// // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) // // Distributed under the Boost Software License, Version 1.0. (See // accompanying file LICENSE_1_0.txt or copy at // http://www.boost.org/LICENSE_1_0.txt) // // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen /*! \page std_locales Introduction to C++ Standard Library localization support \section std_locales_basics Getting familiar with standard C++ Locales The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c std::locale class, the container that holds all the required information about a specific culture, such as number formatting patterns, date and time formatting, currency, case conversion etc. All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class keeps reference counters on installed facets and can be efficiently copied. Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example, the \c std::ctype facet provides rules for case conversion, so you can convert a character to upper-case like this: \code std::ctype const &ctype_facet = std::use_facet >(some_locale); char upper_a = ctype_facet.toupper('a'); \endcode A locale object can be imbued into an \c iostream so it would format information according to the locale: \code cout.imbue(std::locale("en_US.UTF-8")); cout << 1345.45 << endl; cout.imbue(std::locale("ru_RU.UTF-8")); cout << 1345.45 << endl; \endcode Would display: \verbatim 1,345.45 1.345,45 \endverbatim You can also create your own facets and install them into existing locale objects. For example: \code class measure : public std::locale::facet { public: typedef enum { inches, ... } measure_type; measure(measure_type m,size_t refs=0) double from_metric(double value) const; std::string name() const; ... }; \endcode And now you can simply provide this information to a locale: \code std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches))); /// Create default locale built from en_US locale and add paper size facet. \endcode Now you can print a distance according to the correct locale: \code void print_distance(std::ostream &out,double value) { measure const &m = std::use_facet(out.getloc()); // Fetch locale information from stream out << m.from_metric(value) << " " << m.name(); } \endcode This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones. \section std_locales_common Common Critical Problems with the Standard Library There are numerous issues in the standard library that prevent the use of its full power, and there are several additional issues: - Setting the global locale has bad side effects. \n Consider following code: \n \code int main() { std::locale::global(std::locale("")); // Set system's default locale as global std::ofstream csv("test.csv"); csv << 1.1 << "," << 1.3 << std::endl; } \endcode \n What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3" rather than what you had expected. \n More than that it affects even \c printf and libraries like \c boost::lexical_cast giving incorrect or unexpected formatting. In fact many third-party libraries are broken in such a situation. \n Unlike the standard localization library, Boost.Locale never changes the basic number formatting, even when it uses \c std based localization backends, so by default, numbers are always formatted using C-style locale. Localized number formatting requires specific flags. \n - Number formatting is broken on some locales. \n Some locales use the non-breakable space u00A0 character for thousands separator, thus in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle this correctly, for example GCC and SunStudio display a "\xC2" character instead of the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and actually generate invalid UTF-8. \n - Locale names are not standardized. For example, under MSVC you need to provide the name \c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8 or \c en_US.ISO-8859-1 \n More than that, MSVC does not support UTF-8 locales at all. \n - Many standard libraries provide only the C and POSIX locales, thus GCC supports localization only under Linux. On all other platforms, attempting to create locales other than "C" or "POSIX" would fail. */