123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130 |
- //
- // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
- //
- // Distributed under the Boost Software License, Version 1.0. (See
- // accompanying file LICENSE_1_0.txt or copy at
- // http://www.boost.org/LICENSE_1_0.txt)
- //
- // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
- /*!
- \page std_locales Introduction to C++ Standard Library localization support
- \section std_locales_basics Getting familiar with standard C++ Locales
- The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c
- std::locale class, the container that holds all the required information about a specific culture, such as number formatting
- patterns, date and time formatting, currency, case conversion etc.
- All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are
- packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class
- keeps reference counters on installed facets and can be efficiently copied.
- Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example,
- the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
- \code
- std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
- char upper_a = ctype_facet.toupper('a');
- \endcode
- A locale object can be imbued into an \c iostream so it would format information according to the locale:
- \code
- cout.imbue(std::locale("en_US.UTF-8"));
- cout << 1345.45 << endl;
- cout.imbue(std::locale("ru_RU.UTF-8"));
- cout << 1345.45 << endl;
- \endcode
- Would display:
- \verbatim
- 1,345.45 1.345,45
- \endverbatim
- You can also create your own facets and install them into existing locale objects. For example:
- \code
- class measure : public std::locale::facet {
- public:
- typedef enum { inches, ... } measure_type;
- measure(measure_type m,size_t refs=0)
- double from_metric(double value) const;
- std::string name() const;
- ...
- };
- \endcode
- And now you can simply provide this information to a locale:
- \code
- std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
- /// Create default locale built from en_US locale and add paper size facet.
- \endcode
- Now you can print a distance according to the correct locale:
- \code
- void print_distance(std::ostream &out,double value)
- {
- measure const &m = std::use_facet<measure>(out.getloc());
- // Fetch locale information from stream
- out << m.from_metric(value) << " " << m.name();
- }
- \endcode
- This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using
- the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
- \section std_locales_common Common Critical Problems with the Standard Library
- There are numerous issues in the standard library that prevent the use of its full power, and there are several
- additional issues:
- - Setting the global locale has bad side effects.
- \n
- Consider following code:
- \n
- \code
- int main()
- {
- std::locale::global(std::locale(""));
- // Set system's default locale as global
- std::ofstream csv("test.csv");
- csv << 1.1 << "," << 1.3 << std::endl;
- }
- \endcode
- \n
- What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3"
- rather than what you had expected.
- \n
- More than that it affects even \c printf and libraries like \c boost::lexical_cast giving
- incorrect or unexpected formatting. In fact many third-party libraries are broken in such a
- situation.
- \n
- Unlike the standard localization library, Boost.Locale never changes the basic number formatting,
- even when it uses \c std based localization backends, so by default, numbers are always
- formatted using C-style locale. Localized number formatting requires specific flags.
- \n
- - Number formatting is broken on some locales.
- \n
- Some locales use the non-breakable space u00A0 character for thousands separator, thus
- in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space
- is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle
- this correctly, for example GCC and SunStudio display a "\xC2" character instead of
- the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and
- actually generate invalid UTF-8.
- \n
- - Locale names are not standardized. For example, under MSVC you need to provide the name
- \c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8
- or \c en_US.ISO-8859-1
- \n
- More than that, MSVC does not support UTF-8 locales at all.
- \n
- - Many standard libraries provide only the C and POSIX locales, thus GCC supports localization
- only under Linux. On all other platforms, attempting to create locales other than "C" or
- "POSIX" would fail.
- */
|