std_locales.txt 5.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. //
  2. // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
  3. //
  4. // Distributed under the Boost Software License, Version 1.0. (See
  5. // accompanying file LICENSE_1_0.txt or copy at
  6. // http://www.boost.org/LICENSE_1_0.txt)
  7. //
  8. // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
  9. /*!
  10. \page std_locales Introduction to C++ Standard Library localization support
  11. \section std_locales_basics Getting familiar with standard C++ Locales
  12. The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c
  13. std::locale class, the container that holds all the required information about a specific culture, such as number formatting
  14. patterns, date and time formatting, currency, case conversion etc.
  15. All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are
  16. packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class
  17. keeps reference counters on installed facets and can be efficiently copied.
  18. Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example,
  19. the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
  20. \code
  21. std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
  22. char upper_a = ctype_facet.toupper('a');
  23. \endcode
  24. A locale object can be imbued into an \c iostream so it would format information according to the locale:
  25. \code
  26. cout.imbue(std::locale("en_US.UTF-8"));
  27. cout << 1345.45 << endl;
  28. cout.imbue(std::locale("ru_RU.UTF-8"));
  29. cout << 1345.45 << endl;
  30. \endcode
  31. Would display:
  32. \verbatim
  33. 1,345.45 1.345,45
  34. \endverbatim
  35. You can also create your own facets and install them into existing locale objects. For example:
  36. \code
  37. class measure : public std::locale::facet {
  38. public:
  39. typedef enum { inches, ... } measure_type;
  40. measure(measure_type m,size_t refs=0)
  41. double from_metric(double value) const;
  42. std::string name() const;
  43. ...
  44. };
  45. \endcode
  46. And now you can simply provide this information to a locale:
  47. \code
  48. std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
  49. /// Create default locale built from en_US locale and add paper size facet.
  50. \endcode
  51. Now you can print a distance according to the correct locale:
  52. \code
  53. void print_distance(std::ostream &out,double value)
  54. {
  55. measure const &m = std::use_facet<measure>(out.getloc());
  56. // Fetch locale information from stream
  57. out << m.from_metric(value) << " " << m.name();
  58. }
  59. \endcode
  60. This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using
  61. the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
  62. \section std_locales_common Common Critical Problems with the Standard Library
  63. There are numerous issues in the standard library that prevent the use of its full power, and there are several
  64. additional issues:
  65. - Setting the global locale has bad side effects.
  66. \n
  67. Consider following code:
  68. \n
  69. \code
  70. int main()
  71. {
  72. std::locale::global(std::locale(""));
  73. // Set system's default locale as global
  74. std::ofstream csv("test.csv");
  75. csv << 1.1 << "," << 1.3 << std::endl;
  76. }
  77. \endcode
  78. \n
  79. What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3"
  80. rather than what you had expected.
  81. \n
  82. More than that it affects even \c printf and libraries like \c boost::lexical_cast giving
  83. incorrect or unexpected formatting. In fact many third-party libraries are broken in such a
  84. situation.
  85. \n
  86. Unlike the standard localization library, Boost.Locale never changes the basic number formatting,
  87. even when it uses \c std based localization backends, so by default, numbers are always
  88. formatted using C-style locale. Localized number formatting requires specific flags.
  89. \n
  90. - Number formatting is broken on some locales.
  91. \n
  92. Some locales use the non-breakable space u00A0 character for thousands separator, thus
  93. in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space
  94. is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle
  95. this correctly, for example GCC and SunStudio display a "\xC2" character instead of
  96. the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and
  97. actually generate invalid UTF-8.
  98. \n
  99. - Locale names are not standardized. For example, under MSVC you need to provide the name
  100. \c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8
  101. or \c en_US.ISO-8859-1
  102. \n
  103. More than that, MSVC does not support UTF-8 locales at all.
  104. \n
  105. - Many standard libraries provide only the C and POSIX locales, thus GCC supports localization
  106. only under Linux. On all other platforms, attempting to create locales other than "C" or
  107. "POSIX" would fail.
  108. */