conversions.txt 4.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
  1. //
  2. // Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
  3. //
  4. // Distributed under the Boost Software License, Version 1.0. (See
  5. // accompanying file LICENSE_1_0.txt or copy at
  6. // http://www.boost.org/LICENSE_1_0.txt)
  7. //
  8. // vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
  9. /*!
  10. \page conversions Text Conversions
  11. There is a set of functions that perform basic string conversion operations:
  12. upper, lower and \ref term_title_case "title case" conversions, \ref term_case_folding "case folding"
  13. and Unicode \ref term_normalization "normalization". These are \ref boost::locale::to_upper "to_upper" , \ref boost::locale::to_lower "to_lower", \ref boost::locale::to_title "to_title", \ref boost::locale::fold_case "fold_case" and \ref boost::locale::normalize "normalize".
  14. All these functions receive an \c std::locale object as parameter or use a global locale by default.
  15. Global locale is used in all examples below.
  16. \section conversions_case Case Handing
  17. For example:
  18. \code
  19. std::string grussen = "grüßEN";
  20. std::cout <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
  21. <<"Lower "<< boost::locale::to_lower(grussen) << std::endl
  22. <<"Title "<< boost::locale::to_title(grussen) << std::endl
  23. <<"Fold "<< boost::locale::fold_case(grussen) << std::endl;
  24. \endcode
  25. Would print:
  26. \verbatim
  27. Upper GRÜSSEN
  28. Lower grüßen
  29. Title Grüßen
  30. Fold grüssen
  31. \endverbatim
  32. You may notice that there are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library.
  33. The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.
  34. For example:
  35. \code
  36. std::wstring grussen = L"grüßen";
  37. std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
  38. \endcode
  39. Would give in output:
  40. \verbatim
  41. GRÜßEN GRÜSSEN
  42. \endverbatim
  43. Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.
  44. This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all.
  45. For example, this code
  46. \code
  47. std::string grussen = "grüßen";
  48. std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
  49. \endcode
  50. Would modify ASCII characters only
  51. \verbatim
  52. GRüßEN GRÜSSEN
  53. \endverbatim
  54. \section conversions_normalization Unicode Normalization
  55. Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
  56. comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
  57. diaeresis "¨". Normalization is an important part of Unicode text processing.
  58. Unicode defines four normalization forms. Each specific form is selected by a flag passed
  59. to \ref boost::locale::normalize() "normalize" function:
  60. - NFD - Canonical decomposition - boost::locale::norm_nfd
  61. - NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
  62. - NFKD - Compatibility decomposition - boost::locale::norm_nfkd
  63. - NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc
  64. For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.
  65. \section conversions_notes Notes
  66. - \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
  67. character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
  68. - \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
  69. determine the 8-bit encoding.
  70. - All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always
  71. return a newly created STL string.
  72. - The length of the string may change, see the above example.
  73. */