codecvt.html 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
  2. <html>
  3. <!--
  4. == Copyright (c) 2001 Ronald Garcia
  5. ==
  6. == Permission to use, copy, modify, distribute and sell this software
  7. == and its documentation for any purpose is hereby granted without fee,
  8. == provided that the above copyright notice appears in all copies and
  9. == that both that copyright notice and this permission notice appear
  10. == in supporting documentation. Ronald Garcia makes no
  11. == representations about the suitability of this software for any
  12. == purpose. It is provided "as is" without express or implied warranty.
  13. -->
  14. <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  15. <link rel="stylesheet" type="text/css" href="../../../boost.css">
  16. <link rel="stylesheet" type="text/css" href="style.css">
  17. <head>
  18. <title>UTF-8 Codecvt Facet</title>
  19. </head>
  20. <body bgcolor="#ffffff" link="#0000ee" text="#000000"
  21. vlink="#551a8b" alink="#ff0000">
  22. <img src="../../../boost.png" alt="C++ Boost"
  23. width="277" height="86"> <br clear="all">
  24. <a name="sec:utf8-codecvt-facet-class"></a>
  25. <h1><code>utf8_codecvt_facet</code></h1>
  26. <pre>
  27. template&lt;
  28. typename InternType = wchar_t,
  29. typename ExternType = char
  30. &gt; utf8_codecvt_facet
  31. </pre>
  32. <h2>Rationale</h2>
  33. UTF-8 is a method of encoding Unicode text in environments
  34. where data is stored as 8-bit characters and some ascii characters
  35. are considered special (i.e. Unix filesystem filenames) and tend
  36. to appear more commonly than other characters. While
  37. UTF-8 is convenient and efficient for storing data on filesystems,
  38. it was not meant to be manipulated in memory by
  39. applications. While some applications (such as Unix's 'cat') can
  40. simply ignore the encoding of data, others should convert
  41. from UTF-8 to UCS-4 (the more canonical representation of Unicode)
  42. on reading from file, and reversing the process on writing out to
  43. file.
  44. <p>The C++ Standard IOStreams provides the <tt>std::codecvt</tt>
  45. facet to handle specifically these cases. On reading from or
  46. writing to a file, the <tt>std::basic_filebuf</tt> can call out to
  47. the codecvt facet to convert data representations from external
  48. format (ie. UTF-8) to internal format (ie. UCS-4) and
  49. vice-versa. <tt>utf8_codecvt_facet</tt> is a specialization of
  50. <tt>std::codecvt</tt> specifically designed to handle the case
  51. of translating between UTF-8 and UCS-4.
  52. <h2>Template Parameters</h2>
  53. <table border summary="template parameters">
  54. <tr>
  55. <th>Parameter</th><th>Description</th><th>Default</th>
  56. </tr>
  57. <tr>
  58. <td><tt>InternType</tt></td>
  59. <td>The internal type used to represent UCS-4 characters.</td>
  60. <td><tt>wchar_t</tt></td>
  61. </tr>
  62. <tr>
  63. <td><tt>ExternType</tt></td>
  64. <td>The external type used to represent UTF-8 octets.</td>
  65. <td><tt>char_t</tt></td>
  66. </tr>
  67. </table>
  68. <h2>Requirements</h2>
  69. <tt>utf8_codecvt_facet</tt> defaults to using <tt>char</tt> as
  70. its external data type and <tt>wchar_t</tt> as its internal
  71. datatype, but on some architectures <tt>wchar_t</tt> is
  72. not large enough to hold UCS-4 characters. In order to use
  73. another internal type.You must also specialize <tt>std::codecvt</tt>
  74. to handle your internal and external types.
  75. (<tt>std::codecvt&lt;char,wchar_t,std::mbstate_t&gt;</tt> is required to be
  76. supplied by any standard-conforming compiler).
  77. <h2>Example Use</h2>
  78. The following is a simple example of using this facet:
  79. <pre>
  80. //...
  81. // My encoding type
  82. typedef wchar_t ucs4_t;
  83. std::locale old_locale;
  84. std::locale utf8_locale(old_locale,new utf8_codecvt_facet&lt;ucs4_t&gt;);
  85. // Set a New global locale
  86. std::locale::global(utf8_locale);
  87. // Send the UCS-4 data out, converting to UTF-8
  88. {
  89. std::wofstream ofs("data.ucd");
  90. ofs.imbue(utf8_locale);
  91. std::copy(ucs4_data.begin(),ucs4_data.end(),
  92. std::ostream_iterator&lt;ucs4_t,ucs4_t&gt;(ofs));
  93. }
  94. // Read the UTF-8 data back in, converting to UCS-4 on the way in
  95. std::vector&lt;ucs4_t&gt; from_file;
  96. {
  97. std::wifstream ifs("data.ucd");
  98. ifs.imbue(utf8_locale);
  99. ucs4_t item = 0;
  100. while (ifs &gt;&gt; item) from_file.push_back(item);
  101. }
  102. //...
  103. </pre>
  104. <h2>History</h2>
  105. This code was originally written as an iterator adaptor over
  106. containers for use with UTF-8 encoded strings in memory.
  107. Dietmar Kuehl suggested that it would be better provided as a
  108. codecvt facet.
  109. <h2>Resources</h2>
  110. <ul>
  111. <li> <a href="http://www.unicode.org">Unicode Homepage</a>
  112. <li> <a href="http://home.CameloT.de/langer/iostreams.htm">Standard
  113. C++ IOStreams and Locales</a>
  114. <li> <a href="http://www.research.att.com/~bs/3rd.html">The C++
  115. Programming Language Special Edition, Appendix D.</a>
  116. </ul>
  117. <br>
  118. <hr>
  119. <table summary="Copyright information">
  120. <tr valign="top">
  121. <td nowrap>Copyright &copy; 2001</td>
  122. <td><a href="http://www.osl.iu.edu/~garcia">Ronald Garcia</a>,
  123. Indiana University
  124. (<a href="mailto:garcia@cs.indiana.edu">garcia@osl.iu.edu</a>)<br>
  125. <a href="http://www.osl.iu.edu/~lums">Andrew Lumsdaine</a>,
  126. Indiana University
  127. (<a href="mailto:lums@osl.iu.edu">lums@osl.iu.edu</a>)</td>
  128. </tr>
  129. </table>
  130. <p><i>&copy; Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004.
  131. Distributed under the Boost Software License, Version 1.0. (See
  132. accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  133. </i></p>
  134. </body>
  135. </html>