unicode.html 5.8 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
  1. <html>
  2. <head>
  3. <meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
  4. <title>Unicode and Boost.Regex</title>
  5. <link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css">
  6. <meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
  7. <link rel="home" href="../index.html" title="Boost.Regex 5.1.4">
  8. <link rel="up" href="../index.html" title="Boost.Regex 5.1.4">
  9. <link rel="prev" href="intro.html" title="Introduction and Overview">
  10. <link rel="next" href="captures.html" title="Understanding Marked Sub-Expressions and Captures">
  11. </head>
  12. <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
  13. <table cellpadding="2" width="100%"><tr>
  14. <td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
  15. <td align="center"><a href="../../../../../index.html">Home</a></td>
  16. <td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
  17. <td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
  18. <td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
  19. <td align="center"><a href="../../../../../more/index.htm">More</a></td>
  20. </tr></table>
  21. <hr>
  22. <div class="spirit-nav">
  23. <a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
  24. </div>
  25. <div class="section">
  26. <div class="titlepage"><div><div><h2 class="title" style="clear: both">
  27. <a name="boost_regex.unicode"></a><a class="link" href="unicode.html" title="Unicode and Boost.Regex">Unicode and Boost.Regex</a>
  28. </h2></div></div></div>
  29. <p>
  30. There are two ways to use Boost.Regex with Unicode strings:
  31. </p>
  32. <h5>
  33. <a name="boost_regex.unicode.h0"></a>
  34. <span class="phrase"><a name="boost_regex.unicode.rely_on_wchar_t"></a></span><a class="link" href="unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely
  35. on wchar_t</a>
  36. </h5>
  37. <p>
  38. If your platform's <code class="computeroutput"><span class="keyword">wchar_t</span></code> type
  39. can hold Unicode strings, and your platform's C/C++ runtime correctly handles
  40. wide character constants (when passed to <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswspace</span></code>
  41. <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswlower</span></code> etc), then you can use <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">wregex</span></code>
  42. to process Unicode. However, there are several disadvantages to this approach:
  43. </p>
  44. <div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
  45. <li class="listitem">
  46. It's not portable: there's no guarantee on the width of <code class="computeroutput"><span class="keyword">wchar_t</span></code>,
  47. or even whether the runtime treats wide characters as Unicode at all, most
  48. Windows compilers do so, but many Unix systems do not.
  49. </li>
  50. <li class="listitem">
  51. There's no support for Unicode-specific character classes: <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Nd</span><span class="special">:]]</span></code>, <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Po</span><span class="special">:]]</span></code>
  52. etc.
  53. </li>
  54. <li class="listitem">
  55. You can only search strings that are encoded as sequences of wide characters,
  56. it is not possible to search UTF-8, or even UTF-16 on many platforms.
  57. </li>
  58. </ul></div>
  59. <h5>
  60. <a name="boost_regex.unicode.h1"></a>
  61. <span class="phrase"><a name="boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class="link" href="unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use
  62. a Unicode Aware Regular Expression Type.</a>
  63. </h5>
  64. <p>
  65. If you have the <a href="http://www.ibm.com/software/globalization/icu/" target="_top">ICU
  66. library</a>, then Boost.Regex can be <a class="link" href="install.html#boost_regex.install.building_with_unicode_and_icu_su">configured
  67. to make use of it</a>, and provide a distinct regular expression type (boost::u32regex),
  68. that supports both Unicode specific character properties, and the searching
  69. of text that is encoded in either UTF-8, UTF-16, or UTF-32. See: <a class="link" href="ref/non_std_strings/icu.html" title="Working With Unicode and ICU String Types">ICU
  70. string class support</a>.
  71. </p>
  72. </div>
  73. <table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
  74. <td align="left"></td>
  75. <td align="right"><div class="copyright-footer">Copyright &#169; 1998-2013 John Maddock<p>
  76. Distributed under the Boost Software License, Version 1.0. (See accompanying
  77. file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
  78. </p>
  79. </div></td>
  80. </tr></table>
  81. <hr>
  82. <div class="spirit-nav">
  83. <a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
  84. </div>
  85. </body>
  86. </html>