123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293 |
- [/
- / Copyright (c) 2008 Eric Niebler
- /
- / Distributed under the Boost Software License, Version 1.0. (See accompanying
- / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- /]
- [section Localization and Regex Traits]
- [h2 Overview]
- Matching a regular expression against a string often requires locale-dependent information. For example,
- how are case-insensitive comparisons performed? The locale-sensitive behavior is captured in a traits class.
- xpressive provides three traits class templates: `cpp_regex_traits<>`, `c_regex_traits<>` and `null_regex_traits<>`.
- The first wraps a `std::locale`, the second wraps the global C locale, and the third is a stub traits type for
- use when searching non-character data. All traits templates conform to the
- [link boost_xpressive.user_s_guide.concepts.traits_requirements Regex Traits Concept].
- [h2 Setting the Default Regex Trait]
- By default, xpressive uses `cpp_regex_traits<>` for all patterns. This causes all regex objects to use
- the global `std::locale`. If you compile with `BOOST_XPRESSIVE_USE_C_TRAITS` defined, then xpressive will use
- `c_regex_traits<>` by default.
- [h2 Using Custom Traits with Dynamic Regexes]
- To create a dynamic regex that uses a custom traits object, you must use _regex_compiler_.
- The basic steps are shown in the following example:
- // Declare a regex_compiler that uses the global C locale
- regex_compiler<char const *, c_regex_traits<char> > crxcomp;
- cregex crx = crxcomp.compile( "\\w+" );
- // Declare a regex_compiler that uses a custom std::locale
- std::locale loc = /* ... create a locale here ... */;
- regex_compiler<char const *, cpp_regex_traits<char> > cpprxcomp(loc);
- cregex cpprx = cpprxcomp.compile( "\\w+" );
- The `regex_compiler` objects act as regex factories. Once they have been imbued with a locale,
- every regex object they create will use that locale.
- [h2 Using Custom Traits with Static Regexes]
- If you want a particular static regex to use a different set of traits, you can use the special `imbue()`
- pattern modifier. For instance:
- // Define a regex that uses the global C locale
- c_regex_traits<char> ctraits;
- sregex crx = imbue(ctraits)( +_w );
- // Define a regex that uses a customized std::locale
- std::locale loc = /* ... create a locale here ... */;
- cpp_regex_traits<char> cpptraits(loc);
- sregex cpprx1 = imbue(cpptraits)( +_w );
- // A shorthand for above
- sregex cpprx2 = imbue(loc)( +_w );
- The `imbue()` pattern modifier must wrap the entire pattern. It is an error to `imbue` only
- part of a static regex. For example:
- // ERROR! Cannot imbue() only part of a regex
- sregex error = _w >> imbue(loc)( _w );
- [h2 Searching Non-Character Data With [^null_regex_traits]]
- With xpressive static regexes, you are not limitted to searching for patterns in character sequences.
- You can search for patterns in raw bytes, integers, or anything that conforms to the
- [link boost_xpressive.user_s_guide.concepts.chart_requirements Char Concept]. The `null_regex_traits<>` makes it simple. It is a
- stub implementation of the [link boost_xpressive.user_s_guide.concepts.traits_requirements Regex Traits Concept]. It recognizes
- no character classes and does no case-sensitive mappings.
- For example, with `null_regex_traits<>`, you can write a static regex to find a pattern in a
- sequence of integers as follows:
- // some integral data to search
- int const data[] = {0, 1, 2, 3, 4, 5, 6};
- // create a null_regex_traits<> object for searching integers ...
- null_regex_traits<int> nul;
- // imbue a regex object with the null_regex_traits ...
- basic_regex<int const *> rex = imbue(nul)(1 >> +((set= 2,3) | 4) >> 5);
- match_results<int const *> what;
- // search for the pattern in the array of integers ...
- regex_search(data, data + 7, what, rex);
- assert(what[0].matched);
- assert(*what[0].first == 1);
- assert(*what[0].second == 6);
- [endsect]
|