123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314 |
- [/
- / Copyright (c) 2008 Eric Niebler
- /
- / Distributed under the Boost Software License, Version 1.0. (See accompanying
- / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
- /]
- [section String Substitutions]
- Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
- most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
- searching and replacing.
- [h2 regex_replace()]
- Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
- and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
- the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
- of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
- accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
- may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
- using string-based substitutions.
- std::string input("This is his face");
- sregex re = as_xpr("his"); // find all occurrences of "his" ...
- std::string format("her"); // ... and replace them with "her"
- // use the version of regex_replace() that operates on strings
- std::string output = regex_replace( input, re, format );
- std::cout << output << '\n';
- // use the version of regex_replace() that operates on iterators
- std::ostream_iterator< char > out_iter( std::cout );
- regex_replace( out_iter, input.begin(), input.end(), re, format );
- The above program prints out the following:
- [pre
- Ther is her face
- Ther is her face
- ]
- Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.
- Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
- a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
- to see a complete list of the available overloads.
- [h2 Replace Options]
- The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
- possible values of the bitmask are:
- [table Format Flags
- [[Flag] [Meaning]]
- [[`format_default`] [Recognize the ECMA-262 format sequences (see below).]]
- [[`format_first_only`] [Only replace the first match, not all of them.]]
- [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex
- to the output sequence.]]
- [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any
- escape sequences.]]
- [[`format_perl`] [Recognize the Perl format sequences (see below).]]
- [[`format_sed`] [Recognize the sed format sequences (see below).]]
- [[`format_all`] [In addition to the Perl format sequences, recognize some
- Boost-specific format sequences.]]
- ]
- These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
- a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
- `format_all` are ignored.
- [h2 The ECMA-262 Format Sequences]
- When you haven't specified a substitution string dialect with one of the format flags above,
- you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
- the escape sequences recognized in ECMA-262 mode.
- [table Format Escape Sequences
- [[Escape Sequence] [Meaning]]
- [[[^$1], [^$2], etc.] [the corresponding sub-match]]
- [[[^$&]] [the full match]]
- [[[^$\`]] [the match prefix]]
- [[[^$']] [the match suffix]]
- [[[^$$]] [a literal `'$'` character]]
- ]
- Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
- `"$a"` then `"$a"` would be inserted into the output sequence.
- [h2 The Sed Format Sequences]
- When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
- are recognized:
- [table Sed Format Escape Sequences
- [[Escape Sequence] [Meaning]]
- [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
- [[[^&]] [the full match]]
- [[[^\\a]] [A literal `'\a'`]]
- [[[^\\e]] [A literal `char_type(27)`]]
- [[[^\\f]] [A literal `'\f'`]]
- [[[^\\n]] [A literal `'\n'`]]
- [[[^\\r]] [A literal `'\r'`]]
- [[[^\\t]] [A literal `'\t'`]]
- [[[^\\v]] [A literal `'\v'`]]
- [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
- [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
- [[[^\\cX]] [The control character [^['X]]]]
- ]
- [h2 The Perl Format Sequences]
- When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
- are recognized:
- [table Perl Format Escape Sequences
- [[Escape Sequence] [Meaning]]
- [[[^$1], [^$2], etc.] [the corresponding sub-match]]
- [[[^$&]] [the full match]]
- [[[^$\`]] [the match prefix]]
- [[[^$']] [the match suffix]]
- [[[^$$]] [a literal `'$'` character]]
- [[[^\\a]] [A literal `'\a'`]]
- [[[^\\e]] [A literal `char_type(27)`]]
- [[[^\\f]] [A literal `'\f'`]]
- [[[^\\n]] [A literal `'\n'`]]
- [[[^\\r]] [A literal `'\r'`]]
- [[[^\\t]] [A literal `'\t'`]]
- [[[^\\v]] [A literal `'\v'`]]
- [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
- [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
- [[[^\\cX]] [The control character [^['X]]]]
- [[[^\\l]] [Make the next character lowercase]]
- [[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]]
- [[[^\\u]] [Make the next character uppercase]]
- [[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]]
- [[[^\\E]] [Terminate [^\\L] or [^\\U]]]
- [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
- [[[^\\g<name>]] [The named backref /name/]]
- ]
- [h2 The Boost-Specific Format Sequences]
- When specifying the `format_all` flag to _regex_replace_, the escape sequences
- recognized are the same as those above for `format_perl`. In addition, conditional
- expressions of the following form are recognized:
- [pre
- ?Ntrue-expression:false-expression
- ]
- where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
- participated in the full match, then the substitution is /true-expression/. Otherwise,
- it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
- want a literal paren, you must escape it as [^\\(].
- [h2 Formatter Objects]
- Format strings are not always expressive enough for all your text substitution
- needs. Consider the simple example of wanting to map input strings to output
- strings, as you may want to do with environment variables. Rather than a format
- /string/, for this you would use a formatter /object/. Consider the following
- code, which finds embedded environment variables of the form `"$(XYZ)"` and
- computes the substitution string by looking up the environment variable in a
- map.
- #include <map>
- #include <string>
- #include <iostream>
- #include <boost/xpressive/xpressive.hpp>
- using namespace boost;
- using namespace xpressive;
- std::map<std::string, std::string> env;
- std::string const &format_fun(smatch const &what)
- {
- return env[what[1].str()];
- }
- int main()
- {
- env["X"] = "this";
- env["Y"] = "that";
- std::string input("\"$(X)\" has the value \"$(Y)\"");
- // replace strings like "$(XYZ)" with the result of env["XYZ"]
- sregex envar = "$(" >> (s1 = +_w) >> ')';
- std::string output = regex_replace(input, envar, format_fun);
- std::cout << output << std::endl;
- }
- In this case, we use a function, `format_fun()` to compute the substitution string
- on the fly. It accepts a _match_results_ object which contains the results of the
- current match. `format_fun()` uses the first submatch as a key into the global `env`
- map. The above code displays:
- [pre
- "this" has the value "that"
- ]
- The formatter need not be an ordinary function. It may be an object of class type.
- And rather than return a string, it may accept an output iterator into which it
- writes the substitution. Consider the following, which is functionally equivalent
- to the above.
- #include <map>
- #include <string>
- #include <iostream>
- #include <boost/xpressive/xpressive.hpp>
- using namespace boost;
- using namespace xpressive;
- struct formatter
- {
- typedef std::map<std::string, std::string> env_map;
- env_map env;
- template<typename Out>
- Out operator()(smatch const &what, Out out) const
- {
- env_map::const_iterator where = env.find(what[1]);
- if(where != env.end())
- {
- std::string const &sub = where->second;
- out = std::copy(sub.begin(), sub.end(), out);
- }
- return out;
- }
- };
- int main()
- {
- formatter fmt;
- fmt.env["X"] = "this";
- fmt.env["Y"] = "that";
- std::string input("\"$(X)\" has the value \"$(Y)\"");
- sregex envar = "$(" >> (s1 = +_w) >> ')';
- std::string output = regex_replace(input, envar, fmt);
- std::cout << output << std::endl;
- }
- The formatter must be a callable object -- a function or a function object --
- that has one of three possible signatures, detailed in the table below. For
- the table, `fmt` is a function pointer or function object, `what` is a
- _match_results_ object, `out` is an OutputIterator, and `flags` is a value
- of `regex_constants::match_flag_type`:
- [table Formatter Signatures
- [
- [Formatter Invocation]
- [Return Type]
- [Semantics]
- ]
- [
- [`fmt(what)`]
- [Range of characters (e.g. `std::string`) or null-terminated string]
- [The string matched by the regex is replaced with the string returned by
- the formatter.]
- ]
- [
- [`fmt(what, out)`]
- [OutputIterator]
- [The formatter writes the replacement string into `out` and returns `out`.]
- ]
- [
- [`fmt(what, out, flags)`]
- [OutputIterator]
- [The formatter writes the replacement string into `out` and returns `out`.
- The `flags` parameter is the value of the match flags passed to the
- _regex_replace_ algorithm.]
- ]
- ]
- [h2 Formatter Expressions]
- In addition to format /strings/ and formatter /objects/, _regex_replace_ also
- accepts formatter /expressions/. A formatter expression is a lambda expression
- that generates a string. It uses the same syntax as that for
- [link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
- Semantic Actions], which are covered later. The above example, which uses
- _regex_replace_ to substitute strings for environment variables, is repeated
- here using a formatter expression.
- #include <map>
- #include <string>
- #include <iostream>
- #include <boost/xpressive/xpressive.hpp>
- #include <boost/xpressive/regex_actions.hpp>
- using namespace boost::xpressive;
- int main()
- {
- std::map<std::string, std::string> env;
- env["X"] = "this";
- env["Y"] = "that";
- std::string input("\"$(X)\" has the value \"$(Y)\"");
- sregex envar = "$(" >> (s1 = +_w) >> ')';
- std::string output = regex_replace(input, envar, ref(env)[s1]);
- std::cout << output << std::endl;
- }
- In the above, the formatter expression is `ref(env)[s1]`. This means to use the
- value of the first submatch, `s1`, as a key into the `env` map. The purpose of
- `xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
- so that the index operation is deferred until we know what to replace `s1` with.
- [endsect]
|