substitutions.qbk 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314
  1. [/
  2. / Copyright (c) 2008 Eric Niebler
  3. /
  4. / Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. /]
  7. [section String Substitutions]
  8. Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the
  9. most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for
  10. searching and replacing.
  11. [h2 regex_replace()]
  12. Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object,
  13. and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept
  14. the input sequence as a bidirectional container such as `std::string` and returns the result in a new container
  15. of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others
  16. accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution
  17. may be specified as a string with format sequences or as a formatter object. Below are some simple examples of
  18. using string-based substitutions.
  19. std::string input("This is his face");
  20. sregex re = as_xpr("his"); // find all occurrences of "his" ...
  21. std::string format("her"); // ... and replace them with "her"
  22. // use the version of regex_replace() that operates on strings
  23. std::string output = regex_replace( input, re, format );
  24. std::cout << output << '\n';
  25. // use the version of regex_replace() that operates on iterators
  26. std::ostream_iterator< char > out_iter( std::cout );
  27. regex_replace( out_iter, input.begin(), input.end(), re, format );
  28. The above program prints out the following:
  29. [pre
  30. Ther is her face
  31. Ther is her face
  32. ]
  33. Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`.
  34. Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see
  35. a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference
  36. to see a complete list of the available overloads.
  37. [h2 Replace Options]
  38. The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The
  39. possible values of the bitmask are:
  40. [table Format Flags
  41. [[Flag] [Meaning]]
  42. [[`format_default`] [Recognize the ECMA-262 format sequences (see below).]]
  43. [[`format_first_only`] [Only replace the first match, not all of them.]]
  44. [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex
  45. to the output sequence.]]
  46. [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any
  47. escape sequences.]]
  48. [[`format_perl`] [Recognize the Perl format sequences (see below).]]
  49. [[`format_sed`] [Recognize the sed format sequences (see below).]]
  50. [[`format_all`] [In addition to the Perl format sequences, recognize some
  51. Boost-specific format sequences.]]
  52. ]
  53. These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is
  54. a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and
  55. `format_all` are ignored.
  56. [h2 The ECMA-262 Format Sequences]
  57. When you haven't specified a substitution string dialect with one of the format flags above,
  58. you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows
  59. the escape sequences recognized in ECMA-262 mode.
  60. [table Format Escape Sequences
  61. [[Escape Sequence] [Meaning]]
  62. [[[^$1], [^$2], etc.] [the corresponding sub-match]]
  63. [[[^$&]] [the full match]]
  64. [[[^$\`]] [the match prefix]]
  65. [[[^$']] [the match suffix]]
  66. [[[^$$]] [a literal `'$'` character]]
  67. ]
  68. Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were
  69. `"$a"` then `"$a"` would be inserted into the output sequence.
  70. [h2 The Sed Format Sequences]
  71. When specifying the `format_sed` flag to _regex_replace_, the following escape sequences
  72. are recognized:
  73. [table Sed Format Escape Sequences
  74. [[Escape Sequence] [Meaning]]
  75. [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
  76. [[[^&]] [the full match]]
  77. [[[^\\a]] [A literal `'\a'`]]
  78. [[[^\\e]] [A literal `char_type(27)`]]
  79. [[[^\\f]] [A literal `'\f'`]]
  80. [[[^\\n]] [A literal `'\n'`]]
  81. [[[^\\r]] [A literal `'\r'`]]
  82. [[[^\\t]] [A literal `'\t'`]]
  83. [[[^\\v]] [A literal `'\v'`]]
  84. [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
  85. [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
  86. [[[^\\cX]] [The control character [^['X]]]]
  87. ]
  88. [h2 The Perl Format Sequences]
  89. When specifying the `format_perl` flag to _regex_replace_, the following escape sequences
  90. are recognized:
  91. [table Perl Format Escape Sequences
  92. [[Escape Sequence] [Meaning]]
  93. [[[^$1], [^$2], etc.] [the corresponding sub-match]]
  94. [[[^$&]] [the full match]]
  95. [[[^$\`]] [the match prefix]]
  96. [[[^$']] [the match suffix]]
  97. [[[^$$]] [a literal `'$'` character]]
  98. [[[^\\a]] [A literal `'\a'`]]
  99. [[[^\\e]] [A literal `char_type(27)`]]
  100. [[[^\\f]] [A literal `'\f'`]]
  101. [[[^\\n]] [A literal `'\n'`]]
  102. [[[^\\r]] [A literal `'\r'`]]
  103. [[[^\\t]] [A literal `'\t'`]]
  104. [[[^\\v]] [A literal `'\v'`]]
  105. [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]]
  106. [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]]
  107. [[[^\\cX]] [The control character [^['X]]]]
  108. [[[^\\l]] [Make the next character lowercase]]
  109. [[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]]
  110. [[[^\\u]] [Make the next character uppercase]]
  111. [[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]]
  112. [[[^\\E]] [Terminate [^\\L] or [^\\U]]]
  113. [[[^\\1], [^\\2], etc.] [The corresponding sub-match]]
  114. [[[^\\g<name>]] [The named backref /name/]]
  115. ]
  116. [h2 The Boost-Specific Format Sequences]
  117. When specifying the `format_all` flag to _regex_replace_, the escape sequences
  118. recognized are the same as those above for `format_perl`. In addition, conditional
  119. expressions of the following form are recognized:
  120. [pre
  121. ?Ntrue-expression:false-expression
  122. ]
  123. where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match
  124. participated in the full match, then the substitution is /true-expression/. Otherwise,
  125. it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you
  126. want a literal paren, you must escape it as [^\\(].
  127. [h2 Formatter Objects]
  128. Format strings are not always expressive enough for all your text substitution
  129. needs. Consider the simple example of wanting to map input strings to output
  130. strings, as you may want to do with environment variables. Rather than a format
  131. /string/, for this you would use a formatter /object/. Consider the following
  132. code, which finds embedded environment variables of the form `"$(XYZ)"` and
  133. computes the substitution string by looking up the environment variable in a
  134. map.
  135. #include <map>
  136. #include <string>
  137. #include <iostream>
  138. #include <boost/xpressive/xpressive.hpp>
  139. using namespace boost;
  140. using namespace xpressive;
  141. std::map<std::string, std::string> env;
  142. std::string const &format_fun(smatch const &what)
  143. {
  144. return env[what[1].str()];
  145. }
  146. int main()
  147. {
  148. env["X"] = "this";
  149. env["Y"] = "that";
  150. std::string input("\"$(X)\" has the value \"$(Y)\"");
  151. // replace strings like "$(XYZ)" with the result of env["XYZ"]
  152. sregex envar = "$(" >> (s1 = +_w) >> ')';
  153. std::string output = regex_replace(input, envar, format_fun);
  154. std::cout << output << std::endl;
  155. }
  156. In this case, we use a function, `format_fun()` to compute the substitution string
  157. on the fly. It accepts a _match_results_ object which contains the results of the
  158. current match. `format_fun()` uses the first submatch as a key into the global `env`
  159. map. The above code displays:
  160. [pre
  161. "this" has the value "that"
  162. ]
  163. The formatter need not be an ordinary function. It may be an object of class type.
  164. And rather than return a string, it may accept an output iterator into which it
  165. writes the substitution. Consider the following, which is functionally equivalent
  166. to the above.
  167. #include <map>
  168. #include <string>
  169. #include <iostream>
  170. #include <boost/xpressive/xpressive.hpp>
  171. using namespace boost;
  172. using namespace xpressive;
  173. struct formatter
  174. {
  175. typedef std::map<std::string, std::string> env_map;
  176. env_map env;
  177. template<typename Out>
  178. Out operator()(smatch const &what, Out out) const
  179. {
  180. env_map::const_iterator where = env.find(what[1]);
  181. if(where != env.end())
  182. {
  183. std::string const &sub = where->second;
  184. out = std::copy(sub.begin(), sub.end(), out);
  185. }
  186. return out;
  187. }
  188. };
  189. int main()
  190. {
  191. formatter fmt;
  192. fmt.env["X"] = "this";
  193. fmt.env["Y"] = "that";
  194. std::string input("\"$(X)\" has the value \"$(Y)\"");
  195. sregex envar = "$(" >> (s1 = +_w) >> ')';
  196. std::string output = regex_replace(input, envar, fmt);
  197. std::cout << output << std::endl;
  198. }
  199. The formatter must be a callable object -- a function or a function object --
  200. that has one of three possible signatures, detailed in the table below. For
  201. the table, `fmt` is a function pointer or function object, `what` is a
  202. _match_results_ object, `out` is an OutputIterator, and `flags` is a value
  203. of `regex_constants::match_flag_type`:
  204. [table Formatter Signatures
  205. [
  206. [Formatter Invocation]
  207. [Return Type]
  208. [Semantics]
  209. ]
  210. [
  211. [`fmt(what)`]
  212. [Range of characters (e.g. `std::string`) or null-terminated string]
  213. [The string matched by the regex is replaced with the string returned by
  214. the formatter.]
  215. ]
  216. [
  217. [`fmt(what, out)`]
  218. [OutputIterator]
  219. [The formatter writes the replacement string into `out` and returns `out`.]
  220. ]
  221. [
  222. [`fmt(what, out, flags)`]
  223. [OutputIterator]
  224. [The formatter writes the replacement string into `out` and returns `out`.
  225. The `flags` parameter is the value of the match flags passed to the
  226. _regex_replace_ algorithm.]
  227. ]
  228. ]
  229. [h2 Formatter Expressions]
  230. In addition to format /strings/ and formatter /objects/, _regex_replace_ also
  231. accepts formatter /expressions/. A formatter expression is a lambda expression
  232. that generates a string. It uses the same syntax as that for
  233. [link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions
  234. Semantic Actions], which are covered later. The above example, which uses
  235. _regex_replace_ to substitute strings for environment variables, is repeated
  236. here using a formatter expression.
  237. #include <map>
  238. #include <string>
  239. #include <iostream>
  240. #include <boost/xpressive/xpressive.hpp>
  241. #include <boost/xpressive/regex_actions.hpp>
  242. using namespace boost::xpressive;
  243. int main()
  244. {
  245. std::map<std::string, std::string> env;
  246. env["X"] = "this";
  247. env["Y"] = "that";
  248. std::string input("\"$(X)\" has the value \"$(Y)\"");
  249. sregex envar = "$(" >> (s1 = +_w) >> ')';
  250. std::string output = regex_replace(input, envar, ref(env)[s1]);
  251. std::cout << output << std::endl;
  252. }
  253. In the above, the formatter expression is `ref(env)[s1]`. This means to use the
  254. value of the first submatch, `s1`, as a key into the `env` map. The purpose of
  255. `xpressive::ref()` here is to make the reference to the `env` local variable /lazy/
  256. so that the index operation is deferred until we know what to replace `s1` with.
  257. [endsect]