named_captures.qbk 3.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120
  1. [/
  2. / Copyright (c) 2009 Eric Niebler
  3. /
  4. / Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. /]
  7. [section Named Captures]
  8. [h2 Overview]
  9. For complicated regular expressions, dealing with numbered captures can be a
  10. pain. Counting left parentheses to figure out which capture to reference is
  11. no fun. Less fun is the fact that merely editing a regular expression could
  12. cause a capture to be assigned a new number, invaliding code that refers back
  13. to it by the old number.
  14. Other regular expression engines solve this problem with a feature called
  15. /named captures/. This feature allows you to assign a name to a capture, and
  16. to refer back to the capture by name rather by number. Xpressive also supports
  17. named captures, both in dynamic and in static regexes.
  18. [h2 Dynamic Named Captures]
  19. For dynamic regular expressions, xpressive follows the lead of other popular
  20. regex engines with the syntax of named captures. You can create a named capture
  21. with `"(?P<xxx>...)"` and refer back to that capture with `"(?P=xxx)"`. Here,
  22. for instance, is a regular expression that creates a named capture and refers
  23. back to it:
  24. // Create a named capture called "char" that matches a single
  25. // character and refer back to that capture by name.
  26. sregex rx = sregex::compile("(?P<char>.)(?P=char)");
  27. The effect of the above regular expression is to find the first doubled
  28. character.
  29. Once you have executed a match or search operation using a regex with named
  30. captures, you can access the named capture through the _match_results_ object
  31. using the capture's name.
  32. std::string str("tweet");
  33. sregex rx = sregex::compile("(?P<char>.)(?P=char)");
  34. smatch what;
  35. if(regex_search(str, what, rx))
  36. {
  37. std::cout << "char = " << what["char"] << std::endl;
  38. }
  39. The above code displays:
  40. [pre
  41. char = e
  42. ]
  43. You can also refer back to a named capture from within a substitution string.
  44. The syntax for that is `"\\g<xxx>"`. Below is some code that demonstrates how
  45. to use named captures when doing string substitution.
  46. std::string str("tweet");
  47. sregex rx = sregex::compile("(?P<char>.)(?P=char)");
  48. str = regex_replace(str, rx, "**\\g<char>**", regex_constants::format_perl);
  49. std::cout << str << std::endl;
  50. Notice that you have to specify `format_perl` when using named captures. Only
  51. the perl syntax recognizes the `"\\g<xxx>"` syntax. The above code displays:
  52. [pre
  53. tw\*\*e\*\*t
  54. ]
  55. [h2 Static Named Captures]
  56. If you're using static regular expressions, creating and using named
  57. captures is even easier. You can use the _mark_tag_ type to create
  58. a variable that you can use like [globalref boost::xpressive::s1 `s1`],
  59. [globalref boost::xpressive::s1 `s2`] and friends, but with a name
  60. that is more meaningful. Below is how the above example would look
  61. using static regexes:
  62. mark_tag char_(1); // char_ is now a synonym for s1
  63. sregex rx = (char_= _) >> char_;
  64. After a match operation, you can use the `mark_tag` to index into the
  65. _match_results_ to access the named capture:
  66. std::string str("tweet");
  67. mark_tag char_(1);
  68. sregex rx = (char_= _) >> char_;
  69. smatch what;
  70. if(regex_search(str, what, rx))
  71. {
  72. std::cout << what[char_] << std::endl;
  73. }
  74. The above code displays:
  75. [pre
  76. char = e
  77. ]
  78. When doing string substitutions with _regex_replace_, you can use named
  79. captures to create /format expressions/ as below:
  80. std::string str("tweet");
  81. mark_tag char_(1);
  82. sregex rx = (char_= _) >> char_;
  83. str = regex_replace(str, rx, "**" + char_ + "**");
  84. std::cout << str << std::endl;
  85. The above code displays:
  86. [pre
  87. tw\*\*e\*\*t
  88. ]
  89. [note You need to include [^<boost/xpressive/regex_actions.hpp>] to
  90. use format expressions.]
  91. [endsect]