introduction.qbk 5.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
  1. [/
  2. / Copyright (c) 2008 Eric Niebler
  3. /
  4. / Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. /]
  7. [section Introduction]
  8. [h2 What is xpressive?]
  9. xpressive is a regular expression template library. Regular expressions
  10. (regexes) can be written as strings that are parsed dynamically at runtime
  11. (dynamic regexes), or as ['expression templates][footnote See
  12. [@http://www.osl.iu.edu/~tveldhui/papers/Expression-Templates/exprtmpl.html
  13. Expression Templates]] that are parsed at compile-time (static regexes).
  14. Dynamic regexes have the advantage that they can be accepted from the user
  15. as input at runtime or read from an initialization file. Static regexes
  16. have several advantages. Since they are C++ expressions instead of
  17. strings, they can be syntax-checked at compile-time. Also, they can naturally
  18. refer to code and data elsewhere in your program, giving you the ability to call
  19. back into your code from within a regex match. Finally, since they are statically
  20. bound, the compiler can generate faster code for static regexes.
  21. xpressive's dual nature is unique and powerful. Static xpressive is a bit
  22. like the _spirit_fx_. Like _spirit_, you can build grammars with
  23. static regexes using expression templates. (Unlike _spirit_, xpressive does
  24. exhaustive backtracking, trying every possibility to find a match for your
  25. pattern.) Dynamic xpressive is a bit like _regexpp_. In fact,
  26. xpressive's interface should be familiar to anyone who has used _regexpp_.
  27. xpressive's innovation comes from allowing you to mix and match static and
  28. dynamic regexes in the same program, and even in the same expression! You
  29. can embed a dynamic regex in a static regex, or /vice versa/, and the embedded
  30. regex will participate fully in the search, back-tracking as needed to make
  31. the match succeed.
  32. [h2 Hello, world!]
  33. Enough theory. Let's have a look at ['Hello World], xpressive style:
  34. #include <iostream>
  35. #include <boost/xpressive/xpressive.hpp>
  36. using namespace boost::xpressive;
  37. int main()
  38. {
  39. std::string hello( "hello world!" );
  40. sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
  41. smatch what;
  42. if( regex_match( hello, what, rex ) )
  43. {
  44. std::cout << what[0] << '\n'; // whole match
  45. std::cout << what[1] << '\n'; // first capture
  46. std::cout << what[2] << '\n'; // second capture
  47. }
  48. return 0;
  49. }
  50. This program outputs the following:
  51. [pre
  52. hello world!
  53. hello
  54. world
  55. ]
  56. The first thing you'll notice about the code is that all the types in xpressive live in
  57. the `boost::xpressive` namespace.
  58. [note Most of the rest of the examples in this document will leave off the
  59. `using namespace boost::xpressive;` directive. Just pretend it's there.]
  60. Next, you'll notice the type of the regular expression object is `sregex`. If you are familiar
  61. with _regexpp_, this is different than what you are used to. The "`s`" in "`sregex`" stands for
  62. "`string`", indicating that this regex can be used to find patterns in `std::string` objects.
  63. I'll discuss this difference and its implications in detail later.
  64. Notice how the regex object is initialized:
  65. sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
  66. To create a regular expression object from a string, you must call a factory method such as
  67. _regex_compile_. This is another area in which xpressive differs from
  68. other object-oriented regular expression libraries. Other libraries encourage you to think of
  69. a regular expression as a kind of string on steroids. In xpressive, regular expressions are not
  70. strings; they are little programs in a domain-specific language. Strings are only one ['representation]
  71. of that language. Another representation is an expression template. For example, the above line of code
  72. is equivalent to the following:
  73. sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!';
  74. This describes the same regular expression, except it uses the domain-specific embedded language
  75. defined by static xpressive.
  76. As you can see, static regexes have a syntax that is noticeably different than standard Perl
  77. syntax. That is because we are constrained by C++'s syntax. The biggest difference is the use
  78. of `>>` to mean "followed by". For instance, in Perl you can just put sub-expressions next
  79. to each other:
  80. abc
  81. But in C++, there must be an operator separating sub-expressions:
  82. a >> b >> c
  83. In Perl, parentheses `()` have special meaning. They group, but as a side-effect they also create
  84. back-references like [^$1] and [^$2]. In C++, there is no way to overload parentheses to give them
  85. side-effects. To get the same effect, we use the special `s1`, `s2`, etc. tokens. Assign to
  86. one to create a back-reference (known as a sub-match in xpressive).
  87. You'll also notice that the one-or-more repetition operator `+` has moved from postfix
  88. to prefix position. That's because C++ doesn't have a postfix `+` operator. So:
  89. "\\w+"
  90. is the same as:
  91. +_w
  92. We'll cover all the other differences [link boost_xpressive.user_s_guide.creating_a_regex_object.static_regexes later].
  93. [endsect]