lexer.qbk 5.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104
  1. [/==============================================================================
  2. Copyright (C) 2001-2011 Joel de Guzman
  3. Copyright (C) 2001-2011 Hartmut Kaiser
  4. Distributed under the Boost Software License, Version 1.0. (See accompanying
  5. file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
  6. ===============================================================================/]
  7. [section:lexer Supported Regular Expressions]
  8. [table Regular expressions support
  9. [[Expression] [Meaning]]
  10. [[`x`] [Match any character `x`]]
  11. [[`.`] [Match any except newline (or optionally *any* character)]]
  12. [[`"..."`] [All characters taken as literals between double quotes, except escape sequences]]
  13. [[`[xyz]`] [A character class; in this case matches `x`, `y` or `z`]]
  14. [[`[abj-oZ]`] [A character class with a range in it; matches `a`, `b` any
  15. letter from `j` through `o` or a `Z`]]
  16. [[`[^A-Z]`] [A negated character class i.e. any character but those in
  17. the class. In this case, any character except an uppercase
  18. letter]]
  19. [[`r*`] [Zero or more r's (greedy), where r is any regular expression]]
  20. [[`r*?`] [Zero or more r's (abstemious), where r is any regular expression]]
  21. [[`r+`] [One or more r's (greedy)]]
  22. [[`r+?`] [One or more r's (abstemious)]]
  23. [[`r?`] [Zero or one r's (greedy), i.e. optional]]
  24. [[`r??`] [Zero or one r's (abstemious), i.e. optional]]
  25. [[`r{2,5}`] [Anywhere between two and five r's (greedy)]]
  26. [[`r{2,5}?`] [Anywhere between two and five r's (abstemious)]]
  27. [[`r{2,}`] [Two or more r's (greedy)]]
  28. [[`r{2,}?`] [Two or more r's (abstemious)]]
  29. [[`r{4}`] [Exactly four r's]]
  30. [[`{NAME}`] [The macro `NAME` (see below)]]
  31. [[`"[xyz]\"foo"`] [The literal string `[xyz]\"foo`]]
  32. [[`\X`] [If X is `a`, `b`, `e`, `n`, `r`, `f`, `t`, `v` then the
  33. ANSI-C interpretation of `\x`. Otherwise a literal `X`
  34. (used to escape operators such as `*`)]]
  35. [[`\0`] [A NUL character (ASCII code 0)]]
  36. [[`\123`] [The character with octal value 123]]
  37. [[`\x2a`] [The character with hexadecimal value 2a]]
  38. [[`\cX`] [A named control character `X`.]]
  39. [[`\a`] [A shortcut for Alert (bell).]]
  40. [[`\b`] [A shortcut for Backspace]]
  41. [[`\e`] [A shortcut for ESC (escape character `0x1b`)]]
  42. [[`\n`] [A shortcut for newline]]
  43. [[`\r`] [A shortcut for carriage return]]
  44. [[`\f`] [A shortcut for form feed `0x0c`]]
  45. [[`\t`] [A shortcut for horizontal tab `0x09`]]
  46. [[`\v`] [A shortcut for vertical tab `0x0b`]]
  47. [[`\d`] [A shortcut for `[0-9]`]]
  48. [[`\D`] [A shortcut for `[^0-9]`]]
  49. [[`\s`] [A shortcut for `[\x20\t\n\r\f\v]`]]
  50. [[`\S`] [A shortcut for `[^\x20\t\n\r\f\v]`]]
  51. [[`\w`] [A shortcut for `[a-zA-Z0-9_]`]]
  52. [[`\W`] [A shortcut for `[^a-zA-Z0-9_]`]]
  53. [[`(r)`] [Match an `r`; parenthesis are used to override precedence
  54. (see below)]]
  55. [[`(?r-s:pattern)`] [apply option 'r' and omit option 's' while interpreting pattern.
  56. Options may be zero or more of the characters 'i' or 's'.
  57. 'i' means case-insensitive. '-i' means case-sensitive.
  58. 's' alters the meaning of the '.' syntax to match any single character whatsoever.
  59. '-s' alters the meaning of '.' to match any character except '`\n`'.]]
  60. [[`rs`] [The regular expression `r` followed by the regular
  61. expression `s` (a sequence)]]
  62. [[`r|s`] [Either an `r` or and `s`]]
  63. [[`^r`] [An `r` but only at the beginning of a line (i.e. when just
  64. starting to scan, or right after a newline has been
  65. scanned)]]
  66. [[`r`$] [An `r` but only at the end of a line (i.e. just before a
  67. newline)]]
  68. ]
  69. [note POSIX character classes are not currently supported, due to performance issues
  70. when creating them in wide character mode.]
  71. [tip If you want to build tokens for syntaxes that recognize items like quotes
  72. (`"'"`, `'"'`) and backslash (`\`), here is example syntax to get you started.
  73. The lesson here really is to remember that both c++, as well as regular
  74. expressions require escaping with `\` for some constructs, which can
  75. cascade.
  76. ``
  77. quote1 = "'"; // match single "'"
  78. quote2 = "\\\""; // match single '"'
  79. literal_quote1 = "\\'"; // match backslash followed by single "'"
  80. literal_quote2 = "\\\\\\\""; // match backslash followed by single '"'
  81. literal_backslash = "\\\\\\\\"; // match two backslashes
  82. ``
  83. ]
  84. [heading Regular Expression Precedence]
  85. * `rs` has highest precedence
  86. * `r*` has next highest (`+`, `?`, `{n,m}` have the same precedence as `*`)
  87. * `r|s` has the lowest precedence
  88. [heading Macros]
  89. Regular expressions can be given a name and referred to in rules using the
  90. syntax `{NAME}` where `NAME` is the name you have given to the macro. A macro
  91. name can be at most 30 characters long and must start with a `_` or a letter.
  92. Subsequent characters can be `_`, `-`, a letter or a decimal digit.
  93. [endsect]