- Regular expressions have a syntax in which a few characters are
-special constructs and the rest are "ordinary". An ordinary character
-is a simple regular expression which matches that character and nothing
-else. The special characters are `$', `^', `.', `*', `+', `?', `[',
-`]' and `\'; no new special characters will be defined. Any other
-character appearing in a regular expression is ordinary, unless a `\'
-precedes it.
-
- For example, `f' is not a special character, so it is ordinary, and
-therefore `f' is a regular expression that matches the string `f' and
-no other string. (It does not match the string `ff'.) Likewise, `o'
-is a regular expression that matches only `o'.
-
- Any two regular expressions A and B can be concatenated. The result
-is a regular expression which matches a string if A matches some amount
-of the beginning of that string and B matches the rest of the string.
-
- As a simple example, you can concatenate the regular expressions `f'
-and `o' to get the regular expression `fo', which matches only the
-string `fo'. To do something nontrivial, you need to use one of the
-following special characters:
-
-`. (Period)'
- is a special character that matches any single character except a
- newline. Using concatenation, you can make regular expressions
- like `a.b', which matches any three-character string which begins
- with `a' and ends with `b'.
-
-`*'
- is not a construct by itself; it is a suffix, which means the
- preceding regular expression is to be repeated as many times as
- possible. In `fo*', the `*' applies to the `o', so `fo*' matches
- one `f' followed by any number of `o's. The case of zero `o's is
- allowed: `fo*' does match `f'.
-
- `*' always applies to the smallest possible preceding expression.
- Thus, `fo*' has a repeating `o', not a repeating `fo'.
-
- The matcher processes a `*' construct by immediately matching as
- many repetitions as it can find. Then it continues with the rest
- of the pattern. If that fails, backtracking occurs, discarding
- some of the matches of the `*'-modified construct in case that
- makes it possible to match the rest of the pattern. For example,
- matching `ca*ar' against the string `caaar', the `a*' first tries
- to match all three `a's; but the rest of the pattern is `ar' and
- there is only `r' left to match, so this try fails. The next
- alternative is for `a*' to match only two `a's. With this choice,
- the rest of the regexp matches successfully.
-
-`+'
- is a suffix character similar to `*' except that it requires that
- the preceding expression be matched at least once. For example,
- `ca+r' will match the strings `car' and `caaaar' but not the
- string `cr', whereas `ca*r' would match all three strings.
-
-`?'
- is a suffix character similar to `*' except that it can match the
- preceding expression either once or not at all. For example,
- `ca?r' will match `car' or `cr'; nothing else.
-
-`[ ... ]'
- `[' begins a "character set", which is terminated by a `]'. In
- the simplest case, the characters between the two form the set.
- Thus, `[ad]' matches either one `a' or one `d', and `[ad]*'
- matches any string composed of just `a's and `d's (including the
- empty string), from which it follows that `c[ad]*r' matches `cr',
- `car', `cdr', `caddaar', etc.
-
- You can include character ranges in a character set by writing two
- characters with a `-' between them. Thus, `[a-z]' matches any
- lower-case letter. Ranges may be intermixed freely with individual
- characters, as in `[a-z$%.]', which matches any lower-case letter
- or `$', `%', or period.
-
- Note that inside a character set the usual special characters are
- not special any more. A completely different set of special
- characters exists inside character sets: `]', `-', and `^'.
-
- To include a `]' in a character set, you must make it the first
- character. For example, `[]a]' matches `]' or `a'. To include a
- `-', write `---', which is a range containing only `-'. To
- include `^', make it other than the first character in the set.
-
-`[^ ... ]'
- `[^' begins a "complement character set", which matches any
- character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
- all characters except letters and digits.
-
- `^' is not special in a character set unless it is the first
- character. The character following the `^' is treated as if it
- were first (`-' and `]' are not special there).
-
- Note that a complement character set can match a newline, unless
- newline is mentioned as one of the characters not to match.
-
-`^'
- is a special character that matches the empty string, but only if
- at the beginning of a line in the text being matched. Otherwise,
- it fails to match anything. Thus, `^foo' matches a `foo' that
- occurs at the beginning of a line.
-
-`$'
- is similar to `^' but matches only at the end of a line. Thus,
- `xx*$' matches a string of one `x' or more at the end of a line.
-
-`\'
- does two things: it quotes the special characters (including `\'),
- and it introduces additional special constructs.
-
- Because `\' quotes special characters, `\$' is a regular
- expression that matches only `$', and `\[' is a regular expression
- that matches only `[', and so on.
-
- Note: for historical compatibility, special characters are treated as
-ordinary ones if they are in contexts where their special meanings make
-no sense. For example, `*foo' treats `*' as ordinary since there is no
-preceding expression on which the `*' can act. It is poor practice to
-depend on this behavior; better to quote the special character anyway,
-regardless of where is appears.
-
- Usually, `\' followed by any character matches only that character.
-However, there are several exceptions: characters which, when preceded
-by `\', are special constructs. Such characters are always ordinary
-when encountered on their own. Here is a table of `\' constructs.
-
-`\|'
- specifies an alternative. Two regular expressions A and B with
- `\|' in between form an expression that matches anything A or B
- matches.
-
- Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
-
- `\|' applies to the largest possible surrounding expressions.
- Only a surrounding `\( ... \)' grouping can limit the grouping
- power of `\|'.
-
- Full backtracking capability exists to handle multiple uses of
- `\|'.
-
-`\( ... \)'
- is a grouping construct that serves three purposes:
-
- 1. To enclose a set of `\|' alternatives for other operations.
- Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
-
- 2. To enclose a complicated expression for the postfix `*' to
- operate on. Thus, `ba\(na\)*' matches `bananana', etc., with
- any (zero or more) number of `na' strings.
-
- 3. To mark a matched substring for future reference.
-
-
- This last application is not a consequence of the idea of a
- parenthetical grouping; it is a separate feature which happens to
- be assigned as a second meaning to the same `\( ... \)' construct
- because in practice there is no conflict between the two meanings.
- Here is an explanation:
-
-`\DIGIT'
- after the end of a `\( ... \)' construct, the matcher remembers the
- beginning and end of the text matched by that construct. Then,
- later on in the regular expression, you can use `\' followed by
- DIGIT to mean "match the same text matched the DIGIT'th time by the
- `\( ... \)' construct."
-
- The strings matching the first nine `\( ... \)' constructs
- appearing in a regular expression are assigned numbers 1 through 9
- in order that the open-parentheses appear in the regular
- expression. `\1' through `\9' may be used to refer to the text
- matched by the corresponding `\( ... \)' construct.
-
- For example, `\(.*\)\1' matches any newline-free string that is
- composed of two identical halves. The `\(.*\)' matches the first
- half, which may be anything, but the `\1' that follows must match
- the same exact text.
-
-`\`'
- matches the empty string, provided it is at the beginning of the
- buffer.
-
-`\''
- matches the empty string, provided it is at the end of the buffer.
-
-`\b'
- matches the empty string, provided it is at the beginning or end
- of a word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
- separate word. `\bballs?\b' matches `ball' or `balls' as a
- separate word.
-
-`\B'
- matches the empty string, provided it is not at the beginning or
- end of a word.
-
-`\<'
- matches the empty string, provided it is at the beginning of a
- word.
-
-`\>'
- matches the empty string, provided it is at the end of a word.
-
-`\w'
- matches any word-constituent character. The editor syntax table
- determines which characters these are.
-
-`\W'
- matches any character that is not a word-constituent.
-
-`\sCODE'
- matches any character whose syntax is CODE. CODE is a character
- which represents a syntax code: thus, `w' for word constituent,
- `-' for whitespace, `(' for open-parenthesis, etc. *Note Syntax::.
-
-`\SCODE'
- matches any character whose syntax is not CODE.
-
- Here is a complicated regexp used by Emacs to recognize the end of a
-sentence together with any whitespace that follows. It is given in Lisp
-syntax to enable you to distinguish the spaces from the tab characters.
-In Lisp syntax, the string constant begins and ends with a
-double-quote. `\"' stands for a double-quote as part of the regexp,
-`\\' for a backslash as part of the regexp, `\t' for a tab and `\n' for
-a newline.