git.chise.org Git - chise/xemacs-chise.git.1/blob - man/lispref/searching.texi

   1 @c -*-texinfo-*-
   2 @c This is part of the XEmacs Lisp Reference Manual.
   3 @c Copyright (C) 1990, 1991, 1992, 1993, 1994 Free Software Foundation, Inc.
   4 @c See the file lispref.texi for copying conditions.
   5 @setfilename ../../info/searching.info
   6 @node Searching and Matching, Syntax Tables, Text, Top
   7 @chapter Searching and Matching
   8 @cindex searching
   9
  10   XEmacs provides two ways to search through a buffer for specified
  11 text: exact string searches and regular expression searches.  After a
  12 regular expression search, you can examine the @dfn{match data} to
  13 determine which text matched the whole regular expression or various
  14 portions of it.
  15
  16 @menu
  17 * String Search::         Search for an exact match.
  18 * Regular Expressions::   Describing classes of strings.
  19 * Regexp Search::         Searching for a match for a regexp.
  20 * POSIX Regexps::         Searching POSIX-style for the longest match.
  21 * Search and Replace::    Internals of @code{query-replace}.
  22 * Match Data::            Finding out which part of the text matched
  23                             various parts of a regexp, after regexp search.
  24 * Searching and Case::    Case-independent or case-significant searching.
  25 * Standard Regexps::      Useful regexps for finding sentences, pages,...
  26 @end menu
  27
  28   The @samp{skip-chars@dots{}} functions also perform a kind of searching.
  29 @xref{Skipping Characters}.
  30
  31 @node String Search
  32 @section Searching for Strings
  33 @cindex string search
  34
  35   These are the primitive functions for searching through the text in a
  36 buffer.  They are meant for use in programs, but you may call them
  37 interactively.  If you do so, they prompt for the search string;
  38 @var{limit} and @var{noerror} are set to @code{nil}, and @var{count}
  39 is set to 1.
  40
  41 @deffn Command search-forward string &optional limit noerror count buffer
  42   This function searches forward from point for an exact match for
  43 @var{string}.  If successful, it sets point to the end of the occurrence
  44 found, and returns the new value of point.  If no match is found, the
  45 value and side effects depend on @var{noerror} (see below).
  46
  47   In the following example, point is initially at the beginning of the
  48 line.  Then @code{(search-forward "fox")} moves point after the last
  49 letter of @samp{fox}:
  50
  51 @example
  52 @group
  53 ---------- Buffer: foo ----------
  54 @point{}The quick brown fox jumped over the lazy dog.
  55 ---------- Buffer: foo ----------
  56 @end group
  57
  58 @group
  59 (search-forward "fox")
  60      @result{} 20
  61
  62 ---------- Buffer: foo ----------
  63 The quick brown fox@point{} jumped over the lazy dog.
  64 ---------- Buffer: foo ----------
  65 @end group
  66 @end example
  67
  68   The argument @var{limit} specifies the upper bound to the search.  (It
  69 must be a position in the current buffer.)  No match extending after
  70 that position is accepted.  If @var{limit} is omitted or @code{nil}, it
  71 defaults to the end of the accessible portion of the buffer.
  72
  73 @kindex search-failed
  74   What happens when the search fails depends on the value of
  75 @var{noerror}.  If @var{noerror} is @code{nil}, a @code{search-failed}
  76 error is signaled.  If @var{noerror} is @code{t}, @code{search-forward}
  77 returns @code{nil} and does nothing.  If @var{noerror} is neither
  78 @code{nil} nor @code{t}, then @code{search-forward} moves point to the
  79 upper bound and returns @code{nil}.  (It would be more consistent now
  80 to return the new position of point in that case, but some programs
  81 may depend on a value of @code{nil}.)
  82
  83 If @var{count} is supplied (it must be an integer), then the search is
  84 repeated that many times (each time starting at the end of the previous
  85 time's match).  If @var{count} is negative, the search direction is
  86 backward.  If the successive searches succeed, the function succeeds,
  87 moving point and returning its new value.  Otherwise the search fails.
  88
  89 @var{buffer} is the buffer to search in, and defaults to the current buffer.
  90 @end deffn
  91
  92 @deffn Command search-backward string &optional limit noerror count buffer
  93 This function searches backward from point for @var{string}.  It is
  94 just like @code{search-forward} except that it searches backwards and
  95 leaves point at the beginning of the match.
  96 @end deffn
  97
  98 @deffn Command word-search-forward string &optional limit noerror count buffer
  99 @cindex word search
 100 This function searches forward from point for a ``word'' match for
 101 @var{string}.  If it finds a match, it sets point to the end of the
 102 match found, and returns the new value of point.
 103
 104 Word matching regards @var{string} as a sequence of words, disregarding
 105 punctuation that separates them.  It searches the buffer for the same
 106 sequence of words.  Each word must be distinct in the buffer (searching
 107 for the word @samp{ball} does not match the word @samp{balls}), but the
 108 details of punctuation and spacing are ignored (searching for @samp{ball
 109 boy} does match @samp{ball.  Boy!}).
 110
 111 In this example, point is initially at the beginning of the buffer; the
 112 search leaves it between the @samp{y} and the @samp{!}.
 113
 114 @example
 115 @group
 116 ---------- Buffer: foo ----------
 117 @point{}He said "Please!  Find
 118 the ball boy!"
 119 ---------- Buffer: foo ----------
 120 @end group
 121
 122 @group
 123 (word-search-forward "Please find the ball, boy.")
 124      @result{} 35
 125
 126 ---------- Buffer: foo ----------
 127 He said "Please!  Find
 128 the ball boy@point{}!"
 129 ---------- Buffer: foo ----------
 130 @end group
 131 @end example
 132
 133 If @var{limit} is non-@code{nil} (it must be a position in the current
 134 buffer), then it is the upper bound to the search.  The match found must
 135 not extend after that position.
 136
 137 If @var{noerror} is @code{nil}, then @code{word-search-forward} signals
 138 an error if the search fails.  If @var{noerror} is @code{t}, then it
 139 returns @code{nil} instead of signaling an error.  If @var{noerror} is
 140 neither @code{nil} nor @code{t}, it moves point to @var{limit} (or the
 141 end of the buffer) and returns @code{nil}.
 142
 143 If @var{count} is non-@code{nil}, then the search is repeated that many
 144 times.  Point is positioned at the end of the last match.
 145
 146 @var{buffer} is the buffer to search in, and defaults to the current buffer.
 147 @end deffn
 148
 149 @deffn Command word-search-backward string &optional limit noerror count buffer
 150 This function searches backward from point for a word match to
 151 @var{string}.  This function is just like @code{word-search-forward}
 152 except that it searches backward and normally leaves point at the
 153 beginning of the match.
 154 @end deffn
 155
 156 @node Regular Expressions
 157 @section Regular Expressions
 158 @cindex regular expression
 159 @cindex regexp
 160
 161   A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that
 162 denotes a (possibly infinite) set of strings.  Searching for matches for
 163 a regexp is a very powerful operation.  This section explains how to write
 164 regexps; the following section says how to search for them.
 165
 166  To gain a thorough understanding of regular expressions and how to use
 167 them to best advantage, we recommend that you study @cite{Mastering
 168 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
 169 1997}. (It's known as the "Hip Owls" book, because of the picture on its
 170 cover.)  You might also read the manuals to @ref{(gawk)Top},
 171 @ref{(ed)Top}, @cite{sed}, @cite{grep}, @ref{(perl)Top},
 172 @ref{(regex)Top}, @ref{(rx)Top}, @cite{pcre}, and @ref{(flex)Top}, which
 173 also make good use of regular expressions.
 174
 175  The XEmacs regular expression syntax most closely resembles that of
 176 @cite{ed}, or @cite{grep}, the GNU versions of which all utilize the GNU
 177 @cite{regex} library.  XEmacs' version of @cite{regex} has recently been
 178 extended with some Perl--like capabilities, described in the next
 179 section.
 180
 181 @menu
 182 * Syntax of Regexps::       Rules for writing regular expressions.
 183 * Regexp Example::          Illustrates regular expression syntax.
 184 @end menu
 185
 186 @node Syntax of Regexps
 187 @subsection Syntax of Regular Expressions
 188
 189   Regular expressions have a syntax in which a few characters are
 190 special constructs and the rest are @dfn{ordinary}.  An ordinary
 191 character is a simple regular expression that matches that character and
 192 nothing else.  The special characters are @samp{.}, @samp{*}, @samp{+},
 193 @samp{?}, @samp{[}, @samp{]}, @samp{^}, @samp{$}, and @samp{\}; no new
 194 special characters will be defined in the future.  Any other character
 195 appearing in a regular expression is ordinary, unless a @samp{\}
 196 precedes it.
 197
 198 For example, @samp{f} is not a special character, so it is ordinary, and
 199 therefore @samp{f} is a regular expression that matches the string
 200 @samp{f} and no other string.  (It does @emph{not} match the string
 201 @samp{ff}.)  Likewise, @samp{o} is a regular expression that matches
 202 only @samp{o}.@refill
 203
 204 Any two regular expressions @var{a} and @var{b} can be concatenated.  The
 205 result is a regular expression that matches a string if @var{a} matches
 206 some amount of the beginning of that string and @var{b} matches the rest of
 207 the string.@refill
 208
 209 As a simple example, we can concatenate the regular expressions @samp{f}
 210 and @samp{o} to get the regular expression @samp{fo}, which matches only
 211 the string @samp{fo}.  Still trivial.  To do something more powerful, you
 212 need to use one of the special characters.  Here is a list of them:
 213
 214 @need 1200
 215 @table @kbd
 216 @item .@: @r{(Period)}
 217 @cindex @samp{.} in regexp
 218 is a special character that matches any single character except a newline.
 219 Using concatenation, we can make regular expressions like @samp{a.b}, which
 220 matches any three-character string that begins with @samp{a} and ends with
 221 @samp{b}.@refill
 222
 223 @item *
 224 @cindex @samp{*} in regexp
 225 is not a construct by itself; it is a quantifying suffix operator that
 226 means to repeat the preceding regular expression as many times as
 227 possible.  In @samp{fo*}, the @samp{*} applies to the @samp{o}, so
 228 @samp{fo*} matches one @samp{f} followed by any number of @samp{o}s.
 229 The case of zero @samp{o}s is allowed: @samp{fo*} does match
 230 @samp{f}.@refill
 231
 232 @samp{*} always applies to the @emph{smallest} possible preceding
 233 expression.  Thus, @samp{fo*} has a repeating @samp{o}, not a
 234 repeating @samp{fo}.@refill
 235
 236 The matcher processes a @samp{*} construct by matching, immediately, as
 237 many repetitions as can be found; it is "greedy".  Then it continues
 238 with the rest of the pattern.  If that fails, backtracking occurs,
 239 discarding some of the matches of the @samp{*}-modified construct in
 240 case that makes it possible to match the rest of the pattern.  For
 241 example, in matching @samp{ca*ar} against the string @samp{caaar}, the
 242 @samp{a*} first tries to match all three @samp{a}s; but the rest of the
 243 pattern is @samp{ar} and there is only @samp{r} left to match, so this
 244 try fails.  The next alternative is for @samp{a*} to match only two
 245 @samp{a}s.  With this choice, the rest of the regexp matches
 246 successfully.@refill
 247
 248 Nested repetition operators can be extremely slow if they specify
 249 backtracking loops.  For example, it could take hours for the regular
 250 expression @samp{\(x+y*\)*a} to match the sequence
 251 @samp{xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz}.  The slowness is because
 252 Emacs must try each imaginable way of grouping the 35 @samp{x}'s before
 253 concluding that none of them can work.  To make sure your regular
 254 expressions run fast, check nested repetitions carefully.
 255
 256 @item +
 257 @cindex @samp{+} in regexp
 258 is a quantifying suffix operator similar to @samp{*} except that the
 259 preceding expression must match at least once.  It is also "greedy".
 260 So, for example, @samp{ca+r} matches the strings @samp{car} and
 261 @samp{caaaar} but not the string @samp{cr}, whereas @samp{ca*r} matches
 262 all three strings.
 263
 264 @item ?
 265 @cindex @samp{?} in regexp
 266 is a quantifying suffix operator similar to @samp{*}, except that the
 267 preceding expression can match either once or not at all.  For example,
 268 @samp{ca?r} matches @samp{car} or @samp{cr}, but does not match anything
 269 else.
 270
 271 @item *?
 272 @cindex @samp{*?} in regexp
 273 works just like @samp{*}, except that rather than matching the longest
 274 match, it matches the shortest match.  @samp{*?} is known as a
 275 @dfn{non-greedy} quantifier, a regexp construct borrowed from Perl.
 276 @c Did perl get this from somewhere?  What's the real history of *? ?
 277
 278 This construct is very useful for when you want to match the text inside
 279 a pair of delimiters.  For instance, @samp{/\*.*?\*/} will match C
 280 comments in a string.  This could not easily be achieved without the use
 281 of a non-greedy quantifier.
 282
 283 This construct has not been available prior to XEmacs 20.4.  It is not
 284 available in FSF Emacs.
 285
 286 @item +?
 287 @cindex @samp{+?} in regexp
 288 is the non-greedy version of @samp{+}.
 289
 290 @item ??
 291 @cindex @samp{??} in regexp
 292 is the non-greedy version of @samp{?}.
 293
 294 @item \@{n,m\@}
 295 @c Note the spacing after the close brace is deliberate.
 296 @cindex @samp{\@{n,m\@} }in regexp
 297 serves as an interval quantifier, analogous to @samp{*} or @samp{+}, but
 298 specifies that the expression must match at least @var{n} times, but no
 299 more than @var{m} times.  This syntax is supported by most Unix regexp
 300 utilities, and has been introduced to XEmacs for the version 20.3.
 301
 302 Unfortunately, the non-greedy version of this quantifier does not exist
 303 currently, although it does in Perl.
 304
 305 @item [ @dots{} ]
 306 @cindex character set (in regexp)
 307 @cindex @samp{[} in regexp
 308 @cindex @samp{]} in regexp
 309 @samp{[} begins a @dfn{character set}, which is terminated by a
 310 @samp{]}.  In the simplest case, the characters between the two brackets
 311 form the set.  Thus, @samp{[ad]} matches either one @samp{a} or one
 312 @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s
 313 and @samp{d}s (including the empty string), from which it follows that
 314 @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr},
 315 @samp{caddaar}, etc.@refill
 316
 317 The usual regular expression special characters are not special inside a
 318 character set.  A completely different set of special characters exists
 319 inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill
 320
 321 @samp{-} is used for ranges of characters.  To write a range, write two
 322 characters with a @samp{-} between them.  Thus, @samp{[a-z]} matches any
 323 lower case letter.  Ranges may be intermixed freely with individual
 324 characters, as in @samp{[a-z$%.]}, which matches any lower case letter
 325 or @samp{$}, @samp{%}, or a period.@refill
 326
 327 To include a @samp{]} in a character set, make it the first character.
 328 For example, @samp{[]a]} matches @samp{]} or @samp{a}.  To include a
 329 @samp{-}, write @samp{-} as the first character in the set, or put it
 330 immediately after a range.  (You can replace one individual character
 331 @var{c} with the range @samp{@var{c}-@var{c}} to make a place to put the
 332 @samp{-}.)  There is no way to write a set containing just @samp{-} and
 333 @samp{]}.
 334
 335 To include @samp{^} in a set, put it anywhere but at the beginning of
 336 the set.
 337
 338 @item [^ @dots{} ]
 339 @cindex @samp{^} in regexp
 340 @samp{[^} begins a @dfn{complement character set}, which matches any
 341 character except the ones specified.  Thus, @samp{[^a-z0-9A-Z]}
 342 matches all characters @emph{except} letters and digits.@refill
 343
 344 @samp{^} is not special in a character set unless it is the first
 345 character.  The character following the @samp{^} is treated as if it
 346 were first (thus, @samp{-} and @samp{]} are not special there).
 347
 348 Note that a complement character set can match a newline, unless
 349 newline is mentioned as one of the characters not to match.
 350
 351 @item ^
 352 @cindex @samp{^} in regexp
 353 @cindex beginning of line in regexp
 354 is a special character that matches the empty string, but only at the
 355 beginning of a line in the text being matched.  Otherwise it fails to
 356 match anything.  Thus, @samp{^foo} matches a @samp{foo} that occurs at
 357 the beginning of a line.
 358
 359 When matching a string instead of a buffer, @samp{^} matches at the
 360 beginning of the string or after a newline character @samp{\n}.
 361
 362 @item $
 363 @cindex @samp{$} in regexp
 364 is similar to @samp{^} but matches only at the end of a line.  Thus,
 365 @samp{x+$} matches a string of one @samp{x} or more at the end of a line.
 366
 367 When matching a string instead of a buffer, @samp{$} matches at the end
 368 of the string or before a newline character @samp{\n}.
 369
 370 @item \
 371 @cindex @samp{\} in regexp
 372 has two functions: it quotes the special characters (including
 373 @samp{\}), and it introduces additional special constructs.
 374
 375 Because @samp{\} quotes special characters, @samp{\$} is a regular
 376 expression that matches only @samp{$}, and @samp{\[} is a regular
 377 expression that matches only @samp{[}, and so on.
 378
 379 Note that @samp{\} also has special meaning in the read syntax of Lisp
 380 strings (@pxref{String Type}), and must be quoted with @samp{\}.  For
 381 example, the regular expression that matches the @samp{\} character is
 382 @samp{\\}.  To write a Lisp string that contains the characters
 383 @samp{\\}, Lisp syntax requires you to quote each @samp{\} with another
 384 @samp{\}.  Therefore, the read syntax for a regular expression matching
 385 @samp{\} is @code{"\\\\"}.@refill
 386 @end table
 387
 388 @strong{Please note:} For historical compatibility, special characters
 389 are treated as ordinary ones if they are in contexts where their special
 390 meanings make no sense.  For example, @samp{*foo} treats @samp{*} as
 391 ordinary since there is no preceding expression on which the @samp{*}
 392 can act.  It is poor practice to depend on this behavior; quote the
 393 special character anyway, regardless of where it appears.@refill
 394
 395 For the most part, @samp{\} followed by any character matches only
 396 that character.  However, there are several exceptions: characters
 397 that, when preceded by @samp{\}, are special constructs.  Such
 398 characters are always ordinary when encountered on their own.  Here
 399 is a table of @samp{\} constructs:
 400
 401 @table @kbd
 402 @item \|
 403 @cindex @samp{|} in regexp
 404 @cindex regexp alternative
 405 specifies an alternative.
 406 Two regular expressions @var{a} and @var{b} with @samp{\|} in
 407 between form an expression that matches anything that either @var{a} or
 408 @var{b} matches.@refill
 409
 410 Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar}
 411 but no other string.@refill
 412
 413 @samp{\|} applies to the largest possible surrounding expressions.  Only a
 414 surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of
 415 @samp{\|}.@refill
 416
 417 Full backtracking capability exists to handle multiple uses of @samp{\|}.
 418
 419 @item \( @dots{} \)
 420 @cindex @samp{(} in regexp
 421 @cindex @samp{)} in regexp
 422 @cindex regexp grouping
 423 is a grouping construct that serves three purposes:
 424
 425 @enumerate
 426 @item
 427 To enclose a set of @samp{\|} alternatives for other operations.
 428 Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}.
 429
 430 @item
 431 To enclose an expression for a suffix operator such as @samp{*} to act
 432 on.  Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any
 433 (zero or more) number of @samp{na} strings.@refill
 434
 435 @item
 436 To record a matched substring for future reference.
 437 @end enumerate
 438
 439 This last application is not a consequence of the idea of a
 440 parenthetical grouping; it is a separate feature that happens to be
 441 assigned as a second meaning to the same @samp{\( @dots{} \)} construct
 442 because there is no conflict in practice between the two meanings.
 443 Here is an explanation of this feature:
 444
 445 @item \@var{digit}
 446 matches the same text that matched the @var{digit}th occurrence of a
 447 @samp{\( @dots{} \)} construct.
 448
 449 In other words, after the end of a @samp{\( @dots{} \)} construct, the
 450 matcher remembers the beginning and end of the text matched by that
 451 construct.  Then, later on in the regular expression, you can use
 452 @samp{\} followed by @var{digit} to match that same text, whatever it
 453 may have been.
 454
 455 The strings matching the first nine @samp{\( @dots{} \)} constructs
 456 appearing in a regular expression are assigned numbers 1 through 9 in
 457 the order that the open parentheses appear in the regular expression.
 458 So you can use @samp{\1} through @samp{\9} to refer to the text matched
 459 by the corresponding @samp{\( @dots{} \)} constructs.
 460
 461 For example, @samp{\(.*\)\1} matches any newline-free string that is
 462 composed of two identical halves.  The @samp{\(.*\)} matches the first
 463 half, which may be anything, but the @samp{\1} that follows must match
 464 the same exact text.
 465
 466 @item \(?: @dots{} \)
 467 @cindex @samp{\(?:} in regexp
 468 @cindex regexp grouping
 469 is called a @dfn{shy} grouping operator, and it is used just like
 470 @samp{\( @dots{} \)}, except that it does not cause the matched
 471 substring to be recorded for future reference.
 472
 473 This is useful when you need a lot of grouping @samp{\( @dots{} \)}
 474 constructs, but only want to remember one or two -- or if you have
 475 more than nine groupings and need to use backreferences to refer to
 476 the groupings at the end.  It also allows construction of regular
 477 expressions from variable subexpressions that contain varying numbers of
 478 non-capturing subexpressions, without disturbing the group counts for
 479 the main expression.  For example
 480
 481 @example
 482 (let ((sre (if foo "\\(?:bar\\|baz\\)" "quux")))
 483   (re-search-forward (format "a\\(b+ %s c+\\) d" sre) nil t)
 484   (match-string 1))
 485 @end example
 486
 487 It is very tedious to write this kind of code without shy groups, even
 488 if you know what all the alternative subexpressions will look like.
 489
 490 Using @samp{\(?: @dots{} \)} rather than @samp{\( @dots{} \)} should
 491 give little performance gain, as the start of each group must be
 492 recorded for the purpose of back-tracking in any case, and no string
 493 copying is done until @code{match-string} is called.
 494
 495 The shy grouping operator has been borrowed from Perl, and was not
 496 available prior to XEmacs 20.3, and has only been available in GNU Emacs
 497 since version 21.
 498
 499 @item \w
 500 @cindex @samp{\w} in regexp
 501 matches any word-constituent character.  The editor syntax table
 502 determines which characters these are.  @xref{Syntax Tables}.
 503
 504 @item \W
 505 @cindex @samp{\W} in regexp
 506 matches any character that is not a word constituent.
 507
 508 @item \s@var{code}
 509 @cindex @samp{\s} in regexp
 510 matches any character whose syntax is @var{code}.  Here @var{code} is a
 511 character that represents a syntax code: thus, @samp{w} for word
 512 constituent, @samp{-} for whitespace, @samp{(} for open parenthesis,
 513 etc.  @xref{Syntax Tables}, for a list of syntax codes and the
 514 characters that stand for them.
 515
 516 @item \S@var{code}
 517 @cindex @samp{\S} in regexp
 518 matches any character whose syntax is not @var{code}.
 519 @end table
 520
 521   The following regular expression constructs match the empty string---that is,
 522 they don't use up any characters---but whether they match depends on the
 523 context.
 524
 525 @table @kbd
 526 @item \`
 527 @cindex @samp{\`} in regexp
 528 matches the empty string, but only at the beginning
 529 of the buffer or string being matched against.
 530
 531 @item \'
 532 @cindex @samp{\'} in regexp
 533 matches the empty string, but only at the end of
 534 the buffer or string being matched against.
 535
 536 @item \=
 537 @cindex @samp{\=} in regexp
 538 matches the empty string, but only at point.
 539 (This construct is not defined when matching against a string.)
 540
 541 @item \b
 542 @cindex @samp{\b} in regexp
 543 matches the empty string, but only at the beginning or
 544 end of a word.  Thus, @samp{\bfoo\b} matches any occurrence of
 545 @samp{foo} as a separate word.  @samp{\bballs?\b} matches
 546 @samp{ball} or @samp{balls} as a separate word.@refill
 547
 548 @item \B
 549 @cindex @samp{\B} in regexp
 550 matches the empty string, but @emph{not} at the beginning or
 551 end of a word.
 552
 553 @item \<
 554 @cindex @samp{\<} in regexp
 555 matches the empty string, but only at the beginning of a word.
 556
 557 @item \>
 558 @cindex @samp{\>} in regexp
 559 matches the empty string, but only at the end of a word.
 560 @end table
 561
 562 @kindex invalid-regexp
 563   Not every string is a valid regular expression.  For example, a string
 564 with unbalanced square brackets is invalid (with a few exceptions, such
 565 as @samp{[]]}), and so is a string that ends with a single @samp{\}.  If
 566 an invalid regular expression is passed to any of the search functions,
 567 an @code{invalid-regexp} error is signaled.
 568
 569 @defun regexp-quote string
 570 This function returns a regular expression string that matches exactly
 571 @var{string} and nothing else.  This allows you to request an exact
 572 string match when calling a function that wants a regular expression.
 573
 574 @example
 575 @group
 576 (regexp-quote "^The cat$")
 577      @result{} "\\^The cat\\$"
 578 @end group
 579 @end example
 580
 581 One use of @code{regexp-quote} is to combine an exact string match with
 582 context described as a regular expression.  For example, this searches
 583 for the string that is the value of @code{string}, surrounded by
 584 whitespace:
 585
 586 @example
 587 @group
 588 (re-search-forward
 589  (concat "\\s-" (regexp-quote string) "\\s-"))
 590 @end group
 591 @end example
 592 @end defun
 593
 594 @node Regexp Example
 595 @subsection Complex Regexp Example
 596
 597   Here is a complicated regexp, used by XEmacs to recognize the end of a
 598 sentence together with any whitespace that follows.  It is the value of
 599 the variable @code{sentence-end}.
 600
 601   First, we show the regexp as a string in Lisp syntax to distinguish
 602 spaces from tab characters.  The string constant begins and ends with a
 603 double-quote.  @samp{\"} stands for a double-quote as part of the
 604 string, @samp{\\} for a backslash as part of the string, @samp{\t} for a
 605 tab and @samp{\n} for a newline.
 606
 607 @example
 608 "[.?!][]\"')@}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
 609 @end example
 610
 611   In contrast, if you evaluate the variable @code{sentence-end}, you
 612 will see the following:
 613
 614 @example
 615 @group
 616 sentence-end
 617 @result{}
 618 "[.?!][]\"')@}]*\\($\\| $\\|  \\|  \\)[
 619 ]*"
 620 @end group
 621 @end example
 622
 623 @noindent
 624 In this output, tab and newline appear as themselves.
 625
 626   This regular expression contains four parts in succession and can be
 627 deciphered as follows:
 628
 629 @table @code
 630 @item [.?!]
 631 The first part of the pattern is a character set that matches any one of
 632 three characters: period, question mark, and exclamation mark.  The
 633 match must begin with one of these three characters.
 634
 635 @item []\"')@}]*
 636 The second part of the pattern matches any closing braces and quotation
 637 marks, zero or more of them, that may follow the period, question mark
 638 or exclamation mark.  The @code{\"} is Lisp syntax for a double-quote in
 639 a string.  The @samp{*} at the end indicates that the immediately
 640 preceding regular expression (a character set, in this case) may be
 641 repeated zero or more times.
 642
 643 @item \\($\\|@ $\\|\t\\|@ @ \\)
 644 The third part of the pattern matches the whitespace that follows the
 645 end of a sentence: the end of a line, or a tab, or two spaces.  The
 646 double backslashes mark the parentheses and vertical bars as regular
 647 expression syntax; the parentheses delimit a group and the vertical bars
 648 separate alternatives.  The dollar sign is used to match the end of a
 649 line.
 650
 651 @item [ \t\n]*
 652 Finally, the last part of the pattern matches any additional whitespace
 653 beyond the minimum needed to end a sentence.
 654 @end table
 655
 656 @node Regexp Search
 657 @section Regular Expression Searching
 658 @cindex regular expression searching
 659 @cindex regexp searching
 660 @cindex searching for regexp
 661
 662   In XEmacs, you can search for the next match for a regexp either
 663 incrementally or not.  Incremental search commands are described in the
 664 @cite{The XEmacs Lisp Reference Manual}.  @xref{Regexp Search, , Regular Expression
 665 Search, xemacs, The XEmacs Lisp Reference Manual}.  Here we describe only the search
 666 functions useful in programs.  The principal one is
 667 @code{re-search-forward}.
 668
 669 @deffn Command re-search-forward regexp &optional limit noerror count buffer
 670 This function searches forward in the current buffer for a string of
 671 text that is matched by the regular expression @var{regexp}.  The
 672 function skips over any amount of text that is not matched by
 673 @var{regexp}, and leaves point at the end of the first match found.
 674 It returns the new value of point.
 675
 676 If @var{limit} is non-@code{nil} (it must be a position in the current
 677 buffer), then it is the upper bound to the search.  No match extending
 678 after that position is accepted.
 679
 680 What happens when the search fails depends on the value of
 681 @var{noerror}.  If @var{noerror} is @code{nil}, a @code{search-failed}
 682 error is signaled.  If @var{noerror} is @code{t},
 683 @code{re-search-forward} does nothing and returns @code{nil}.  If
 684 @var{noerror} is neither @code{nil} nor @code{t}, then
 685 @code{re-search-forward} moves point to @var{limit} (or the end of the
 686 buffer) and returns @code{nil}.
 687
 688 If @var{count} is supplied (it must be a positive number), then the
 689 search is repeated that many times (each time starting at the end of the
 690 previous time's match).  If these successive searches succeed, the
 691 function succeeds, moving point and returning its new value.  Otherwise
 692 the search fails.
 693
 694 In the following example, point is initially before the @samp{T}.
 695 Evaluating the search call moves point to the end of that line (between
 696 the @samp{t} of @samp{hat} and the newline).
 697
 698 @example
 699 @group
 700 ---------- Buffer: foo ----------
 701 I read "@point{}The cat in the hat
 702 comes back" twice.
 703 ---------- Buffer: foo ----------
 704 @end group
 705
 706 @group
 707 (re-search-forward "[a-z]+" nil t 5)
 708      @result{} 27
 709
 710 ---------- Buffer: foo ----------
 711 I read "The cat in the hat@point{}
 712 comes back" twice.
 713 ---------- Buffer: foo ----------
 714 @end group
 715 @end example
 716 @end deffn
 717
 718 @deffn Command re-search-backward regexp &optional limit noerror count buffer
 719 This function searches backward in the current buffer for a string of
 720 text that is matched by the regular expression @var{regexp}, leaving
 721 point at the beginning of the first text found.
 722
 723 This function is analogous to @code{re-search-forward}, but they are not
 724 simple mirror images.  @code{re-search-forward} finds the match whose
 725 beginning is as close as possible to the starting point.  If
 726 @code{re-search-backward} were a perfect mirror image, it would find the
 727 match whose end is as close as possible.  However, in fact it finds the
 728 match whose beginning is as close as possible.  The reason is that
 729 matching a regular expression at a given spot always works from
 730 beginning to end, and starts at a specified beginning position.
 731
 732 A true mirror-image of @code{re-search-forward} would require a special
 733 feature for matching regexps from end to beginning.  It's not worth the
 734 trouble of implementing that.
 735 @end deffn
 736
 737 @defun string-match regexp string &optional start buffer
 738 This function returns the index of the start of the first match for
 739 the regular expression @var{regexp} in @var{string}, or @code{nil} if
 740 there is no match.  If @var{start} is non-@code{nil}, the search starts
 741 at that index in @var{string}.
 742
 743
 744 Optional arg @var{buffer} controls how case folding is done (according
 745 to the value of @code{case-fold-search} in @var{buffer} and
 746 @var{buffer}'s case tables) and defaults to the current buffer.
 747
 748 For example,
 749
 750 @example
 751 @group
 752 (string-match
 753  "quick" "The quick brown fox jumped quickly.")
 754      @result{} 4
 755 @end group
 756 @group
 757 (string-match
 758  "quick" "The quick brown fox jumped quickly." 8)
 759      @result{} 27
 760 @end group
 761 @end example
 762
 763 @noindent
 764 The index of the first character of the
 765 string is 0, the index of the second character is 1, and so on.
 766
 767 After this function returns, the index of the first character beyond
 768 the match is available as @code{(match-end 0)}.  @xref{Match Data}.
 769
 770 @example
 771 @group
 772 (string-match
 773  "quick" "The quick brown fox jumped quickly." 8)
 774      @result{} 27
 775 @end group
 776
 777 @group
 778 (match-end 0)
 779      @result{} 32
 780 @end group
 781 @end example
 782 @end defun
 783
 784 @defun split-string string &optional pattern
 785 This function splits @var{string} to substrings delimited by
 786 @var{pattern}, and returns a list of substrings.  If @var{pattern} is
 787 omitted, it defaults to @samp{[ \f\t\n\r\v]+}, which means that it
 788 splits @var{string} by white--space.
 789
 790 @example
 791 @group
 792 (split-string "foo bar")
 793      @result{} ("foo" "bar")
 794 @end group
 795
 796 @group
 797 (split-string "something")
 798      @result{} ("something")
 799 @end group
 800
 801 @group
 802 (split-string "a:b:c" ":")
 803      @result{} ("a" "b" "c")
 804 @end group
 805
 806 @group
 807 (split-string ":a::b:c" ":")
 808      @result{} ("" "a" "" "b" "c")
 809 @end group
 810 @end example
 811 @end defun
 812
 813 @defun split-path path
 814 This function splits a search path into a list of strings.  The path
 815 components are separated with the characters specified with
 816 @code{path-separator}.  Under Unix, @code{path-separator} will normally
 817 be @samp{:}, while under Windows, it will be @samp{;}.
 818 @end defun
 819
 820 @defun looking-at regexp &optional buffer
 821 This function determines whether the text in the current buffer directly
 822 following point matches the regular expression @var{regexp}.  ``Directly
 823 following'' means precisely that: the search is ``anchored'' and it can
 824 succeed only starting with the first character following point.  The
 825 result is @code{t} if so, @code{nil} otherwise.
 826
 827 This function does not move point, but it updates the match data, which
 828 you can access using @code{match-beginning} and @code{match-end}.
 829 @xref{Match Data}.
 830
 831 In this example, point is located directly before the @samp{T}.  If it
 832 were anywhere else, the result would be @code{nil}.
 833
 834 @example
 835 @group
 836 ---------- Buffer: foo ----------
 837 I read "@point{}The cat in the hat
 838 comes back" twice.
 839 ---------- Buffer: foo ----------
 840
 841 (looking-at "The cat in the hat$")
 842      @result{} t
 843 @end group
 844 @end example
 845 @end defun
 846
 847 @node POSIX Regexps
 848 @section POSIX Regular Expression Searching
 849
 850   The usual regular expression functions do backtracking when necessary
 851 to handle the @samp{\|} and repetition constructs, but they continue
 852 this only until they find @emph{some} match.  Then they succeed and
 853 report the first match found.
 854
 855   This section describes alternative search functions which perform the
 856 full backtracking specified by the POSIX standard for regular expression
 857 matching.  They continue backtracking until they have tried all
 858 possibilities and found all matches, so they can report the longest
 859 match, as required by POSIX.  This is much slower, so use these
 860 functions only when you really need the longest match.
 861
 862   In Emacs versions prior to 19.29, these functions did not exist, and
 863 the functions described above implemented full POSIX backtracking.
 864
 865 @deffn Command posix-search-forward regexp &optional limit noerror count buffer
 866 This is like @code{re-search-forward} except that it performs the full
 867 backtracking specified by the POSIX standard for regular expression
 868 matching.
 869 @end deffn
 870
 871 @deffn Command posix-search-backward regexp &optional limit noerror count buffer
 872 This is like @code{re-search-backward} except that it performs the full
 873 backtracking specified by the POSIX standard for regular expression
 874 matching.
 875 @end deffn
 876
 877 @defun posix-looking-at regexp &optional buffer
 878 This is like @code{looking-at} except that it performs the full
 879 backtracking specified by the POSIX standard for regular expression
 880 matching.
 881 @end defun
 882
 883 @defun posix-string-match regexp string &optional start buffer
 884 This is like @code{string-match} except that it performs the full
 885 backtracking specified by the POSIX standard for regular expression
 886 matching.
 887
 888 Optional arg @var{buffer} controls how case folding is done (according
 889 to the value of @code{case-fold-search} in @var{buffer} and
 890 @var{buffer}'s case tables) and defaults to the current buffer.
 891 @end defun
 892
 893 @ignore
 894 @deffn Command delete-matching-lines regexp
 895 This function is identical to @code{delete-non-matching-lines}, save
 896 that it deletes what @code{delete-non-matching-lines} keeps.
 897
 898 In the example below, point is located on the first line of text.
 899
 900 @example
 901 @group
 902 ---------- Buffer: foo ----------
 903 We hold these truths
 904 to be self-evident,
 905 that all men are created
 906 equal, and that they are
 907 ---------- Buffer: foo ----------
 908 @end group
 909
 910 @group
 911 (delete-matching-lines "the")
 912      @result{} nil
 913
 914 ---------- Buffer: foo ----------
 915 to be self-evident,
 916 that all men are created
 917 ---------- Buffer: foo ----------
 918 @end group
 919 @end example
 920 @end deffn
 921
 922 @deffn Command flush-lines regexp
 923 This function is an alias of @code{delete-matching-lines}.
 924 @end deffn
 925
 926 @deffn Command delete-non-matching-lines regexp
 927 This function deletes all lines following point which don't
 928 contain a match for the regular expression @var{regexp}.
 929 @end deffn
 930
 931 @deffn Command keep-lines regexp
 932 This function is the same as @code{delete-non-matching-lines}.
 933 @end deffn
 934
 935 @deffn Command count-matches regexp
 936 This function counts the number of matches for @var{regexp} there are in
 937 the current buffer following point.  It prints this number in
 938 the echo area, returning the string printed.
 939 @end deffn
 940
 941 @deffn Command how-many regexp
 942 This function is an alias of @code{count-matches}.
 943 @end deffn
 944
 945 @deffn Command list-matching-lines regexp &optional nlines
 946 This function is a synonym of @code{occur}.
 947 Show all lines following point containing a match for @var{regexp}.
 948 Display each line with @var{nlines} lines before and after,
 949 or @code{-}@var{nlines} before if @var{nlines} is negative.
 950 @var{nlines} defaults to @code{list-matching-lines-default-context-lines}.
 951 Interactively it is the prefix arg.
 952
 953 The lines are shown in a buffer named @samp{*Occur*}.
 954 It serves as a menu to find any of the occurrences in this buffer.
 955 @kbd{C-h m} (@code{describe-mode} in that buffer gives help.
 956 @end deffn
 957
 958 @defopt list-matching-lines-default-context-lines
 959 Default value is 0.
 960 Default number of context lines to include around a @code{list-matching-lines}
 961 match.  A negative number means to include that many lines before the match.
 962 A positive number means to include that many lines both before and after.
 963 @end defopt
 964 @end ignore
 965
 966 @node Search and Replace
 967 @section Search and Replace
 968 @cindex replacement
 969
 970 @defun perform-replace from-string replacements query-flag regexp-flag delimited-flag &optional repeat-count map
 971 This function is the guts of @code{query-replace} and related commands.
 972 It searches for occurrences of @var{from-string} and replaces some or
 973 all of them.  If @var{query-flag} is @code{nil}, it replaces all
 974 occurrences; otherwise, it asks the user what to do about each one.
 975
 976 If @var{regexp-flag} is non-@code{nil}, then @var{from-string} is
 977 considered a regular expression; otherwise, it must match literally.  If
 978 @var{delimited-flag} is non-@code{nil}, then only replacements
 979 surrounded by word boundaries are considered.
 980
 981 The argument @var{replacements} specifies what to replace occurrences
 982 with.  If it is a string, that string is used.  It can also be a list of
 983 strings, to be used in cyclic order.
 984
 985 If @var{repeat-count} is non-@code{nil}, it should be an integer.  Then
 986 it specifies how many times to use each of the strings in the
 987 @var{replacements} list before advancing cyclicly to the next one.
 988
 989 Normally, the keymap @code{query-replace-map} defines the possible user
 990 responses for queries.  The argument @var{map}, if non-@code{nil}, is a
 991 keymap to use instead of @code{query-replace-map}.
 992 @end defun
 993
 994 @defvar query-replace-map
 995 This variable holds a special keymap that defines the valid user
 996 responses for @code{query-replace} and related functions, as well as
 997 @code{y-or-n-p} and @code{map-y-or-n-p}.  It is unusual in two ways:
 998
 999 @itemize @bullet
1000 @item
1001 The ``key bindings'' are not commands, just symbols that are meaningful
1002 to the functions that use this map.
1003
1004 @item
1005 Prefix keys are not supported; each key binding must be for a single event
1006 key sequence.  This is because the functions don't use read key sequence to
1007 get the input; instead, they read a single event and look it up ``by hand.''
1008 @end itemize
1009 @end defvar
1010
1011 Here are the meaningful ``bindings'' for @code{query-replace-map}.
1012 Several of them are meaningful only for @code{query-replace} and
1013 friends.
1014
1015 @table @code
1016 @item act
1017 Do take the action being considered---in other words, ``yes.''
1018
1019 @item skip
1020 Do not take action for this question---in other words, ``no.''
1021
1022 @item exit
1023 Answer this question ``no,'' and give up on the entire series of
1024 questions, assuming that the answers will be ``no.''
1025
1026 @item act-and-exit
1027 Answer this question ``yes,'' and give up on the entire series of
1028 questions, assuming that subsequent answers will be ``no.''
1029
1030 @item act-and-show
1031 Answer this question ``yes,'' but show the results---don't advance yet
1032 to the next question.
1033
1034 @item automatic
1035 Answer this question and all subsequent questions in the series with
1036 ``yes,'' without further user interaction.
1037
1038 @item backup
1039 Move back to the previous place that a question was asked about.
1040
1041 @item edit
1042 Enter a recursive edit to deal with this question---instead of any
1043 other action that would normally be taken.
1044
1045 @item delete-and-edit
1046 Delete the text being considered, then enter a recursive edit to replace
1047 it.
1048
1049 @item recenter
1050 Redisplay and center the window, then ask the same question again.
1051
1052 @item quit
1053 Perform a quit right away.  Only @code{y-or-n-p} and related functions
1054 use this answer.
1055
1056 @item help
1057 Display some help, then ask again.
1058 @end table
1059
1060 @node Match Data
1061 @section The Match Data
1062 @cindex match data
1063
1064   XEmacs keeps track of the positions of the start and end of segments of
1065 text found during a regular expression search.  This means, for example,
1066 that you can search for a complex pattern, such as a date in an Rmail
1067 message, and then extract parts of the match under control of the
1068 pattern.
1069
1070   Because the match data normally describe the most recent search only,
1071 you must be careful not to do another search inadvertently between the
1072 search you wish to refer back to and the use of the match data.  If you
1073 can't avoid another intervening search, you must save and restore the
1074 match data around it, to prevent it from being overwritten.
1075
1076 @menu
1077 * Simple Match Data::     Accessing single items of match data,
1078                             such as where a particular subexpression started.
1079 * Replacing Match::       Replacing a substring that was matched.
1080 * Entire Match Data::     Accessing the entire match data at once, as a list.
1081 * Saving Match Data::     Saving and restoring the match data.
1082 @end menu
1083
1084 @node Simple Match Data
1085 @subsection Simple Match Data Access
1086
1087   This section explains how to use the match data to find out what was
1088 matched by the last search or match operation.
1089
1090   You can ask about the entire matching text, or about a particular
1091 parenthetical subexpression of a regular expression.  The @var{count}
1092 argument in the functions below specifies which.  If @var{count} is
1093 zero, you are asking about the entire match.  If @var{count} is
1094 positive, it specifies which subexpression you want.
1095
1096   Recall that the subexpressions of a regular expression are those
1097 expressions grouped with escaped parentheses, @samp{\(@dots{}\)}.  The
1098 @var{count}th subexpression is found by counting occurrences of
1099 @samp{\(} from the beginning of the whole regular expression.  The first
1100 subexpression is numbered 1, the second 2, and so on.  Only regular
1101 expressions can have subexpressions---after a simple string search, the
1102 only information available is about the entire match.
1103
1104 @defun match-string count &optional in-string
1105 This function returns, as a string, the text matched in the last search
1106 or match operation.  It returns the entire text if @var{count} is zero,
1107 or just the portion corresponding to the @var{count}th parenthetical
1108 subexpression, if @var{count} is positive.  If @var{count} is out of
1109 range, or if that subexpression didn't match anything, the value is
1110 @code{nil}.
1111
1112 If the last such operation was done against a string with
1113 @code{string-match}, then you should pass the same string as the
1114 argument @var{in-string}.  Otherwise, after a buffer search or match,
1115 you should omit @var{in-string} or pass @code{nil} for it; but you
1116 should make sure that the current buffer when you call
1117 @code{match-string} is the one in which you did the searching or
1118 matching.
1119 @end defun
1120
1121 @defun match-beginning count
1122 This function returns the position of the start of text matched by the
1123 last regular expression searched for, or a subexpression of it.
1124
1125 If @var{count} is zero, then the value is the position of the start of
1126 the entire match.  Otherwise, @var{count} specifies a subexpression in
1127 the regular expression, and the value of the function is the starting
1128 position of the match for that subexpression.
1129
1130 The value is @code{nil} for a subexpression inside a @samp{\|}
1131 alternative that wasn't used in the match.
1132 @end defun
1133
1134 @defun match-end count
1135 This function is like @code{match-beginning} except that it returns the
1136 position of the end of the match, rather than the position of the
1137 beginning.
1138 @end defun
1139
1140   Here is an example of using the match data, with a comment showing the
1141 positions within the text:
1142
1143 @example
1144 @group
1145 (string-match "\\(qu\\)\\(ick\\)"
1146               "The quick fox jumped quickly.")
1147               ;0123456789
1148      @result{} 4
1149 @end group
1150
1151 @group
1152 (match-string 0 "The quick fox jumped quickly.")
1153      @result{} "quick"
1154 (match-string 1 "The quick fox jumped quickly.")
1155      @result{} "qu"
1156 (match-string 2 "The quick fox jumped quickly.")
1157      @result{} "ick"
1158 @end group
1159
1160 @group
1161 (match-beginning 1)       ; @r{The beginning of the match}
1162      @result{} 4                 ;   @r{with @samp{qu} is at index 4.}
1163 @end group
1164
1165 @group
1166 (match-beginning 2)       ; @r{The beginning of the match}
1167      @result{} 6                 ;   @r{with @samp{ick} is at index 6.}
1168 @end group
1169
1170 @group
1171 (match-end 1)             ; @r{The end of the match}
1172      @result{} 6                 ;   @r{with @samp{qu} is at index 6.}
1173
1174 (match-end 2)             ; @r{The end of the match}
1175      @result{} 9                 ;   @r{with @samp{ick} is at index 9.}
1176 @end group
1177 @end example
1178
1179   Here is another example.  Point is initially located at the beginning
1180 of the line.  Searching moves point to between the space and the word
1181 @samp{in}.  The beginning of the entire match is at the 9th character of
1182 the buffer (@samp{T}), and the beginning of the match for the first
1183 subexpression is at the 13th character (@samp{c}).
1184
1185 @example
1186 @group
1187 (list
1188   (re-search-forward "The \\(cat \\)")
1189   (match-beginning 0)
1190   (match-beginning 1))
1191     @result{} (9 9 13)
1192 @end group
1193
1194 @group
1195 ---------- Buffer: foo ----------
1196 I read "The cat @point{}in the hat comes back" twice.
1197         ^   ^
1198         9  13
1199 ---------- Buffer: foo ----------
1200 @end group
1201 @end example
1202
1203 @noindent
1204 (In this case, the index returned is a buffer position; the first
1205 character of the buffer counts as 1.)
1206
1207 @node Replacing Match
1208 @subsection Replacing the Text That Matched
1209
1210   This function replaces the text matched by the last search with
1211 @var{replacement}.
1212
1213 @cindex case in replacements
1214 @defun replace-match replacement &optional fixedcase literal string strbuffer
1215 This function replaces the text in the buffer (or in @var{string}) that
1216 was matched by the last search.  It replaces that text with
1217 @var{replacement}.
1218
1219 If you did the last search in a buffer, you should specify @code{nil}
1220 for @var{string}.  Then @code{replace-match} does the replacement by
1221 editing the buffer; it leaves point at the end of the replacement text,
1222 and returns @code{t}.
1223
1224 If you did the search in a string, pass the same string as @var{string}.
1225 Then @code{replace-match} does the replacement by constructing and
1226 returning a new string.
1227
1228 If the fourth argument @var{string} is a string, fifth argument
1229 @var{strbuffer} specifies the buffer to be used for syntax-table and
1230 case-table lookup and defaults to the current buffer.  When @var{string}
1231 is not a string, the buffer that the match occurred in has automatically
1232 been remembered and you do not need to specify it.
1233
1234 If @var{fixedcase} is non-@code{nil}, then the case of the replacement
1235 text is not changed; otherwise, the replacement text is converted to a
1236 different case depending upon the capitalization of the text to be
1237 replaced.  If the original text is all upper case, the replacement text
1238 is converted to upper case.  If the first word of the original text is
1239 capitalized, then the first word of the replacement text is capitalized.
1240 If the original text contains just one word, and that word is a capital
1241 letter, @code{replace-match} considers this a capitalized first word
1242 rather than all upper case.
1243
1244 If @code{case-replace} is @code{nil}, then case conversion is not done,
1245 regardless of the value of @var{fixedcase}.  @xref{Searching and Case}.
1246
1247 If @var{literal} is non-@code{nil}, then @var{replacement} is inserted
1248 exactly as it is, the only alterations being case changes as needed.
1249 If it is @code{nil} (the default), then the character @samp{\} is treated
1250 specially.  If a @samp{\} appears in @var{replacement}, then it must be
1251 part of one of the following sequences:
1252
1253 @table @asis
1254 @item @samp{\&}
1255 @cindex @samp{&} in replacement
1256 @samp{\&} stands for the entire text being replaced.
1257
1258 @item @samp{\@var{n}}
1259 @cindex @samp{\@var{n}} in replacement
1260 @samp{\@var{n}}, where @var{n} is a digit, stands for the text that
1261 matched the @var{n}th subexpression in the original regexp.
1262 Subexpressions are those expressions grouped inside @samp{\(@dots{}\)}.
1263
1264 @item @samp{\\}
1265 @cindex @samp{\} in replacement
1266 @samp{\\} stands for a single @samp{\} in the replacement text.
1267 @end table
1268 @end defun
1269
1270 @node Entire Match Data
1271 @subsection Accessing the Entire Match Data
1272
1273   The functions @code{match-data} and @code{set-match-data} read or
1274 write the entire match data, all at once.
1275
1276 @defun match-data &optional integers reuse
1277 This function returns a newly constructed list containing all the
1278 information on what text the last search matched.  Element zero is the
1279 position of the beginning of the match for the whole expression; element
1280 one is the position of the end of the match for the expression.  The
1281 next two elements are the positions of the beginning and end of the
1282 match for the first subexpression, and so on.  In general, element
1283 @ifinfo
1284 number 2@var{n}
1285 @end ifinfo
1286 @tex
1287 number {\mathsurround=0pt $2n$}
1288 @end tex
1289 corresponds to @code{(match-beginning @var{n})}; and
1290 element
1291 @ifinfo
1292 number 2@var{n} + 1
1293 @end ifinfo
1294 @tex
1295 number {\mathsurround=0pt $2n+1$}
1296 @end tex
1297 corresponds to @code{(match-end @var{n})}.
1298
1299 All the elements are markers or @code{nil} if matching was done on a
1300 buffer, and all are integers or @code{nil} if matching was done on a
1301 string with @code{string-match}.  However, if the optional first
1302 argument @var{integers} is non-@code{nil}, always use integers (rather
1303 than markers) to represent buffer positions.
1304
1305 If the optional second argument @var{reuse} is a list, reuse it as part
1306 of the value.  If @var{reuse} is long enough to hold all the values, and if
1307 @var{integers} is non-@code{nil}, no new lisp objects are created.
1308
1309 As always, there must be no possibility of intervening searches between
1310 the call to a search function and the call to @code{match-data} that is
1311 intended to access the match data for that search.
1312
1313 @example
1314 @group
1315 (match-data)
1316      @result{}  (#<marker at 9 in foo>
1317           #<marker at 17 in foo>
1318           #<marker at 13 in foo>
1319           #<marker at 17 in foo>)
1320 @end group
1321 @end example
1322 @end defun
1323
1324 @defun set-match-data match-list
1325 This function sets the match data from the elements of @var{match-list},
1326 which should be a list that was the value of a previous call to
1327 @code{match-data}.
1328
1329 If @var{match-list} refers to a buffer that doesn't exist, you don't get
1330 an error; that sets the match data in a meaningless but harmless way.
1331
1332 @findex store-match-data
1333 @code{store-match-data} is an alias for @code{set-match-data}.
1334 @end defun
1335
1336 @node Saving Match Data
1337 @subsection Saving and Restoring the Match Data
1338
1339   When you call a function that may do a search, you may need to save
1340 and restore the match data around that call, if you want to preserve the
1341 match data from an earlier search for later use.  Here is an example
1342 that shows the problem that arises if you fail to save the match data:
1343
1344 @example
1345 @group
1346 (re-search-forward "The \\(cat \\)")
1347      @result{} 48
1348 (foo)                   ; @r{Perhaps @code{foo} does}
1349                         ;   @r{more searching.}
1350 (match-end 0)
1351      @result{} 61              ; @r{Unexpected result---not 48!}
1352 @end group
1353 @end example
1354
1355   You can save and restore the match data with @code{save-match-data}:
1356
1357 @defspec save-match-data body@dots{}
1358 This special form executes @var{body}, saving and restoring the match
1359 data around it.
1360 @end defspec
1361
1362   You can use @code{set-match-data} together with @code{match-data} to
1363 imitate the effect of the special form @code{save-match-data}.  This is
1364 useful for writing code that can run in Emacs 18.  Here is how:
1365
1366 @example
1367 @group
1368 (let ((data (match-data)))
1369   (unwind-protect
1370       @dots{}   ; @r{May change the original match data.}
1371     (set-match-data data)))
1372 @end group
1373 @end example
1374
1375   Emacs automatically saves and restores the match data when it runs
1376 process filter functions (@pxref{Filter Functions}) and process
1377 sentinels (@pxref{Sentinels}).
1378
1379 @ignore
1380   Here is a function which restores the match data provided the buffer
1381 associated with it still exists.
1382
1383 @smallexample
1384 @group
1385 (defun restore-match-data (data)
1386 @c It is incorrect to split the first line of a doc string.
1387 @c If there's a problem here, it should be solved in some other way.
1388   "Restore the match data DATA unless the buffer is missing."
1389   (catch 'foo
1390     (let ((d data))
1391 @end group
1392       (while d
1393         (and (car d)
1394              (null (marker-buffer (car d)))
1395 @group
1396              ;; @file{match-data} @r{buffer is deleted.}
1397              (throw 'foo nil))
1398         (setq d (cdr d)))
1399       (set-match-data data))))
1400 @end group
1401 @end smallexample
1402 @end ignore
1403
1404 @node Searching and Case
1405 @section Searching and Case
1406 @cindex searching and case
1407
1408   By default, searches in Emacs ignore the case of the text they are
1409 searching through; if you specify searching for @samp{FOO}, then
1410 @samp{Foo} or @samp{foo} is also considered a match.  Regexps, and in
1411 particular character sets, are included: thus, @samp{[aB]} would match
1412 @samp{a} or @samp{A} or @samp{b} or @samp{B}.
1413
1414   If you do not want this feature, set the variable
1415 @code{case-fold-search} to @code{nil}.  Then all letters must match
1416 exactly, including case.  This is a buffer-local variable; altering the
1417 variable affects only the current buffer.  (@xref{Intro to
1418 Buffer-Local}.)  Alternatively, you may change the value of
1419 @code{default-case-fold-search}, which is the default value of
1420 @code{case-fold-search} for buffers that do not override it.
1421
1422   Note that the user-level incremental search feature handles case
1423 distinctions differently.  When given a lower case letter, it looks for
1424 a match of either case, but when given an upper case letter, it looks
1425 for an upper case letter only.  But this has nothing to do with the
1426 searching functions Lisp functions use.
1427
1428 @defopt case-replace
1429 This variable determines whether the replacement functions should
1430 preserve case.  If the variable is @code{nil}, that means to use the
1431 replacement text verbatim.  A non-@code{nil} value means to convert the
1432 case of the replacement text according to the text being replaced.
1433
1434 The function @code{replace-match} is where this variable actually has
1435 its effect.  @xref{Replacing Match}.
1436 @end defopt
1437
1438 @defopt case-fold-search
1439 This buffer-local variable determines whether searches should ignore
1440 case.  If the variable is @code{nil} they do not ignore case; otherwise
1441 they do ignore case.
1442 @end defopt
1443
1444 @defvar default-case-fold-search
1445 The value of this variable is the default value for
1446 @code{case-fold-search} in buffers that do not override it.  This is the
1447 same as @code{(default-value 'case-fold-search)}.
1448 @end defvar
1449
1450 @node Standard Regexps
1451 @section Standard Regular Expressions Used in Editing
1452 @cindex regexps used standardly in editing
1453 @cindex standard regexps used in editing
1454
1455   This section describes some variables that hold regular expressions
1456 used for certain purposes in editing:
1457
1458 @defvar page-delimiter
1459 This is the regexp describing line-beginnings that separate pages.  The
1460 default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"});
1461 this matches a line that starts with a formfeed character.
1462 @end defvar
1463
1464   The following two regular expressions should @emph{not} assume the
1465 match always starts at the beginning of a line; they should not use
1466 @samp{^} to anchor the match.  Most often, the paragraph commands do
1467 check for a match only at the beginning of a line, which means that
1468 @samp{^} would be superfluous.  When there is a nonzero left margin,
1469 they accept matches that start after the left margin.  In that case, a
1470 @samp{^} would be incorrect.  However, a @samp{^} is harmless in modes
1471 where a left margin is never used.
1472
1473 @defvar paragraph-separate
1474 This is the regular expression for recognizing the beginning of a line
1475 that separates paragraphs.  (If you change this, you may have to
1476 change @code{paragraph-start} also.)  The default value is
1477 @w{@code{"[@ \t\f]*$"}}, which matches a line that consists entirely of
1478 spaces, tabs, and form feeds (after its left margin).
1479 @end defvar
1480
1481 @defvar paragraph-start
1482 This is the regular expression for recognizing the beginning of a line
1483 that starts @emph{or} separates paragraphs.  The default value is
1484 @w{@code{"[@ \t\n\f]"}}, which matches a line starting with a space, tab,
1485 newline, or form feed (after its left margin).
1486 @end defvar
1487
1488 @defvar sentence-end
1489 This is the regular expression describing the end of a sentence.  (All
1490 paragraph boundaries also end sentences, regardless.)  The default value
1491 is:
1492
1493 @example
1494 "[.?!][]\"')@}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
1495 @end example
1496
1497 This means a period, question mark or exclamation mark, followed
1498 optionally by a closing parenthetical character, followed by tabs,
1499 spaces or new lines.
1500
1501 For a detailed explanation of this regular expression, see @ref{Regexp
1502 Example}.
1503 @end defvar