info/lispref.info-32

   1 This is ../info/lispref.info, produced by makeinfo version 4.0b from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Change Hooks,  Next: Transformations,  Prev: Transposition,  Up: Text
  54
  55 Change Hooks
  56 ============
  57
  58    These hook variables let you arrange to take notice of all changes in
  59 all buffers (or in a particular buffer, if you make them buffer-local).
  60
  61    The functions you use in these hooks should save and restore the
  62 match data if they do anything that uses regular expressions;
  63 otherwise, they will interfere in bizarre ways with the editing
  64 operations that call them.
  65
  66    Buffer changes made while executing the following hooks don't
  67 themselves cause any change hooks to be invoked.
  68
  69  - Variable: before-change-functions
  70      This variable holds a list of a functions to call before any buffer
  71      modification.  Each function gets two arguments, the beginning and
  72      end of the region that is about to change, represented as
  73      integers.  The buffer that is about to change is always the
  74      current buffer.
  75
  76  - Variable: after-change-functions
  77      This variable holds a list of a functions to call after any buffer
  78      modification.  Each function receives three arguments: the
  79      beginning and end of the region just changed, and the length of
  80      the text that existed before the change.  (To get the current
  81      length, subtract the region beginning from the region end.)  All
  82      three arguments are integers.  The buffer that's about to change
  83      is always the current buffer.
  84
  85  - Variable: before-change-function
  86      This obsolete variable holds one function to call before any buffer
  87      modification (or `nil' for no function).  It is called just like
  88      the functions in `before-change-functions'.
  89
  90  - Variable: after-change-function
  91      This obsolete variable holds one function to call after any buffer
  92      modification (or `nil' for no function).  It is called just like
  93      the functions in `after-change-functions'.
  94
  95  - Variable: first-change-hook
  96      This variable is a normal hook that is run whenever a buffer is
  97      changed that was previously in the unmodified state.
  98
  99 \1f
 100 File: lispref.info,  Node: Transformations,  Prev: Change Hooks,  Up: Text
 101
 102 Textual transformations--MD5 and base64 support
 103 ===============================================
 104
 105    Some textual operations inherently require examining each character
 106 in turn, and performing arithmetic operations on them.  Such operations
 107 can, of course, be implemented in Emacs Lisp, but tend to be very slow
 108 for large portions of text or data.  This is why some of them are
 109 implemented in C, with an appropriate interface for Lisp programmers.
 110 Examples of algorithms thus provided are MD5 and base64 support.
 111
 112    MD5 is an algorithm for calculating message digests, as described in
 113 rfc1321.  Given a message of arbitrary length, MD5 produces an 128-bit
 114 "fingerprint" ("message digest") corresponding to that message.  It is
 115 considered computationally infeasible to produce two messages having
 116 the same MD5 digest, or to produce a message having a prespecified
 117 target digest.  MD5 is used heavily by various authentication schemes.
 118
 119    Emacs Lisp interface to MD5 consists of a single function `md5':
 120
 121  - Function: md5 object &optional start end coding noerror
 122      This function returns the MD5 message digest of OBJECT, a buffer
 123      or string.
 124
 125      Optional arguments START and END denote positions for computing
 126      the digest of a portion of OBJECT.
 127
 128      The optional CODING argument specifies the coding system the text
 129      is to be represented in while computing the digest.  If
 130      unspecified, it defaults to the current format of the data, or is
 131      guessed.
 132
 133      If NOERROR is non-`nil', silently assume binary coding if the
 134      guesswork fails.  Normally, an error is signaled in such case.
 135
 136      CODING and NOERROR arguments are meaningful only in XEmacsen with
 137      file-coding or Mule support.  Otherwise, they are ignored.  Some
 138      examples of usage:
 139
 140           ;; Calculate the digest of the entire buffer
 141           (md5 (current-buffer))
 142                => "8842b04362899b1cda8d2d126dc11712"
 143
 144           ;; Calculate the digest of the current line
 145           (md5 (current-buffer) (point-at-bol) (point-at-eol))
 146                => "60614d21e9dee27dfdb01fa4e30d6d00"
 147
 148           ;; Calculate the digest of your name and email address
 149           (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
 150                => "0a2188c40fd38922d941fe6032fce516"
 151
 152    Base64 is a portable encoding for arbitrary sequences of octets, in a
 153 form that need not be readable by humans.  It uses a 65-character subset
 154 of US-ASCII, as described in rfc2045.  Base64 is used by MIME to encode
 155 binary bodies, and to encode binary characters in message headers.
 156
 157    The Lisp interface to base64 consists of four functions:
 158
 159  - Command: base64-encode-region start end &optional no-line-break
 160      This function encodes the region between START and END of the
 161      current buffer to base64 format.  This means that the original
 162      region is deleted, and replaced with its base64 equivalent.
 163
 164      Normally, encoded base64 output is multi-line, with 76-character
 165      lines.  If NO-LINE-BREAK is non-`nil', newlines will not be
 166      inserted, resulting in single-line output.
 167
 168      Mule note: you should make sure that you convert the multibyte
 169      characters (those that do not fit into 0-255 range) to something
 170      else, because they cannot be meaningfully converted to base64.  If
 171      the `base64-encode-region' encounters such characters, it will
 172      signal an error.
 173
 174      `base64-encode-region' returns the length of the encoded text.
 175
 176           ;; Encode the whole buffer in base64
 177           (base64-encode-region (point-min) (point-max))
 178
 179      The function can also be used interactively, in which case it
 180      works on the currently active region.
 181
 182  - Function: base64-encode-string string &optional no-line-break
 183      This function encodes STRING to base64, and returns the encoded
 184      string.
 185
 186      Normally, encoded base64 output is multi-line, with 76-character
 187      lines.  If NO-LINE-BREAK is non-`nil', newlines will not be
 188      inserted, resulting in single-line output.
 189
 190      For Mule, the same considerations apply as for
 191      `base64-encode-region'.
 192
 193           (base64-encode-string "fubar")
 194               => "ZnViYXI="
 195
 196  - Command: base64-decode-region start end
 197      This function decodes the region between START and END of the
 198      current buffer.  The region should be in base64 encoding.
 199
 200      If the region was decoded correctly, `base64-decode-region' returns
 201      the length of the decoded region.  If the decoding failed, `nil' is
 202      returned.
 203
 204           ;; Decode a base64 buffer, and replace it with the decoded version
 205           (base64-decode-region (point-min) (point-max))
 206
 207  - Function: base64-decode-string string
 208      This function decodes STRING to base64, and returns the decoded
 209      string.  STRING should be valid base64-encoded text.
 210
 211      If encoding was not possible, `nil' is returned.
 212
 213           (base64-decode-string "ZnViYXI=")
 214               => "fubar"
 215
 216           (base64-decode-string "totally bogus")
 217               => nil
 218
 219 \1f
 220 File: lispref.info,  Node: Searching and Matching,  Next: Syntax Tables,  Prev: Text,  Up: Top
 221
 222 Searching and Matching
 223 **********************
 224
 225    XEmacs provides two ways to search through a buffer for specified
 226 text: exact string searches and regular expression searches.  After a
 227 regular expression search, you can examine the "match data" to
 228 determine which text matched the whole regular expression or various
 229 portions of it.
 230
 231 * Menu:
 232
 233 * String Search::         Search for an exact match.
 234 * Regular Expressions::   Describing classes of strings.
 235 * Regexp Search::         Searching for a match for a regexp.
 236 * POSIX Regexps::         Searching POSIX-style for the longest match.
 237 * Search and Replace::    Internals of `query-replace'.
 238 * Match Data::            Finding out which part of the text matched
 239                             various parts of a regexp, after regexp search.
 240 * Searching and Case::    Case-independent or case-significant searching.
 241 * Standard Regexps::      Useful regexps for finding sentences, pages,...
 242
 243    The `skip-chars...' functions also perform a kind of searching.
 244 *Note Skipping Characters::.
 245
 246 \1f
 247 File: lispref.info,  Node: String Search,  Next: Regular Expressions,  Up: Searching and Matching
 248
 249 Searching for Strings
 250 =====================
 251
 252    These are the primitive functions for searching through the text in a
 253 buffer.  They are meant for use in programs, but you may call them
 254 interactively.  If you do so, they prompt for the search string; LIMIT
 255 and NOERROR are set to `nil', and COUNT is set to 1.
 256
 257  - Command: search-forward string &optional limit noerror count buffer
 258      This function searches forward from point for an exact match for
 259      STRING.  If successful, it sets point to the end of the occurrence
 260      found, and returns the new value of point.  If no match is found,
 261      the value and side effects depend on NOERROR (see below).
 262
 263      In the following example, point is initially at the beginning of
 264      the line.  Then `(search-forward "fox")' moves point after the last
 265      letter of `fox':
 266
 267           ---------- Buffer: foo ----------
 268           -!-The quick brown fox jumped over the lazy dog.
 269           ---------- Buffer: foo ----------
 270
 271           (search-forward "fox")
 272                => 20
 273
 274           ---------- Buffer: foo ----------
 275           The quick brown fox-!- jumped over the lazy dog.
 276           ---------- Buffer: foo ----------
 277
 278      The argument LIMIT specifies the upper bound to the search.  (It
 279      must be a position in the current buffer.)  No match extending
 280      after that position is accepted.  If LIMIT is omitted or `nil', it
 281      defaults to the end of the accessible portion of the buffer.
 282
 283      What happens when the search fails depends on the value of
 284      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 285      signaled.  If NOERROR is `t', `search-forward' returns `nil' and
 286      does nothing.  If NOERROR is neither `nil' nor `t', then
 287      `search-forward' moves point to the upper bound and returns `nil'.
 288      (It would be more consistent now to return the new position of
 289      point in that case, but some programs may depend on a value of
 290      `nil'.)
 291
 292      If COUNT is supplied (it must be an integer), then the search is
 293      repeated that many times (each time starting at the end of the
 294      previous time's match).  If COUNT is negative, the search
 295      direction is backward.  If the successive searches succeed, the
 296      function succeeds, moving point and returning its new value.
 297      Otherwise the search fails.
 298
 299      BUFFER is the buffer to search in, and defaults to the current
 300      buffer.
 301
 302  - Command: search-backward string &optional limit noerror count buffer
 303      This function searches backward from point for STRING.  It is just
 304      like `search-forward' except that it searches backwards and leaves
 305      point at the beginning of the match.
 306
 307  - Command: word-search-forward string &optional limit noerror count
 308           buffer
 309      This function searches forward from point for a "word" match for
 310      STRING.  If it finds a match, it sets point to the end of the
 311      match found, and returns the new value of point.
 312
 313      Word matching regards STRING as a sequence of words, disregarding
 314      punctuation that separates them.  It searches the buffer for the
 315      same sequence of words.  Each word must be distinct in the buffer
 316      (searching for the word `ball' does not match the word `balls'),
 317      but the details of punctuation and spacing are ignored (searching
 318      for `ball boy' does match `ball.  Boy!').
 319
 320      In this example, point is initially at the beginning of the
 321      buffer; the search leaves it between the `y' and the `!'.
 322
 323           ---------- Buffer: foo ----------
 324           -!-He said "Please!  Find
 325           the ball boy!"
 326           ---------- Buffer: foo ----------
 327
 328           (word-search-forward "Please find the ball, boy.")
 329                => 35
 330
 331           ---------- Buffer: foo ----------
 332           He said "Please!  Find
 333           the ball boy-!-!"
 334           ---------- Buffer: foo ----------
 335
 336      If LIMIT is non-`nil' (it must be a position in the current
 337      buffer), then it is the upper bound to the search.  The match
 338      found must not extend after that position.
 339
 340      If NOERROR is `nil', then `word-search-forward' signals an error
 341      if the search fails.  If NOERROR is `t', then it returns `nil'
 342      instead of signaling an error.  If NOERROR is neither `nil' nor
 343      `t', it moves point to LIMIT (or the end of the buffer) and
 344      returns `nil'.
 345
 346      If COUNT is non-`nil', then the search is repeated that many
 347      times.  Point is positioned at the end of the last match.
 348
 349      BUFFER is the buffer to search in, and defaults to the current
 350      buffer.
 351
 352  - Command: word-search-backward string &optional limit noerror count
 353           buffer
 354      This function searches backward from point for a word match to
 355      STRING.  This function is just like `word-search-forward' except
 356      that it searches backward and normally leaves point at the
 357      beginning of the match.
 358
 359 \1f
 360 File: lispref.info,  Node: Regular Expressions,  Next: Regexp Search,  Prev: String Search,  Up: Searching and Matching
 361
 362 Regular Expressions
 363 ===================
 364
 365    A "regular expression" ("regexp", for short) is a pattern that
 366 denotes a (possibly infinite) set of strings.  Searching for matches for
 367 a regexp is a very powerful operation.  This section explains how to
 368 write regexps; the following section says how to search for them.
 369
 370    To gain a thorough understanding of regular expressions and how to
 371 use them to best advantage, we recommend that you study `Mastering
 372 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
 373 1997'. (It's known as the "Hip Owls" book, because of the picture on its
 374 cover.)  You might also read the manuals to *Note (gawk)Top::, *Note
 375 (ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
 376 (rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
 377 regular expressions.
 378
 379    The XEmacs regular expression syntax most closely resembles that of
 380 `ed', or `grep', the GNU versions of which all utilize the GNU `regex'
 381 library.  XEmacs' version of `regex' has recently been extended with
 382 some Perl-like capabilities, described in the next section.
 383
 384 * Menu:
 385
 386 * Syntax of Regexps::       Rules for writing regular expressions.
 387 * Regexp Example::          Illustrates regular expression syntax.
 388
 389 \1f
 390 File: lispref.info,  Node: Syntax of Regexps,  Next: Regexp Example,  Up: Regular Expressions
 391
 392 Syntax of Regular Expressions
 393 -----------------------------
 394
 395    Regular expressions have a syntax in which a few characters are
 396 special constructs and the rest are "ordinary".  An ordinary character
 397 is a simple regular expression that matches that character and nothing
 398 else.  The special characters are `.', `*', `+', `?', `[', `]', `^',
 399 `$', and `\'; no new special characters will be defined in the future.
 400 Any other character appearing in a regular expression is ordinary,
 401 unless a `\' precedes it.
 402
 403    For example, `f' is not a special character, so it is ordinary, and
 404 therefore `f' is a regular expression that matches the string `f' and
 405 no other string.  (It does _not_ match the string `ff'.)  Likewise, `o'
 406 is a regular expression that matches only `o'.
 407
 408    Any two regular expressions A and B can be concatenated.  The result
 409 is a regular expression that matches a string if A matches some amount
 410 of the beginning of that string and B matches the rest of the string.
 411
 412    As a simple example, we can concatenate the regular expressions `f'
 413 and `o' to get the regular expression `fo', which matches only the
 414 string `fo'.  Still trivial.  To do something more powerful, you need
 415 to use one of the special characters.  Here is a list of them:
 416
 417 `. (Period)'
 418      is a special character that matches any single character except a
 419      newline.  Using concatenation, we can make regular expressions
 420      like `a.b', which matches any three-character string that begins
 421      with `a' and ends with `b'.
 422
 423 `*'
 424      is not a construct by itself; it is a quantifying suffix operator
 425      that means to repeat the preceding regular expression as many
 426      times as possible.  In `fo*', the `*' applies to the `o', so `fo*'
 427      matches one `f' followed by any number of `o's.  The case of zero
 428      `o's is allowed: `fo*' does match `f'.
 429
 430      `*' always applies to the _smallest_ possible preceding
 431      expression.  Thus, `fo*' has a repeating `o', not a repeating `fo'.
 432
 433      The matcher processes a `*' construct by matching, immediately, as
 434      many repetitions as can be found; it is "greedy".  Then it
 435      continues with the rest of the pattern.  If that fails,
 436      backtracking occurs, discarding some of the matches of the
 437      `*'-modified construct in case that makes it possible to match the
 438      rest of the pattern.  For example, in matching `ca*ar' against the
 439      string `caaar', the `a*' first tries to match all three `a's; but
 440      the rest of the pattern is `ar' and there is only `r' left to
 441      match, so this try fails.  The next alternative is for `a*' to
 442      match only two `a's.  With this choice, the rest of the regexp
 443      matches successfully.
 444
 445      Nested repetition operators can be extremely slow if they specify
 446      backtracking loops.  For example, it could take hours for the
 447      regular expression `\(x+y*\)*a' to match the sequence
 448      `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'.  The slowness is because
 449      Emacs must try each imaginable way of grouping the 35 `x''s before
 450      concluding that none of them can work.  To make sure your regular
 451      expressions run fast, check nested repetitions carefully.
 452
 453 `+'
 454      is a quantifying suffix operator similar to `*' except that the
 455      preceding expression must match at least once.  It is also
 456      "greedy".  So, for example, `ca+r' matches the strings `car' and
 457      `caaaar' but not the string `cr', whereas `ca*r' matches all three
 458      strings.
 459
 460 `?'
 461      is a quantifying suffix operator similar to `*', except that the
 462      preceding expression can match either once or not at all.  For
 463      example, `ca?r' matches `car' or `cr', but does not match anything
 464      else.
 465
 466 `*?'
 467      works just like `*', except that rather than matching the longest
 468      match, it matches the shortest match.  `*?' is known as a
 469      "non-greedy" quantifier, a regexp construct borrowed from Perl.
 470
 471      This construct is very useful for when you want to match the text
 472      inside a pair of delimiters.  For instance, `/\*.*?\*/' will match
 473      C comments in a string.  This could not easily be achieved without
 474      the use of a non-greedy quantifier.
 475
 476      This construct has not been available prior to XEmacs 20.4.  It is
 477      not available in FSF Emacs.
 478
 479 `+?'
 480      is the non-greedy version of `+'.
 481
 482 `??'
 483      is the non-greedy version of `?'.
 484
 485 `\{n,m\}'
 486      serves as an interval quantifier, analogous to `*' or `+', but
 487      specifies that the expression must match at least N times, but no
 488      more than M times.  This syntax is supported by most Unix regexp
 489      utilities, and has been introduced to XEmacs for the version 20.3.
 490
 491      Unfortunately, the non-greedy version of this quantifier does not
 492      exist currently, although it does in Perl.
 493
 494 `[ ... ]'
 495      `[' begins a "character set", which is terminated by a `]'.  In
 496      the simplest case, the characters between the two brackets form
 497      the set.  Thus, `[ad]' matches either one `a' or one `d', and
 498      `[ad]*' matches any string composed of just `a's and `d's
 499      (including the empty string), from which it follows that `c[ad]*r'
 500      matches `cr', `car', `cdr', `caddaar', etc.
 501
 502      The usual regular expression special characters are not special
 503      inside a character set.  A completely different set of special
 504      characters exists inside character sets: `]', `-' and `^'.
 505
 506      `-' is used for ranges of characters.  To write a range, write two
 507      characters with a `-' between them.  Thus, `[a-z]' matches any
 508      lower case letter.  Ranges may be intermixed freely with individual
 509      characters, as in `[a-z$%.]', which matches any lower case letter
 510      or `$', `%', or a period.
 511
 512      To include a `]' in a character set, make it the first character.
 513      For example, `[]a]' matches `]' or `a'.  To include a `-', write
 514      `-' as the first character in the set, or put it immediately after
 515      a range.  (You can replace one individual character C with the
 516      range `C-C' to make a place to put the `-'.)  There is no way to
 517      write a set containing just `-' and `]'.
 518
 519      To include `^' in a set, put it anywhere but at the beginning of
 520      the set.
 521
 522 `[^ ... ]'
 523      `[^' begins a "complement character set", which matches any
 524      character except the ones specified.  Thus, `[^a-z0-9A-Z]' matches
 525      all characters _except_ letters and digits.
 526
 527      `^' is not special in a character set unless it is the first
 528      character.  The character following the `^' is treated as if it
 529      were first (thus, `-' and `]' are not special there).
 530
 531      Note that a complement character set can match a newline, unless
 532      newline is mentioned as one of the characters not to match.
 533
 534 `^'
 535      is a special character that matches the empty string, but only at
 536      the beginning of a line in the text being matched.  Otherwise it
 537      fails to match anything.  Thus, `^foo' matches a `foo' that occurs
 538      at the beginning of a line.
 539
 540      When matching a string instead of a buffer, `^' matches at the
 541      beginning of the string or after a newline character `\n'.
 542
 543 `$'
 544      is similar to `^' but matches only at the end of a line.  Thus,
 545      `x+$' matches a string of one `x' or more at the end of a line.
 546
 547      When matching a string instead of a buffer, `$' matches at the end
 548      of the string or before a newline character `\n'.
 549
 550 `\'
 551      has two functions: it quotes the special characters (including
 552      `\'), and it introduces additional special constructs.
 553
 554      Because `\' quotes special characters, `\$' is a regular
 555      expression that matches only `$', and `\[' is a regular expression
 556      that matches only `[', and so on.
 557
 558      Note that `\' also has special meaning in the read syntax of Lisp
 559      strings (*note String Type::), and must be quoted with `\'.  For
 560      example, the regular expression that matches the `\' character is
 561      `\\'.  To write a Lisp string that contains the characters `\\',
 562      Lisp syntax requires you to quote each `\' with another `\'.
 563      Therefore, the read syntax for a regular expression matching `\'
 564      is `"\\\\"'.
 565
 566    *Please note:* For historical compatibility, special characters are
 567 treated as ordinary ones if they are in contexts where their special
 568 meanings make no sense.  For example, `*foo' treats `*' as ordinary
 569 since there is no preceding expression on which the `*' can act.  It is
 570 poor practice to depend on this behavior; quote the special character
 571 anyway, regardless of where it appears.
 572
 573    For the most part, `\' followed by any character matches only that
 574 character.  However, there are several exceptions: characters that,
 575 when preceded by `\', are special constructs.  Such characters are
 576 always ordinary when encountered on their own.  Here is a table of `\'
 577 constructs:
 578
 579 `\|'
 580      specifies an alternative.  Two regular expressions A and B with
 581      `\|' in between form an expression that matches anything that
 582      either A or B matches.
 583
 584      Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
 585
 586      `\|' applies to the largest possible surrounding expressions.
 587      Only a surrounding `\( ... \)' grouping can limit the grouping
 588      power of `\|'.
 589
 590      Full backtracking capability exists to handle multiple uses of
 591      `\|'.
 592
 593 `\( ... \)'
 594      is a grouping construct that serves three purposes:
 595
 596        1. To enclose a set of `\|' alternatives for other operations.
 597           Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
 598
 599        2. To enclose an expression for a suffix operator such as `*' to
 600           act on.  Thus, `ba\(na\)*' matches `bananana', etc., with any
 601           (zero or more) number of `na' strings.
 602
 603        3. To record a matched substring for future reference.
 604
 605      This last application is not a consequence of the idea of a
 606      parenthetical grouping; it is a separate feature that happens to be
 607      assigned as a second meaning to the same `\( ... \)' construct
 608      because there is no conflict in practice between the two meanings.
 609      Here is an explanation of this feature:
 610
 611 `\DIGIT'
 612      matches the same text that matched the DIGITth occurrence of a `\(
 613      ... \)' construct.
 614
 615      In other words, after the end of a `\( ... \)' construct.  the
 616      matcher remembers the beginning and end of the text matched by that
 617      construct.  Then, later on in the regular expression, you can use
 618      `\' followed by DIGIT to match that same text, whatever it may
 619      have been.
 620
 621      The strings matching the first nine `\( ... \)' constructs
 622      appearing in a regular expression are assigned numbers 1 through 9
 623      in the order that the open parentheses appear in the regular
 624      expression.  So you can use `\1' through `\9' to refer to the text
 625      matched by the corresponding `\( ... \)' constructs.
 626
 627      For example, `\(.*\)\1' matches any newline-free string that is
 628      composed of two identical halves.  The `\(.*\)' matches the first
 629      half, which may be anything, but the `\1' that follows must match
 630      the same exact text.
 631
 632 `\(?: ... \)'
 633      is called a "shy" grouping operator, and it is used just like `\(
 634      ... \)', except that it does not cause the matched substring to be
 635      recorded for future reference.
 636
 637      This is useful when you need a lot of grouping `\( ... \)'
 638      constructs, but only want to remember one or two - or if you have
 639      more than nine groupings and need to use backreferences to refer to
 640      the groupings at the end.
 641
 642      Using `\(?: ... \)' rather than `\( ... \)' when you don't need
 643      the captured substrings ought to speed up your programs some,
 644      since it shortens the code path followed by the regular expression
 645      engine, as well as the amount of memory allocation and string
 646      copying it must do.  The actual performance gain to be observed
 647      has not been measured or quantified as of this writing.
 648
 649      The shy grouping operator has been borrowed from Perl, and has not
 650      been available prior to XEmacs 20.3, nor is it available in FSF
 651      Emacs.
 652
 653 `\w'
 654      matches any word-constituent character.  The editor syntax table
 655      determines which characters these are.  *Note Syntax Tables::.
 656
 657 `\W'
 658      matches any character that is not a word constituent.
 659
 660 `\sCODE'
 661      matches any character whose syntax is CODE.  Here CODE is a
 662      character that represents a syntax code: thus, `w' for word
 663      constituent, `-' for whitespace, `(' for open parenthesis, etc.
 664      *Note Syntax Tables::, for a list of syntax codes and the
 665      characters that stand for them.
 666
 667 `\SCODE'
 668      matches any character whose syntax is not CODE.
 669
 670    The following regular expression constructs match the empty
 671 string--that is, they don't use up any characters--but whether they
 672 match depends on the context.
 673
 674 `\`'
 675      matches the empty string, but only at the beginning of the buffer
 676      or string being matched against.
 677
 678 `\''
 679      matches the empty string, but only at the end of the buffer or
 680      string being matched against.
 681
 682 `\='
 683      matches the empty string, but only at point.  (This construct is
 684      not defined when matching against a string.)
 685
 686 `\b'
 687      matches the empty string, but only at the beginning or end of a
 688      word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
 689      separate word.  `\bballs?\b' matches `ball' or `balls' as a
 690      separate word.
 691
 692 `\B'
 693      matches the empty string, but _not_ at the beginning or end of a
 694      word.
 695
 696 `\<'
 697      matches the empty string, but only at the beginning of a word.
 698
 699 `\>'
 700      matches the empty string, but only at the end of a word.
 701
 702    Not every string is a valid regular expression.  For example, a
 703 string with unbalanced square brackets is invalid (with a few
 704 exceptions, such as `[]]'), and so is a string that ends with a single
 705 `\'.  If an invalid regular expression is passed to any of the search
 706 functions, an `invalid-regexp' error is signaled.
 707
 708  - Function: regexp-quote string
 709      This function returns a regular expression string that matches
 710      exactly STRING and nothing else.  This allows you to request an
 711      exact string match when calling a function that wants a regular
 712      expression.
 713
 714           (regexp-quote "^The cat$")
 715                => "\\^The cat\\$"
 716
 717      One use of `regexp-quote' is to combine an exact string match with
 718      context described as a regular expression.  For example, this
 719      searches for the string that is the value of `string', surrounded
 720      by whitespace:
 721
 722           (re-search-forward
 723            (concat "\\s-" (regexp-quote string) "\\s-"))
 724
 725 \1f
 726 File: lispref.info,  Node: Regexp Example,  Prev: Syntax of Regexps,  Up: Regular Expressions
 727
 728 Complex Regexp Example
 729 ----------------------
 730
 731    Here is a complicated regexp, used by XEmacs to recognize the end of
 732 a sentence together with any whitespace that follows.  It is the value
 733 of the variable `sentence-end'.
 734
 735    First, we show the regexp as a string in Lisp syntax to distinguish
 736 spaces from tab characters.  The string constant begins and ends with a
 737 double-quote.  `\"' stands for a double-quote as part of the string,
 738 `\\' for a backslash as part of the string, `\t' for a tab and `\n' for
 739 a newline.
 740
 741      "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
 742
 743    In contrast, if you evaluate the variable `sentence-end', you will
 744 see the following:
 745
 746      sentence-end
 747      =>
 748      "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
 749      ]*"
 750
 751 In this output, tab and newline appear as themselves.
 752
 753    This regular expression contains four parts in succession and can be
 754 deciphered as follows:
 755
 756 `[.?!]'
 757      The first part of the pattern is a character set that matches any
 758      one of three characters: period, question mark, and exclamation
 759      mark.  The match must begin with one of these three characters.
 760
 761 `[]\"')}]*'
 762      The second part of the pattern matches any closing braces and
 763      quotation marks, zero or more of them, that may follow the period,
 764      question mark or exclamation mark.  The `\"' is Lisp syntax for a
 765      double-quote in a string.  The `*' at the end indicates that the
 766      immediately preceding regular expression (a character set, in this
 767      case) may be repeated zero or more times.
 768
 769 `\\($\\| $\\|\t\\|  \\)'
 770      The third part of the pattern matches the whitespace that follows
 771      the end of a sentence: the end of a line, or a tab, or two spaces.
 772      The double backslashes mark the parentheses and vertical bars as
 773      regular expression syntax; the parentheses delimit a group and the
 774      vertical bars separate alternatives.  The dollar sign is used to
 775      match the end of a line.
 776
 777 `[ \t\n]*'
 778      Finally, the last part of the pattern matches any additional
 779      whitespace beyond the minimum needed to end a sentence.
 780
 781 \1f
 782 File: lispref.info,  Node: Regexp Search,  Next: POSIX Regexps,  Prev: Regular Expressions,  Up: Searching and Matching
 783
 784 Regular Expression Searching
 785 ============================
 786
 787    In XEmacs, you can search for the next match for a regexp either
 788 incrementally or not.  Incremental search commands are described in the
 789 `The XEmacs Lisp Reference Manual'.  *Note Regular Expression Search:
 790 (xemacs)Regexp Search.  Here we describe only the search functions
 791 useful in programs.  The principal one is `re-search-forward'.
 792
 793  - Command: re-search-forward regexp &optional limit noerror count
 794           buffer
 795      This function searches forward in the current buffer for a string
 796      of text that is matched by the regular expression REGEXP.  The
 797      function skips over any amount of text that is not matched by
 798      REGEXP, and leaves point at the end of the first match found.  It
 799      returns the new value of point.
 800
 801      If LIMIT is non-`nil' (it must be a position in the current
 802      buffer), then it is the upper bound to the search.  No match
 803      extending after that position is accepted.
 804
 805      What happens when the search fails depends on the value of
 806      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 807      signaled.  If NOERROR is `t', `re-search-forward' does nothing and
 808      returns `nil'.  If NOERROR is neither `nil' nor `t', then
 809      `re-search-forward' moves point to LIMIT (or the end of the
 810      buffer) and returns `nil'.
 811
 812      If COUNT is supplied (it must be a positive number), then the
 813      search is repeated that many times (each time starting at the end
 814      of the previous time's match).  If these successive searches
 815      succeed, the function succeeds, moving point and returning its new
 816      value.  Otherwise the search fails.
 817
 818      In the following example, point is initially before the `T'.
 819      Evaluating the search call moves point to the end of that line
 820      (between the `t' of `hat' and the newline).
 821
 822           ---------- Buffer: foo ----------
 823           I read "-!-The cat in the hat
 824           comes back" twice.
 825           ---------- Buffer: foo ----------
 826
 827           (re-search-forward "[a-z]+" nil t 5)
 828                => 27
 829
 830           ---------- Buffer: foo ----------
 831           I read "The cat in the hat-!-
 832           comes back" twice.
 833           ---------- Buffer: foo ----------
 834
 835  - Command: re-search-backward regexp &optional limit noerror count
 836           buffer
 837      This function searches backward in the current buffer for a string
 838      of text that is matched by the regular expression REGEXP, leaving
 839      point at the beginning of the first text found.
 840
 841      This function is analogous to `re-search-forward', but they are not
 842      simple mirror images.  `re-search-forward' finds the match whose
 843      beginning is as close as possible to the starting point.  If
 844      `re-search-backward' were a perfect mirror image, it would find the
 845      match whose end is as close as possible.  However, in fact it
 846      finds the match whose beginning is as close as possible.  The
 847      reason is that matching a regular expression at a given spot
 848      always works from beginning to end, and starts at a specified
 849      beginning position.
 850
 851      A true mirror-image of `re-search-forward' would require a special
 852      feature for matching regexps from end to beginning.  It's not
 853      worth the trouble of implementing that.
 854
 855  - Function: string-match regexp string &optional start buffer
 856      This function returns the index of the start of the first match for
 857      the regular expression REGEXP in STRING, or `nil' if there is no
 858      match.  If START is non-`nil', the search starts at that index in
 859      STRING.
 860
 861      Optional arg BUFFER controls how case folding is done (according
 862      to the value of `case-fold-search' in BUFFER and BUFFER's case
 863      tables) and defaults to the current buffer.
 864
 865      For example,
 866
 867           (string-match
 868            "quick" "The quick brown fox jumped quickly.")
 869                => 4
 870           (string-match
 871            "quick" "The quick brown fox jumped quickly." 8)
 872                => 27
 873
 874      The index of the first character of the string is 0, the index of
 875      the second character is 1, and so on.
 876
 877      After this function returns, the index of the first character
 878      beyond the match is available as `(match-end 0)'.  *Note Match
 879      Data::.
 880
 881           (string-match
 882            "quick" "The quick brown fox jumped quickly." 8)
 883                => 27
 884
 885           (match-end 0)
 886                => 32
 887
 888  - Function: split-string string &optional pattern
 889      This function splits STRING to substrings delimited by PATTERN,
 890      and returns a list of substrings.  If PATTERN is omitted, it
 891      defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
 892      white-space.
 893
 894           (split-string "foo bar")
 895                => ("foo" "bar")
 896
 897           (split-string "something")
 898                => ("something")
 899
 900           (split-string "a:b:c" ":")
 901                => ("a" "b" "c")
 902
 903           (split-string ":a::b:c" ":")
 904                => ("" "a" "" "b" "c")
 905
 906  - Function: split-path path
 907      This function splits a search path into a list of strings.  The
 908      path components are separated with the characters specified with
 909      `path-separator'.  Under Unix, `path-separator' will normally be
 910      `:', while under Windows, it will be `;'.
 911
 912  - Function: looking-at regexp &optional buffer
 913      This function determines whether the text in the current buffer
 914      directly following point matches the regular expression REGEXP.
 915      "Directly following" means precisely that: the search is
 916      "anchored" and it can succeed only starting with the first
 917      character following point.  The result is `t' if so, `nil'
 918      otherwise.
 919
 920      This function does not move point, but it updates the match data,
 921      which you can access using `match-beginning' and `match-end'.
 922      *Note Match Data::.
 923
 924      In this example, point is located directly before the `T'.  If it
 925      were anywhere else, the result would be `nil'.
 926
 927           ---------- Buffer: foo ----------
 928           I read "-!-The cat in the hat
 929           comes back" twice.
 930           ---------- Buffer: foo ----------
 931
 932           (looking-at "The cat in the hat$")
 933                => t
 934
 935 \1f
 936 File: lispref.info,  Node: POSIX Regexps,  Next: Search and Replace,  Prev: Regexp Search,  Up: Searching and Matching
 937
 938 POSIX Regular Expression Searching
 939 ==================================
 940
 941    The usual regular expression functions do backtracking when necessary
 942 to handle the `\|' and repetition constructs, but they continue this
 943 only until they find _some_ match.  Then they succeed and report the
 944 first match found.
 945
 946    This section describes alternative search functions which perform the
 947 full backtracking specified by the POSIX standard for regular expression
 948 matching.  They continue backtracking until they have tried all
 949 possibilities and found all matches, so they can report the longest
 950 match, as required by POSIX.  This is much slower, so use these
 951 functions only when you really need the longest match.
 952
 953    In Emacs versions prior to 19.29, these functions did not exist, and
 954 the functions described above implemented full POSIX backtracking.
 955
 956  - Command: posix-search-forward regexp &optional limit noerror count
 957           buffer
 958      This is like `re-search-forward' except that it performs the full
 959      backtracking specified by the POSIX standard for regular expression
 960      matching.
 961
 962  - Command: posix-search-backward regexp &optional limit noerror count
 963           buffer
 964      This is like `re-search-backward' except that it performs the full
 965      backtracking specified by the POSIX standard for regular expression
 966      matching.
 967
 968  - Function: posix-looking-at regexp &optional buffer
 969      This is like `looking-at' except that it performs the full
 970      backtracking specified by the POSIX standard for regular expression
 971      matching.
 972
 973  - Function: posix-string-match regexp string &optional start buffer
 974      This is like `string-match' except that it performs the full
 975      backtracking specified by the POSIX standard for regular expression
 976      matching.
 977
 978      Optional arg BUFFER controls how case folding is done (according
 979      to the value of `case-fold-search' in BUFFER and BUFFER's case
 980      tables) and defaults to the current buffer.
 981
 982 \1f
 983 File: lispref.info,  Node: Search and Replace,  Next: Match Data,  Prev: POSIX Regexps,  Up: Searching and Matching
 984
 985 Search and Replace
 986 ==================
 987
 988  - Function: perform-replace from-string replacements query-flag
 989           regexp-flag delimited-flag &optional repeat-count map
 990      This function is the guts of `query-replace' and related commands.
 991      It searches for occurrences of FROM-STRING and replaces some or
 992      all of them.  If QUERY-FLAG is `nil', it replaces all occurrences;
 993      otherwise, it asks the user what to do about each one.
 994
 995      If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
 996      regular expression; otherwise, it must match literally.  If
 997      DELIMITED-FLAG is non-`nil', then only replacements surrounded by
 998      word boundaries are considered.
 999
1000      The argument REPLACEMENTS specifies what to replace occurrences
1001      with.  If it is a string, that string is used.  It can also be a
1002      list of strings, to be used in cyclic order.
1003
1004      If REPEAT-COUNT is non-`nil', it should be an integer.  Then it
1005      specifies how many times to use each of the strings in the
1006      REPLACEMENTS list before advancing cyclicly to the next one.
1007
1008      Normally, the keymap `query-replace-map' defines the possible user
1009      responses for queries.  The argument MAP, if non-`nil', is a
1010      keymap to use instead of `query-replace-map'.
1011
1012  - Variable: query-replace-map
1013      This variable holds a special keymap that defines the valid user
1014      responses for `query-replace' and related functions, as well as
1015      `y-or-n-p' and `map-y-or-n-p'.  It is unusual in two ways:
1016
1017         * The "key bindings" are not commands, just symbols that are
1018           meaningful to the functions that use this map.
1019
1020         * Prefix keys are not supported; each key binding must be for a
1021           single event key sequence.  This is because the functions
1022           don't use read key sequence to get the input; instead, they
1023           read a single event and look it up "by hand."
1024
1025    Here are the meaningful "bindings" for `query-replace-map'.  Several
1026 of them are meaningful only for `query-replace' and friends.
1027
1028 `act'
1029      Do take the action being considered--in other words, "yes."
1030
1031 `skip'
1032      Do not take action for this question--in other words, "no."
1033
1034 `exit'
1035      Answer this question "no," and give up on the entire series of
1036      questions, assuming that the answers will be "no."
1037
1038 `act-and-exit'
1039      Answer this question "yes," and give up on the entire series of
1040      questions, assuming that subsequent answers will be "no."
1041
1042 `act-and-show'
1043      Answer this question "yes," but show the results--don't advance yet
1044      to the next question.
1045
1046 `automatic'
1047      Answer this question and all subsequent questions in the series
1048      with "yes," without further user interaction.
1049
1050 `backup'
1051      Move back to the previous place that a question was asked about.
1052
1053 `edit'
1054      Enter a recursive edit to deal with this question--instead of any
1055      other action that would normally be taken.
1056
1057 `delete-and-edit'
1058      Delete the text being considered, then enter a recursive edit to
1059      replace it.
1060
1061 `recenter'
1062      Redisplay and center the window, then ask the same question again.
1063
1064 `quit'
1065      Perform a quit right away.  Only `y-or-n-p' and related functions
1066      use this answer.
1067
1068 `help'
1069      Display some help, then ask again.
1070
1071 \1f
1072 File: lispref.info,  Node: Match Data,  Next: Searching and Case,  Prev: Search and Replace,  Up: Searching and Matching
1073
1074 The Match Data
1075 ==============
1076
1077    XEmacs keeps track of the positions of the start and end of segments
1078 of text found during a regular expression search.  This means, for
1079 example, that you can search for a complex pattern, such as a date in
1080 an Rmail message, and then extract parts of the match under control of
1081 the pattern.
1082
1083    Because the match data normally describe the most recent search only,
1084 you must be careful not to do another search inadvertently between the
1085 search you wish to refer back to and the use of the match data.  If you
1086 can't avoid another intervening search, you must save and restore the
1087 match data around it, to prevent it from being overwritten.
1088
1089 * Menu:
1090
1091 * Simple Match Data::     Accessing single items of match data,
1092                             such as where a particular subexpression started.
1093 * Replacing Match::       Replacing a substring that was matched.
1094 * Entire Match Data::     Accessing the entire match data at once, as a list.
1095 * Saving Match Data::     Saving and restoring the match data.
1096
1097 \1f
1098 File: lispref.info,  Node: Simple Match Data,  Next: Replacing Match,  Up: Match Data
1099
1100 Simple Match Data Access
1101 ------------------------
1102
1103    This section explains how to use the match data to find out what was
1104 matched by the last search or match operation.
1105
1106    You can ask about the entire matching text, or about a particular
1107 parenthetical subexpression of a regular expression.  The COUNT
1108 argument in the functions below specifies which.  If COUNT is zero, you
1109 are asking about the entire match.  If COUNT is positive, it specifies
1110 which subexpression you want.
1111
1112    Recall that the subexpressions of a regular expression are those
1113 expressions grouped with escaped parentheses, `\(...\)'.  The COUNTth
1114 subexpression is found by counting occurrences of `\(' from the
1115 beginning of the whole regular expression.  The first subexpression is
1116 numbered 1, the second 2, and so on.  Only regular expressions can have
1117 subexpressions--after a simple string search, the only information
1118 available is about the entire match.
1119
1120  - Function: match-string count &optional in-string
1121      This function returns, as a string, the text matched in the last
1122      search or match operation.  It returns the entire text if COUNT is
1123      zero, or just the portion corresponding to the COUNTth
1124      parenthetical subexpression, if COUNT is positive.  If COUNT is
1125      out of range, or if that subexpression didn't match anything, the
1126      value is `nil'.
1127
1128      If the last such operation was done against a string with
1129      `string-match', then you should pass the same string as the
1130      argument IN-STRING.  Otherwise, after a buffer search or match,
1131      you should omit IN-STRING or pass `nil' for it; but you should
1132      make sure that the current buffer when you call `match-string' is
1133      the one in which you did the searching or matching.
1134
1135  - Function: match-beginning count
1136      This function returns the position of the start of text matched by
1137      the last regular expression searched for, or a subexpression of it.
1138
1139      If COUNT is zero, then the value is the position of the start of
1140      the entire match.  Otherwise, COUNT specifies a subexpression in
1141      the regular expression, and the value of the function is the
1142      starting position of the match for that subexpression.
1143
1144      The value is `nil' for a subexpression inside a `\|' alternative
1145      that wasn't used in the match.
1146
1147  - Function: match-end count
1148      This function is like `match-beginning' except that it returns the
1149      position of the end of the match, rather than the position of the
1150      beginning.
1151
1152    Here is an example of using the match data, with a comment showing
1153 the positions within the text:
1154
1155      (string-match "\\(qu\\)\\(ick\\)"
1156                    "The quick fox jumped quickly.")
1157                    ;0123456789
1158           => 4
1159
1160      (match-string 0 "The quick fox jumped quickly.")
1161           => "quick"
1162      (match-string 1 "The quick fox jumped quickly.")
1163           => "qu"
1164      (match-string 2 "The quick fox jumped quickly.")
1165           => "ick"
1166
1167      (match-beginning 1)       ; The beginning of the match
1168           => 4                 ;   with `qu' is at index 4.
1169
1170      (match-beginning 2)       ; The beginning of the match
1171           => 6                 ;   with `ick' is at index 6.
1172
1173      (match-end 1)             ; The end of the match
1174           => 6                 ;   with `qu' is at index 6.
1175
1176      (match-end 2)             ; The end of the match
1177           => 9                 ;   with `ick' is at index 9.
1178
1179    Here is another example.  Point is initially located at the beginning
1180 of the line.  Searching moves point to between the space and the word
1181 `in'.  The beginning of the entire match is at the 9th character of the
1182 buffer (`T'), and the beginning of the match for the first
1183 subexpression is at the 13th character (`c').
1184
1185      (list
1186        (re-search-forward "The \\(cat \\)")
1187        (match-beginning 0)
1188        (match-beginning 1))
1189          => (9 9 13)
1190
1191      ---------- Buffer: foo ----------
1192      I read "The cat -!-in the hat comes back" twice.
1193              ^   ^
1194              9  13
1195      ---------- Buffer: foo ----------
1196
1197 (In this case, the index returned is a buffer position; the first
1198 character of the buffer counts as 1.)
1199