git.chise.org Git - chise/xemacs-chise.git.1/blob - info/lispref.info-31

   1 This is Info file ../../info/lispref.info, produced by Makeinfo version
   2 1.68 from the input file lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Change Hooks,  Next: Transformations,  Prev: Transposition,  Up: Text
  54
  55 Change Hooks
  56 ============
  57
  58    These hook variables let you arrange to take notice of all changes in
  59 all buffers (or in a particular buffer, if you make them buffer-local).
  60
  61    The functions you use in these hooks should save and restore the
  62 match data if they do anything that uses regular expressions;
  63 otherwise, they will interfere in bizarre ways with the editing
  64 operations that call them.
  65
  66    Buffer changes made while executing the following hooks don't
  67 themselves cause any change hooks to be invoked.
  68
  69  - Variable: before-change-functions
  70      This variable holds a list of a functions to call before any buffer
  71      modification.  Each function gets two arguments, the beginning and
  72      end of the region that is about to change, represented as
  73      integers.  The buffer that is about to change is always the
  74      current buffer.
  75
  76  - Variable: after-change-functions
  77      This variable holds a list of a functions to call after any buffer
  78      modification.  Each function receives three arguments: the
  79      beginning and end of the region just changed, and the length of
  80      the text that existed before the change.  (To get the current
  81      length, subtract the region beginning from the region end.)  All
  82      three arguments are integers.  The buffer that's about to change
  83      is always the current buffer.
  84
  85  - Variable: before-change-function
  86      This obsolete variable holds one function to call before any buffer
  87      modification (or `nil' for no function).  It is called just like
  88      the functions in `before-change-functions'.
  89
  90  - Variable: after-change-function
  91      This obsolete variable holds one function to call after any buffer
  92      modification (or `nil' for no function).  It is called just like
  93      the functions in `after-change-functions'.
  94
  95  - Variable: first-change-hook
  96      This variable is a normal hook that is run whenever a buffer is
  97      changed that was previously in the unmodified state.
  98
  99 \1f
 100 File: lispref.info,  Node: Transformations,  Prev: Change Hooks,  Up: Text
 101
 102 Textual transformations--MD5 and base64 support
 103 ===============================================
 104
 105    Some textual operations inherently require examining each character
 106 in turn, and performing arithmetic operations on them.  Such operations
 107 can, of course, be implemented in Emacs Lisp, but tend to be very slow
 108 for large portions of text or data.  This is why some of them are
 109 implemented in C, with an appropriate interface for Lisp programmers.
 110 Examples of algorithms thus provided are MD5 and base64 support.
 111
 112    MD5 is an algorithm for calculating message digests, as described in
 113 rfc1321.  Given a message of arbitrary length, MD5 produces an 128-bit
 114 "fingerprint" ("message digest") corresponding to that message.  It is
 115 considered computationally infeasible to produce two messages having
 116 the same MD5 digest, or to produce a message having a prespecified
 117 target digest.  MD5 is used heavily by various authentication schemes.
 118
 119    Emacs Lisp interface to MD5 consists of a single function `md5':
 120
 121  - Function: md5 OBJECT &optional START END
 122      This function returns the MD5 message digest of OBJECT, a buffer
 123      or string.
 124
 125      Optional arguments START and END denote positions for computing
 126      the digest of a portion of OBJECT.
 127
 128      Some examples of usage:
 129
 130           ;; Calculate the digest of the entire buffer
 131           (md5 (current-buffer))
 132                => "8842b04362899b1cda8d2d126dc11712"
 133
 134           ;; Calculate the digest of the current line
 135           (md5 (current-buffer) (point-at-bol) (point-at-eol))
 136                => "60614d21e9dee27dfdb01fa4e30d6d00"
 137
 138           ;; Calculate the digest of your name and email address
 139           (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
 140                => "0a2188c40fd38922d941fe6032fce516"
 141
 142    Base64 is a portable encoding for arbitrary sequences of octets, in a
 143 form that need not be readable by humans.  It uses a 65-character subset
 144 of US-ASCII, as described in rfc2045.  Base64 is used by MIME to encode
 145 binary bodies, and to encode binary characters in message headers.
 146
 147    The Lisp interface to base64 consists of four functions:
 148
 149  - Function: base64-encode-region BEG END &optional NO-LINE-BREAK
 150      This function encodes the region between BEG and END of the
 151      current buffer to base64 format.  This means that the original
 152      region is deleted, and replaced with its base64 equivalent.
 153
 154      Normally, encoded base64 output is multi-line, with 76-character
 155      lines.  If NO-LINE-BREAK is non-`nil', newlines will not be
 156      inserted, resulting in single-line output.
 157
 158      Mule note: you should make sure that you convert the multibyte
 159      characters (those that do not fit into 0-255 range) to something
 160      else, because they cannot be meaningfully converted to base64.  If
 161      the `base64-encode-region' encounters such characters, it will
 162      signal an error.
 163
 164      `base64-encode-region' returns the length of the encoded text.
 165
 166           ;; Encode the whole buffer in base64
 167           (base64-encode-region (point-min) (point-max))
 168
 169      The function can also be used interactively, in which case it
 170      works on the currently active region.
 171
 172  - Function: base64-encode-string STRING
 173      This function encodes STRING to base64, and returns the encoded
 174      string.
 175
 176      For Mule, the same considerations apply as for
 177      `base64-encode-region'.
 178
 179           (base64-encode-string "fubar")
 180               => "ZnViYXI="
 181
 182  - Function: base64-decode-region BEG END
 183      This function decodes the region between BEG and END of the
 184      current buffer.  The region should be in base64 encoding.
 185
 186      If the region was decoded correctly, `base64-decode-region' returns
 187      the length of the decoded region.  If the decoding failed, `nil' is
 188      returned.
 189
 190           ;; Decode a base64 buffer, and replace it with the decoded version
 191           (base64-decode-region (point-min) (point-max))
 192
 193  - Function: base64-decode-string STRING
 194      This function decodes STRING to base64, and returns the decoded
 195      string.  STRING should be valid base64-encoded text.
 196
 197      If encoding was not possible, `nil' is returned.
 198
 199           (base64-decode-string "ZnViYXI=")
 200               => "fubar"
 201
 202           (base64-decode-string "totally bogus")
 203               => nil
 204
 205 \1f
 206 File: lispref.info,  Node: Searching and Matching,  Next: Syntax Tables,  Prev: Text,  Up: Top
 207
 208 Searching and Matching
 209 **********************
 210
 211    XEmacs provides two ways to search through a buffer for specified
 212 text: exact string searches and regular expression searches.  After a
 213 regular expression search, you can examine the "match data" to
 214 determine which text matched the whole regular expression or various
 215 portions of it.
 216
 217 * Menu:
 218
 219 * String Search::         Search for an exact match.
 220 * Regular Expressions::   Describing classes of strings.
 221 * Regexp Search::         Searching for a match for a regexp.
 222 * POSIX Regexps::         Searching POSIX-style for the longest match.
 223 * Search and Replace::    Internals of `query-replace'.
 224 * Match Data::            Finding out which part of the text matched
 225                             various parts of a regexp, after regexp search.
 226 * Searching and Case::    Case-independent or case-significant searching.
 227 * Standard Regexps::      Useful regexps for finding sentences, pages,...
 228
 229    The `skip-chars...' functions also perform a kind of searching.
 230 *Note Skipping Characters::.
 231
 232 \1f
 233 File: lispref.info,  Node: String Search,  Next: Regular Expressions,  Up: Searching and Matching
 234
 235 Searching for Strings
 236 =====================
 237
 238    These are the primitive functions for searching through the text in a
 239 buffer.  They are meant for use in programs, but you may call them
 240 interactively.  If you do so, they prompt for the search string; LIMIT
 241 and NOERROR are set to `nil', and REPEAT is set to 1.
 242
 243  - Command: search-forward STRING &optional LIMIT NOERROR REPEAT
 244      This function searches forward from point for an exact match for
 245      STRING.  If successful, it sets point to the end of the occurrence
 246      found, and returns the new value of point.  If no match is found,
 247      the value and side effects depend on NOERROR (see below).
 248
 249      In the following example, point is initially at the beginning of
 250      the line.  Then `(search-forward "fox")' moves point after the last
 251      letter of `fox':
 252
 253           ---------- Buffer: foo ----------
 254           -!-The quick brown fox jumped over the lazy dog.
 255           ---------- Buffer: foo ----------
 256
 257           (search-forward "fox")
 258                => 20
 259
 260           ---------- Buffer: foo ----------
 261           The quick brown fox-!- jumped over the lazy dog.
 262           ---------- Buffer: foo ----------
 263
 264      The argument LIMIT specifies the upper bound to the search.  (It
 265      must be a position in the current buffer.)  No match extending
 266      after that position is accepted.  If LIMIT is omitted or `nil', it
 267      defaults to the end of the accessible portion of the buffer.
 268
 269      What happens when the search fails depends on the value of
 270      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 271      signaled.  If NOERROR is `t', `search-forward' returns `nil' and
 272      does nothing.  If NOERROR is neither `nil' nor `t', then
 273      `search-forward' moves point to the upper bound and returns `nil'.
 274      (It would be more consistent now to return the new position of
 275      point in that case, but some programs may depend on a value of
 276      `nil'.)
 277
 278      If REPEAT is supplied (it must be a positive number), then the
 279      search is repeated that many times (each time starting at the end
 280      of the previous time's match).  If these successive searches
 281      succeed, the function succeeds, moving point and returning its new
 282      value.  Otherwise the search fails.
 283
 284  - Command: search-backward STRING &optional LIMIT NOERROR REPEAT
 285      This function searches backward from point for STRING.  It is just
 286      like `search-forward' except that it searches backwards and leaves
 287      point at the beginning of the match.
 288
 289  - Command: word-search-forward STRING &optional LIMIT NOERROR REPEAT
 290      This function searches forward from point for a "word" match for
 291      STRING.  If it finds a match, it sets point to the end of the
 292      match found, and returns the new value of point.
 293
 294      Word matching regards STRING as a sequence of words, disregarding
 295      punctuation that separates them.  It searches the buffer for the
 296      same sequence of words.  Each word must be distinct in the buffer
 297      (searching for the word `ball' does not match the word `balls'),
 298      but the details of punctuation and spacing are ignored (searching
 299      for `ball boy' does match `ball.  Boy!').
 300
 301      In this example, point is initially at the beginning of the
 302      buffer; the search leaves it between the `y' and the `!'.
 303
 304           ---------- Buffer: foo ----------
 305           -!-He said "Please!  Find
 306           the ball boy!"
 307           ---------- Buffer: foo ----------
 308
 309           (word-search-forward "Please find the ball, boy.")
 310                => 35
 311
 312           ---------- Buffer: foo ----------
 313           He said "Please!  Find
 314           the ball boy-!-!"
 315           ---------- Buffer: foo ----------
 316
 317      If LIMIT is non-`nil' (it must be a position in the current
 318      buffer), then it is the upper bound to the search.  The match
 319      found must not extend after that position.
 320
 321      If NOERROR is `nil', then `word-search-forward' signals an error
 322      if the search fails.  If NOERROR is `t', then it returns `nil'
 323      instead of signaling an error.  If NOERROR is neither `nil' nor
 324      `t', it moves point to LIMIT (or the end of the buffer) and
 325      returns `nil'.
 326
 327      If REPEAT is non-`nil', then the search is repeated that many
 328      times.  Point is positioned at the end of the last match.
 329
 330  - Command: word-search-backward STRING &optional LIMIT NOERROR REPEAT
 331      This function searches backward from point for a word match to
 332      STRING.  This function is just like `word-search-forward' except
 333      that it searches backward and normally leaves point at the
 334      beginning of the match.
 335
 336 \1f
 337 File: lispref.info,  Node: Regular Expressions,  Next: Regexp Search,  Prev: String Search,  Up: Searching and Matching
 338
 339 Regular Expressions
 340 ===================
 341
 342    A "regular expression" ("regexp", for short) is a pattern that
 343 denotes a (possibly infinite) set of strings.  Searching for matches for
 344 a regexp is a very powerful operation.  This section explains how to
 345 write regexps; the following section says how to search for them.
 346
 347    To gain a thorough understanding of regular expressions and how to
 348 use them to best advantage, we recommend that you study `Mastering
 349 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
 350 1997'. (It's known as the "Hip Owls" book, because of the picture on its
 351 cover.)  You might also read the manuals to *Note (gawk)Top::, *Note
 352 (ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
 353 (rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
 354 regular expressions.
 355
 356    The XEmacs regular expression syntax most closely resembles that of
 357 `ed', or `grep', the GNU versions of which all utilize the GNU `regex'
 358 library.  XEmacs' version of `regex' has recently been extended with
 359 some Perl-like capabilities, described in the next section.
 360
 361 * Menu:
 362
 363 * Syntax of Regexps::       Rules for writing regular expressions.
 364 * Regexp Example::          Illustrates regular expression syntax.
 365
 366 \1f
 367 File: lispref.info,  Node: Syntax of Regexps,  Next: Regexp Example,  Up: Regular Expressions
 368
 369 Syntax of Regular Expressions
 370 -----------------------------
 371
 372    Regular expressions have a syntax in which a few characters are
 373 special constructs and the rest are "ordinary".  An ordinary character
 374 is a simple regular expression that matches that character and nothing
 375 else.  The special characters are `.', `*', `+', `?', `[', `]', `^',
 376 `$', and `\'; no new special characters will be defined in the future.
 377 Any other character appearing in a regular expression is ordinary,
 378 unless a `\' precedes it.
 379
 380    For example, `f' is not a special character, so it is ordinary, and
 381 therefore `f' is a regular expression that matches the string `f' and
 382 no other string.  (It does *not* match the string `ff'.)  Likewise, `o'
 383 is a regular expression that matches only `o'.
 384
 385    Any two regular expressions A and B can be concatenated.  The result
 386 is a regular expression that matches a string if A matches some amount
 387 of the beginning of that string and B matches the rest of the string.
 388
 389    As a simple example, we can concatenate the regular expressions `f'
 390 and `o' to get the regular expression `fo', which matches only the
 391 string `fo'.  Still trivial.  To do something more powerful, you need
 392 to use one of the special characters.  Here is a list of them:
 393
 394 `. (Period)'
 395      is a special character that matches any single character except a
 396      newline.  Using concatenation, we can make regular expressions
 397      like `a.b', which matches any three-character string that begins
 398      with `a' and ends with `b'.
 399
 400 `*'
 401      is not a construct by itself; it is a quantifying suffix operator
 402      that means to repeat the preceding regular expression as many
 403      times as possible.  In `fo*', the `*' applies to the `o', so `fo*'
 404      matches one `f' followed by any number of `o's.  The case of zero
 405      `o's is allowed: `fo*' does match `f'.
 406
 407      `*' always applies to the *smallest* possible preceding
 408      expression.  Thus, `fo*' has a repeating `o', not a repeating `fo'.
 409
 410      The matcher processes a `*' construct by matching, immediately, as
 411      many repetitions as can be found; it is "greedy".  Then it
 412      continues with the rest of the pattern.  If that fails,
 413      backtracking occurs, discarding some of the matches of the
 414      `*'-modified construct in case that makes it possible to match the
 415      rest of the pattern.  For example, in matching `ca*ar' against the
 416      string `caaar', the `a*' first tries to match all three `a's; but
 417      the rest of the pattern is `ar' and there is only `r' left to
 418      match, so this try fails.  The next alternative is for `a*' to
 419      match only two `a's.  With this choice, the rest of the regexp
 420      matches successfully.
 421
 422      Nested repetition operators can be extremely slow if they specify
 423      backtracking loops.  For example, it could take hours for the
 424      regular expression `\(x+y*\)*a' to match the sequence
 425      `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'.  The slowness is because
 426      Emacs must try each imaginable way of grouping the 35 `x''s before
 427      concluding that none of them can work.  To make sure your regular
 428      expressions run fast, check nested repetitions carefully.
 429
 430 `+'
 431      is a quantifying suffix operator similar to `*' except that the
 432      preceding expression must match at least once.  It is also
 433      "greedy".  So, for example, `ca+r' matches the strings `car' and
 434      `caaaar' but not the string `cr', whereas `ca*r' matches all three
 435      strings.
 436
 437 `?'
 438      is a quantifying suffix operator similar to `*', except that the
 439      preceding expression can match either once or not at all.  For
 440      example, `ca?r' matches `car' or `cr', but does not match anything
 441      else.
 442
 443 `*?'
 444      works just like `*', except that rather than matching the longest
 445      match, it matches the shortest match.  `*?' is known as a
 446      "non-greedy" quantifier, a regexp construct borrowed from Perl.
 447
 448      This construct very useful for when you want to match the text
 449      inside a pair of delimiters.  For instance, `/\*.*?\*/' will match
 450      C comments in a string.  This could not be achieved without the
 451      use of greedy quantifier.
 452
 453      This construct has not been available prior to XEmacs 20.4.  It is
 454      not available in FSF Emacs.
 455
 456 `+?'
 457      is the `+' analog to `*?'.
 458
 459 `\{n,m\}'
 460      serves as an interval quantifier, analogous to `*' or `+', but
 461      specifies that the expression must match at least N times, but no
 462      more than M times.  This syntax is supported by most Unix regexp
 463      utilities, and has been introduced to XEmacs for the version 20.3.
 464
 465 `[ ... ]'
 466      `[' begins a "character set", which is terminated by a `]'.  In
 467      the simplest case, the characters between the two brackets form
 468      the set.  Thus, `[ad]' matches either one `a' or one `d', and
 469      `[ad]*' matches any string composed of just `a's and `d's
 470      (including the empty string), from which it follows that `c[ad]*r'
 471      matches `cr', `car', `cdr', `caddaar', etc.
 472
 473      The usual regular expression special characters are not special
 474      inside a character set.  A completely different set of special
 475      characters exists inside character sets: `]', `-' and `^'.
 476
 477      `-' is used for ranges of characters.  To write a range, write two
 478      characters with a `-' between them.  Thus, `[a-z]' matches any
 479      lower case letter.  Ranges may be intermixed freely with individual
 480      characters, as in `[a-z$%.]', which matches any lower case letter
 481      or `$', `%', or a period.
 482
 483      To include a `]' in a character set, make it the first character.
 484      For example, `[]a]' matches `]' or `a'.  To include a `-', write
 485      `-' as the first character in the set, or put it immediately after
 486      a range.  (You can replace one individual character C with the
 487      range `C-C' to make a place to put the `-'.)  There is no way to
 488      write a set containing just `-' and `]'.
 489
 490      To include `^' in a set, put it anywhere but at the beginning of
 491      the set.
 492
 493 `[^ ... ]'
 494      `[^' begins a "complement character set", which matches any
 495      character except the ones specified.  Thus, `[^a-z0-9A-Z]' matches
 496      all characters *except* letters and digits.
 497
 498      `^' is not special in a character set unless it is the first
 499      character.  The character following the `^' is treated as if it
 500      were first (thus, `-' and `]' are not special there).
 501
 502      Note that a complement character set can match a newline, unless
 503      newline is mentioned as one of the characters not to match.
 504
 505 `^'
 506      is a special character that matches the empty string, but only at
 507      the beginning of a line in the text being matched.  Otherwise it
 508      fails to match anything.  Thus, `^foo' matches a `foo' that occurs
 509      at the beginning of a line.
 510
 511      When matching a string instead of a buffer, `^' matches at the
 512      beginning of the string or after a newline character `\n'.
 513
 514 `$'
 515      is similar to `^' but matches only at the end of a line.  Thus,
 516      `x+$' matches a string of one `x' or more at the end of a line.
 517
 518      When matching a string instead of a buffer, `$' matches at the end
 519      of the string or before a newline character `\n'.
 520
 521 `\'
 522      has two functions: it quotes the special characters (including
 523      `\'), and it introduces additional special constructs.
 524
 525      Because `\' quotes special characters, `\$' is a regular
 526      expression that matches only `$', and `\[' is a regular expression
 527      that matches only `[', and so on.
 528
 529      Note that `\' also has special meaning in the read syntax of Lisp
 530      strings (*note String Type::.), and must be quoted with `\'.  For
 531      example, the regular expression that matches the `\' character is
 532      `\\'.  To write a Lisp string that contains the characters `\\',
 533      Lisp syntax requires you to quote each `\' with another `\'.
 534      Therefore, the read syntax for a regular expression matching `\'
 535      is `"\\\\"'.
 536
 537    *Please note:* For historical compatibility, special characters are
 538 treated as ordinary ones if they are in contexts where their special
 539 meanings make no sense.  For example, `*foo' treats `*' as ordinary
 540 since there is no preceding expression on which the `*' can act.  It is
 541 poor practice to depend on this behavior; quote the special character
 542 anyway, regardless of where it appears.
 543
 544    For the most part, `\' followed by any character matches only that
 545 character.  However, there are several exceptions: characters that,
 546 when preceded by `\', are special constructs.  Such characters are
 547 always ordinary when encountered on their own.  Here is a table of `\'
 548 constructs:
 549
 550 `\|'
 551      specifies an alternative.  Two regular expressions A and B with
 552      `\|' in between form an expression that matches anything that
 553      either A or B matches.
 554
 555      Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
 556
 557      `\|' applies to the largest possible surrounding expressions.
 558      Only a surrounding `\( ... \)' grouping can limit the grouping
 559      power of `\|'.
 560
 561      Full backtracking capability exists to handle multiple uses of
 562      `\|'.
 563
 564 `\( ... \)'
 565      is a grouping construct that serves three purposes:
 566
 567        1. To enclose a set of `\|' alternatives for other operations.
 568           Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
 569
 570        2. To enclose an expression for a suffix operator such as `*' to
 571           act on.  Thus, `ba\(na\)*' matches `bananana', etc., with any
 572           (zero or more) number of `na' strings.
 573
 574        3. To record a matched substring for future reference.
 575
 576      This last application is not a consequence of the idea of a
 577      parenthetical grouping; it is a separate feature that happens to be
 578      assigned as a second meaning to the same `\( ... \)' construct
 579      because there is no conflict in practice between the two meanings.
 580      Here is an explanation of this feature:
 581
 582 `\DIGIT'
 583      matches the same text that matched the DIGITth occurrence of a `\(
 584      ... \)' construct.
 585
 586      In other words, after the end of a `\( ... \)' construct.  the
 587      matcher remembers the beginning and end of the text matched by that
 588      construct.  Then, later on in the regular expression, you can use
 589      `\' followed by DIGIT to match that same text, whatever it may
 590      have been.
 591
 592      The strings matching the first nine `\( ... \)' constructs
 593      appearing in a regular expression are assigned numbers 1 through 9
 594      in the order that the open parentheses appear in the regular
 595      expression.  So you can use `\1' through `\9' to refer to the text
 596      matched by the corresponding `\( ... \)' constructs.
 597
 598      For example, `\(.*\)\1' matches any newline-free string that is
 599      composed of two identical halves.  The `\(.*\)' matches the first
 600      half, which may be anything, but the `\1' that follows must match
 601      the same exact text.
 602
 603 `\(?: ... \)'
 604      is called a "shy" grouping operator, and it is used just like `\(
 605      ... \)', except that it does not cause the matched substring to be
 606      recorded for future reference.
 607
 608      This is useful when you need a lot of grouping `\( ... \)'
 609      constructs, but only want to remember one or two.  Then you can use
 610      not want to remember them for later use with `match-string'.
 611
 612      Using `\(?: ... \)' rather than `\( ... \)' when you don't need
 613      the captured substrings ought to speed up your programs some,
 614      since it shortens the code path followed by the regular expression
 615      engine, as well as the amount of memory allocation and string
 616      copying it must do.  The actual performance gain to be observed
 617      has not been measured or quantified as of this writing.
 618
 619      The shy grouping operator has been borrowed from Perl, and has not
 620      been available prior to XEmacs 20.3, nor is it available in FSF
 621      Emacs.
 622
 623 `\w'
 624      matches any word-constituent character.  The editor syntax table
 625      determines which characters these are.  *Note Syntax Tables::.
 626
 627 `\W'
 628      matches any character that is not a word constituent.
 629
 630 `\sCODE'
 631      matches any character whose syntax is CODE.  Here CODE is a
 632      character that represents a syntax code: thus, `w' for word
 633      constituent, `-' for whitespace, `(' for open parenthesis, etc.
 634      *Note Syntax Tables::, for a list of syntax codes and the
 635      characters that stand for them.
 636
 637 `\SCODE'
 638      matches any character whose syntax is not CODE.
 639
 640    The following regular expression constructs match the empty
 641 string--that is, they don't use up any characters--but whether they
 642 match depends on the context.
 643
 644 `\`'
 645      matches the empty string, but only at the beginning of the buffer
 646      or string being matched against.
 647
 648 `\''
 649      matches the empty string, but only at the end of the buffer or
 650      string being matched against.
 651
 652 `\='
 653      matches the empty string, but only at point.  (This construct is
 654      not defined when matching against a string.)
 655
 656 `\b'
 657      matches the empty string, but only at the beginning or end of a
 658      word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
 659      separate word.  `\bballs?\b' matches `ball' or `balls' as a
 660      separate word.
 661
 662 `\B'
 663      matches the empty string, but *not* at the beginning or end of a
 664      word.
 665
 666 `\<'
 667      matches the empty string, but only at the beginning of a word.
 668
 669 `\>'
 670      matches the empty string, but only at the end of a word.
 671
 672    Not every string is a valid regular expression.  For example, a
 673 string with unbalanced square brackets is invalid (with a few
 674 exceptions, such as `[]]'), and so is a string that ends with a single
 675 `\'.  If an invalid regular expression is passed to any of the search
 676 functions, an `invalid-regexp' error is signaled.
 677
 678  - Function: regexp-quote STRING
 679      This function returns a regular expression string that matches
 680      exactly STRING and nothing else.  This allows you to request an
 681      exact string match when calling a function that wants a regular
 682      expression.
 683
 684           (regexp-quote "^The cat$")
 685                => "\\^The cat\\$"
 686
 687      One use of `regexp-quote' is to combine an exact string match with
 688      context described as a regular expression.  For example, this
 689      searches for the string that is the value of `string', surrounded
 690      by whitespace:
 691
 692           (re-search-forward
 693            (concat "\\s-" (regexp-quote string) "\\s-"))
 694
 695 \1f
 696 File: lispref.info,  Node: Regexp Example,  Prev: Syntax of Regexps,  Up: Regular Expressions
 697
 698 Complex Regexp Example
 699 ----------------------
 700
 701    Here is a complicated regexp, used by XEmacs to recognize the end of
 702 a sentence together with any whitespace that follows.  It is the value
 703 of the variable `sentence-end'.
 704
 705    First, we show the regexp as a string in Lisp syntax to distinguish
 706 spaces from tab characters.  The string constant begins and ends with a
 707 double-quote.  `\"' stands for a double-quote as part of the string,
 708 `\\' for a backslash as part of the string, `\t' for a tab and `\n' for
 709 a newline.
 710
 711      "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
 712
 713    In contrast, if you evaluate the variable `sentence-end', you will
 714 see the following:
 715
 716      sentence-end
 717      =>
 718      "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
 719      ]*"
 720
 721 In this output, tab and newline appear as themselves.
 722
 723    This regular expression contains four parts in succession and can be
 724 deciphered as follows:
 725
 726 `[.?!]'
 727      The first part of the pattern is a character set that matches any
 728      one of three characters: period, question mark, and exclamation
 729      mark.  The match must begin with one of these three characters.
 730
 731 `[]\"')}]*'
 732      The second part of the pattern matches any closing braces and
 733      quotation marks, zero or more of them, that may follow the period,
 734      question mark or exclamation mark.  The `\"' is Lisp syntax for a
 735      double-quote in a string.  The `*' at the end indicates that the
 736      immediately preceding regular expression (a character set, in this
 737      case) may be repeated zero or more times.
 738
 739 `\\($\\| $\\|\t\\|  \\)'
 740      The third part of the pattern matches the whitespace that follows
 741      the end of a sentence: the end of a line, or a tab, or two spaces.
 742      The double backslashes mark the parentheses and vertical bars as
 743      regular expression syntax; the parentheses delimit a group and the
 744      vertical bars separate alternatives.  The dollar sign is used to
 745      match the end of a line.
 746
 747 `[ \t\n]*'
 748      Finally, the last part of the pattern matches any additional
 749      whitespace beyond the minimum needed to end a sentence.
 750
 751 \1f
 752 File: lispref.info,  Node: Regexp Search,  Next: POSIX Regexps,  Prev: Regular Expressions,  Up: Searching and Matching
 753
 754 Regular Expression Searching
 755 ============================
 756
 757    In XEmacs, you can search for the next match for a regexp either
 758 incrementally or not.  Incremental search commands are described in the
 759 `The XEmacs Reference Manual'.  *Note Regular Expression Search:
 760 (emacs)Regexp Search.  Here we describe only the search functions
 761 useful in programs.  The principal one is `re-search-forward'.
 762
 763  - Command: re-search-forward REGEXP &optional LIMIT NOERROR REPEAT
 764      This function searches forward in the current buffer for a string
 765      of text that is matched by the regular expression REGEXP.  The
 766      function skips over any amount of text that is not matched by
 767      REGEXP, and leaves point at the end of the first match found.  It
 768      returns the new value of point.
 769
 770      If LIMIT is non-`nil' (it must be a position in the current
 771      buffer), then it is the upper bound to the search.  No match
 772      extending after that position is accepted.
 773
 774      What happens when the search fails depends on the value of
 775      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 776      signaled.  If NOERROR is `t', `re-search-forward' does nothing and
 777      returns `nil'.  If NOERROR is neither `nil' nor `t', then
 778      `re-search-forward' moves point to LIMIT (or the end of the
 779      buffer) and returns `nil'.
 780
 781      If REPEAT is supplied (it must be a positive number), then the
 782      search is repeated that many times (each time starting at the end
 783      of the previous time's match).  If these successive searches
 784      succeed, the function succeeds, moving point and returning its new
 785      value.  Otherwise the search fails.
 786
 787      In the following example, point is initially before the `T'.
 788      Evaluating the search call moves point to the end of that line
 789      (between the `t' of `hat' and the newline).
 790
 791           ---------- Buffer: foo ----------
 792           I read "-!-The cat in the hat
 793           comes back" twice.
 794           ---------- Buffer: foo ----------
 795
 796           (re-search-forward "[a-z]+" nil t 5)
 797                => 27
 798
 799           ---------- Buffer: foo ----------
 800           I read "The cat in the hat-!-
 801           comes back" twice.
 802           ---------- Buffer: foo ----------
 803
 804  - Command: re-search-backward REGEXP &optional LIMIT NOERROR REPEAT
 805      This function searches backward in the current buffer for a string
 806      of text that is matched by the regular expression REGEXP, leaving
 807      point at the beginning of the first text found.
 808
 809      This function is analogous to `re-search-forward', but they are not
 810      simple mirror images.  `re-search-forward' finds the match whose
 811      beginning is as close as possible to the starting point.  If
 812      `re-search-backward' were a perfect mirror image, it would find the
 813      match whose end is as close as possible.  However, in fact it
 814      finds the match whose beginning is as close as possible.  The
 815      reason is that matching a regular expression at a given spot
 816      always works from beginning to end, and starts at a specified
 817      beginning position.
 818
 819      A true mirror-image of `re-search-forward' would require a special
 820      feature for matching regexps from end to beginning.  It's not
 821      worth the trouble of implementing that.
 822
 823  - Function: string-match REGEXP STRING &optional START
 824      This function returns the index of the start of the first match for
 825      the regular expression REGEXP in STRING, or `nil' if there is no
 826      match.  If START is non-`nil', the search starts at that index in
 827      STRING.
 828
 829      For example,
 830
 831           (string-match
 832            "quick" "The quick brown fox jumped quickly.")
 833                => 4
 834           (string-match
 835            "quick" "The quick brown fox jumped quickly." 8)
 836                => 27
 837
 838      The index of the first character of the string is 0, the index of
 839      the second character is 1, and so on.
 840
 841      After this function returns, the index of the first character
 842      beyond the match is available as `(match-end 0)'.  *Note Match
 843      Data::.
 844
 845           (string-match
 846            "quick" "The quick brown fox jumped quickly." 8)
 847                => 27
 848
 849           (match-end 0)
 850                => 32
 851
 852  - Function: split-string STRING &optional PATTERN
 853      This function splits STRING to substrings delimited by PATTERN,
 854      and returns a list of substrings.  If PATTERN is omitted, it
 855      defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
 856      white-space.
 857
 858           (split-string "foo bar")
 859                => ("foo" "bar")
 860
 861           (split-string "something")
 862                => ("something")
 863
 864           (split-string "a:b:c" ":")
 865                => ("a" "b" "c")
 866
 867           (split-string ":a::b:c" ":")
 868                => ("" "a" "" "b" "c")
 869
 870  - Function: split-path PATH
 871      This function splits a search path into a list of strings.  The
 872      path components are separated with the characters specified with
 873      `path-separator'.  Under Unix, `path-separator' will normally be
 874      `:', while under Windows, it will be `;'.
 875
 876  - Function: looking-at REGEXP
 877      This function determines whether the text in the current buffer
 878      directly following point matches the regular expression REGEXP.
 879      "Directly following" means precisely that: the search is
 880      "anchored" and it can succeed only starting with the first
 881      character following point.  The result is `t' if so, `nil'
 882      otherwise.
 883
 884      This function does not move point, but it updates the match data,
 885      which you can access using `match-beginning' and `match-end'.
 886      *Note Match Data::.
 887
 888      In this example, point is located directly before the `T'.  If it
 889      were anywhere else, the result would be `nil'.
 890
 891           ---------- Buffer: foo ----------
 892           I read "-!-The cat in the hat
 893           comes back" twice.
 894           ---------- Buffer: foo ----------
 895
 896           (looking-at "The cat in the hat$")
 897                => t
 898
 899 \1f
 900 File: lispref.info,  Node: POSIX Regexps,  Next: Search and Replace,  Prev: Regexp Search,  Up: Searching and Matching
 901
 902 POSIX Regular Expression Searching
 903 ==================================
 904
 905    The usual regular expression functions do backtracking when necessary
 906 to handle the `\|' and repetition constructs, but they continue this
 907 only until they find *some* match.  Then they succeed and report the
 908 first match found.
 909
 910    This section describes alternative search functions which perform the
 911 full backtracking specified by the POSIX standard for regular expression
 912 matching.  They continue backtracking until they have tried all
 913 possibilities and found all matches, so they can report the longest
 914 match, as required by POSIX.  This is much slower, so use these
 915 functions only when you really need the longest match.
 916
 917    In Emacs versions prior to 19.29, these functions did not exist, and
 918 the functions described above implemented full POSIX backtracking.
 919
 920  - Function: posix-search-forward REGEXP &optional LIMIT NOERROR REPEAT
 921      This is like `re-search-forward' except that it performs the full
 922      backtracking specified by the POSIX standard for regular expression
 923      matching.
 924
 925  - Function: posix-search-backward REGEXP &optional LIMIT NOERROR REPEAT
 926      This is like `re-search-backward' except that it performs the full
 927      backtracking specified by the POSIX standard for regular expression
 928      matching.
 929
 930  - Function: posix-looking-at REGEXP
 931      This is like `looking-at' except that it performs the full
 932      backtracking specified by the POSIX standard for regular expression
 933      matching.
 934
 935  - Function: posix-string-match REGEXP STRING &optional START
 936      This is like `string-match' except that it performs the full
 937      backtracking specified by the POSIX standard for regular expression
 938      matching.
 939
 940 \1f
 941 File: lispref.info,  Node: Search and Replace,  Next: Match Data,  Prev: POSIX Regexps,  Up: Searching and Matching
 942
 943 Search and Replace
 944 ==================
 945
 946  - Function: perform-replace FROM-STRING REPLACEMENTS QUERY-FLAG
 947           REGEXP-FLAG DELIMITED-FLAG &optional REPEAT-COUNT MAP
 948      This function is the guts of `query-replace' and related commands.
 949      It searches for occurrences of FROM-STRING and replaces some or
 950      all of them.  If QUERY-FLAG is `nil', it replaces all occurrences;
 951      otherwise, it asks the user what to do about each one.
 952
 953      If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
 954      regular expression; otherwise, it must match literally.  If
 955      DELIMITED-FLAG is non-`nil', then only replacements surrounded by
 956      word boundaries are considered.
 957
 958      The argument REPLACEMENTS specifies what to replace occurrences
 959      with.  If it is a string, that string is used.  It can also be a
 960      list of strings, to be used in cyclic order.
 961
 962      If REPEAT-COUNT is non-`nil', it should be an integer.  Then it
 963      specifies how many times to use each of the strings in the
 964      REPLACEMENTS list before advancing cyclicly to the next one.
 965
 966      Normally, the keymap `query-replace-map' defines the possible user
 967      responses for queries.  The argument MAP, if non-`nil', is a
 968      keymap to use instead of `query-replace-map'.
 969
 970  - Variable: query-replace-map
 971      This variable holds a special keymap that defines the valid user
 972      responses for `query-replace' and related functions, as well as
 973      `y-or-n-p' and `map-y-or-n-p'.  It is unusual in two ways:
 974
 975         * The "key bindings" are not commands, just symbols that are
 976           meaningful to the functions that use this map.
 977
 978         * Prefix keys are not supported; each key binding must be for a
 979           single event key sequence.  This is because the functions
 980           don't use read key sequence to get the input; instead, they
 981           read a single event and look it up "by hand."
 982
 983    Here are the meaningful "bindings" for `query-replace-map'.  Several
 984 of them are meaningful only for `query-replace' and friends.
 985
 986 `act'
 987      Do take the action being considered--in other words, "yes."
 988
 989 `skip'
 990      Do not take action for this question--in other words, "no."
 991
 992 `exit'
 993      Answer this question "no," and give up on the entire series of
 994      questions, assuming that the answers will be "no."
 995
 996 `act-and-exit'
 997      Answer this question "yes," and give up on the entire series of
 998      questions, assuming that subsequent answers will be "no."
 999
1000 `act-and-show'
1001      Answer this question "yes," but show the results--don't advance yet
1002      to the next question.
1003
1004 `automatic'
1005      Answer this question and all subsequent questions in the series
1006      with "yes," without further user interaction.
1007
1008 `backup'
1009      Move back to the previous place that a question was asked about.
1010
1011 `edit'
1012      Enter a recursive edit to deal with this question--instead of any
1013      other action that would normally be taken.
1014
1015 `delete-and-edit'
1016      Delete the text being considered, then enter a recursive edit to
1017      replace it.
1018
1019 `recenter'
1020      Redisplay and center the window, then ask the same question again.
1021
1022 `quit'
1023      Perform a quit right away.  Only `y-or-n-p' and related functions
1024      use this answer.
1025
1026 `help'
1027      Display some help, then ask again.
1028
1029 \1f
1030 File: lispref.info,  Node: Match Data,  Next: Searching and Case,  Prev: Search and Replace,  Up: Searching and Matching
1031
1032 The Match Data
1033 ==============
1034
1035    XEmacs keeps track of the positions of the start and end of segments
1036 of text found during a regular expression search.  This means, for
1037 example, that you can search for a complex pattern, such as a date in
1038 an Rmail message, and then extract parts of the match under control of
1039 the pattern.
1040
1041    Because the match data normally describe the most recent search only,
1042 you must be careful not to do another search inadvertently between the
1043 search you wish to refer back to and the use of the match data.  If you
1044 can't avoid another intervening search, you must save and restore the
1045 match data around it, to prevent it from being overwritten.
1046
1047 * Menu:
1048
1049 * Simple Match Data::     Accessing single items of match data,
1050                             such as where a particular subexpression started.
1051 * Replacing Match::       Replacing a substring that was matched.
1052 * Entire Match Data::     Accessing the entire match data at once, as a list.
1053 * Saving Match Data::     Saving and restoring the match data.
1054
1055 \1f
1056 File: lispref.info,  Node: Simple Match Data,  Next: Replacing Match,  Up: Match Data
1057
1058 Simple Match Data Access
1059 ------------------------
1060
1061    This section explains how to use the match data to find out what was
1062 matched by the last search or match operation.
1063
1064    You can ask about the entire matching text, or about a particular
1065 parenthetical subexpression of a regular expression.  The COUNT
1066 argument in the functions below specifies which.  If COUNT is zero, you
1067 are asking about the entire match.  If COUNT is positive, it specifies
1068 which subexpression you want.
1069
1070    Recall that the subexpressions of a regular expression are those
1071 expressions grouped with escaped parentheses, `\(...\)'.  The COUNTth
1072 subexpression is found by counting occurrences of `\(' from the
1073 beginning of the whole regular expression.  The first subexpression is
1074 numbered 1, the second 2, and so on.  Only regular expressions can have
1075 subexpressions--after a simple string search, the only information
1076 available is about the entire match.
1077
1078  - Function: match-string COUNT &optional IN-STRING
1079      This function returns, as a string, the text matched in the last
1080      search or match operation.  It returns the entire text if COUNT is
1081      zero, or just the portion corresponding to the COUNTth
1082      parenthetical subexpression, if COUNT is positive.  If COUNT is
1083      out of range, or if that subexpression didn't match anything, the
1084      value is `nil'.
1085
1086      If the last such operation was done against a string with
1087      `string-match', then you should pass the same string as the
1088      argument IN-STRING.  Otherwise, after a buffer search or match,
1089      you should omit IN-STRING or pass `nil' for it; but you should
1090      make sure that the current buffer when you call `match-string' is
1091      the one in which you did the searching or matching.
1092
1093  - Function: match-beginning COUNT
1094      This function returns the position of the start of text matched by
1095      the last regular expression searched for, or a subexpression of it.
1096
1097      If COUNT is zero, then the value is the position of the start of
1098      the entire match.  Otherwise, COUNT specifies a subexpression in
1099      the regular expression, and the value of the function is the
1100      starting position of the match for that subexpression.
1101
1102      The value is `nil' for a subexpression inside a `\|' alternative
1103      that wasn't used in the match.
1104
1105  - Function: match-end COUNT
1106      This function is like `match-beginning' except that it returns the
1107      position of the end of the match, rather than the position of the
1108      beginning.
1109
1110    Here is an example of using the match data, with a comment showing
1111 the positions within the text:
1112
1113      (string-match "\\(qu\\)\\(ick\\)"
1114                    "The quick fox jumped quickly.")
1115                    ;0123456789
1116           => 4
1117
1118      (match-string 0 "The quick fox jumped quickly.")
1119           => "quick"
1120      (match-string 1 "The quick fox jumped quickly.")
1121           => "qu"
1122      (match-string 2 "The quick fox jumped quickly.")
1123           => "ick"
1124
1125      (match-beginning 1)       ; The beginning of the match
1126           => 4                 ;   with `qu' is at index 4.
1127
1128      (match-beginning 2)       ; The beginning of the match
1129           => 6                 ;   with `ick' is at index 6.
1130
1131      (match-end 1)             ; The end of the match
1132           => 6                 ;   with `qu' is at index 6.
1133
1134      (match-end 2)             ; The end of the match
1135           => 9                 ;   with `ick' is at index 9.
1136
1137    Here is another example.  Point is initially located at the beginning
1138 of the line.  Searching moves point to between the space and the word
1139 `in'.  The beginning of the entire match is at the 9th character of the
1140 buffer (`T'), and the beginning of the match for the first
1141 subexpression is at the 13th character (`c').
1142
1143      (list
1144        (re-search-forward "The \\(cat \\)")
1145        (match-beginning 0)
1146        (match-beginning 1))
1147          => (9 9 13)
1148
1149      ---------- Buffer: foo ----------
1150      I read "The cat -!-in the hat comes back" twice.
1151              ^   ^
1152              9  13
1153      ---------- Buffer: foo ----------
1154
1155 (In this case, the index returned is a buffer position; the first
1156 character of the buffer counts as 1.)
1157
1158 \1f
1159 File: lispref.info,  Node: Replacing Match,  Next: Entire Match Data,  Prev: Simple Match Data,  Up: Match Data
1160
1161 Replacing the Text That Matched
1162 -------------------------------
1163
1164    This function replaces the text matched by the last search with
1165 REPLACEMENT.
1166
1167  - Function: replace-match REPLACEMENT &optional FIXEDCASE LITERAL
1168           STRING
1169      This function replaces the text in the buffer (or in STRING) that
1170      was matched by the last search.  It replaces that text with
1171      REPLACEMENT.
1172
1173      If you did the last search in a buffer, you should specify `nil'
1174      for STRING.  Then `replace-match' does the replacement by editing
1175      the buffer; it leaves point at the end of the replacement text,
1176      and returns `t'.
1177
1178      If you did the search in a string, pass the same string as STRING.
1179      Then `replace-match' does the replacement by constructing and
1180      returning a new string.
1181
1182      If FIXEDCASE is non-`nil', then the case of the replacement text
1183      is not changed; otherwise, the replacement text is converted to a
1184      different case depending upon the capitalization of the text to be
1185      replaced.  If the original text is all upper case, the replacement
1186      text is converted to upper case.  If the first word of the
1187      original text is capitalized, then the first word of the
1188      replacement text is capitalized.  If the original text contains
1189      just one word, and that word is a capital letter, `replace-match'
1190      considers this a capitalized first word rather than all upper case.
1191
1192      If `case-replace' is `nil', then case conversion is not done,
1193      regardless of the value of FIXED-CASE.  *Note Searching and Case::.
1194
1195      If LITERAL is non-`nil', then REPLACEMENT is inserted exactly as
1196      it is, the only alterations being case changes as needed.  If it
1197      is `nil' (the default), then the character `\' is treated
1198      specially.  If a `\' appears in REPLACEMENT, then it must be part
1199      of one of the following sequences:
1200
1201     `\&'
1202           `\&' stands for the entire text being replaced.
1203
1204     `\N'
1205           `\N', where N is a digit, stands for the text that matched
1206           the Nth subexpression in the original regexp.  Subexpressions
1207           are those expressions grouped inside `\(...\)'.
1208
1209     `\\'
1210           `\\' stands for a single `\' in the replacement text.
1211