git.chise.org Git - chise/xemacs-chise.git-/blob - info/lispref.info-32

   1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Regexp Example,  Prev: Syntax of Regexps,  Up: Regular Expressions
  54
  55 Complex Regexp Example
  56 ----------------------
  57
  58    Here is a complicated regexp, used by XEmacs to recognize the end of
  59 a sentence together with any whitespace that follows.  It is the value
  60 of the variable `sentence-end'.
  61
  62    First, we show the regexp as a string in Lisp syntax to distinguish
  63 spaces from tab characters.  The string constant begins and ends with a
  64 double-quote.  `\"' stands for a double-quote as part of the string,
  65 `\\' for a backslash as part of the string, `\t' for a tab and `\n' for
  66 a newline.
  67
  68      "[.?!][]\"')}]*\\($\\| $\\|\t\\|  \\)[ \t\n]*"
  69
  70    In contrast, if you evaluate the variable `sentence-end', you will
  71 see the following:
  72
  73      sentence-end
  74      =>
  75      "[.?!][]\"')}]*\\($\\| $\\|  \\|  \\)[
  76      ]*"
  77
  78 In this output, tab and newline appear as themselves.
  79
  80    This regular expression contains four parts in succession and can be
  81 deciphered as follows:
  82
  83 `[.?!]'
  84      The first part of the pattern is a character set that matches any
  85      one of three characters: period, question mark, and exclamation
  86      mark.  The match must begin with one of these three characters.
  87
  88 `[]\"')}]*'
  89      The second part of the pattern matches any closing braces and
  90      quotation marks, zero or more of them, that may follow the period,
  91      question mark or exclamation mark.  The `\"' is Lisp syntax for a
  92      double-quote in a string.  The `*' at the end indicates that the
  93      immediately preceding regular expression (a character set, in this
  94      case) may be repeated zero or more times.
  95
  96 `\\($\\| $\\|\t\\|  \\)'
  97      The third part of the pattern matches the whitespace that follows
  98      the end of a sentence: the end of a line, or a tab, or two spaces.
  99      The double backslashes mark the parentheses and vertical bars as
 100      regular expression syntax; the parentheses delimit a group and the
 101      vertical bars separate alternatives.  The dollar sign is used to
 102      match the end of a line.
 103
 104 `[ \t\n]*'
 105      Finally, the last part of the pattern matches any additional
 106      whitespace beyond the minimum needed to end a sentence.
 107
 108 \1f
 109 File: lispref.info,  Node: Regexp Search,  Next: POSIX Regexps,  Prev: Regular Expressions,  Up: Searching and Matching
 110
 111 Regular Expression Searching
 112 ============================
 113
 114    In XEmacs, you can search for the next match for a regexp either
 115 incrementally or not.  Incremental search commands are described in the
 116 `The XEmacs Reference Manual'.  *Note Regular Expression Search:
 117 (emacs)Regexp Search.  Here we describe only the search functions
 118 useful in programs.  The principal one is `re-search-forward'.
 119
 120  - Command: re-search-forward regexp &optional limit noerror repeat
 121      This function searches forward in the current buffer for a string
 122      of text that is matched by the regular expression REGEXP.  The
 123      function skips over any amount of text that is not matched by
 124      REGEXP, and leaves point at the end of the first match found.  It
 125      returns the new value of point.
 126
 127      If LIMIT is non-`nil' (it must be a position in the current
 128      buffer), then it is the upper bound to the search.  No match
 129      extending after that position is accepted.
 130
 131      What happens when the search fails depends on the value of
 132      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 133      signaled.  If NOERROR is `t', `re-search-forward' does nothing and
 134      returns `nil'.  If NOERROR is neither `nil' nor `t', then
 135      `re-search-forward' moves point to LIMIT (or the end of the
 136      buffer) and returns `nil'.
 137
 138      If REPEAT is supplied (it must be a positive number), then the
 139      search is repeated that many times (each time starting at the end
 140      of the previous time's match).  If these successive searches
 141      succeed, the function succeeds, moving point and returning its new
 142      value.  Otherwise the search fails.
 143
 144      In the following example, point is initially before the `T'.
 145      Evaluating the search call moves point to the end of that line
 146      (between the `t' of `hat' and the newline).
 147
 148           ---------- Buffer: foo ----------
 149           I read "-!-The cat in the hat
 150           comes back" twice.
 151           ---------- Buffer: foo ----------
 152
 153           (re-search-forward "[a-z]+" nil t 5)
 154                => 27
 155
 156           ---------- Buffer: foo ----------
 157           I read "The cat in the hat-!-
 158           comes back" twice.
 159           ---------- Buffer: foo ----------
 160
 161  - Command: re-search-backward regexp &optional limit noerror repeat
 162      This function searches backward in the current buffer for a string
 163      of text that is matched by the regular expression REGEXP, leaving
 164      point at the beginning of the first text found.
 165
 166      This function is analogous to `re-search-forward', but they are not
 167      simple mirror images.  `re-search-forward' finds the match whose
 168      beginning is as close as possible to the starting point.  If
 169      `re-search-backward' were a perfect mirror image, it would find the
 170      match whose end is as close as possible.  However, in fact it
 171      finds the match whose beginning is as close as possible.  The
 172      reason is that matching a regular expression at a given spot
 173      always works from beginning to end, and starts at a specified
 174      beginning position.
 175
 176      A true mirror-image of `re-search-forward' would require a special
 177      feature for matching regexps from end to beginning.  It's not
 178      worth the trouble of implementing that.
 179
 180  - Function: string-match regexp string &optional start
 181      This function returns the index of the start of the first match for
 182      the regular expression REGEXP in STRING, or `nil' if there is no
 183      match.  If START is non-`nil', the search starts at that index in
 184      STRING.
 185
 186      For example,
 187
 188           (string-match
 189            "quick" "The quick brown fox jumped quickly.")
 190                => 4
 191           (string-match
 192            "quick" "The quick brown fox jumped quickly." 8)
 193                => 27
 194
 195      The index of the first character of the string is 0, the index of
 196      the second character is 1, and so on.
 197
 198      After this function returns, the index of the first character
 199      beyond the match is available as `(match-end 0)'.  *Note Match
 200      Data::.
 201
 202           (string-match
 203            "quick" "The quick brown fox jumped quickly." 8)
 204                => 27
 205
 206           (match-end 0)
 207                => 32
 208
 209  - Function: split-string string &optional pattern
 210      This function splits STRING to substrings delimited by PATTERN,
 211      and returns a list of substrings.  If PATTERN is omitted, it
 212      defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
 213      white-space.
 214
 215           (split-string "foo bar")
 216                => ("foo" "bar")
 217
 218           (split-string "something")
 219                => ("something")
 220
 221           (split-string "a:b:c" ":")
 222                => ("a" "b" "c")
 223
 224           (split-string ":a::b:c" ":")
 225                => ("" "a" "" "b" "c")
 226
 227  - Function: split-path path
 228      This function splits a search path into a list of strings.  The
 229      path components are separated with the characters specified with
 230      `path-separator'.  Under Unix, `path-separator' will normally be
 231      `:', while under Windows, it will be `;'.
 232
 233  - Function: looking-at regexp
 234      This function determines whether the text in the current buffer
 235      directly following point matches the regular expression REGEXP.
 236      "Directly following" means precisely that: the search is
 237      "anchored" and it can succeed only starting with the first
 238      character following point.  The result is `t' if so, `nil'
 239      otherwise.
 240
 241      This function does not move point, but it updates the match data,
 242      which you can access using `match-beginning' and `match-end'.
 243      *Note Match Data::.
 244
 245      In this example, point is located directly before the `T'.  If it
 246      were anywhere else, the result would be `nil'.
 247
 248           ---------- Buffer: foo ----------
 249           I read "-!-The cat in the hat
 250           comes back" twice.
 251           ---------- Buffer: foo ----------
 252
 253           (looking-at "The cat in the hat$")
 254                => t
 255
 256 \1f
 257 File: lispref.info,  Node: POSIX Regexps,  Next: Search and Replace,  Prev: Regexp Search,  Up: Searching and Matching
 258
 259 POSIX Regular Expression Searching
 260 ==================================
 261
 262    The usual regular expression functions do backtracking when necessary
 263 to handle the `\|' and repetition constructs, but they continue this
 264 only until they find _some_ match.  Then they succeed and report the
 265 first match found.
 266
 267    This section describes alternative search functions which perform the
 268 full backtracking specified by the POSIX standard for regular expression
 269 matching.  They continue backtracking until they have tried all
 270 possibilities and found all matches, so they can report the longest
 271 match, as required by POSIX.  This is much slower, so use these
 272 functions only when you really need the longest match.
 273
 274    In Emacs versions prior to 19.29, these functions did not exist, and
 275 the functions described above implemented full POSIX backtracking.
 276
 277  - Function: posix-search-forward regexp &optional limit noerror repeat
 278      This is like `re-search-forward' except that it performs the full
 279      backtracking specified by the POSIX standard for regular expression
 280      matching.
 281
 282  - Function: posix-search-backward regexp &optional limit noerror repeat
 283      This is like `re-search-backward' except that it performs the full
 284      backtracking specified by the POSIX standard for regular expression
 285      matching.
 286
 287  - Function: posix-looking-at regexp
 288      This is like `looking-at' except that it performs the full
 289      backtracking specified by the POSIX standard for regular expression
 290      matching.
 291
 292  - Function: posix-string-match regexp string &optional start
 293      This is like `string-match' except that it performs the full
 294      backtracking specified by the POSIX standard for regular expression
 295      matching.
 296
 297 \1f
 298 File: lispref.info,  Node: Search and Replace,  Next: Match Data,  Prev: POSIX Regexps,  Up: Searching and Matching
 299
 300 Search and Replace
 301 ==================
 302
 303  - Function: perform-replace from-string replacements query-flag
 304           regexp-flag delimited-flag &optional repeat-count map
 305      This function is the guts of `query-replace' and related commands.
 306      It searches for occurrences of FROM-STRING and replaces some or
 307      all of them.  If QUERY-FLAG is `nil', it replaces all occurrences;
 308      otherwise, it asks the user what to do about each one.
 309
 310      If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
 311      regular expression; otherwise, it must match literally.  If
 312      DELIMITED-FLAG is non-`nil', then only replacements surrounded by
 313      word boundaries are considered.
 314
 315      The argument REPLACEMENTS specifies what to replace occurrences
 316      with.  If it is a string, that string is used.  It can also be a
 317      list of strings, to be used in cyclic order.
 318
 319      If REPEAT-COUNT is non-`nil', it should be an integer.  Then it
 320      specifies how many times to use each of the strings in the
 321      REPLACEMENTS list before advancing cyclicly to the next one.
 322
 323      Normally, the keymap `query-replace-map' defines the possible user
 324      responses for queries.  The argument MAP, if non-`nil', is a
 325      keymap to use instead of `query-replace-map'.
 326
 327  - Variable: query-replace-map
 328      This variable holds a special keymap that defines the valid user
 329      responses for `query-replace' and related functions, as well as
 330      `y-or-n-p' and `map-y-or-n-p'.  It is unusual in two ways:
 331
 332         * The "key bindings" are not commands, just symbols that are
 333           meaningful to the functions that use this map.
 334
 335         * Prefix keys are not supported; each key binding must be for a
 336           single event key sequence.  This is because the functions
 337           don't use read key sequence to get the input; instead, they
 338           read a single event and look it up "by hand."
 339
 340    Here are the meaningful "bindings" for `query-replace-map'.  Several
 341 of them are meaningful only for `query-replace' and friends.
 342
 343 `act'
 344      Do take the action being considered--in other words, "yes."
 345
 346 `skip'
 347      Do not take action for this question--in other words, "no."
 348
 349 `exit'
 350      Answer this question "no," and give up on the entire series of
 351      questions, assuming that the answers will be "no."
 352
 353 `act-and-exit'
 354      Answer this question "yes," and give up on the entire series of
 355      questions, assuming that subsequent answers will be "no."
 356
 357 `act-and-show'
 358      Answer this question "yes," but show the results--don't advance yet
 359      to the next question.
 360
 361 `automatic'
 362      Answer this question and all subsequent questions in the series
 363      with "yes," without further user interaction.
 364
 365 `backup'
 366      Move back to the previous place that a question was asked about.
 367
 368 `edit'
 369      Enter a recursive edit to deal with this question--instead of any
 370      other action that would normally be taken.
 371
 372 `delete-and-edit'
 373      Delete the text being considered, then enter a recursive edit to
 374      replace it.
 375
 376 `recenter'
 377      Redisplay and center the window, then ask the same question again.
 378
 379 `quit'
 380      Perform a quit right away.  Only `y-or-n-p' and related functions
 381      use this answer.
 382
 383 `help'
 384      Display some help, then ask again.
 385
 386 \1f
 387 File: lispref.info,  Node: Match Data,  Next: Searching and Case,  Prev: Search and Replace,  Up: Searching and Matching
 388
 389 The Match Data
 390 ==============
 391
 392    XEmacs keeps track of the positions of the start and end of segments
 393 of text found during a regular expression search.  This means, for
 394 example, that you can search for a complex pattern, such as a date in
 395 an Rmail message, and then extract parts of the match under control of
 396 the pattern.
 397
 398    Because the match data normally describe the most recent search only,
 399 you must be careful not to do another search inadvertently between the
 400 search you wish to refer back to and the use of the match data.  If you
 401 can't avoid another intervening search, you must save and restore the
 402 match data around it, to prevent it from being overwritten.
 403
 404 * Menu:
 405
 406 * Simple Match Data::     Accessing single items of match data,
 407                             such as where a particular subexpression started.
 408 * Replacing Match::       Replacing a substring that was matched.
 409 * Entire Match Data::     Accessing the entire match data at once, as a list.
 410 * Saving Match Data::     Saving and restoring the match data.
 411
 412 \1f
 413 File: lispref.info,  Node: Simple Match Data,  Next: Replacing Match,  Up: Match Data
 414
 415 Simple Match Data Access
 416 ------------------------
 417
 418    This section explains how to use the match data to find out what was
 419 matched by the last search or match operation.
 420
 421    You can ask about the entire matching text, or about a particular
 422 parenthetical subexpression of a regular expression.  The COUNT
 423 argument in the functions below specifies which.  If COUNT is zero, you
 424 are asking about the entire match.  If COUNT is positive, it specifies
 425 which subexpression you want.
 426
 427    Recall that the subexpressions of a regular expression are those
 428 expressions grouped with escaped parentheses, `\(...\)'.  The COUNTth
 429 subexpression is found by counting occurrences of `\(' from the
 430 beginning of the whole regular expression.  The first subexpression is
 431 numbered 1, the second 2, and so on.  Only regular expressions can have
 432 subexpressions--after a simple string search, the only information
 433 available is about the entire match.
 434
 435  - Function: match-string count &optional in-string
 436      This function returns, as a string, the text matched in the last
 437      search or match operation.  It returns the entire text if COUNT is
 438      zero, or just the portion corresponding to the COUNTth
 439      parenthetical subexpression, if COUNT is positive.  If COUNT is
 440      out of range, or if that subexpression didn't match anything, the
 441      value is `nil'.
 442
 443      If the last such operation was done against a string with
 444      `string-match', then you should pass the same string as the
 445      argument IN-STRING.  Otherwise, after a buffer search or match,
 446      you should omit IN-STRING or pass `nil' for it; but you should
 447      make sure that the current buffer when you call `match-string' is
 448      the one in which you did the searching or matching.
 449
 450  - Function: match-beginning count
 451      This function returns the position of the start of text matched by
 452      the last regular expression searched for, or a subexpression of it.
 453
 454      If COUNT is zero, then the value is the position of the start of
 455      the entire match.  Otherwise, COUNT specifies a subexpression in
 456      the regular expression, and the value of the function is the
 457      starting position of the match for that subexpression.
 458
 459      The value is `nil' for a subexpression inside a `\|' alternative
 460      that wasn't used in the match.
 461
 462  - Function: match-end count
 463      This function is like `match-beginning' except that it returns the
 464      position of the end of the match, rather than the position of the
 465      beginning.
 466
 467    Here is an example of using the match data, with a comment showing
 468 the positions within the text:
 469
 470      (string-match "\\(qu\\)\\(ick\\)"
 471                    "The quick fox jumped quickly.")
 472                    ;0123456789
 473           => 4
 474
 475      (match-string 0 "The quick fox jumped quickly.")
 476           => "quick"
 477      (match-string 1 "The quick fox jumped quickly.")
 478           => "qu"
 479      (match-string 2 "The quick fox jumped quickly.")
 480           => "ick"
 481
 482      (match-beginning 1)       ; The beginning of the match
 483           => 4                 ;   with `qu' is at index 4.
 484
 485      (match-beginning 2)       ; The beginning of the match
 486           => 6                 ;   with `ick' is at index 6.
 487
 488      (match-end 1)             ; The end of the match
 489           => 6                 ;   with `qu' is at index 6.
 490
 491      (match-end 2)             ; The end of the match
 492           => 9                 ;   with `ick' is at index 9.
 493
 494    Here is another example.  Point is initially located at the beginning
 495 of the line.  Searching moves point to between the space and the word
 496 `in'.  The beginning of the entire match is at the 9th character of the
 497 buffer (`T'), and the beginning of the match for the first
 498 subexpression is at the 13th character (`c').
 499
 500      (list
 501        (re-search-forward "The \\(cat \\)")
 502        (match-beginning 0)
 503        (match-beginning 1))
 504          => (9 9 13)
 505
 506      ---------- Buffer: foo ----------
 507      I read "The cat -!-in the hat comes back" twice.
 508              ^   ^
 509              9  13
 510      ---------- Buffer: foo ----------
 511
 512 (In this case, the index returned is a buffer position; the first
 513 character of the buffer counts as 1.)
 514
 515 \1f
 516 File: lispref.info,  Node: Replacing Match,  Next: Entire Match Data,  Prev: Simple Match Data,  Up: Match Data
 517
 518 Replacing the Text That Matched
 519 -------------------------------
 520
 521    This function replaces the text matched by the last search with
 522 REPLACEMENT.
 523
 524  - Function: replace-match replacement &optional fixedcase literal
 525           string
 526      This function replaces the text in the buffer (or in STRING) that
 527      was matched by the last search.  It replaces that text with
 528      REPLACEMENT.
 529
 530      If you did the last search in a buffer, you should specify `nil'
 531      for STRING.  Then `replace-match' does the replacement by editing
 532      the buffer; it leaves point at the end of the replacement text,
 533      and returns `t'.
 534
 535      If you did the search in a string, pass the same string as STRING.
 536      Then `replace-match' does the replacement by constructing and
 537      returning a new string.
 538
 539      If FIXEDCASE is non-`nil', then the case of the replacement text
 540      is not changed; otherwise, the replacement text is converted to a
 541      different case depending upon the capitalization of the text to be
 542      replaced.  If the original text is all upper case, the replacement
 543      text is converted to upper case.  If the first word of the
 544      original text is capitalized, then the first word of the
 545      replacement text is capitalized.  If the original text contains
 546      just one word, and that word is a capital letter, `replace-match'
 547      considers this a capitalized first word rather than all upper case.
 548
 549      If `case-replace' is `nil', then case conversion is not done,
 550      regardless of the value of FIXED-CASE.  *Note Searching and Case::.
 551
 552      If LITERAL is non-`nil', then REPLACEMENT is inserted exactly as
 553      it is, the only alterations being case changes as needed.  If it
 554      is `nil' (the default), then the character `\' is treated
 555      specially.  If a `\' appears in REPLACEMENT, then it must be part
 556      of one of the following sequences:
 557
 558     `\&'
 559           `\&' stands for the entire text being replaced.
 560
 561     `\N'
 562           `\N', where N is a digit, stands for the text that matched
 563           the Nth subexpression in the original regexp.  Subexpressions
 564           are those expressions grouped inside `\(...\)'.
 565
 566     `\\'
 567           `\\' stands for a single `\' in the replacement text.
 568
 569 \1f
 570 File: lispref.info,  Node: Entire Match Data,  Next: Saving Match Data,  Prev: Replacing Match,  Up: Match Data
 571
 572 Accessing the Entire Match Data
 573 -------------------------------
 574
 575    The functions `match-data' and `set-match-data' read or write the
 576 entire match data, all at once.
 577
 578  - Function: match-data
 579      This function returns a newly constructed list containing all the
 580      information on what text the last search matched.  Element zero is
 581      the position of the beginning of the match for the whole
 582      expression; element one is the position of the end of the match
 583      for the expression.  The next two elements are the positions of
 584      the beginning and end of the match for the first subexpression,
 585      and so on.  In general, element number 2N corresponds to
 586      `(match-beginning N)'; and element number 2N + 1 corresponds to
 587      `(match-end N)'.
 588
 589      All the elements are markers or `nil' if matching was done on a
 590      buffer, and all are integers or `nil' if matching was done on a
 591      string with `string-match'.  (In Emacs 18 and earlier versions,
 592      markers were used even for matching on a string, except in the case
 593      of the integer 0.)
 594
 595      As always, there must be no possibility of intervening searches
 596      between the call to a search function and the call to `match-data'
 597      that is intended to access the match data for that search.
 598
 599           (match-data)
 600                =>  (#<marker at 9 in foo>
 601                     #<marker at 17 in foo>
 602                     #<marker at 13 in foo>
 603                     #<marker at 17 in foo>)
 604
 605  - Function: set-match-data match-list
 606      This function sets the match data from the elements of MATCH-LIST,
 607      which should be a list that was the value of a previous call to
 608      `match-data'.
 609
 610      If MATCH-LIST refers to a buffer that doesn't exist, you don't get
 611      an error; that sets the match data in a meaningless but harmless
 612      way.
 613
 614      `store-match-data' is an alias for `set-match-data'.
 615
 616 \1f
 617 File: lispref.info,  Node: Saving Match Data,  Prev: Entire Match Data,  Up: Match Data
 618
 619 Saving and Restoring the Match Data
 620 -----------------------------------
 621
 622    When you call a function that may do a search, you may need to save
 623 and restore the match data around that call, if you want to preserve the
 624 match data from an earlier search for later use.  Here is an example
 625 that shows the problem that arises if you fail to save the match data:
 626
 627      (re-search-forward "The \\(cat \\)")
 628           => 48
 629      (foo)                   ; Perhaps `foo' does
 630                              ;   more searching.
 631      (match-end 0)
 632           => 61              ; Unexpected result--not 48!
 633
 634    You can save and restore the match data with `save-match-data':
 635
 636  - Macro: save-match-data body...
 637      This special form executes BODY, saving and restoring the match
 638      data around it.
 639
 640    You can use `set-match-data' together with `match-data' to imitate
 641 the effect of the special form `save-match-data'.  This is useful for
 642 writing code that can run in Emacs 18.  Here is how:
 643
 644      (let ((data (match-data)))
 645        (unwind-protect
 646            ...   ; May change the original match data.
 647          (set-match-data data)))
 648
 649    Emacs automatically saves and restores the match data when it runs
 650 process filter functions (*note Filter Functions::) and process
 651 sentinels (*note Sentinels::).
 652
 653 \1f
 654 File: lispref.info,  Node: Searching and Case,  Next: Standard Regexps,  Prev: Match Data,  Up: Searching and Matching
 655
 656 Searching and Case
 657 ==================
 658
 659    By default, searches in Emacs ignore the case of the text they are
 660 searching through; if you specify searching for `FOO', then `Foo' or
 661 `foo' is also considered a match.  Regexps, and in particular character
 662 sets, are included: thus, `[aB]' would match `a' or `A' or `b' or `B'.
 663
 664    If you do not want this feature, set the variable `case-fold-search'
 665 to `nil'.  Then all letters must match exactly, including case.  This
 666 is a buffer-local variable; altering the variable affects only the
 667 current buffer.  (*Note Intro to Buffer-Local::.)  Alternatively, you
 668 may change the value of `default-case-fold-search', which is the
 669 default value of `case-fold-search' for buffers that do not override it.
 670
 671    Note that the user-level incremental search feature handles case
 672 distinctions differently.  When given a lower case letter, it looks for
 673 a match of either case, but when given an upper case letter, it looks
 674 for an upper case letter only.  But this has nothing to do with the
 675 searching functions Lisp functions use.
 676
 677  - User Option: case-replace
 678      This variable determines whether the replacement functions should
 679      preserve case.  If the variable is `nil', that means to use the
 680      replacement text verbatim.  A non-`nil' value means to convert the
 681      case of the replacement text according to the text being replaced.
 682
 683      The function `replace-match' is where this variable actually has
 684      its effect.  *Note Replacing Match::.
 685
 686  - User Option: case-fold-search
 687      This buffer-local variable determines whether searches should
 688      ignore case.  If the variable is `nil' they do not ignore case;
 689      otherwise they do ignore case.
 690
 691  - Variable: default-case-fold-search
 692      The value of this variable is the default value for
 693      `case-fold-search' in buffers that do not override it.  This is the
 694      same as `(default-value 'case-fold-search)'.
 695
 696 \1f
 697 File: lispref.info,  Node: Standard Regexps,  Prev: Searching and Case,  Up: Searching and Matching
 698
 699 Standard Regular Expressions Used in Editing
 700 ============================================
 701
 702    This section describes some variables that hold regular expressions
 703 used for certain purposes in editing:
 704
 705  - Variable: page-delimiter
 706      This is the regexp describing line-beginnings that separate pages.
 707      The default value is `"^\014"' (i.e., `"^^L"' or `"^\C-l"'); this
 708      matches a line that starts with a formfeed character.
 709
 710    The following two regular expressions should _not_ assume the match
 711 always starts at the beginning of a line; they should not use `^' to
 712 anchor the match.  Most often, the paragraph commands do check for a
 713 match only at the beginning of a line, which means that `^' would be
 714 superfluous.  When there is a nonzero left margin, they accept matches
 715 that start after the left margin.  In that case, a `^' would be
 716 incorrect.  However, a `^' is harmless in modes where a left margin is
 717 never used.
 718
 719  - Variable: paragraph-separate
 720      This is the regular expression for recognizing the beginning of a
 721      line that separates paragraphs.  (If you change this, you may have
 722      to change `paragraph-start' also.)  The default value is
 723      `"[ \t\f]*$"', which matches a line that consists entirely of
 724      spaces, tabs, and form feeds (after its left margin).
 725
 726  - Variable: paragraph-start
 727      This is the regular expression for recognizing the beginning of a
 728      line that starts _or_ separates paragraphs.  The default value is
 729      `"[ \t\n\f]"', which matches a line starting with a space, tab,
 730      newline, or form feed (after its left margin).
 731
 732  - Variable: sentence-end
 733      This is the regular expression describing the end of a sentence.
 734      (All paragraph boundaries also end sentences, regardless.)  The
 735      default value is:
 736
 737           "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
 738
 739      This means a period, question mark or exclamation mark, followed
 740      optionally by a closing parenthetical character, followed by tabs,
 741      spaces or new lines.
 742
 743      For a detailed explanation of this regular expression, see *Note
 744      Regexp Example::.
 745
 746 \1f
 747 File: lispref.info,  Node: Syntax Tables,  Next: Abbrevs,  Prev: Searching and Matching,  Up: Top
 748
 749 Syntax Tables
 750 *************
 751
 752    A "syntax table" specifies the syntactic textual function of each
 753 character.  This information is used by the parsing commands, the
 754 complex movement commands, and others to determine where words, symbols,
 755 and other syntactic constructs begin and end.  The current syntax table
 756 controls the meaning of the word motion functions (*note Word Motion::)
 757 and the list motion functions (*note List Motion::) as well as the
 758 functions in this chapter.
 759
 760 * Menu:
 761
 762 * Basics: Syntax Basics.     Basic concepts of syntax tables.
 763 * Desc: Syntax Descriptors.  How characters are classified.
 764 * Syntax Table Functions::   How to create, examine and alter syntax tables.
 765 * Motion and Syntax::        Moving over characters with certain syntaxes.
 766 * Parsing Expressions::      Parsing balanced expressions
 767                                 using the syntax table.
 768 * Standard Syntax Tables::   Syntax tables used by various major modes.
 769 * Syntax Table Internals::   How syntax table information is stored.
 770
 771 \1f
 772 File: lispref.info,  Node: Syntax Basics,  Next: Syntax Descriptors,  Up: Syntax Tables
 773
 774 Syntax Table Concepts
 775 =====================
 776
 777    A "syntax table" provides Emacs with the information that determines
 778 the syntactic use of each character in a buffer.  This information is
 779 used by the parsing commands, the complex movement commands, and others
 780 to determine where words, symbols, and other syntactic constructs begin
 781 and end.  The current syntax table controls the meaning of the word
 782 motion functions (*note Word Motion::) and the list motion functions
 783 (*note List Motion::) as well as the functions in this chapter.
 784
 785    Under XEmacs 20, a syntax table is a particular subtype of the
 786 primitive char table type (*note Char Tables::), and each element of the
 787 char table is an integer that encodes the syntax of the character in
 788 question, or a cons of such an integer and a matching character (for
 789 characters with parenthesis syntax).
 790
 791    Under XEmacs 19, a syntax table is a vector of 256 elements; it
 792 contains one entry for each of the 256 possible characters in an 8-bit
 793 byte.  Each element is an integer that encodes the syntax of the
 794 character in question. (The matching character, if any, is embedded in
 795 the bits of this integer.)
 796
 797    Syntax tables are used only for moving across text, not for the Emacs
 798 Lisp reader.  XEmacs Lisp uses built-in syntactic rules when reading
 799 Lisp expressions, and these rules cannot be changed.
 800
 801    Each buffer has its own major mode, and each major mode has its own
 802 idea of the syntactic class of various characters.  For example, in Lisp
 803 mode, the character `;' begins a comment, but in C mode, it terminates
 804 a statement.  To support these variations, XEmacs makes the choice of
 805 syntax table local to each buffer.  Typically, each major mode has its
 806 own syntax table and installs that table in each buffer that uses that
 807 mode.  Changing this table alters the syntax in all those buffers as
 808 well as in any buffers subsequently put in that mode.  Occasionally
 809 several similar modes share one syntax table.  *Note Example Major
 810 Modes::, for an example of how to set up a syntax table.
 811
 812    A syntax table can inherit the data for some characters from the
 813 standard syntax table, while specifying other characters itself.  The
 814 "inherit" syntax class means "inherit this character's syntax from the
 815 standard syntax table."  Most major modes' syntax tables inherit the
 816 syntax of character codes 0 through 31 and 128 through 255.  This is
 817 useful with character sets such as ISO Latin-1 that have additional
 818 alphabetic characters in the range 128 to 255.  Just changing the
 819 standard syntax for these characters affects all major modes.
 820
 821  - Function: syntax-table-p object
 822      This function returns `t' if OBJECT is a vector of length 256
 823      elements.  This means that the vector may be a syntax table.
 824      However, according to this test, any vector of length 256 is
 825      considered to be a syntax table, no matter what its contents.
 826
 827 \1f
 828 File: lispref.info,  Node: Syntax Descriptors,  Next: Syntax Table Functions,  Prev: Syntax Basics,  Up: Syntax Tables
 829
 830 Syntax Descriptors
 831 ==================
 832
 833    This section describes the syntax classes and flags that denote the
 834 syntax of a character, and how they are represented as a "syntax
 835 descriptor", which is a Lisp string that you pass to
 836 `modify-syntax-entry' to specify the desired syntax.
 837
 838    XEmacs defines a number of "syntax classes".  Each syntax table puts
 839 each character into one class.  There is no necessary relationship
 840 between the class of a character in one syntax table and its class in
 841 any other table.
 842
 843    Each class is designated by a mnemonic character, which serves as the
 844 name of the class when you need to specify a class.  Usually the
 845 designator character is one that is frequently in that class; however,
 846 its meaning as a designator is unvarying and independent of what syntax
 847 that character currently has.
 848
 849    A syntax descriptor is a Lisp string that specifies a syntax class, a
 850 matching character (used only for the parenthesis classes) and flags.
 851 The first character is the designator for a syntax class.  The second
 852 character is the character to match; if it is unused, put a space there.
 853 Then come the characters for any desired flags.  If no matching
 854 character or flags are needed, one character is sufficient.
 855
 856    For example, the descriptor for the character `*' in C mode is
 857 `. 23' (i.e., punctuation, matching character slot unused, second
 858 character of a comment-starter, first character of an comment-ender),
 859 and the entry for `/' is `. 14' (i.e., punctuation, matching character
 860 slot unused, first character of a comment-starter, second character of
 861 a comment-ender).
 862
 863 * Menu:
 864
 865 * Syntax Class Table::      Table of syntax classes.
 866 * Syntax Flags::            Additional flags each character can have.
 867
 868 \1f
 869 File: lispref.info,  Node: Syntax Class Table,  Next: Syntax Flags,  Up: Syntax Descriptors
 870
 871 Table of Syntax Classes
 872 -----------------------
 873
 874    Here is a table of syntax classes, the characters that stand for
 875 them, their meanings, and examples of their use.
 876
 877  - Syntax class: whitespace character
 878      "Whitespace characters" (designated with ` ' or `-') separate
 879      symbols and words from each other.  Typically, whitespace
 880      characters have no other syntactic significance, and multiple
 881      whitespace characters are syntactically equivalent to a single
 882      one.  Space, tab, newline and formfeed are almost always
 883      classified as whitespace.
 884
 885  - Syntax class: word constituent
 886      "Word constituents" (designated with `w') are parts of normal
 887      English words and are typically used in variable and command names
 888      in programs.  All upper- and lower-case letters, and the digits,
 889      are typically word constituents.
 890
 891  - Syntax class: symbol constituent
 892      "Symbol constituents" (designated with `_') are the extra
 893      characters that are used in variable and command names along with
 894      word constituents.  For example, the symbol constituents class is
 895      used in Lisp mode to indicate that certain characters may be part
 896      of symbol names even though they are not part of English words.
 897      These characters are `$&*+-_<>'.  In standard C, the only
 898      non-word-constituent character that is valid in symbols is
 899      underscore (`_').
 900
 901  - Syntax class: punctuation character
 902      "Punctuation characters" (`.') are those characters that are used
 903      as punctuation in English, or are used in some way in a programming
 904      language to separate symbols from one another.  Most programming
 905      language modes, including Emacs Lisp mode, have no characters in
 906      this class since the few characters that are not symbol or word
 907      constituents all have other uses.
 908
 909  - Syntax class: open parenthesis character
 910  - Syntax class: close parenthesis character
 911      Open and close "parenthesis characters" are characters used in
 912      dissimilar pairs to surround sentences or expressions.  Such a
 913      grouping is begun with an open parenthesis character and
 914      terminated with a close.  Each open parenthesis character matches
 915      a particular close parenthesis character, and vice versa.
 916      Normally, XEmacs indicates momentarily the matching open
 917      parenthesis when you insert a close parenthesis.  *Note Blinking::.
 918
 919      The class of open parentheses is designated with `(', and that of
 920      close parentheses with `)'.
 921
 922      In English text, and in C code, the parenthesis pairs are `()',
 923      `[]', and `{}'.  In XEmacs Lisp, the delimiters for lists and
 924      vectors (`()' and `[]') are classified as parenthesis characters.
 925
 926  - Syntax class: string quote
 927      "String quote characters" (designated with `"') are used in many
 928      languages, including Lisp and C, to delimit string constants.  The
 929      same string quote character appears at the beginning and the end
 930      of a string.  Such quoted strings do not nest.
 931
 932      The parsing facilities of XEmacs consider a string as a single
 933      token.  The usual syntactic meanings of the characters in the
 934      string are suppressed.
 935
 936      The Lisp modes have two string quote characters: double-quote (`"')
 937      and vertical bar (`|').  `|' is not used in XEmacs Lisp, but it is
 938      used in Common Lisp.  C also has two string quote characters:
 939      double-quote for strings, and single-quote (`'') for character
 940      constants.
 941
 942      English text has no string quote characters because English is not
 943      a programming language.  Although quotation marks are used in
 944      English, we do not want them to turn off the usual syntactic
 945      properties of other characters in the quotation.
 946
 947  - Syntax class: escape
 948      An "escape character" (designated with `\') starts an escape
 949      sequence such as is used in C string and character constants.  The
 950      character `\' belongs to this class in both C and Lisp.  (In C, it
 951      is used thus only inside strings, but it turns out to cause no
 952      trouble to treat it this way throughout C code.)
 953
 954      Characters in this class count as part of words if
 955      `words-include-escapes' is non-`nil'.  *Note Word Motion::.
 956
 957  - Syntax class: character quote
 958      A "character quote character" (designated with `/') quotes the
 959      following character so that it loses its normal syntactic meaning.
 960      This differs from an escape character in that only the character
 961      immediately following is ever affected.
 962
 963      Characters in this class count as part of words if
 964      `words-include-escapes' is non-`nil'.  *Note Word Motion::.
 965
 966      This class is used for backslash in TeX mode.
 967
 968  - Syntax class: paired delimiter
 969      "Paired delimiter characters" (designated with `$') are like
 970      string quote characters except that the syntactic properties of the
 971      characters between the delimiters are not suppressed.  Only TeX
 972      mode uses a paired delimiter presently--the `$' that both enters
 973      and leaves math mode.
 974
 975  - Syntax class: expression prefix
 976      An "expression prefix operator" (designated with `'') is used for
 977      syntactic operators that are part of an expression if they appear
 978      next to one.  These characters in Lisp include the apostrophe, `''
 979      (used for quoting), the comma, `,' (used in macros), and `#' (used
 980      in the read syntax for certain data types).
 981
 982  - Syntax class: comment starter
 983  - Syntax class: comment ender
 984      The "comment starter" and "comment ender" characters are used in
 985      various languages to delimit comments.  These classes are
 986      designated with `<' and `>', respectively.
 987
 988      English text has no comment characters.  In Lisp, the semicolon
 989      (`;') starts a comment and a newline or formfeed ends one.
 990
 991  - Syntax class: inherit
 992      This syntax class does not specify a syntax.  It says to look in
 993      the standard syntax table to find the syntax of this character.
 994      The designator for this syntax code is `@'.
 995
 996 \1f
 997 File: lispref.info,  Node: Syntax Flags,  Prev: Syntax Class Table,  Up: Syntax Descriptors
 998
 999 Syntax Flags
1000 ------------
1001
1002    In addition to the classes, entries for characters in a syntax table
1003 can include flags.  There are six possible flags, represented by the
1004 characters `1', `2', `3', `4', `b' and `p'.
1005
1006    All the flags except `p' are used to describe multi-character
1007 comment delimiters.  The digit flags indicate that a character can
1008 _also_ be part of a comment sequence, in addition to the syntactic
1009 properties associated with its character class.  The flags are
1010 independent of the class and each other for the sake of characters such
1011 as `*' in C mode, which is a punctuation character, _and_ the second
1012 character of a start-of-comment sequence (`/*'), _and_ the first
1013 character of an end-of-comment sequence (`*/').
1014
1015    The flags for a character C are:
1016
1017    * `1' means C is the start of a two-character comment-start sequence.
1018
1019    * `2' means C is the second character of such a sequence.
1020
1021    * `3' means C is the start of a two-character comment-end sequence.
1022
1023    * `4' means C is the second character of such a sequence.
1024
1025    * `b' means that C as a comment delimiter belongs to the alternative
1026      "b" comment style.
1027
1028      Emacs supports two comment styles simultaneously in any one syntax
1029      table.  This is for the sake of C++.  Each style of comment syntax
1030      has its own comment-start sequence and its own comment-end
1031      sequence.  Each comment must stick to one style or the other;
1032      thus, if it starts with the comment-start sequence of style "b",
1033      it must also end with the comment-end sequence of style "b".
1034
1035      The two comment-start sequences must begin with the same
1036      character; only the second character may differ.  Mark the second
1037      character of the "b"-style comment-start sequence with the `b'
1038      flag.
1039
1040      A comment-end sequence (one or two characters) applies to the "b"
1041      style if its first character has the `b' flag set; otherwise, it
1042      applies to the "a" style.
1043
1044      The appropriate comment syntax settings for C++ are as follows:
1045
1046     `/'
1047           `124b'
1048
1049     `*'
1050           `23'
1051
1052     newline
1053           `>b'
1054
1055      This defines four comment-delimiting sequences:
1056
1057     `/*'
1058           This is a comment-start sequence for "a" style because the
1059           second character, `*', does not have the `b' flag.
1060
1061     `//'
1062           This is a comment-start sequence for "b" style because the
1063           second character, `/', does have the `b' flag.
1064
1065     `*/'
1066           This is a comment-end sequence for "a" style because the first
1067           character, `*', does not have the `b' flag
1068
1069     newline
1070           This is a comment-end sequence for "b" style, because the
1071           newline character has the `b' flag.
1072
1073    * `p' identifies an additional "prefix character" for Lisp syntax.
1074      These characters are treated as whitespace when they appear between
1075      expressions.  When they appear within an expression, they are
1076      handled according to their usual syntax codes.
1077
1078      The function `backward-prefix-chars' moves back over these
1079      characters, as well as over characters whose primary syntax class
1080      is prefix (`'').  *Note Motion and Syntax::.
1081
1082 \1f
1083 File: lispref.info,  Node: Syntax Table Functions,  Next: Motion and Syntax,  Prev: Syntax Descriptors,  Up: Syntax Tables
1084
1085 Syntax Table Functions
1086 ======================
1087
1088    In this section we describe functions for creating, accessing and
1089 altering syntax tables.
1090
1091  - Function: make-syntax-table &optional table
1092      This function creates a new syntax table.  Character codes 0
1093      through 31 and 128 through 255 are set up to inherit from the
1094      standard syntax table.  The other character codes are set up by
1095      copying what the standard syntax table says about them.
1096
1097      Most major mode syntax tables are created in this way.
1098
1099  - Function: copy-syntax-table &optional table
1100      This function constructs a copy of TABLE and returns it.  If TABLE
1101      is not supplied (or is `nil'), it returns a copy of the current
1102      syntax table.  Otherwise, an error is signaled if TABLE is not a
1103      syntax table.
1104
1105  - Command: modify-syntax-entry char syntax-descriptor &optional table
1106      This function sets the syntax entry for CHAR according to
1107      SYNTAX-DESCRIPTOR.  The syntax is changed only for TABLE, which
1108      defaults to the current buffer's syntax table, and not in any
1109      other syntax table.  The argument SYNTAX-DESCRIPTOR specifies the
1110      desired syntax; this is a string beginning with a class designator
1111      character, and optionally containing a matching character and
1112      flags as well.  *Note Syntax Descriptors::.
1113
1114      This function always returns `nil'.  The old syntax information in
1115      the table for this character is discarded.
1116
1117      An error is signaled if the first character of the syntax
1118      descriptor is not one of the twelve syntax class designator
1119      characters.  An error is also signaled if CHAR is not a character.
1120
1121      Examples:
1122
1123           ;; Put the space character in class whitespace.
1124           (modify-syntax-entry ?\  " ")
1125                => nil
1126
1127           ;; Make `$' an open parenthesis character,
1128           ;;   with `^' as its matching close.
1129           (modify-syntax-entry ?$ "(^")
1130                => nil
1131
1132           ;; Make `^' a close parenthesis character,
1133           ;;   with `$' as its matching open.
1134           (modify-syntax-entry ?^ ")$")
1135                => nil
1136
1137           ;; Make `/' a punctuation character,
1138           ;;   the first character of a start-comment sequence,
1139           ;;   and the second character of an end-comment sequence.
1140           ;;   This is used in C mode.
1141           (modify-syntax-entry ?/ ". 14")
1142                => nil
1143
1144  - Function: char-syntax character
1145      This function returns the syntax class of CHARACTER, represented
1146      by its mnemonic designator character.  This _only_ returns the
1147      class, not any matching parenthesis or flags.
1148
1149      An error is signaled if CHAR is not a character.
1150
1151      The following examples apply to C mode.  The first example shows
1152      that the syntax class of space is whitespace (represented by a
1153      space).  The second example shows that the syntax of `/' is
1154      punctuation.  This does not show the fact that it is also part of
1155      comment-start and -end sequences.  The third example shows that
1156      open parenthesis is in the class of open parentheses.  This does
1157      not show the fact that it has a matching character, `)'.
1158
1159           (char-to-string (char-syntax ?\ ))
1160                => " "
1161
1162           (char-to-string (char-syntax ?/))
1163                => "."
1164
1165           (char-to-string (char-syntax ?\())
1166                => "("
1167
1168  - Function: set-syntax-table table &optional buffer
1169      This function makes TABLE the syntax table for BUFFER, which
1170      defaults to the current buffer if omitted.  It returns TABLE.
1171
1172  - Function: syntax-table &optional buffer
1173      This function returns the syntax table for BUFFER, which defaults
1174      to the current buffer if omitted.
1175
1176 \1f
1177 File: lispref.info,  Node: Motion and Syntax,  Next: Parsing Expressions,  Prev: Syntax Table Functions,  Up: Syntax Tables
1178
1179 Motion and Syntax
1180 =================
1181
1182    This section describes functions for moving across characters in
1183 certain syntax classes.  None of these functions exists in Emacs
1184 version 18 or earlier.
1185
1186  - Function: skip-syntax-forward syntaxes &optional limit buffer
1187      This function moves point forward across characters having syntax
1188      classes mentioned in SYNTAXES.  It stops when it encounters the
1189      end of the buffer, or position LIMIT (if specified), or a
1190      character it is not supposed to skip.  Optional argument BUFFER
1191      defaults to the current buffer if omitted.
1192
1193  - Function: skip-syntax-backward syntaxes &optional limit buffer
1194      This function moves point backward across characters whose syntax
1195      classes are mentioned in SYNTAXES.  It stops when it encounters
1196      the beginning of the buffer, or position LIMIT (if specified), or a
1197      character it is not supposed to skip.  Optional argument BUFFER
1198      defaults to the current buffer if omitted.
1199
1200
1201  - Function: backward-prefix-chars &optional buffer
1202      This function moves point backward over any number of characters
1203      with expression prefix syntax.  This includes both characters in
1204      the expression prefix syntax class, and characters with the `p'
1205      flag.  Optional argument BUFFER defaults to the current buffer if
1206      omitted.
1207