1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
4 INFO-DIR-SECTION XEmacs Editor
6 * Lispref: (lispref). XEmacs Lisp Reference Manual.
11 GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
21 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
22 Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc.
23 Copyright (C) 1995, 1996 Ben Wing.
25 Permission is granted to make and distribute verbatim copies of this
26 manual provided the copyright notice and this permission notice are
27 preserved on all copies.
29 Permission is granted to copy and distribute modified versions of
30 this manual under the conditions for verbatim copying, provided that the
31 entire resulting derived work is distributed under the terms of a
32 permission notice identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that this permission notice may be stated in a
37 translation approved by the Foundation.
39 Permission is granted to copy and distribute modified versions of
40 this manual under the conditions for verbatim copying, provided also
41 that the section entitled "GNU General Public License" is included
42 exactly as in the original, and provided that the entire resulting
43 derived work is distributed under the terms of a permission notice
44 identical to this one.
46 Permission is granted to copy and distribute translations of this
47 manual into another language, under the above conditions for modified
48 versions, except that the section entitled "GNU General Public License"
49 may be included in a translation approved by the Free Software
50 Foundation instead of in the original English.
53 File: lispref.info, Node: Change Hooks, Next: Transformations, Prev: Transposition, Up: Text
58 These hook variables let you arrange to take notice of all changes in
59 all buffers (or in a particular buffer, if you make them buffer-local).
61 The functions you use in these hooks should save and restore the
62 match data if they do anything that uses regular expressions;
63 otherwise, they will interfere in bizarre ways with the editing
64 operations that call them.
66 Buffer changes made while executing the following hooks don't
67 themselves cause any change hooks to be invoked.
69 - Variable: before-change-functions
70 This variable holds a list of a functions to call before any buffer
71 modification. Each function gets two arguments, the beginning and
72 end of the region that is about to change, represented as
73 integers. The buffer that is about to change is always the
76 - Variable: after-change-functions
77 This variable holds a list of a functions to call after any buffer
78 modification. Each function receives three arguments: the
79 beginning and end of the region just changed, and the length of
80 the text that existed before the change. (To get the current
81 length, subtract the region beginning from the region end.) All
82 three arguments are integers. The buffer that's about to change
83 is always the current buffer.
85 - Variable: before-change-function
86 This obsolete variable holds one function to call before any buffer
87 modification (or `nil' for no function). It is called just like
88 the functions in `before-change-functions'.
90 - Variable: after-change-function
91 This obsolete variable holds one function to call after any buffer
92 modification (or `nil' for no function). It is called just like
93 the functions in `after-change-functions'.
95 - Variable: first-change-hook
96 This variable is a normal hook that is run whenever a buffer is
97 changed that was previously in the unmodified state.
100 File: lispref.info, Node: Transformations, Prev: Change Hooks, Up: Text
102 Textual transformations--MD5 and base64 support
103 ===============================================
105 Some textual operations inherently require examining each character
106 in turn, and performing arithmetic operations on them. Such operations
107 can, of course, be implemented in Emacs Lisp, but tend to be very slow
108 for large portions of text or data. This is why some of them are
109 implemented in C, with an appropriate interface for Lisp programmers.
110 Examples of algorithms thus provided are MD5 and base64 support.
112 MD5 is an algorithm for calculating message digests, as described in
113 rfc1321. Given a message of arbitrary length, MD5 produces an 128-bit
114 "fingerprint" ("message digest") corresponding to that message. It is
115 considered computationally infeasible to produce two messages having
116 the same MD5 digest, or to produce a message having a prespecified
117 target digest. MD5 is used heavily by various authentication schemes.
119 Emacs Lisp interface to MD5 consists of a single function `md5':
121 - Function: md5 object &optional start end
122 This function returns the MD5 message digest of OBJECT, a buffer
125 Optional arguments START and END denote positions for computing
126 the digest of a portion of OBJECT.
128 Some examples of usage:
130 ;; Calculate the digest of the entire buffer
131 (md5 (current-buffer))
132 => "8842b04362899b1cda8d2d126dc11712"
134 ;; Calculate the digest of the current line
135 (md5 (current-buffer) (point-at-bol) (point-at-eol))
136 => "60614d21e9dee27dfdb01fa4e30d6d00"
138 ;; Calculate the digest of your name and email address
139 (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
140 => "0a2188c40fd38922d941fe6032fce516"
142 Base64 is a portable encoding for arbitrary sequences of octets, in a
143 form that need not be readable by humans. It uses a 65-character subset
144 of US-ASCII, as described in rfc2045. Base64 is used by MIME to encode
145 binary bodies, and to encode binary characters in message headers.
147 The Lisp interface to base64 consists of four functions:
149 - Function: base64-encode-region beg end &optional no-line-break
150 This function encodes the region between BEG and END of the
151 current buffer to base64 format. This means that the original
152 region is deleted, and replaced with its base64 equivalent.
154 Normally, encoded base64 output is multi-line, with 76-character
155 lines. If NO-LINE-BREAK is non-`nil', newlines will not be
156 inserted, resulting in single-line output.
158 Mule note: you should make sure that you convert the multibyte
159 characters (those that do not fit into 0-255 range) to something
160 else, because they cannot be meaningfully converted to base64. If
161 the `base64-encode-region' encounters such characters, it will
164 `base64-encode-region' returns the length of the encoded text.
166 ;; Encode the whole buffer in base64
167 (base64-encode-region (point-min) (point-max))
169 The function can also be used interactively, in which case it
170 works on the currently active region.
172 - Function: base64-encode-string string
173 This function encodes STRING to base64, and returns the encoded
176 For Mule, the same considerations apply as for
177 `base64-encode-region'.
179 (base64-encode-string "fubar")
182 - Function: base64-decode-region beg end
183 This function decodes the region between BEG and END of the
184 current buffer. The region should be in base64 encoding.
186 If the region was decoded correctly, `base64-decode-region' returns
187 the length of the decoded region. If the decoding failed, `nil' is
190 ;; Decode a base64 buffer, and replace it with the decoded version
191 (base64-decode-region (point-min) (point-max))
193 - Function: base64-decode-string string
194 This function decodes STRING to base64, and returns the decoded
195 string. STRING should be valid base64-encoded text.
197 If encoding was not possible, `nil' is returned.
199 (base64-decode-string "ZnViYXI=")
202 (base64-decode-string "totally bogus")
206 File: lispref.info, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top
208 Searching and Matching
209 **********************
211 XEmacs provides two ways to search through a buffer for specified
212 text: exact string searches and regular expression searches. After a
213 regular expression search, you can examine the "match data" to
214 determine which text matched the whole regular expression or various
219 * String Search:: Search for an exact match.
220 * Regular Expressions:: Describing classes of strings.
221 * Regexp Search:: Searching for a match for a regexp.
222 * POSIX Regexps:: Searching POSIX-style for the longest match.
223 * Search and Replace:: Internals of `query-replace'.
224 * Match Data:: Finding out which part of the text matched
225 various parts of a regexp, after regexp search.
226 * Searching and Case:: Case-independent or case-significant searching.
227 * Standard Regexps:: Useful regexps for finding sentences, pages,...
229 The `skip-chars...' functions also perform a kind of searching.
230 *Note Skipping Characters::.
233 File: lispref.info, Node: String Search, Next: Regular Expressions, Up: Searching and Matching
235 Searching for Strings
236 =====================
238 These are the primitive functions for searching through the text in a
239 buffer. They are meant for use in programs, but you may call them
240 interactively. If you do so, they prompt for the search string; LIMIT
241 and NOERROR are set to `nil', and REPEAT is set to 1.
243 - Command: search-forward string &optional limit noerror repeat
244 This function searches forward from point for an exact match for
245 STRING. If successful, it sets point to the end of the occurrence
246 found, and returns the new value of point. If no match is found,
247 the value and side effects depend on NOERROR (see below).
249 In the following example, point is initially at the beginning of
250 the line. Then `(search-forward "fox")' moves point after the last
253 ---------- Buffer: foo ----------
254 -!-The quick brown fox jumped over the lazy dog.
255 ---------- Buffer: foo ----------
257 (search-forward "fox")
260 ---------- Buffer: foo ----------
261 The quick brown fox-!- jumped over the lazy dog.
262 ---------- Buffer: foo ----------
264 The argument LIMIT specifies the upper bound to the search. (It
265 must be a position in the current buffer.) No match extending
266 after that position is accepted. If LIMIT is omitted or `nil', it
267 defaults to the end of the accessible portion of the buffer.
269 What happens when the search fails depends on the value of
270 NOERROR. If NOERROR is `nil', a `search-failed' error is
271 signaled. If NOERROR is `t', `search-forward' returns `nil' and
272 does nothing. If NOERROR is neither `nil' nor `t', then
273 `search-forward' moves point to the upper bound and returns `nil'.
274 (It would be more consistent now to return the new position of
275 point in that case, but some programs may depend on a value of
278 If REPEAT is supplied (it must be a positive number), then the
279 search is repeated that many times (each time starting at the end
280 of the previous time's match). If these successive searches
281 succeed, the function succeeds, moving point and returning its new
282 value. Otherwise the search fails.
284 - Command: search-backward string &optional limit noerror repeat
285 This function searches backward from point for STRING. It is just
286 like `search-forward' except that it searches backwards and leaves
287 point at the beginning of the match.
289 - Command: word-search-forward string &optional limit noerror repeat
290 This function searches forward from point for a "word" match for
291 STRING. If it finds a match, it sets point to the end of the
292 match found, and returns the new value of point.
294 Word matching regards STRING as a sequence of words, disregarding
295 punctuation that separates them. It searches the buffer for the
296 same sequence of words. Each word must be distinct in the buffer
297 (searching for the word `ball' does not match the word `balls'),
298 but the details of punctuation and spacing are ignored (searching
299 for `ball boy' does match `ball. Boy!').
301 In this example, point is initially at the beginning of the
302 buffer; the search leaves it between the `y' and the `!'.
304 ---------- Buffer: foo ----------
305 -!-He said "Please! Find
307 ---------- Buffer: foo ----------
309 (word-search-forward "Please find the ball, boy.")
312 ---------- Buffer: foo ----------
313 He said "Please! Find
315 ---------- Buffer: foo ----------
317 If LIMIT is non-`nil' (it must be a position in the current
318 buffer), then it is the upper bound to the search. The match
319 found must not extend after that position.
321 If NOERROR is `nil', then `word-search-forward' signals an error
322 if the search fails. If NOERROR is `t', then it returns `nil'
323 instead of signaling an error. If NOERROR is neither `nil' nor
324 `t', it moves point to LIMIT (or the end of the buffer) and
327 If REPEAT is non-`nil', then the search is repeated that many
328 times. Point is positioned at the end of the last match.
330 - Command: word-search-backward string &optional limit noerror repeat
331 This function searches backward from point for a word match to
332 STRING. This function is just like `word-search-forward' except
333 that it searches backward and normally leaves point at the
334 beginning of the match.
337 File: lispref.info, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching
342 A "regular expression" ("regexp", for short) is a pattern that
343 denotes a (possibly infinite) set of strings. Searching for matches for
344 a regexp is a very powerful operation. This section explains how to
345 write regexps; the following section says how to search for them.
347 To gain a thorough understanding of regular expressions and how to
348 use them to best advantage, we recommend that you study `Mastering
349 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
350 1997'. (It's known as the "Hip Owls" book, because of the picture on its
351 cover.) You might also read the manuals to *Note (gawk)Top::, *Note
352 (ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
353 (rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
356 The XEmacs regular expression syntax most closely resembles that of
357 `ed', or `grep', the GNU versions of which all utilize the GNU `regex'
358 library. XEmacs' version of `regex' has recently been extended with
359 some Perl-like capabilities, described in the next section.
363 * Syntax of Regexps:: Rules for writing regular expressions.
364 * Regexp Example:: Illustrates regular expression syntax.
367 File: lispref.info, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions
369 Syntax of Regular Expressions
370 -----------------------------
372 Regular expressions have a syntax in which a few characters are
373 special constructs and the rest are "ordinary". An ordinary character
374 is a simple regular expression that matches that character and nothing
375 else. The special characters are `.', `*', `+', `?', `[', `]', `^',
376 `$', and `\'; no new special characters will be defined in the future.
377 Any other character appearing in a regular expression is ordinary,
378 unless a `\' precedes it.
380 For example, `f' is not a special character, so it is ordinary, and
381 therefore `f' is a regular expression that matches the string `f' and
382 no other string. (It does _not_ match the string `ff'.) Likewise, `o'
383 is a regular expression that matches only `o'.
385 Any two regular expressions A and B can be concatenated. The result
386 is a regular expression that matches a string if A matches some amount
387 of the beginning of that string and B matches the rest of the string.
389 As a simple example, we can concatenate the regular expressions `f'
390 and `o' to get the regular expression `fo', which matches only the
391 string `fo'. Still trivial. To do something more powerful, you need
392 to use one of the special characters. Here is a list of them:
395 is a special character that matches any single character except a
396 newline. Using concatenation, we can make regular expressions
397 like `a.b', which matches any three-character string that begins
398 with `a' and ends with `b'.
401 is not a construct by itself; it is a quantifying suffix operator
402 that means to repeat the preceding regular expression as many
403 times as possible. In `fo*', the `*' applies to the `o', so `fo*'
404 matches one `f' followed by any number of `o's. The case of zero
405 `o's is allowed: `fo*' does match `f'.
407 `*' always applies to the _smallest_ possible preceding
408 expression. Thus, `fo*' has a repeating `o', not a repeating `fo'.
410 The matcher processes a `*' construct by matching, immediately, as
411 many repetitions as can be found; it is "greedy". Then it
412 continues with the rest of the pattern. If that fails,
413 backtracking occurs, discarding some of the matches of the
414 `*'-modified construct in case that makes it possible to match the
415 rest of the pattern. For example, in matching `ca*ar' against the
416 string `caaar', the `a*' first tries to match all three `a's; but
417 the rest of the pattern is `ar' and there is only `r' left to
418 match, so this try fails. The next alternative is for `a*' to
419 match only two `a's. With this choice, the rest of the regexp
420 matches successfully.
422 Nested repetition operators can be extremely slow if they specify
423 backtracking loops. For example, it could take hours for the
424 regular expression `\(x+y*\)*a' to match the sequence
425 `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'. The slowness is because
426 Emacs must try each imaginable way of grouping the 35 `x''s before
427 concluding that none of them can work. To make sure your regular
428 expressions run fast, check nested repetitions carefully.
431 is a quantifying suffix operator similar to `*' except that the
432 preceding expression must match at least once. It is also
433 "greedy". So, for example, `ca+r' matches the strings `car' and
434 `caaaar' but not the string `cr', whereas `ca*r' matches all three
438 is a quantifying suffix operator similar to `*', except that the
439 preceding expression can match either once or not at all. For
440 example, `ca?r' matches `car' or `cr', but does not match anything
444 works just like `*', except that rather than matching the longest
445 match, it matches the shortest match. `*?' is known as a
446 "non-greedy" quantifier, a regexp construct borrowed from Perl.
448 This construct very useful for when you want to match the text
449 inside a pair of delimiters. For instance, `/\*.*?\*/' will match
450 C comments in a string. This could not be achieved without the
451 use of greedy quantifier.
453 This construct has not been available prior to XEmacs 20.4. It is
454 not available in FSF Emacs.
457 is the `+' analog to `*?'.
460 serves as an interval quantifier, analogous to `*' or `+', but
461 specifies that the expression must match at least N times, but no
462 more than M times. This syntax is supported by most Unix regexp
463 utilities, and has been introduced to XEmacs for the version 20.3.
466 `[' begins a "character set", which is terminated by a `]'. In
467 the simplest case, the characters between the two brackets form
468 the set. Thus, `[ad]' matches either one `a' or one `d', and
469 `[ad]*' matches any string composed of just `a's and `d's
470 (including the empty string), from which it follows that `c[ad]*r'
471 matches `cr', `car', `cdr', `caddaar', etc.
473 The usual regular expression special characters are not special
474 inside a character set. A completely different set of special
475 characters exists inside character sets: `]', `-' and `^'.
477 `-' is used for ranges of characters. To write a range, write two
478 characters with a `-' between them. Thus, `[a-z]' matches any
479 lower case letter. Ranges may be intermixed freely with individual
480 characters, as in `[a-z$%.]', which matches any lower case letter
481 or `$', `%', or a period.
483 To include a `]' in a character set, make it the first character.
484 For example, `[]a]' matches `]' or `a'. To include a `-', write
485 `-' as the first character in the set, or put it immediately after
486 a range. (You can replace one individual character C with the
487 range `C-C' to make a place to put the `-'.) There is no way to
488 write a set containing just `-' and `]'.
490 To include `^' in a set, put it anywhere but at the beginning of
494 `[^' begins a "complement character set", which matches any
495 character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
496 all characters _except_ letters and digits.
498 `^' is not special in a character set unless it is the first
499 character. The character following the `^' is treated as if it
500 were first (thus, `-' and `]' are not special there).
502 Note that a complement character set can match a newline, unless
503 newline is mentioned as one of the characters not to match.
506 is a special character that matches the empty string, but only at
507 the beginning of a line in the text being matched. Otherwise it
508 fails to match anything. Thus, `^foo' matches a `foo' that occurs
509 at the beginning of a line.
511 When matching a string instead of a buffer, `^' matches at the
512 beginning of the string or after a newline character `\n'.
515 is similar to `^' but matches only at the end of a line. Thus,
516 `x+$' matches a string of one `x' or more at the end of a line.
518 When matching a string instead of a buffer, `$' matches at the end
519 of the string or before a newline character `\n'.
522 has two functions: it quotes the special characters (including
523 `\'), and it introduces additional special constructs.
525 Because `\' quotes special characters, `\$' is a regular
526 expression that matches only `$', and `\[' is a regular expression
527 that matches only `[', and so on.
529 Note that `\' also has special meaning in the read syntax of Lisp
530 strings (*note String Type::), and must be quoted with `\'. For
531 example, the regular expression that matches the `\' character is
532 `\\'. To write a Lisp string that contains the characters `\\',
533 Lisp syntax requires you to quote each `\' with another `\'.
534 Therefore, the read syntax for a regular expression matching `\'
537 *Please note:* For historical compatibility, special characters are
538 treated as ordinary ones if they are in contexts where their special
539 meanings make no sense. For example, `*foo' treats `*' as ordinary
540 since there is no preceding expression on which the `*' can act. It is
541 poor practice to depend on this behavior; quote the special character
542 anyway, regardless of where it appears.
544 For the most part, `\' followed by any character matches only that
545 character. However, there are several exceptions: characters that,
546 when preceded by `\', are special constructs. Such characters are
547 always ordinary when encountered on their own. Here is a table of `\'
551 specifies an alternative. Two regular expressions A and B with
552 `\|' in between form an expression that matches anything that
553 either A or B matches.
555 Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
557 `\|' applies to the largest possible surrounding expressions.
558 Only a surrounding `\( ... \)' grouping can limit the grouping
561 Full backtracking capability exists to handle multiple uses of
565 is a grouping construct that serves three purposes:
567 1. To enclose a set of `\|' alternatives for other operations.
568 Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
570 2. To enclose an expression for a suffix operator such as `*' to
571 act on. Thus, `ba\(na\)*' matches `bananana', etc., with any
572 (zero or more) number of `na' strings.
574 3. To record a matched substring for future reference.
576 This last application is not a consequence of the idea of a
577 parenthetical grouping; it is a separate feature that happens to be
578 assigned as a second meaning to the same `\( ... \)' construct
579 because there is no conflict in practice between the two meanings.
580 Here is an explanation of this feature:
583 matches the same text that matched the DIGITth occurrence of a `\(
586 In other words, after the end of a `\( ... \)' construct. the
587 matcher remembers the beginning and end of the text matched by that
588 construct. Then, later on in the regular expression, you can use
589 `\' followed by DIGIT to match that same text, whatever it may
592 The strings matching the first nine `\( ... \)' constructs
593 appearing in a regular expression are assigned numbers 1 through 9
594 in the order that the open parentheses appear in the regular
595 expression. So you can use `\1' through `\9' to refer to the text
596 matched by the corresponding `\( ... \)' constructs.
598 For example, `\(.*\)\1' matches any newline-free string that is
599 composed of two identical halves. The `\(.*\)' matches the first
600 half, which may be anything, but the `\1' that follows must match
604 is called a "shy" grouping operator, and it is used just like `\(
605 ... \)', except that it does not cause the matched substring to be
606 recorded for future reference.
608 This is useful when you need a lot of grouping `\( ... \)'
609 constructs, but only want to remember one or two. Then you can use
610 not want to remember them for later use with `match-string'.
612 Using `\(?: ... \)' rather than `\( ... \)' when you don't need
613 the captured substrings ought to speed up your programs some,
614 since it shortens the code path followed by the regular expression
615 engine, as well as the amount of memory allocation and string
616 copying it must do. The actual performance gain to be observed
617 has not been measured or quantified as of this writing.
619 The shy grouping operator has been borrowed from Perl, and has not
620 been available prior to XEmacs 20.3, nor is it available in FSF
624 matches any word-constituent character. The editor syntax table
625 determines which characters these are. *Note Syntax Tables::.
628 matches any character that is not a word constituent.
631 matches any character whose syntax is CODE. Here CODE is a
632 character that represents a syntax code: thus, `w' for word
633 constituent, `-' for whitespace, `(' for open parenthesis, etc.
634 *Note Syntax Tables::, for a list of syntax codes and the
635 characters that stand for them.
638 matches any character whose syntax is not CODE.
640 The following regular expression constructs match the empty
641 string--that is, they don't use up any characters--but whether they
642 match depends on the context.
645 matches the empty string, but only at the beginning of the buffer
646 or string being matched against.
649 matches the empty string, but only at the end of the buffer or
650 string being matched against.
653 matches the empty string, but only at point. (This construct is
654 not defined when matching against a string.)
657 matches the empty string, but only at the beginning or end of a
658 word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
659 separate word. `\bballs?\b' matches `ball' or `balls' as a
663 matches the empty string, but _not_ at the beginning or end of a
667 matches the empty string, but only at the beginning of a word.
670 matches the empty string, but only at the end of a word.
672 Not every string is a valid regular expression. For example, a
673 string with unbalanced square brackets is invalid (with a few
674 exceptions, such as `[]]'), and so is a string that ends with a single
675 `\'. If an invalid regular expression is passed to any of the search
676 functions, an `invalid-regexp' error is signaled.
678 - Function: regexp-quote string
679 This function returns a regular expression string that matches
680 exactly STRING and nothing else. This allows you to request an
681 exact string match when calling a function that wants a regular
684 (regexp-quote "^The cat$")
687 One use of `regexp-quote' is to combine an exact string match with
688 context described as a regular expression. For example, this
689 searches for the string that is the value of `string', surrounded
693 (concat "\\s-" (regexp-quote string) "\\s-"))
696 File: lispref.info, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions
698 Complex Regexp Example
699 ----------------------
701 Here is a complicated regexp, used by XEmacs to recognize the end of
702 a sentence together with any whitespace that follows. It is the value
703 of the variable `sentence-end'.
705 First, we show the regexp as a string in Lisp syntax to distinguish
706 spaces from tab characters. The string constant begins and ends with a
707 double-quote. `\"' stands for a double-quote as part of the string,
708 `\\' for a backslash as part of the string, `\t' for a tab and `\n' for
711 "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
713 In contrast, if you evaluate the variable `sentence-end', you will
718 "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[
721 In this output, tab and newline appear as themselves.
723 This regular expression contains four parts in succession and can be
724 deciphered as follows:
727 The first part of the pattern is a character set that matches any
728 one of three characters: period, question mark, and exclamation
729 mark. The match must begin with one of these three characters.
732 The second part of the pattern matches any closing braces and
733 quotation marks, zero or more of them, that may follow the period,
734 question mark or exclamation mark. The `\"' is Lisp syntax for a
735 double-quote in a string. The `*' at the end indicates that the
736 immediately preceding regular expression (a character set, in this
737 case) may be repeated zero or more times.
739 `\\($\\| $\\|\t\\| \\)'
740 The third part of the pattern matches the whitespace that follows
741 the end of a sentence: the end of a line, or a tab, or two spaces.
742 The double backslashes mark the parentheses and vertical bars as
743 regular expression syntax; the parentheses delimit a group and the
744 vertical bars separate alternatives. The dollar sign is used to
745 match the end of a line.
748 Finally, the last part of the pattern matches any additional
749 whitespace beyond the minimum needed to end a sentence.
752 File: lispref.info, Node: Regexp Search, Next: POSIX Regexps, Prev: Regular Expressions, Up: Searching and Matching
754 Regular Expression Searching
755 ============================
757 In XEmacs, you can search for the next match for a regexp either
758 incrementally or not. Incremental search commands are described in the
759 `The XEmacs Reference Manual'. *Note Regular Expression Search:
760 (emacs)Regexp Search. Here we describe only the search functions
761 useful in programs. The principal one is `re-search-forward'.
763 - Command: re-search-forward regexp &optional limit noerror repeat
764 This function searches forward in the current buffer for a string
765 of text that is matched by the regular expression REGEXP. The
766 function skips over any amount of text that is not matched by
767 REGEXP, and leaves point at the end of the first match found. It
768 returns the new value of point.
770 If LIMIT is non-`nil' (it must be a position in the current
771 buffer), then it is the upper bound to the search. No match
772 extending after that position is accepted.
774 What happens when the search fails depends on the value of
775 NOERROR. If NOERROR is `nil', a `search-failed' error is
776 signaled. If NOERROR is `t', `re-search-forward' does nothing and
777 returns `nil'. If NOERROR is neither `nil' nor `t', then
778 `re-search-forward' moves point to LIMIT (or the end of the
779 buffer) and returns `nil'.
781 If REPEAT is supplied (it must be a positive number), then the
782 search is repeated that many times (each time starting at the end
783 of the previous time's match). If these successive searches
784 succeed, the function succeeds, moving point and returning its new
785 value. Otherwise the search fails.
787 In the following example, point is initially before the `T'.
788 Evaluating the search call moves point to the end of that line
789 (between the `t' of `hat' and the newline).
791 ---------- Buffer: foo ----------
792 I read "-!-The cat in the hat
794 ---------- Buffer: foo ----------
796 (re-search-forward "[a-z]+" nil t 5)
799 ---------- Buffer: foo ----------
800 I read "The cat in the hat-!-
802 ---------- Buffer: foo ----------
804 - Command: re-search-backward regexp &optional limit noerror repeat
805 This function searches backward in the current buffer for a string
806 of text that is matched by the regular expression REGEXP, leaving
807 point at the beginning of the first text found.
809 This function is analogous to `re-search-forward', but they are not
810 simple mirror images. `re-search-forward' finds the match whose
811 beginning is as close as possible to the starting point. If
812 `re-search-backward' were a perfect mirror image, it would find the
813 match whose end is as close as possible. However, in fact it
814 finds the match whose beginning is as close as possible. The
815 reason is that matching a regular expression at a given spot
816 always works from beginning to end, and starts at a specified
819 A true mirror-image of `re-search-forward' would require a special
820 feature for matching regexps from end to beginning. It's not
821 worth the trouble of implementing that.
823 - Function: string-match regexp string &optional start
824 This function returns the index of the start of the first match for
825 the regular expression REGEXP in STRING, or `nil' if there is no
826 match. If START is non-`nil', the search starts at that index in
832 "quick" "The quick brown fox jumped quickly.")
835 "quick" "The quick brown fox jumped quickly." 8)
838 The index of the first character of the string is 0, the index of
839 the second character is 1, and so on.
841 After this function returns, the index of the first character
842 beyond the match is available as `(match-end 0)'. *Note Match
846 "quick" "The quick brown fox jumped quickly." 8)
852 - Function: split-string string &optional pattern
853 This function splits STRING to substrings delimited by PATTERN,
854 and returns a list of substrings. If PATTERN is omitted, it
855 defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
858 (split-string "foo bar")
861 (split-string "something")
864 (split-string "a:b:c" ":")
867 (split-string ":a::b:c" ":")
868 => ("" "a" "" "b" "c")
870 - Function: split-path path
871 This function splits a search path into a list of strings. The
872 path components are separated with the characters specified with
873 `path-separator'. Under Unix, `path-separator' will normally be
874 `:', while under Windows, it will be `;'.
876 - Function: looking-at regexp
877 This function determines whether the text in the current buffer
878 directly following point matches the regular expression REGEXP.
879 "Directly following" means precisely that: the search is
880 "anchored" and it can succeed only starting with the first
881 character following point. The result is `t' if so, `nil'
884 This function does not move point, but it updates the match data,
885 which you can access using `match-beginning' and `match-end'.
888 In this example, point is located directly before the `T'. If it
889 were anywhere else, the result would be `nil'.
891 ---------- Buffer: foo ----------
892 I read "-!-The cat in the hat
894 ---------- Buffer: foo ----------
896 (looking-at "The cat in the hat$")
900 File: lispref.info, Node: POSIX Regexps, Next: Search and Replace, Prev: Regexp Search, Up: Searching and Matching
902 POSIX Regular Expression Searching
903 ==================================
905 The usual regular expression functions do backtracking when necessary
906 to handle the `\|' and repetition constructs, but they continue this
907 only until they find _some_ match. Then they succeed and report the
910 This section describes alternative search functions which perform the
911 full backtracking specified by the POSIX standard for regular expression
912 matching. They continue backtracking until they have tried all
913 possibilities and found all matches, so they can report the longest
914 match, as required by POSIX. This is much slower, so use these
915 functions only when you really need the longest match.
917 In Emacs versions prior to 19.29, these functions did not exist, and
918 the functions described above implemented full POSIX backtracking.
920 - Function: posix-search-forward regexp &optional limit noerror repeat
921 This is like `re-search-forward' except that it performs the full
922 backtracking specified by the POSIX standard for regular expression
925 - Function: posix-search-backward regexp &optional limit noerror repeat
926 This is like `re-search-backward' except that it performs the full
927 backtracking specified by the POSIX standard for regular expression
930 - Function: posix-looking-at regexp
931 This is like `looking-at' except that it performs the full
932 backtracking specified by the POSIX standard for regular expression
935 - Function: posix-string-match regexp string &optional start
936 This is like `string-match' except that it performs the full
937 backtracking specified by the POSIX standard for regular expression
941 File: lispref.info, Node: Search and Replace, Next: Match Data, Prev: POSIX Regexps, Up: Searching and Matching
946 - Function: perform-replace from-string replacements query-flag
947 regexp-flag delimited-flag &optional repeat-count map
948 This function is the guts of `query-replace' and related commands.
949 It searches for occurrences of FROM-STRING and replaces some or
950 all of them. If QUERY-FLAG is `nil', it replaces all occurrences;
951 otherwise, it asks the user what to do about each one.
953 If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
954 regular expression; otherwise, it must match literally. If
955 DELIMITED-FLAG is non-`nil', then only replacements surrounded by
956 word boundaries are considered.
958 The argument REPLACEMENTS specifies what to replace occurrences
959 with. If it is a string, that string is used. It can also be a
960 list of strings, to be used in cyclic order.
962 If REPEAT-COUNT is non-`nil', it should be an integer. Then it
963 specifies how many times to use each of the strings in the
964 REPLACEMENTS list before advancing cyclicly to the next one.
966 Normally, the keymap `query-replace-map' defines the possible user
967 responses for queries. The argument MAP, if non-`nil', is a
968 keymap to use instead of `query-replace-map'.
970 - Variable: query-replace-map
971 This variable holds a special keymap that defines the valid user
972 responses for `query-replace' and related functions, as well as
973 `y-or-n-p' and `map-y-or-n-p'. It is unusual in two ways:
975 * The "key bindings" are not commands, just symbols that are
976 meaningful to the functions that use this map.
978 * Prefix keys are not supported; each key binding must be for a
979 single event key sequence. This is because the functions
980 don't use read key sequence to get the input; instead, they
981 read a single event and look it up "by hand."
983 Here are the meaningful "bindings" for `query-replace-map'. Several
984 of them are meaningful only for `query-replace' and friends.
987 Do take the action being considered--in other words, "yes."
990 Do not take action for this question--in other words, "no."
993 Answer this question "no," and give up on the entire series of
994 questions, assuming that the answers will be "no."
997 Answer this question "yes," and give up on the entire series of
998 questions, assuming that subsequent answers will be "no."
1001 Answer this question "yes," but show the results--don't advance yet
1002 to the next question.
1005 Answer this question and all subsequent questions in the series
1006 with "yes," without further user interaction.
1009 Move back to the previous place that a question was asked about.
1012 Enter a recursive edit to deal with this question--instead of any
1013 other action that would normally be taken.
1016 Delete the text being considered, then enter a recursive edit to
1020 Redisplay and center the window, then ask the same question again.
1023 Perform a quit right away. Only `y-or-n-p' and related functions
1027 Display some help, then ask again.
1030 File: lispref.info, Node: Match Data, Next: Searching and Case, Prev: Search and Replace, Up: Searching and Matching
1035 XEmacs keeps track of the positions of the start and end of segments
1036 of text found during a regular expression search. This means, for
1037 example, that you can search for a complex pattern, such as a date in
1038 an Rmail message, and then extract parts of the match under control of
1041 Because the match data normally describe the most recent search only,
1042 you must be careful not to do another search inadvertently between the
1043 search you wish to refer back to and the use of the match data. If you
1044 can't avoid another intervening search, you must save and restore the
1045 match data around it, to prevent it from being overwritten.
1049 * Simple Match Data:: Accessing single items of match data,
1050 such as where a particular subexpression started.
1051 * Replacing Match:: Replacing a substring that was matched.
1052 * Entire Match Data:: Accessing the entire match data at once, as a list.
1053 * Saving Match Data:: Saving and restoring the match data.
1056 File: lispref.info, Node: Simple Match Data, Next: Replacing Match, Up: Match Data
1058 Simple Match Data Access
1059 ------------------------
1061 This section explains how to use the match data to find out what was
1062 matched by the last search or match operation.
1064 You can ask about the entire matching text, or about a particular
1065 parenthetical subexpression of a regular expression. The COUNT
1066 argument in the functions below specifies which. If COUNT is zero, you
1067 are asking about the entire match. If COUNT is positive, it specifies
1068 which subexpression you want.
1070 Recall that the subexpressions of a regular expression are those
1071 expressions grouped with escaped parentheses, `\(...\)'. The COUNTth
1072 subexpression is found by counting occurrences of `\(' from the
1073 beginning of the whole regular expression. The first subexpression is
1074 numbered 1, the second 2, and so on. Only regular expressions can have
1075 subexpressions--after a simple string search, the only information
1076 available is about the entire match.
1078 - Function: match-string count &optional in-string
1079 This function returns, as a string, the text matched in the last
1080 search or match operation. It returns the entire text if COUNT is
1081 zero, or just the portion corresponding to the COUNTth
1082 parenthetical subexpression, if COUNT is positive. If COUNT is
1083 out of range, or if that subexpression didn't match anything, the
1086 If the last such operation was done against a string with
1087 `string-match', then you should pass the same string as the
1088 argument IN-STRING. Otherwise, after a buffer search or match,
1089 you should omit IN-STRING or pass `nil' for it; but you should
1090 make sure that the current buffer when you call `match-string' is
1091 the one in which you did the searching or matching.
1093 - Function: match-beginning count
1094 This function returns the position of the start of text matched by
1095 the last regular expression searched for, or a subexpression of it.
1097 If COUNT is zero, then the value is the position of the start of
1098 the entire match. Otherwise, COUNT specifies a subexpression in
1099 the regular expression, and the value of the function is the
1100 starting position of the match for that subexpression.
1102 The value is `nil' for a subexpression inside a `\|' alternative
1103 that wasn't used in the match.
1105 - Function: match-end count
1106 This function is like `match-beginning' except that it returns the
1107 position of the end of the match, rather than the position of the
1110 Here is an example of using the match data, with a comment showing
1111 the positions within the text:
1113 (string-match "\\(qu\\)\\(ick\\)"
1114 "The quick fox jumped quickly.")
1118 (match-string 0 "The quick fox jumped quickly.")
1120 (match-string 1 "The quick fox jumped quickly.")
1122 (match-string 2 "The quick fox jumped quickly.")
1125 (match-beginning 1) ; The beginning of the match
1126 => 4 ; with `qu' is at index 4.
1128 (match-beginning 2) ; The beginning of the match
1129 => 6 ; with `ick' is at index 6.
1131 (match-end 1) ; The end of the match
1132 => 6 ; with `qu' is at index 6.
1134 (match-end 2) ; The end of the match
1135 => 9 ; with `ick' is at index 9.
1137 Here is another example. Point is initially located at the beginning
1138 of the line. Searching moves point to between the space and the word
1139 `in'. The beginning of the entire match is at the 9th character of the
1140 buffer (`T'), and the beginning of the match for the first
1141 subexpression is at the 13th character (`c').
1144 (re-search-forward "The \\(cat \\)")
1146 (match-beginning 1))
1149 ---------- Buffer: foo ----------
1150 I read "The cat -!-in the hat comes back" twice.
1153 ---------- Buffer: foo ----------
1155 (In this case, the index returned is a buffer position; the first
1156 character of the buffer counts as 1.)
1159 File: lispref.info, Node: Replacing Match, Next: Entire Match Data, Prev: Simple Match Data, Up: Match Data
1161 Replacing the Text That Matched
1162 -------------------------------
1164 This function replaces the text matched by the last search with
1167 - Function: replace-match replacement &optional fixedcase literal
1169 This function replaces the text in the buffer (or in STRING) that
1170 was matched by the last search. It replaces that text with
1173 If you did the last search in a buffer, you should specify `nil'
1174 for STRING. Then `replace-match' does the replacement by editing
1175 the buffer; it leaves point at the end of the replacement text,
1178 If you did the search in a string, pass the same string as STRING.
1179 Then `replace-match' does the replacement by constructing and
1180 returning a new string.
1182 If FIXEDCASE is non-`nil', then the case of the replacement text
1183 is not changed; otherwise, the replacement text is converted to a
1184 different case depending upon the capitalization of the text to be
1185 replaced. If the original text is all upper case, the replacement
1186 text is converted to upper case. If the first word of the
1187 original text is capitalized, then the first word of the
1188 replacement text is capitalized. If the original text contains
1189 just one word, and that word is a capital letter, `replace-match'
1190 considers this a capitalized first word rather than all upper case.
1192 If `case-replace' is `nil', then case conversion is not done,
1193 regardless of the value of FIXED-CASE. *Note Searching and Case::.
1195 If LITERAL is non-`nil', then REPLACEMENT is inserted exactly as
1196 it is, the only alterations being case changes as needed. If it
1197 is `nil' (the default), then the character `\' is treated
1198 specially. If a `\' appears in REPLACEMENT, then it must be part
1199 of one of the following sequences:
1202 `\&' stands for the entire text being replaced.
1205 `\N', where N is a digit, stands for the text that matched
1206 the Nth subexpression in the original regexp. Subexpressions
1207 are those expressions grouped inside `\(...\)'.
1210 `\\' stands for a single `\' in the replacement text.