1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
4 INFO-DIR-SECTION XEmacs Editor
6 * Lispref: (lispref). XEmacs Lisp Reference Manual.
11 GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
21 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
22 Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc.
23 Copyright (C) 1995, 1996 Ben Wing.
25 Permission is granted to make and distribute verbatim copies of this
26 manual provided the copyright notice and this permission notice are
27 preserved on all copies.
29 Permission is granted to copy and distribute modified versions of
30 this manual under the conditions for verbatim copying, provided that the
31 entire resulting derived work is distributed under the terms of a
32 permission notice identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that this permission notice may be stated in a
37 translation approved by the Foundation.
39 Permission is granted to copy and distribute modified versions of
40 this manual under the conditions for verbatim copying, provided also
41 that the section entitled "GNU General Public License" is included
42 exactly as in the original, and provided that the entire resulting
43 derived work is distributed under the terms of a permission notice
44 identical to this one.
46 Permission is granted to copy and distribute translations of this
47 manual into another language, under the above conditions for modified
48 versions, except that the section entitled "GNU General Public License"
49 may be included in a translation approved by the Free Software
50 Foundation instead of in the original English.
53 File: lispref.info, Node: Examining Properties, Next: Changing Properties, Up: Text Properties
55 Examining Text Properties
56 -------------------------
58 The simplest way to examine text properties is to ask for the value
59 of a particular property of a particular character. For that, use
60 `get-text-property'. Use `text-properties-at' to get the entire
61 property list of a character. *Note Property Search::, for functions
62 to examine the properties of a number of characters at once.
64 These functions handle both strings and buffers. (Keep in mind that
65 positions in a string start from 0, whereas positions in a buffer start
68 - Function: get-text-property pos prop &optional object
69 This function returns the value of the PROP property of the
70 character after position POS in OBJECT (a buffer or string). The
71 argument OBJECT is optional and defaults to the current buffer.
73 - Function: get-char-property pos prop &optional object
74 This function is like `get-text-property', except that it checks
75 all extents, not just text-property extents.
78 - Function: text-properties-at position &optional object
79 This function returns the entire property list of the character at
80 POSITION in the string or buffer OBJECT. If OBJECT is `nil', it
81 defaults to the current buffer.
83 - Variable: default-text-properties
84 This variable holds a property list giving default values for text
85 properties. Whenever a character does not specify a value for a
86 property, the value stored in this list is used instead. Here is
89 (setq default-text-properties '(foo 69))
90 ;; Make sure character 1 has no properties of its own.
91 (set-text-properties 1 2 nil)
92 ;; What we get, when we ask, is the default value.
93 (get-text-property 1 'foo)
97 File: lispref.info, Node: Changing Properties, Next: Property Search, Prev: Examining Properties, Up: Text Properties
99 Changing Text Properties
100 ------------------------
102 The primitives for changing properties apply to a specified range of
103 text. The function `set-text-properties' (see end of section) sets the
104 entire property list of the text in that range; more often, it is
105 useful to add, change, or delete just certain properties specified by
108 Since text properties are considered part of the buffer's contents,
109 and can affect how the buffer looks on the screen, any change in the
110 text properties is considered a buffer modification. Buffer text
111 property changes are undoable (*note Undo::).
113 - Function: put-text-property start end prop value &optional object
114 This function sets the PROP property to VALUE for the text between
115 START and END in the string or buffer OBJECT. If OBJECT is `nil',
116 it defaults to the current buffer.
118 - Function: add-text-properties start end props &optional object
119 This function modifies the text properties for the text between
120 START and END in the string or buffer OBJECT. If OBJECT is `nil',
121 it defaults to the current buffer.
123 The argument PROPS specifies which properties to change. It
124 should have the form of a property list (*note Property Lists::):
125 a list whose elements include the property names followed
126 alternately by the corresponding values.
128 The return value is `t' if the function actually changed some
129 property's value; `nil' otherwise (if PROPS is `nil' or its values
130 agree with those in the text).
132 For example, here is how to set the `comment' and `face'
133 properties of a range of text:
135 (add-text-properties START END
136 '(comment t face highlight))
138 - Function: remove-text-properties start end props &optional object
139 This function deletes specified text properties from the text
140 between START and END in the string or buffer OBJECT. If OBJECT
141 is `nil', it defaults to the current buffer.
143 The argument PROPS specifies which properties to delete. It
144 should have the form of a property list (*note Property Lists::):
145 a list whose elements are property names alternating with
146 corresponding values. But only the names matter--the values that
147 accompany them are ignored. For example, here's how to remove the
150 (remove-text-properties START END '(face nil))
152 The return value is `t' if the function actually changed some
153 property's value; `nil' otherwise (if PROPS is `nil' or if no
154 character in the specified text had any of those properties).
156 - Function: set-text-properties start end props &optional object
157 This function completely replaces the text property list for the
158 text between START and END in the string or buffer OBJECT. If
159 OBJECT is `nil', it defaults to the current buffer.
161 The argument PROPS is the new property list. It should be a list
162 whose elements are property names alternating with corresponding
165 After `set-text-properties' returns, all the characters in the
166 specified range have identical properties.
168 If PROPS is `nil', the effect is to get rid of all properties from
169 the specified range of text. Here's an example:
171 (set-text-properties START END nil)
173 See also the function `buffer-substring-without-properties' (*note
174 Buffer Contents::) which copies text from the buffer but does not copy
178 File: lispref.info, Node: Property Search, Next: Special Properties, Prev: Changing Properties, Up: Text Properties
180 Property Search Functions
181 -------------------------
183 In typical use of text properties, most of the time several or many
184 consecutive characters have the same value for a property. Rather than
185 writing your programs to examine characters one by one, it is much
186 faster to process chunks of text that have the same property value.
188 Here are functions you can use to do this. They use `eq' for
189 comparing property values. In all cases, OBJECT defaults to the
192 For high performance, it's very important to use the LIMIT argument
193 to these functions, especially the ones that search for a single
194 property--otherwise, they may spend a long time scanning to the end of
195 the buffer, if the property you are interested in does not change.
197 Remember that a position is always between two characters; the
198 position returned by these functions is between two characters with
199 different properties.
201 - Function: next-property-change pos &optional object limit
202 The function scans the text forward from position POS in the
203 string or buffer OBJECT till it finds a change in some text
204 property, then returns the position of the change. In other
205 words, it returns the position of the first character beyond POS
206 whose properties are not identical to those of the character just
209 If LIMIT is non-`nil', then the scan ends at position LIMIT. If
210 there is no property change before that point,
211 `next-property-change' returns LIMIT.
213 The value is `nil' if the properties remain unchanged all the way
214 to the end of OBJECT and LIMIT is `nil'. If the value is
215 non-`nil', it is a position greater than or equal to POS. The
216 value equals POS only when LIMIT equals POS.
218 Here is an example of how to scan the buffer by chunks of text
219 within which all properties are constant:
222 (let ((plist (text-properties-at (point)))
224 (or (next-property-change (point) (current-buffer))
226 Process text from point to NEXT-CHANGE...
227 (goto-char next-change)))
229 - Function: next-single-property-change pos prop &optional object limit
230 The function scans the text forward from position POS in the
231 string or buffer OBJECT till it finds a change in the PROP
232 property, then returns the position of the change. In other
233 words, it returns the position of the first character beyond POS
234 whose PROP property differs from that of the character just after
237 If LIMIT is non-`nil', then the scan ends at position LIMIT. If
238 there is no property change before that point,
239 `next-single-property-change' returns LIMIT.
241 The value is `nil' if the property remains unchanged all the way to
242 the end of OBJECT and LIMIT is `nil'. If the value is non-`nil',
243 it is a position greater than or equal to POS; it equals POS only
246 - Function: previous-property-change pos &optional object limit
247 This is like `next-property-change', but scans back from POS
248 instead of forward. If the value is non-`nil', it is a position
249 less than or equal to POS; it equals POS only if LIMIT equals POS.
251 - Function: previous-single-property-change pos prop &optional object
253 This is like `next-single-property-change', but scans back from
254 POS instead of forward. If the value is non-`nil', it is a
255 position less than or equal to POS; it equals POS only if LIMIT
258 - Function: text-property-any start end prop value &optional object
259 This function returns non-`nil' if at least one character between
260 START and END has a property PROP whose value is VALUE. More
261 precisely, it returns the position of the first such character.
262 Otherwise, it returns `nil'.
264 The optional fifth argument, OBJECT, specifies the string or
265 buffer to scan. Positions are relative to OBJECT. The default
266 for OBJECT is the current buffer.
268 - Function: text-property-not-all start end prop value &optional object
269 This function returns non-`nil' if at least one character between
270 START and END has a property PROP whose value differs from VALUE.
271 More precisely, it returns the position of the first such
272 character. Otherwise, it returns `nil'.
274 The optional fifth argument, OBJECT, specifies the string or
275 buffer to scan. Positions are relative to OBJECT. The default
276 for OBJECT is the current buffer.
279 File: lispref.info, Node: Special Properties, Next: Saving Properties, Prev: Property Search, Up: Text Properties
281 Properties with Special Meanings
282 --------------------------------
284 The predefined properties are the same as those for extents. *Note
288 File: lispref.info, Node: Saving Properties, Prev: Special Properties, Up: Text Properties
290 Saving Text Properties in Files
291 -------------------------------
293 You can save text properties in files, and restore text properties
294 when inserting the files, using these two hooks:
296 - Variable: write-region-annotate-functions
297 This variable's value is a list of functions for `write-region' to
298 run to encode text properties in some fashion as annotations to
299 the text being written in the file. *Note Writing to Files::.
301 Each function in the list is called with two arguments: the start
302 and end of the region to be written. These functions should not
303 alter the contents of the buffer. Instead, they should return
304 lists indicating annotations to write in the file in addition to
305 the text in the buffer.
307 Each function should return a list of elements of the form
308 `(POSITION . STRING)', where POSITION is an integer specifying the
309 relative position in the text to be written, and STRING is the
310 annotation to add there.
312 Each list returned by one of these functions must be already
313 sorted in increasing order by POSITION. If there is more than one
314 function, `write-region' merges the lists destructively into one
317 When `write-region' actually writes the text from the buffer to the
318 file, it intermixes the specified annotations at the corresponding
319 positions. All this takes place without modifying the buffer.
321 - Variable: after-insert-file-functions
322 This variable holds a list of functions for `insert-file-contents'
323 to call after inserting a file's contents. These functions should
324 scan the inserted text for annotations, and convert them to the
325 text properties they stand for.
327 Each function receives one argument, the length of the inserted
328 text; point indicates the start of that text. The function should
329 scan that text for annotations, delete them, and create the text
330 properties that the annotations specify. The function should
331 return the updated length of the inserted text, as it stands after
332 those changes. The value returned by one function becomes the
333 argument to the next function.
335 These functions should always return with point at the beginning of
338 The intended use of `after-insert-file-functions' is for converting
339 some sort of textual annotations into actual text properties. But
340 other uses may be possible.
342 We invite users to write Lisp programs to store and retrieve text
343 properties in files, using these hooks, and thus to experiment with
344 various data formats and find good ones. Eventually we hope users will
345 produce good, general extensions we can install in Emacs.
347 We suggest not trying to handle arbitrary Lisp objects as property
348 names or property values--because a program that general is probably
349 difficult to write, and slow. Instead, choose a set of possible data
350 types that are reasonably flexible, and not too hard to encode.
352 *Note Format Conversion::, for a related feature.
355 File: lispref.info, Node: Substitution, Next: Registers, Prev: Text Properties, Up: Text
357 Substituting for a Character Code
358 =================================
360 The following functions replace characters within a specified region
361 based on their character codes.
363 - Function: subst-char-in-region start end old-char new-char &optional
365 This function replaces all occurrences of the character OLD-CHAR
366 with the character NEW-CHAR in the region of the current buffer
367 defined by START and END.
369 If NOUNDO is non-`nil', then `subst-char-in-region' does not
370 record the change for undo and does not mark the buffer as
371 modified. This feature is used for controlling selective display
372 (*note Selective Display::).
374 `subst-char-in-region' does not move point and returns `nil'.
376 ---------- Buffer: foo ----------
377 This is the contents of the buffer before.
378 ---------- Buffer: foo ----------
380 (subst-char-in-region 1 20 ?i ?X)
383 ---------- Buffer: foo ----------
384 ThXs Xs the contents of the buffer before.
385 ---------- Buffer: foo ----------
387 - Function: translate-region start end table
388 This function applies a translation table to the characters in the
389 buffer between positions START and END. The translation table
390 TABLE can be either a string, a vector, or a char-table.
392 If TABLE is a string, its Nth element is the mapping for the
393 character with code N.
395 If TABLE is a vector, its Nth element is the mapping for character
396 with code N. Legal mappings are characters, strings, or `nil'
397 (meaning don't replace.)
399 If TABLE is a char-table, its elements describe the mapping
400 between characters and their replacements. The char-table should
401 be of type `char' or `generic'.
403 When the TABLE is a string or vector and its length is less than
404 the total number of characters (256 without Mule), any characters
405 with codes larger than the length of TABLE are not altered by the
408 The return value of `translate-region' is the number of characters
409 that were actually changed by the translation. This does not
410 count characters that were mapped into themselves in the
413 *NOTE*: Prior to XEmacs 21.2, the TABLE argument was allowed only
414 to be a string. This is still the case in FSF Emacs.
416 The following example creates a char-table that is passed to
417 `translate-region', which translates character `a' to `the letter
418 a', removes character `b', and translates character `c' to newline.
420 ---------- Buffer: foo ----------
421 Here is a sentence in the buffer.
422 ---------- Buffer: foo ----------
424 (let ((table (make-char-table 'generic)))
425 (put-char-table ?a "the letter a" table)
426 (put-char-table ?b "" table)
427 (put-char-table ?c ?\n table)
428 (translate-region (point-min) (point-max) table))
431 ---------- Buffer: foo ----------
432 Here is the letter a senten
434 ---------- Buffer: foo ----------
437 File: lispref.info, Node: Registers, Next: Transposition, Prev: Substitution, Up: Text
442 A register is a sort of variable used in XEmacs editing that can
443 hold a marker, a string, a rectangle, a window configuration (of one
444 frame), or a frame configuration (of all frames). Each register is
445 named by a single character. All characters, including control and
446 meta characters (but with the exception of `C-g'), can be used to name
447 registers. Thus, there are 255 possible registers. A register is
448 designated in Emacs Lisp by a character that is its name.
450 The functions in this section return unpredictable values unless
453 - Variable: register-alist
454 This variable is an alist of elements of the form `(NAME .
455 CONTENTS)'. Normally, there is one element for each XEmacs
456 register that has been used.
458 The object NAME is a character (an integer) identifying the
459 register. The object CONTENTS is a string, marker, or list
460 representing the register contents. A string represents text
461 stored in the register. A marker represents a position. A list
462 represents a rectangle; its elements are strings, one per line of
465 - Function: get-register reg
466 This function returns the contents of the register REG, or `nil'
467 if it has no contents.
469 - Function: set-register reg value
470 This function sets the contents of register REG to VALUE. A
471 register can be set to any value, but the other register functions
472 expect only certain data types. The return value is VALUE.
474 - Command: view-register reg
475 This command displays what is contained in register REG.
477 - Command: insert-register reg &optional beforep
478 This command inserts contents of register REG into the current
481 Normally, this command puts point before the inserted text, and the
482 mark after it. However, if the optional second argument BEFOREP
483 is non-`nil', it puts the mark before and point after. You can
484 pass a non-`nil' second argument BEFOREP to this function
485 interactively by supplying any prefix argument.
487 If the register contains a rectangle, then the rectangle is
488 inserted with its upper left corner at point. This means that
489 text is inserted in the current line and underneath it on
492 If the register contains something other than saved text (a
493 string) or a rectangle (a list), currently useless things happen.
494 This may be changed in the future.
497 File: lispref.info, Node: Transposition, Next: Change Hooks, Prev: Registers, Up: Text
499 Transposition of Text
500 =====================
502 This subroutine is used by the transposition commands.
504 - Function: transpose-regions start1 end1 start2 end2 &optional
506 This function exchanges two nonoverlapping portions of the buffer.
507 Arguments START1 and END1 specify the bounds of one portion and
508 arguments START2 and END2 specify the bounds of the other portion.
510 Normally, `transpose-regions' relocates markers with the transposed
511 text; a marker previously positioned within one of the two
512 transposed portions moves along with that portion, thus remaining
513 between the same two characters in their new position. However,
514 if LEAVE-MARKERS is non-`nil', `transpose-regions' does not do
515 this--it leaves all markers unrelocated.
518 File: lispref.info, Node: Change Hooks, Next: Transformations, Prev: Transposition, Up: Text
523 These hook variables let you arrange to take notice of all changes in
524 all buffers (or in a particular buffer, if you make them buffer-local).
526 The functions you use in these hooks should save and restore the
527 match data if they do anything that uses regular expressions;
528 otherwise, they will interfere in bizarre ways with the editing
529 operations that call them.
531 Buffer changes made while executing the following hooks don't
532 themselves cause any change hooks to be invoked.
534 - Variable: before-change-functions
535 This variable holds a list of a functions to call before any buffer
536 modification. Each function gets two arguments, the beginning and
537 end of the region that is about to change, represented as
538 integers. The buffer that is about to change is always the
541 - Variable: after-change-functions
542 This variable holds a list of a functions to call after any buffer
543 modification. Each function receives three arguments: the
544 beginning and end of the region just changed, and the length of
545 the text that existed before the change. (To get the current
546 length, subtract the region beginning from the region end.) All
547 three arguments are integers. The buffer that's about to change
548 is always the current buffer.
550 - Variable: before-change-function
551 This obsolete variable holds one function to call before any buffer
552 modification (or `nil' for no function). It is called just like
553 the functions in `before-change-functions'.
555 - Variable: after-change-function
556 This obsolete variable holds one function to call after any buffer
557 modification (or `nil' for no function). It is called just like
558 the functions in `after-change-functions'.
560 - Variable: first-change-hook
561 This variable is a normal hook that is run whenever a buffer is
562 changed that was previously in the unmodified state.
565 File: lispref.info, Node: Transformations, Prev: Change Hooks, Up: Text
567 Textual transformations--MD5 and base64 support
568 ===============================================
570 Some textual operations inherently require examining each character
571 in turn, and performing arithmetic operations on them. Such operations
572 can, of course, be implemented in Emacs Lisp, but tend to be very slow
573 for large portions of text or data. This is why some of them are
574 implemented in C, with an appropriate interface for Lisp programmers.
575 Examples of algorithms thus provided are MD5 and base64 support.
577 MD5 is an algorithm for calculating message digests, as described in
578 rfc1321. Given a message of arbitrary length, MD5 produces an 128-bit
579 "fingerprint" ("message digest") corresponding to that message. It is
580 considered computationally infeasible to produce two messages having
581 the same MD5 digest, or to produce a message having a prespecified
582 target digest. MD5 is used heavily by various authentication schemes.
584 Emacs Lisp interface to MD5 consists of a single function `md5':
586 - Function: md5 object &optional start end
587 This function returns the MD5 message digest of OBJECT, a buffer
590 Optional arguments START and END denote positions for computing
591 the digest of a portion of OBJECT.
593 Some examples of usage:
595 ;; Calculate the digest of the entire buffer
596 (md5 (current-buffer))
597 => "8842b04362899b1cda8d2d126dc11712"
599 ;; Calculate the digest of the current line
600 (md5 (current-buffer) (point-at-bol) (point-at-eol))
601 => "60614d21e9dee27dfdb01fa4e30d6d00"
603 ;; Calculate the digest of your name and email address
604 (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
605 => "0a2188c40fd38922d941fe6032fce516"
607 Base64 is a portable encoding for arbitrary sequences of octets, in a
608 form that need not be readable by humans. It uses a 65-character subset
609 of US-ASCII, as described in rfc2045. Base64 is used by MIME to encode
610 binary bodies, and to encode binary characters in message headers.
612 The Lisp interface to base64 consists of four functions:
614 - Function: base64-encode-region beg end &optional no-line-break
615 This function encodes the region between BEG and END of the
616 current buffer to base64 format. This means that the original
617 region is deleted, and replaced with its base64 equivalent.
619 Normally, encoded base64 output is multi-line, with 76-character
620 lines. If NO-LINE-BREAK is non-`nil', newlines will not be
621 inserted, resulting in single-line output.
623 Mule note: you should make sure that you convert the multibyte
624 characters (those that do not fit into 0-255 range) to something
625 else, because they cannot be meaningfully converted to base64. If
626 the `base64-encode-region' encounters such characters, it will
629 `base64-encode-region' returns the length of the encoded text.
631 ;; Encode the whole buffer in base64
632 (base64-encode-region (point-min) (point-max))
634 The function can also be used interactively, in which case it
635 works on the currently active region.
637 - Function: base64-encode-string string
638 This function encodes STRING to base64, and returns the encoded
641 For Mule, the same considerations apply as for
642 `base64-encode-region'.
644 (base64-encode-string "fubar")
647 - Function: base64-decode-region beg end
648 This function decodes the region between BEG and END of the
649 current buffer. The region should be in base64 encoding.
651 If the region was decoded correctly, `base64-decode-region' returns
652 the length of the decoded region. If the decoding failed, `nil' is
655 ;; Decode a base64 buffer, and replace it with the decoded version
656 (base64-decode-region (point-min) (point-max))
658 - Function: base64-decode-string string
659 This function decodes STRING to base64, and returns the decoded
660 string. STRING should be valid base64-encoded text.
662 If encoding was not possible, `nil' is returned.
664 (base64-decode-string "ZnViYXI=")
667 (base64-decode-string "totally bogus")
671 File: lispref.info, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top
673 Searching and Matching
674 **********************
676 XEmacs provides two ways to search through a buffer for specified
677 text: exact string searches and regular expression searches. After a
678 regular expression search, you can examine the "match data" to
679 determine which text matched the whole regular expression or various
684 * String Search:: Search for an exact match.
685 * Regular Expressions:: Describing classes of strings.
686 * Regexp Search:: Searching for a match for a regexp.
687 * POSIX Regexps:: Searching POSIX-style for the longest match.
688 * Search and Replace:: Internals of `query-replace'.
689 * Match Data:: Finding out which part of the text matched
690 various parts of a regexp, after regexp search.
691 * Searching and Case:: Case-independent or case-significant searching.
692 * Standard Regexps:: Useful regexps for finding sentences, pages,...
694 The `skip-chars...' functions also perform a kind of searching.
695 *Note Skipping Characters::.
698 File: lispref.info, Node: String Search, Next: Regular Expressions, Up: Searching and Matching
700 Searching for Strings
701 =====================
703 These are the primitive functions for searching through the text in a
704 buffer. They are meant for use in programs, but you may call them
705 interactively. If you do so, they prompt for the search string; LIMIT
706 and NOERROR are set to `nil', and REPEAT is set to 1.
708 - Command: search-forward string &optional limit noerror repeat
709 This function searches forward from point for an exact match for
710 STRING. If successful, it sets point to the end of the occurrence
711 found, and returns the new value of point. If no match is found,
712 the value and side effects depend on NOERROR (see below).
714 In the following example, point is initially at the beginning of
715 the line. Then `(search-forward "fox")' moves point after the last
718 ---------- Buffer: foo ----------
719 -!-The quick brown fox jumped over the lazy dog.
720 ---------- Buffer: foo ----------
722 (search-forward "fox")
725 ---------- Buffer: foo ----------
726 The quick brown fox-!- jumped over the lazy dog.
727 ---------- Buffer: foo ----------
729 The argument LIMIT specifies the upper bound to the search. (It
730 must be a position in the current buffer.) No match extending
731 after that position is accepted. If LIMIT is omitted or `nil', it
732 defaults to the end of the accessible portion of the buffer.
734 What happens when the search fails depends on the value of
735 NOERROR. If NOERROR is `nil', a `search-failed' error is
736 signaled. If NOERROR is `t', `search-forward' returns `nil' and
737 does nothing. If NOERROR is neither `nil' nor `t', then
738 `search-forward' moves point to the upper bound and returns `nil'.
739 (It would be more consistent now to return the new position of
740 point in that case, but some programs may depend on a value of
743 If REPEAT is supplied (it must be a positive number), then the
744 search is repeated that many times (each time starting at the end
745 of the previous time's match). If these successive searches
746 succeed, the function succeeds, moving point and returning its new
747 value. Otherwise the search fails.
749 - Command: search-backward string &optional limit noerror repeat
750 This function searches backward from point for STRING. It is just
751 like `search-forward' except that it searches backwards and leaves
752 point at the beginning of the match.
754 - Command: word-search-forward string &optional limit noerror repeat
755 This function searches forward from point for a "word" match for
756 STRING. If it finds a match, it sets point to the end of the
757 match found, and returns the new value of point.
759 Word matching regards STRING as a sequence of words, disregarding
760 punctuation that separates them. It searches the buffer for the
761 same sequence of words. Each word must be distinct in the buffer
762 (searching for the word `ball' does not match the word `balls'),
763 but the details of punctuation and spacing are ignored (searching
764 for `ball boy' does match `ball. Boy!').
766 In this example, point is initially at the beginning of the
767 buffer; the search leaves it between the `y' and the `!'.
769 ---------- Buffer: foo ----------
770 -!-He said "Please! Find
772 ---------- Buffer: foo ----------
774 (word-search-forward "Please find the ball, boy.")
777 ---------- Buffer: foo ----------
778 He said "Please! Find
780 ---------- Buffer: foo ----------
782 If LIMIT is non-`nil' (it must be a position in the current
783 buffer), then it is the upper bound to the search. The match
784 found must not extend after that position.
786 If NOERROR is `nil', then `word-search-forward' signals an error
787 if the search fails. If NOERROR is `t', then it returns `nil'
788 instead of signaling an error. If NOERROR is neither `nil' nor
789 `t', it moves point to LIMIT (or the end of the buffer) and
792 If REPEAT is non-`nil', then the search is repeated that many
793 times. Point is positioned at the end of the last match.
795 - Command: word-search-backward string &optional limit noerror repeat
796 This function searches backward from point for a word match to
797 STRING. This function is just like `word-search-forward' except
798 that it searches backward and normally leaves point at the
799 beginning of the match.
802 File: lispref.info, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching
807 A "regular expression" ("regexp", for short) is a pattern that
808 denotes a (possibly infinite) set of strings. Searching for matches for
809 a regexp is a very powerful operation. This section explains how to
810 write regexps; the following section says how to search for them.
812 To gain a thorough understanding of regular expressions and how to
813 use them to best advantage, we recommend that you study `Mastering
814 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
815 1997'. (It's known as the "Hip Owls" book, because of the picture on its
816 cover.) You might also read the manuals to *Note (gawk)Top::, *Note
817 (ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
818 (rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
821 The XEmacs regular expression syntax most closely resembles that of
822 `ed', or `grep', the GNU versions of which all utilize the GNU `regex'
823 library. XEmacs' version of `regex' has recently been extended with
824 some Perl-like capabilities, described in the next section.
828 * Syntax of Regexps:: Rules for writing regular expressions.
829 * Regexp Example:: Illustrates regular expression syntax.
832 File: lispref.info, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions
834 Syntax of Regular Expressions
835 -----------------------------
837 Regular expressions have a syntax in which a few characters are
838 special constructs and the rest are "ordinary". An ordinary character
839 is a simple regular expression that matches that character and nothing
840 else. The special characters are `.', `*', `+', `?', `[', `]', `^',
841 `$', and `\'; no new special characters will be defined in the future.
842 Any other character appearing in a regular expression is ordinary,
843 unless a `\' precedes it.
845 For example, `f' is not a special character, so it is ordinary, and
846 therefore `f' is a regular expression that matches the string `f' and
847 no other string. (It does _not_ match the string `ff'.) Likewise, `o'
848 is a regular expression that matches only `o'.
850 Any two regular expressions A and B can be concatenated. The result
851 is a regular expression that matches a string if A matches some amount
852 of the beginning of that string and B matches the rest of the string.
854 As a simple example, we can concatenate the regular expressions `f'
855 and `o' to get the regular expression `fo', which matches only the
856 string `fo'. Still trivial. To do something more powerful, you need
857 to use one of the special characters. Here is a list of them:
860 is a special character that matches any single character except a
861 newline. Using concatenation, we can make regular expressions
862 like `a.b', which matches any three-character string that begins
863 with `a' and ends with `b'.
866 is not a construct by itself; it is a quantifying suffix operator
867 that means to repeat the preceding regular expression as many
868 times as possible. In `fo*', the `*' applies to the `o', so `fo*'
869 matches one `f' followed by any number of `o's. The case of zero
870 `o's is allowed: `fo*' does match `f'.
872 `*' always applies to the _smallest_ possible preceding
873 expression. Thus, `fo*' has a repeating `o', not a repeating `fo'.
875 The matcher processes a `*' construct by matching, immediately, as
876 many repetitions as can be found; it is "greedy". Then it
877 continues with the rest of the pattern. If that fails,
878 backtracking occurs, discarding some of the matches of the
879 `*'-modified construct in case that makes it possible to match the
880 rest of the pattern. For example, in matching `ca*ar' against the
881 string `caaar', the `a*' first tries to match all three `a's; but
882 the rest of the pattern is `ar' and there is only `r' left to
883 match, so this try fails. The next alternative is for `a*' to
884 match only two `a's. With this choice, the rest of the regexp
885 matches successfully.
887 Nested repetition operators can be extremely slow if they specify
888 backtracking loops. For example, it could take hours for the
889 regular expression `\(x+y*\)*a' to match the sequence
890 `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'. The slowness is because
891 Emacs must try each imaginable way of grouping the 35 `x''s before
892 concluding that none of them can work. To make sure your regular
893 expressions run fast, check nested repetitions carefully.
896 is a quantifying suffix operator similar to `*' except that the
897 preceding expression must match at least once. It is also
898 "greedy". So, for example, `ca+r' matches the strings `car' and
899 `caaaar' but not the string `cr', whereas `ca*r' matches all three
903 is a quantifying suffix operator similar to `*', except that the
904 preceding expression can match either once or not at all. For
905 example, `ca?r' matches `car' or `cr', but does not match anything
909 works just like `*', except that rather than matching the longest
910 match, it matches the shortest match. `*?' is known as a
911 "non-greedy" quantifier, a regexp construct borrowed from Perl.
913 This construct very useful for when you want to match the text
914 inside a pair of delimiters. For instance, `/\*.*?\*/' will match
915 C comments in a string. This could not be achieved without the
916 use of greedy quantifier.
918 This construct has not been available prior to XEmacs 20.4. It is
919 not available in FSF Emacs.
922 is the `+' analog to `*?'.
925 serves as an interval quantifier, analogous to `*' or `+', but
926 specifies that the expression must match at least N times, but no
927 more than M times. This syntax is supported by most Unix regexp
928 utilities, and has been introduced to XEmacs for the version 20.3.
931 `[' begins a "character set", which is terminated by a `]'. In
932 the simplest case, the characters between the two brackets form
933 the set. Thus, `[ad]' matches either one `a' or one `d', and
934 `[ad]*' matches any string composed of just `a's and `d's
935 (including the empty string), from which it follows that `c[ad]*r'
936 matches `cr', `car', `cdr', `caddaar', etc.
938 The usual regular expression special characters are not special
939 inside a character set. A completely different set of special
940 characters exists inside character sets: `]', `-' and `^'.
942 `-' is used for ranges of characters. To write a range, write two
943 characters with a `-' between them. Thus, `[a-z]' matches any
944 lower case letter. Ranges may be intermixed freely with individual
945 characters, as in `[a-z$%.]', which matches any lower case letter
946 or `$', `%', or a period.
948 To include a `]' in a character set, make it the first character.
949 For example, `[]a]' matches `]' or `a'. To include a `-', write
950 `-' as the first character in the set, or put it immediately after
951 a range. (You can replace one individual character C with the
952 range `C-C' to make a place to put the `-'.) There is no way to
953 write a set containing just `-' and `]'.
955 To include `^' in a set, put it anywhere but at the beginning of
959 `[^' begins a "complement character set", which matches any
960 character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
961 all characters _except_ letters and digits.
963 `^' is not special in a character set unless it is the first
964 character. The character following the `^' is treated as if it
965 were first (thus, `-' and `]' are not special there).
967 Note that a complement character set can match a newline, unless
968 newline is mentioned as one of the characters not to match.
971 is a special character that matches the empty string, but only at
972 the beginning of a line in the text being matched. Otherwise it
973 fails to match anything. Thus, `^foo' matches a `foo' that occurs
974 at the beginning of a line.
976 When matching a string instead of a buffer, `^' matches at the
977 beginning of the string or after a newline character `\n'.
980 is similar to `^' but matches only at the end of a line. Thus,
981 `x+$' matches a string of one `x' or more at the end of a line.
983 When matching a string instead of a buffer, `$' matches at the end
984 of the string or before a newline character `\n'.
987 has two functions: it quotes the special characters (including
988 `\'), and it introduces additional special constructs.
990 Because `\' quotes special characters, `\$' is a regular
991 expression that matches only `$', and `\[' is a regular expression
992 that matches only `[', and so on.
994 Note that `\' also has special meaning in the read syntax of Lisp
995 strings (*note String Type::), and must be quoted with `\'. For
996 example, the regular expression that matches the `\' character is
997 `\\'. To write a Lisp string that contains the characters `\\',
998 Lisp syntax requires you to quote each `\' with another `\'.
999 Therefore, the read syntax for a regular expression matching `\'
1002 *Please note:* For historical compatibility, special characters are
1003 treated as ordinary ones if they are in contexts where their special
1004 meanings make no sense. For example, `*foo' treats `*' as ordinary
1005 since there is no preceding expression on which the `*' can act. It is
1006 poor practice to depend on this behavior; quote the special character
1007 anyway, regardless of where it appears.
1009 For the most part, `\' followed by any character matches only that
1010 character. However, there are several exceptions: characters that,
1011 when preceded by `\', are special constructs. Such characters are
1012 always ordinary when encountered on their own. Here is a table of `\'
1016 specifies an alternative. Two regular expressions A and B with
1017 `\|' in between form an expression that matches anything that
1018 either A or B matches.
1020 Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
1022 `\|' applies to the largest possible surrounding expressions.
1023 Only a surrounding `\( ... \)' grouping can limit the grouping
1026 Full backtracking capability exists to handle multiple uses of
1030 is a grouping construct that serves three purposes:
1032 1. To enclose a set of `\|' alternatives for other operations.
1033 Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
1035 2. To enclose an expression for a suffix operator such as `*' to
1036 act on. Thus, `ba\(na\)*' matches `bananana', etc., with any
1037 (zero or more) number of `na' strings.
1039 3. To record a matched substring for future reference.
1041 This last application is not a consequence of the idea of a
1042 parenthetical grouping; it is a separate feature that happens to be
1043 assigned as a second meaning to the same `\( ... \)' construct
1044 because there is no conflict in practice between the two meanings.
1045 Here is an explanation of this feature:
1048 matches the same text that matched the DIGITth occurrence of a `\(
1051 In other words, after the end of a `\( ... \)' construct. the
1052 matcher remembers the beginning and end of the text matched by that
1053 construct. Then, later on in the regular expression, you can use
1054 `\' followed by DIGIT to match that same text, whatever it may
1057 The strings matching the first nine `\( ... \)' constructs
1058 appearing in a regular expression are assigned numbers 1 through 9
1059 in the order that the open parentheses appear in the regular
1060 expression. So you can use `\1' through `\9' to refer to the text
1061 matched by the corresponding `\( ... \)' constructs.
1063 For example, `\(.*\)\1' matches any newline-free string that is
1064 composed of two identical halves. The `\(.*\)' matches the first
1065 half, which may be anything, but the `\1' that follows must match
1066 the same exact text.
1069 is called a "shy" grouping operator, and it is used just like `\(
1070 ... \)', except that it does not cause the matched substring to be
1071 recorded for future reference.
1073 This is useful when you need a lot of grouping `\( ... \)'
1074 constructs, but only want to remember one or two. Then you can use
1075 not want to remember them for later use with `match-string'.
1077 Using `\(?: ... \)' rather than `\( ... \)' when you don't need
1078 the captured substrings ought to speed up your programs some,
1079 since it shortens the code path followed by the regular expression
1080 engine, as well as the amount of memory allocation and string
1081 copying it must do. The actual performance gain to be observed
1082 has not been measured or quantified as of this writing.
1084 The shy grouping operator has been borrowed from Perl, and has not
1085 been available prior to XEmacs 20.3, nor is it available in FSF
1089 matches any word-constituent character. The editor syntax table
1090 determines which characters these are. *Note Syntax Tables::.
1093 matches any character that is not a word constituent.
1096 matches any character whose syntax is CODE. Here CODE is a
1097 character that represents a syntax code: thus, `w' for word
1098 constituent, `-' for whitespace, `(' for open parenthesis, etc.
1099 *Note Syntax Tables::, for a list of syntax codes and the
1100 characters that stand for them.
1103 matches any character whose syntax is not CODE.
1105 The following regular expression constructs match the empty
1106 string--that is, they don't use up any characters--but whether they
1107 match depends on the context.
1110 matches the empty string, but only at the beginning of the buffer
1111 or string being matched against.
1114 matches the empty string, but only at the end of the buffer or
1115 string being matched against.
1118 matches the empty string, but only at point. (This construct is
1119 not defined when matching against a string.)
1122 matches the empty string, but only at the beginning or end of a
1123 word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
1124 separate word. `\bballs?\b' matches `ball' or `balls' as a
1128 matches the empty string, but _not_ at the beginning or end of a
1132 matches the empty string, but only at the beginning of a word.
1135 matches the empty string, but only at the end of a word.
1137 Not every string is a valid regular expression. For example, a
1138 string with unbalanced square brackets is invalid (with a few
1139 exceptions, such as `[]]'), and so is a string that ends with a single
1140 `\'. If an invalid regular expression is passed to any of the search
1141 functions, an `invalid-regexp' error is signaled.
1143 - Function: regexp-quote string
1144 This function returns a regular expression string that matches
1145 exactly STRING and nothing else. This allows you to request an
1146 exact string match when calling a function that wants a regular
1149 (regexp-quote "^The cat$")
1152 One use of `regexp-quote' is to combine an exact string match with
1153 context described as a regular expression. For example, this
1154 searches for the string that is the value of `string', surrounded
1158 (concat "\\s-" (regexp-quote string) "\\s-"))