1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
4 INFO-DIR-SECTION XEmacs Editor
6 * Lispref: (lispref). XEmacs Lisp Reference Manual.
11 GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
21 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
22 Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc.
23 Copyright (C) 1995, 1996 Ben Wing.
25 Permission is granted to make and distribute verbatim copies of this
26 manual provided the copyright notice and this permission notice are
27 preserved on all copies.
29 Permission is granted to copy and distribute modified versions of
30 this manual under the conditions for verbatim copying, provided that the
31 entire resulting derived work is distributed under the terms of a
32 permission notice identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that this permission notice may be stated in a
37 translation approved by the Foundation.
39 Permission is granted to copy and distribute modified versions of
40 this manual under the conditions for verbatim copying, provided also
41 that the section entitled "GNU General Public License" is included
42 exactly as in the original, and provided that the entire resulting
43 derived work is distributed under the terms of a permission notice
44 identical to this one.
46 Permission is granted to copy and distribute translations of this
47 manual into another language, under the above conditions for modified
48 versions, except that the section entitled "GNU General Public License"
49 may be included in a translation approved by the Free Software
50 Foundation instead of in the original English.
53 File: lispref.info, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions
55 Complex Regexp Example
56 ----------------------
58 Here is a complicated regexp, used by XEmacs to recognize the end of
59 a sentence together with any whitespace that follows. It is the value
60 of the variable `sentence-end'.
62 First, we show the regexp as a string in Lisp syntax to distinguish
63 spaces from tab characters. The string constant begins and ends with a
64 double-quote. `\"' stands for a double-quote as part of the string,
65 `\\' for a backslash as part of the string, `\t' for a tab and `\n' for
68 "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
70 In contrast, if you evaluate the variable `sentence-end', you will
75 "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[
78 In this output, tab and newline appear as themselves.
80 This regular expression contains four parts in succession and can be
81 deciphered as follows:
84 The first part of the pattern is a character set that matches any
85 one of three characters: period, question mark, and exclamation
86 mark. The match must begin with one of these three characters.
89 The second part of the pattern matches any closing braces and
90 quotation marks, zero or more of them, that may follow the period,
91 question mark or exclamation mark. The `\"' is Lisp syntax for a
92 double-quote in a string. The `*' at the end indicates that the
93 immediately preceding regular expression (a character set, in this
94 case) may be repeated zero or more times.
96 `\\($\\| $\\|\t\\| \\)'
97 The third part of the pattern matches the whitespace that follows
98 the end of a sentence: the end of a line, or a tab, or two spaces.
99 The double backslashes mark the parentheses and vertical bars as
100 regular expression syntax; the parentheses delimit a group and the
101 vertical bars separate alternatives. The dollar sign is used to
102 match the end of a line.
105 Finally, the last part of the pattern matches any additional
106 whitespace beyond the minimum needed to end a sentence.
109 File: lispref.info, Node: Regexp Search, Next: POSIX Regexps, Prev: Regular Expressions, Up: Searching and Matching
111 Regular Expression Searching
112 ============================
114 In XEmacs, you can search for the next match for a regexp either
115 incrementally or not. Incremental search commands are described in the
116 `The XEmacs Reference Manual'. *Note Regular Expression Search:
117 (emacs)Regexp Search. Here we describe only the search functions
118 useful in programs. The principal one is `re-search-forward'.
120 - Command: re-search-forward regexp &optional limit noerror repeat
121 This function searches forward in the current buffer for a string
122 of text that is matched by the regular expression REGEXP. The
123 function skips over any amount of text that is not matched by
124 REGEXP, and leaves point at the end of the first match found. It
125 returns the new value of point.
127 If LIMIT is non-`nil' (it must be a position in the current
128 buffer), then it is the upper bound to the search. No match
129 extending after that position is accepted.
131 What happens when the search fails depends on the value of
132 NOERROR. If NOERROR is `nil', a `search-failed' error is
133 signaled. If NOERROR is `t', `re-search-forward' does nothing and
134 returns `nil'. If NOERROR is neither `nil' nor `t', then
135 `re-search-forward' moves point to LIMIT (or the end of the
136 buffer) and returns `nil'.
138 If REPEAT is supplied (it must be a positive number), then the
139 search is repeated that many times (each time starting at the end
140 of the previous time's match). If these successive searches
141 succeed, the function succeeds, moving point and returning its new
142 value. Otherwise the search fails.
144 In the following example, point is initially before the `T'.
145 Evaluating the search call moves point to the end of that line
146 (between the `t' of `hat' and the newline).
148 ---------- Buffer: foo ----------
149 I read "-!-The cat in the hat
151 ---------- Buffer: foo ----------
153 (re-search-forward "[a-z]+" nil t 5)
156 ---------- Buffer: foo ----------
157 I read "The cat in the hat-!-
159 ---------- Buffer: foo ----------
161 - Command: re-search-backward regexp &optional limit noerror repeat
162 This function searches backward in the current buffer for a string
163 of text that is matched by the regular expression REGEXP, leaving
164 point at the beginning of the first text found.
166 This function is analogous to `re-search-forward', but they are not
167 simple mirror images. `re-search-forward' finds the match whose
168 beginning is as close as possible to the starting point. If
169 `re-search-backward' were a perfect mirror image, it would find the
170 match whose end is as close as possible. However, in fact it
171 finds the match whose beginning is as close as possible. The
172 reason is that matching a regular expression at a given spot
173 always works from beginning to end, and starts at a specified
176 A true mirror-image of `re-search-forward' would require a special
177 feature for matching regexps from end to beginning. It's not
178 worth the trouble of implementing that.
180 - Function: string-match regexp string &optional start
181 This function returns the index of the start of the first match for
182 the regular expression REGEXP in STRING, or `nil' if there is no
183 match. If START is non-`nil', the search starts at that index in
189 "quick" "The quick brown fox jumped quickly.")
192 "quick" "The quick brown fox jumped quickly." 8)
195 The index of the first character of the string is 0, the index of
196 the second character is 1, and so on.
198 After this function returns, the index of the first character
199 beyond the match is available as `(match-end 0)'. *Note Match
203 "quick" "The quick brown fox jumped quickly." 8)
209 - Function: split-string string &optional pattern
210 This function splits STRING to substrings delimited by PATTERN,
211 and returns a list of substrings. If PATTERN is omitted, it
212 defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
215 (split-string "foo bar")
218 (split-string "something")
221 (split-string "a:b:c" ":")
224 (split-string ":a::b:c" ":")
225 => ("" "a" "" "b" "c")
227 - Function: split-path path
228 This function splits a search path into a list of strings. The
229 path components are separated with the characters specified with
230 `path-separator'. Under Unix, `path-separator' will normally be
231 `:', while under Windows, it will be `;'.
233 - Function: looking-at regexp
234 This function determines whether the text in the current buffer
235 directly following point matches the regular expression REGEXP.
236 "Directly following" means precisely that: the search is
237 "anchored" and it can succeed only starting with the first
238 character following point. The result is `t' if so, `nil'
241 This function does not move point, but it updates the match data,
242 which you can access using `match-beginning' and `match-end'.
245 In this example, point is located directly before the `T'. If it
246 were anywhere else, the result would be `nil'.
248 ---------- Buffer: foo ----------
249 I read "-!-The cat in the hat
251 ---------- Buffer: foo ----------
253 (looking-at "The cat in the hat$")
257 File: lispref.info, Node: POSIX Regexps, Next: Search and Replace, Prev: Regexp Search, Up: Searching and Matching
259 POSIX Regular Expression Searching
260 ==================================
262 The usual regular expression functions do backtracking when necessary
263 to handle the `\|' and repetition constructs, but they continue this
264 only until they find _some_ match. Then they succeed and report the
267 This section describes alternative search functions which perform the
268 full backtracking specified by the POSIX standard for regular expression
269 matching. They continue backtracking until they have tried all
270 possibilities and found all matches, so they can report the longest
271 match, as required by POSIX. This is much slower, so use these
272 functions only when you really need the longest match.
274 In Emacs versions prior to 19.29, these functions did not exist, and
275 the functions described above implemented full POSIX backtracking.
277 - Function: posix-search-forward regexp &optional limit noerror repeat
278 This is like `re-search-forward' except that it performs the full
279 backtracking specified by the POSIX standard for regular expression
282 - Function: posix-search-backward regexp &optional limit noerror repeat
283 This is like `re-search-backward' except that it performs the full
284 backtracking specified by the POSIX standard for regular expression
287 - Function: posix-looking-at regexp
288 This is like `looking-at' except that it performs the full
289 backtracking specified by the POSIX standard for regular expression
292 - Function: posix-string-match regexp string &optional start
293 This is like `string-match' except that it performs the full
294 backtracking specified by the POSIX standard for regular expression
298 File: lispref.info, Node: Search and Replace, Next: Match Data, Prev: POSIX Regexps, Up: Searching and Matching
303 - Function: perform-replace from-string replacements query-flag
304 regexp-flag delimited-flag &optional repeat-count map
305 This function is the guts of `query-replace' and related commands.
306 It searches for occurrences of FROM-STRING and replaces some or
307 all of them. If QUERY-FLAG is `nil', it replaces all occurrences;
308 otherwise, it asks the user what to do about each one.
310 If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
311 regular expression; otherwise, it must match literally. If
312 DELIMITED-FLAG is non-`nil', then only replacements surrounded by
313 word boundaries are considered.
315 The argument REPLACEMENTS specifies what to replace occurrences
316 with. If it is a string, that string is used. It can also be a
317 list of strings, to be used in cyclic order.
319 If REPEAT-COUNT is non-`nil', it should be an integer. Then it
320 specifies how many times to use each of the strings in the
321 REPLACEMENTS list before advancing cyclicly to the next one.
323 Normally, the keymap `query-replace-map' defines the possible user
324 responses for queries. The argument MAP, if non-`nil', is a
325 keymap to use instead of `query-replace-map'.
327 - Variable: query-replace-map
328 This variable holds a special keymap that defines the valid user
329 responses for `query-replace' and related functions, as well as
330 `y-or-n-p' and `map-y-or-n-p'. It is unusual in two ways:
332 * The "key bindings" are not commands, just symbols that are
333 meaningful to the functions that use this map.
335 * Prefix keys are not supported; each key binding must be for a
336 single event key sequence. This is because the functions
337 don't use read key sequence to get the input; instead, they
338 read a single event and look it up "by hand."
340 Here are the meaningful "bindings" for `query-replace-map'. Several
341 of them are meaningful only for `query-replace' and friends.
344 Do take the action being considered--in other words, "yes."
347 Do not take action for this question--in other words, "no."
350 Answer this question "no," and give up on the entire series of
351 questions, assuming that the answers will be "no."
354 Answer this question "yes," and give up on the entire series of
355 questions, assuming that subsequent answers will be "no."
358 Answer this question "yes," but show the results--don't advance yet
359 to the next question.
362 Answer this question and all subsequent questions in the series
363 with "yes," without further user interaction.
366 Move back to the previous place that a question was asked about.
369 Enter a recursive edit to deal with this question--instead of any
370 other action that would normally be taken.
373 Delete the text being considered, then enter a recursive edit to
377 Redisplay and center the window, then ask the same question again.
380 Perform a quit right away. Only `y-or-n-p' and related functions
384 Display some help, then ask again.
387 File: lispref.info, Node: Match Data, Next: Searching and Case, Prev: Search and Replace, Up: Searching and Matching
392 XEmacs keeps track of the positions of the start and end of segments
393 of text found during a regular expression search. This means, for
394 example, that you can search for a complex pattern, such as a date in
395 an Rmail message, and then extract parts of the match under control of
398 Because the match data normally describe the most recent search only,
399 you must be careful not to do another search inadvertently between the
400 search you wish to refer back to and the use of the match data. If you
401 can't avoid another intervening search, you must save and restore the
402 match data around it, to prevent it from being overwritten.
406 * Simple Match Data:: Accessing single items of match data,
407 such as where a particular subexpression started.
408 * Replacing Match:: Replacing a substring that was matched.
409 * Entire Match Data:: Accessing the entire match data at once, as a list.
410 * Saving Match Data:: Saving and restoring the match data.
413 File: lispref.info, Node: Simple Match Data, Next: Replacing Match, Up: Match Data
415 Simple Match Data Access
416 ------------------------
418 This section explains how to use the match data to find out what was
419 matched by the last search or match operation.
421 You can ask about the entire matching text, or about a particular
422 parenthetical subexpression of a regular expression. The COUNT
423 argument in the functions below specifies which. If COUNT is zero, you
424 are asking about the entire match. If COUNT is positive, it specifies
425 which subexpression you want.
427 Recall that the subexpressions of a regular expression are those
428 expressions grouped with escaped parentheses, `\(...\)'. The COUNTth
429 subexpression is found by counting occurrences of `\(' from the
430 beginning of the whole regular expression. The first subexpression is
431 numbered 1, the second 2, and so on. Only regular expressions can have
432 subexpressions--after a simple string search, the only information
433 available is about the entire match.
435 - Function: match-string count &optional in-string
436 This function returns, as a string, the text matched in the last
437 search or match operation. It returns the entire text if COUNT is
438 zero, or just the portion corresponding to the COUNTth
439 parenthetical subexpression, if COUNT is positive. If COUNT is
440 out of range, or if that subexpression didn't match anything, the
443 If the last such operation was done against a string with
444 `string-match', then you should pass the same string as the
445 argument IN-STRING. Otherwise, after a buffer search or match,
446 you should omit IN-STRING or pass `nil' for it; but you should
447 make sure that the current buffer when you call `match-string' is
448 the one in which you did the searching or matching.
450 - Function: match-beginning count
451 This function returns the position of the start of text matched by
452 the last regular expression searched for, or a subexpression of it.
454 If COUNT is zero, then the value is the position of the start of
455 the entire match. Otherwise, COUNT specifies a subexpression in
456 the regular expression, and the value of the function is the
457 starting position of the match for that subexpression.
459 The value is `nil' for a subexpression inside a `\|' alternative
460 that wasn't used in the match.
462 - Function: match-end count
463 This function is like `match-beginning' except that it returns the
464 position of the end of the match, rather than the position of the
467 Here is an example of using the match data, with a comment showing
468 the positions within the text:
470 (string-match "\\(qu\\)\\(ick\\)"
471 "The quick fox jumped quickly.")
475 (match-string 0 "The quick fox jumped quickly.")
477 (match-string 1 "The quick fox jumped quickly.")
479 (match-string 2 "The quick fox jumped quickly.")
482 (match-beginning 1) ; The beginning of the match
483 => 4 ; with `qu' is at index 4.
485 (match-beginning 2) ; The beginning of the match
486 => 6 ; with `ick' is at index 6.
488 (match-end 1) ; The end of the match
489 => 6 ; with `qu' is at index 6.
491 (match-end 2) ; The end of the match
492 => 9 ; with `ick' is at index 9.
494 Here is another example. Point is initially located at the beginning
495 of the line. Searching moves point to between the space and the word
496 `in'. The beginning of the entire match is at the 9th character of the
497 buffer (`T'), and the beginning of the match for the first
498 subexpression is at the 13th character (`c').
501 (re-search-forward "The \\(cat \\)")
506 ---------- Buffer: foo ----------
507 I read "The cat -!-in the hat comes back" twice.
510 ---------- Buffer: foo ----------
512 (In this case, the index returned is a buffer position; the first
513 character of the buffer counts as 1.)
516 File: lispref.info, Node: Replacing Match, Next: Entire Match Data, Prev: Simple Match Data, Up: Match Data
518 Replacing the Text That Matched
519 -------------------------------
521 This function replaces the text matched by the last search with
524 - Function: replace-match replacement &optional fixedcase literal
526 This function replaces the text in the buffer (or in STRING) that
527 was matched by the last search. It replaces that text with
530 If you did the last search in a buffer, you should specify `nil'
531 for STRING. Then `replace-match' does the replacement by editing
532 the buffer; it leaves point at the end of the replacement text,
535 If you did the search in a string, pass the same string as STRING.
536 Then `replace-match' does the replacement by constructing and
537 returning a new string.
539 If FIXEDCASE is non-`nil', then the case of the replacement text
540 is not changed; otherwise, the replacement text is converted to a
541 different case depending upon the capitalization of the text to be
542 replaced. If the original text is all upper case, the replacement
543 text is converted to upper case. If the first word of the
544 original text is capitalized, then the first word of the
545 replacement text is capitalized. If the original text contains
546 just one word, and that word is a capital letter, `replace-match'
547 considers this a capitalized first word rather than all upper case.
549 If `case-replace' is `nil', then case conversion is not done,
550 regardless of the value of FIXED-CASE. *Note Searching and Case::.
552 If LITERAL is non-`nil', then REPLACEMENT is inserted exactly as
553 it is, the only alterations being case changes as needed. If it
554 is `nil' (the default), then the character `\' is treated
555 specially. If a `\' appears in REPLACEMENT, then it must be part
556 of one of the following sequences:
559 `\&' stands for the entire text being replaced.
562 `\N', where N is a digit, stands for the text that matched
563 the Nth subexpression in the original regexp. Subexpressions
564 are those expressions grouped inside `\(...\)'.
567 `\\' stands for a single `\' in the replacement text.
570 File: lispref.info, Node: Entire Match Data, Next: Saving Match Data, Prev: Replacing Match, Up: Match Data
572 Accessing the Entire Match Data
573 -------------------------------
575 The functions `match-data' and `set-match-data' read or write the
576 entire match data, all at once.
578 - Function: match-data
579 This function returns a newly constructed list containing all the
580 information on what text the last search matched. Element zero is
581 the position of the beginning of the match for the whole
582 expression; element one is the position of the end of the match
583 for the expression. The next two elements are the positions of
584 the beginning and end of the match for the first subexpression,
585 and so on. In general, element number 2N corresponds to
586 `(match-beginning N)'; and element number 2N + 1 corresponds to
589 All the elements are markers or `nil' if matching was done on a
590 buffer, and all are integers or `nil' if matching was done on a
591 string with `string-match'. (In Emacs 18 and earlier versions,
592 markers were used even for matching on a string, except in the case
595 As always, there must be no possibility of intervening searches
596 between the call to a search function and the call to `match-data'
597 that is intended to access the match data for that search.
600 => (#<marker at 9 in foo>
601 #<marker at 17 in foo>
602 #<marker at 13 in foo>
603 #<marker at 17 in foo>)
605 - Function: set-match-data match-list
606 This function sets the match data from the elements of MATCH-LIST,
607 which should be a list that was the value of a previous call to
610 If MATCH-LIST refers to a buffer that doesn't exist, you don't get
611 an error; that sets the match data in a meaningless but harmless
614 `store-match-data' is an alias for `set-match-data'.
617 File: lispref.info, Node: Saving Match Data, Prev: Entire Match Data, Up: Match Data
619 Saving and Restoring the Match Data
620 -----------------------------------
622 When you call a function that may do a search, you may need to save
623 and restore the match data around that call, if you want to preserve the
624 match data from an earlier search for later use. Here is an example
625 that shows the problem that arises if you fail to save the match data:
627 (re-search-forward "The \\(cat \\)")
629 (foo) ; Perhaps `foo' does
632 => 61 ; Unexpected result--not 48!
634 You can save and restore the match data with `save-match-data':
636 - Macro: save-match-data body...
637 This special form executes BODY, saving and restoring the match
640 You can use `set-match-data' together with `match-data' to imitate
641 the effect of the special form `save-match-data'. This is useful for
642 writing code that can run in Emacs 18. Here is how:
644 (let ((data (match-data)))
646 ... ; May change the original match data.
647 (set-match-data data)))
649 Emacs automatically saves and restores the match data when it runs
650 process filter functions (*note Filter Functions::) and process
651 sentinels (*note Sentinels::).
654 File: lispref.info, Node: Searching and Case, Next: Standard Regexps, Prev: Match Data, Up: Searching and Matching
659 By default, searches in Emacs ignore the case of the text they are
660 searching through; if you specify searching for `FOO', then `Foo' or
661 `foo' is also considered a match. Regexps, and in particular character
662 sets, are included: thus, `[aB]' would match `a' or `A' or `b' or `B'.
664 If you do not want this feature, set the variable `case-fold-search'
665 to `nil'. Then all letters must match exactly, including case. This
666 is a buffer-local variable; altering the variable affects only the
667 current buffer. (*Note Intro to Buffer-Local::.) Alternatively, you
668 may change the value of `default-case-fold-search', which is the
669 default value of `case-fold-search' for buffers that do not override it.
671 Note that the user-level incremental search feature handles case
672 distinctions differently. When given a lower case letter, it looks for
673 a match of either case, but when given an upper case letter, it looks
674 for an upper case letter only. But this has nothing to do with the
675 searching functions Lisp functions use.
677 - User Option: case-replace
678 This variable determines whether the replacement functions should
679 preserve case. If the variable is `nil', that means to use the
680 replacement text verbatim. A non-`nil' value means to convert the
681 case of the replacement text according to the text being replaced.
683 The function `replace-match' is where this variable actually has
684 its effect. *Note Replacing Match::.
686 - User Option: case-fold-search
687 This buffer-local variable determines whether searches should
688 ignore case. If the variable is `nil' they do not ignore case;
689 otherwise they do ignore case.
691 - Variable: default-case-fold-search
692 The value of this variable is the default value for
693 `case-fold-search' in buffers that do not override it. This is the
694 same as `(default-value 'case-fold-search)'.
697 File: lispref.info, Node: Standard Regexps, Prev: Searching and Case, Up: Searching and Matching
699 Standard Regular Expressions Used in Editing
700 ============================================
702 This section describes some variables that hold regular expressions
703 used for certain purposes in editing:
705 - Variable: page-delimiter
706 This is the regexp describing line-beginnings that separate pages.
707 The default value is `"^\014"' (i.e., `"^^L"' or `"^\C-l"'); this
708 matches a line that starts with a formfeed character.
710 The following two regular expressions should _not_ assume the match
711 always starts at the beginning of a line; they should not use `^' to
712 anchor the match. Most often, the paragraph commands do check for a
713 match only at the beginning of a line, which means that `^' would be
714 superfluous. When there is a nonzero left margin, they accept matches
715 that start after the left margin. In that case, a `^' would be
716 incorrect. However, a `^' is harmless in modes where a left margin is
719 - Variable: paragraph-separate
720 This is the regular expression for recognizing the beginning of a
721 line that separates paragraphs. (If you change this, you may have
722 to change `paragraph-start' also.) The default value is
723 `"[ \t\f]*$"', which matches a line that consists entirely of
724 spaces, tabs, and form feeds (after its left margin).
726 - Variable: paragraph-start
727 This is the regular expression for recognizing the beginning of a
728 line that starts _or_ separates paragraphs. The default value is
729 `"[ \t\n\f]"', which matches a line starting with a space, tab,
730 newline, or form feed (after its left margin).
732 - Variable: sentence-end
733 This is the regular expression describing the end of a sentence.
734 (All paragraph boundaries also end sentences, regardless.) The
737 "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
739 This means a period, question mark or exclamation mark, followed
740 optionally by a closing parenthetical character, followed by tabs,
743 For a detailed explanation of this regular expression, see *Note
747 File: lispref.info, Node: Syntax Tables, Next: Abbrevs, Prev: Searching and Matching, Up: Top
752 A "syntax table" specifies the syntactic textual function of each
753 character. This information is used by the parsing commands, the
754 complex movement commands, and others to determine where words, symbols,
755 and other syntactic constructs begin and end. The current syntax table
756 controls the meaning of the word motion functions (*note Word Motion::)
757 and the list motion functions (*note List Motion::) as well as the
758 functions in this chapter.
762 * Basics: Syntax Basics. Basic concepts of syntax tables.
763 * Desc: Syntax Descriptors. How characters are classified.
764 * Syntax Table Functions:: How to create, examine and alter syntax tables.
765 * Motion and Syntax:: Moving over characters with certain syntaxes.
766 * Parsing Expressions:: Parsing balanced expressions
767 using the syntax table.
768 * Standard Syntax Tables:: Syntax tables used by various major modes.
769 * Syntax Table Internals:: How syntax table information is stored.
772 File: lispref.info, Node: Syntax Basics, Next: Syntax Descriptors, Up: Syntax Tables
774 Syntax Table Concepts
775 =====================
777 A "syntax table" provides Emacs with the information that determines
778 the syntactic use of each character in a buffer. This information is
779 used by the parsing commands, the complex movement commands, and others
780 to determine where words, symbols, and other syntactic constructs begin
781 and end. The current syntax table controls the meaning of the word
782 motion functions (*note Word Motion::) and the list motion functions
783 (*note List Motion::) as well as the functions in this chapter.
785 Under XEmacs 20, a syntax table is a particular subtype of the
786 primitive char table type (*note Char Tables::), and each element of the
787 char table is an integer that encodes the syntax of the character in
788 question, or a cons of such an integer and a matching character (for
789 characters with parenthesis syntax).
791 Under XEmacs 19, a syntax table is a vector of 256 elements; it
792 contains one entry for each of the 256 possible characters in an 8-bit
793 byte. Each element is an integer that encodes the syntax of the
794 character in question. (The matching character, if any, is embedded in
795 the bits of this integer.)
797 Syntax tables are used only for moving across text, not for the Emacs
798 Lisp reader. XEmacs Lisp uses built-in syntactic rules when reading
799 Lisp expressions, and these rules cannot be changed.
801 Each buffer has its own major mode, and each major mode has its own
802 idea of the syntactic class of various characters. For example, in Lisp
803 mode, the character `;' begins a comment, but in C mode, it terminates
804 a statement. To support these variations, XEmacs makes the choice of
805 syntax table local to each buffer. Typically, each major mode has its
806 own syntax table and installs that table in each buffer that uses that
807 mode. Changing this table alters the syntax in all those buffers as
808 well as in any buffers subsequently put in that mode. Occasionally
809 several similar modes share one syntax table. *Note Example Major
810 Modes::, for an example of how to set up a syntax table.
812 A syntax table can inherit the data for some characters from the
813 standard syntax table, while specifying other characters itself. The
814 "inherit" syntax class means "inherit this character's syntax from the
815 standard syntax table." Most major modes' syntax tables inherit the
816 syntax of character codes 0 through 31 and 128 through 255. This is
817 useful with character sets such as ISO Latin-1 that have additional
818 alphabetic characters in the range 128 to 255. Just changing the
819 standard syntax for these characters affects all major modes.
821 - Function: syntax-table-p object
822 This function returns `t' if OBJECT is a vector of length 256
823 elements. This means that the vector may be a syntax table.
824 However, according to this test, any vector of length 256 is
825 considered to be a syntax table, no matter what its contents.
828 File: lispref.info, Node: Syntax Descriptors, Next: Syntax Table Functions, Prev: Syntax Basics, Up: Syntax Tables
833 This section describes the syntax classes and flags that denote the
834 syntax of a character, and how they are represented as a "syntax
835 descriptor", which is a Lisp string that you pass to
836 `modify-syntax-entry' to specify the desired syntax.
838 XEmacs defines a number of "syntax classes". Each syntax table puts
839 each character into one class. There is no necessary relationship
840 between the class of a character in one syntax table and its class in
843 Each class is designated by a mnemonic character, which serves as the
844 name of the class when you need to specify a class. Usually the
845 designator character is one that is frequently in that class; however,
846 its meaning as a designator is unvarying and independent of what syntax
847 that character currently has.
849 A syntax descriptor is a Lisp string that specifies a syntax class, a
850 matching character (used only for the parenthesis classes) and flags.
851 The first character is the designator for a syntax class. The second
852 character is the character to match; if it is unused, put a space there.
853 Then come the characters for any desired flags. If no matching
854 character or flags are needed, one character is sufficient.
856 For example, the descriptor for the character `*' in C mode is
857 `. 23' (i.e., punctuation, matching character slot unused, second
858 character of a comment-starter, first character of an comment-ender),
859 and the entry for `/' is `. 14' (i.e., punctuation, matching character
860 slot unused, first character of a comment-starter, second character of
865 * Syntax Class Table:: Table of syntax classes.
866 * Syntax Flags:: Additional flags each character can have.
869 File: lispref.info, Node: Syntax Class Table, Next: Syntax Flags, Up: Syntax Descriptors
871 Table of Syntax Classes
872 -----------------------
874 Here is a table of syntax classes, the characters that stand for
875 them, their meanings, and examples of their use.
877 - Syntax class: whitespace character
878 "Whitespace characters" (designated with ` ' or `-') separate
879 symbols and words from each other. Typically, whitespace
880 characters have no other syntactic significance, and multiple
881 whitespace characters are syntactically equivalent to a single
882 one. Space, tab, newline and formfeed are almost always
883 classified as whitespace.
885 - Syntax class: word constituent
886 "Word constituents" (designated with `w') are parts of normal
887 English words and are typically used in variable and command names
888 in programs. All upper- and lower-case letters, and the digits,
889 are typically word constituents.
891 - Syntax class: symbol constituent
892 "Symbol constituents" (designated with `_') are the extra
893 characters that are used in variable and command names along with
894 word constituents. For example, the symbol constituents class is
895 used in Lisp mode to indicate that certain characters may be part
896 of symbol names even though they are not part of English words.
897 These characters are `$&*+-_<>'. In standard C, the only
898 non-word-constituent character that is valid in symbols is
901 - Syntax class: punctuation character
902 "Punctuation characters" (`.') are those characters that are used
903 as punctuation in English, or are used in some way in a programming
904 language to separate symbols from one another. Most programming
905 language modes, including Emacs Lisp mode, have no characters in
906 this class since the few characters that are not symbol or word
907 constituents all have other uses.
909 - Syntax class: open parenthesis character
910 - Syntax class: close parenthesis character
911 Open and close "parenthesis characters" are characters used in
912 dissimilar pairs to surround sentences or expressions. Such a
913 grouping is begun with an open parenthesis character and
914 terminated with a close. Each open parenthesis character matches
915 a particular close parenthesis character, and vice versa.
916 Normally, XEmacs indicates momentarily the matching open
917 parenthesis when you insert a close parenthesis. *Note Blinking::.
919 The class of open parentheses is designated with `(', and that of
920 close parentheses with `)'.
922 In English text, and in C code, the parenthesis pairs are `()',
923 `[]', and `{}'. In XEmacs Lisp, the delimiters for lists and
924 vectors (`()' and `[]') are classified as parenthesis characters.
926 - Syntax class: string quote
927 "String quote characters" (designated with `"') are used in many
928 languages, including Lisp and C, to delimit string constants. The
929 same string quote character appears at the beginning and the end
930 of a string. Such quoted strings do not nest.
932 The parsing facilities of XEmacs consider a string as a single
933 token. The usual syntactic meanings of the characters in the
934 string are suppressed.
936 The Lisp modes have two string quote characters: double-quote (`"')
937 and vertical bar (`|'). `|' is not used in XEmacs Lisp, but it is
938 used in Common Lisp. C also has two string quote characters:
939 double-quote for strings, and single-quote (`'') for character
942 English text has no string quote characters because English is not
943 a programming language. Although quotation marks are used in
944 English, we do not want them to turn off the usual syntactic
945 properties of other characters in the quotation.
947 - Syntax class: escape
948 An "escape character" (designated with `\') starts an escape
949 sequence such as is used in C string and character constants. The
950 character `\' belongs to this class in both C and Lisp. (In C, it
951 is used thus only inside strings, but it turns out to cause no
952 trouble to treat it this way throughout C code.)
954 Characters in this class count as part of words if
955 `words-include-escapes' is non-`nil'. *Note Word Motion::.
957 - Syntax class: character quote
958 A "character quote character" (designated with `/') quotes the
959 following character so that it loses its normal syntactic meaning.
960 This differs from an escape character in that only the character
961 immediately following is ever affected.
963 Characters in this class count as part of words if
964 `words-include-escapes' is non-`nil'. *Note Word Motion::.
966 This class is used for backslash in TeX mode.
968 - Syntax class: paired delimiter
969 "Paired delimiter characters" (designated with `$') are like
970 string quote characters except that the syntactic properties of the
971 characters between the delimiters are not suppressed. Only TeX
972 mode uses a paired delimiter presently--the `$' that both enters
973 and leaves math mode.
975 - Syntax class: expression prefix
976 An "expression prefix operator" (designated with `'') is used for
977 syntactic operators that are part of an expression if they appear
978 next to one. These characters in Lisp include the apostrophe, `''
979 (used for quoting), the comma, `,' (used in macros), and `#' (used
980 in the read syntax for certain data types).
982 - Syntax class: comment starter
983 - Syntax class: comment ender
984 The "comment starter" and "comment ender" characters are used in
985 various languages to delimit comments. These classes are
986 designated with `<' and `>', respectively.
988 English text has no comment characters. In Lisp, the semicolon
989 (`;') starts a comment and a newline or formfeed ends one.
991 - Syntax class: inherit
992 This syntax class does not specify a syntax. It says to look in
993 the standard syntax table to find the syntax of this character.
994 The designator for this syntax code is `@'.
997 File: lispref.info, Node: Syntax Flags, Prev: Syntax Class Table, Up: Syntax Descriptors
1002 In addition to the classes, entries for characters in a syntax table
1003 can include flags. There are six possible flags, represented by the
1004 characters `1', `2', `3', `4', `b' and `p'.
1006 All the flags except `p' are used to describe multi-character
1007 comment delimiters. The digit flags indicate that a character can
1008 _also_ be part of a comment sequence, in addition to the syntactic
1009 properties associated with its character class. The flags are
1010 independent of the class and each other for the sake of characters such
1011 as `*' in C mode, which is a punctuation character, _and_ the second
1012 character of a start-of-comment sequence (`/*'), _and_ the first
1013 character of an end-of-comment sequence (`*/').
1015 The flags for a character C are:
1017 * `1' means C is the start of a two-character comment-start sequence.
1019 * `2' means C is the second character of such a sequence.
1021 * `3' means C is the start of a two-character comment-end sequence.
1023 * `4' means C is the second character of such a sequence.
1025 * `b' means that C as a comment delimiter belongs to the alternative
1028 Emacs supports two comment styles simultaneously in any one syntax
1029 table. This is for the sake of C++. Each style of comment syntax
1030 has its own comment-start sequence and its own comment-end
1031 sequence. Each comment must stick to one style or the other;
1032 thus, if it starts with the comment-start sequence of style "b",
1033 it must also end with the comment-end sequence of style "b".
1035 The two comment-start sequences must begin with the same
1036 character; only the second character may differ. Mark the second
1037 character of the "b"-style comment-start sequence with the `b'
1040 A comment-end sequence (one or two characters) applies to the "b"
1041 style if its first character has the `b' flag set; otherwise, it
1042 applies to the "a" style.
1044 The appropriate comment syntax settings for C++ are as follows:
1055 This defines four comment-delimiting sequences:
1058 This is a comment-start sequence for "a" style because the
1059 second character, `*', does not have the `b' flag.
1062 This is a comment-start sequence for "b" style because the
1063 second character, `/', does have the `b' flag.
1066 This is a comment-end sequence for "a" style because the first
1067 character, `*', does not have the `b' flag
1070 This is a comment-end sequence for "b" style, because the
1071 newline character has the `b' flag.
1073 * `p' identifies an additional "prefix character" for Lisp syntax.
1074 These characters are treated as whitespace when they appear between
1075 expressions. When they appear within an expression, they are
1076 handled according to their usual syntax codes.
1078 The function `backward-prefix-chars' moves back over these
1079 characters, as well as over characters whose primary syntax class
1080 is prefix (`''). *Note Motion and Syntax::.
1083 File: lispref.info, Node: Syntax Table Functions, Next: Motion and Syntax, Prev: Syntax Descriptors, Up: Syntax Tables
1085 Syntax Table Functions
1086 ======================
1088 In this section we describe functions for creating, accessing and
1089 altering syntax tables.
1091 - Function: make-syntax-table &optional table
1092 This function creates a new syntax table. Character codes 0
1093 through 31 and 128 through 255 are set up to inherit from the
1094 standard syntax table. The other character codes are set up by
1095 copying what the standard syntax table says about them.
1097 Most major mode syntax tables are created in this way.
1099 - Function: copy-syntax-table &optional table
1100 This function constructs a copy of TABLE and returns it. If TABLE
1101 is not supplied (or is `nil'), it returns a copy of the current
1102 syntax table. Otherwise, an error is signaled if TABLE is not a
1105 - Command: modify-syntax-entry char syntax-descriptor &optional table
1106 This function sets the syntax entry for CHAR according to
1107 SYNTAX-DESCRIPTOR. The syntax is changed only for TABLE, which
1108 defaults to the current buffer's syntax table, and not in any
1109 other syntax table. The argument SYNTAX-DESCRIPTOR specifies the
1110 desired syntax; this is a string beginning with a class designator
1111 character, and optionally containing a matching character and
1112 flags as well. *Note Syntax Descriptors::.
1114 This function always returns `nil'. The old syntax information in
1115 the table for this character is discarded.
1117 An error is signaled if the first character of the syntax
1118 descriptor is not one of the twelve syntax class designator
1119 characters. An error is also signaled if CHAR is not a character.
1123 ;; Put the space character in class whitespace.
1124 (modify-syntax-entry ?\ " ")
1127 ;; Make `$' an open parenthesis character,
1128 ;; with `^' as its matching close.
1129 (modify-syntax-entry ?$ "(^")
1132 ;; Make `^' a close parenthesis character,
1133 ;; with `$' as its matching open.
1134 (modify-syntax-entry ?^ ")$")
1137 ;; Make `/' a punctuation character,
1138 ;; the first character of a start-comment sequence,
1139 ;; and the second character of an end-comment sequence.
1140 ;; This is used in C mode.
1141 (modify-syntax-entry ?/ ". 14")
1144 - Function: char-syntax character
1145 This function returns the syntax class of CHARACTER, represented
1146 by its mnemonic designator character. This _only_ returns the
1147 class, not any matching parenthesis or flags.
1149 An error is signaled if CHAR is not a character.
1151 The following examples apply to C mode. The first example shows
1152 that the syntax class of space is whitespace (represented by a
1153 space). The second example shows that the syntax of `/' is
1154 punctuation. This does not show the fact that it is also part of
1155 comment-start and -end sequences. The third example shows that
1156 open parenthesis is in the class of open parentheses. This does
1157 not show the fact that it has a matching character, `)'.
1159 (char-to-string (char-syntax ?\ ))
1162 (char-to-string (char-syntax ?/))
1165 (char-to-string (char-syntax ?\())
1168 - Function: set-syntax-table table &optional buffer
1169 This function makes TABLE the syntax table for BUFFER, which
1170 defaults to the current buffer if omitted. It returns TABLE.
1172 - Function: syntax-table &optional buffer
1173 This function returns the syntax table for BUFFER, which defaults
1174 to the current buffer if omitted.
1177 File: lispref.info, Node: Motion and Syntax, Next: Parsing Expressions, Prev: Syntax Table Functions, Up: Syntax Tables
1182 This section describes functions for moving across characters in
1183 certain syntax classes. None of these functions exists in Emacs
1184 version 18 or earlier.
1186 - Function: skip-syntax-forward syntaxes &optional limit buffer
1187 This function moves point forward across characters having syntax
1188 classes mentioned in SYNTAXES. It stops when it encounters the
1189 end of the buffer, or position LIMIT (if specified), or a
1190 character it is not supposed to skip. Optional argument BUFFER
1191 defaults to the current buffer if omitted.
1193 - Function: skip-syntax-backward syntaxes &optional limit buffer
1194 This function moves point backward across characters whose syntax
1195 classes are mentioned in SYNTAXES. It stops when it encounters
1196 the beginning of the buffer, or position LIMIT (if specified), or a
1197 character it is not supposed to skip. Optional argument BUFFER
1198 defaults to the current buffer if omitted.
1201 - Function: backward-prefix-chars &optional buffer
1202 This function moves point backward over any number of characters
1203 with expression prefix syntax. This includes both characters in
1204 the expression prefix syntax class, and characters with the `p'
1205 flag. Optional argument BUFFER defaults to the current buffer if