-This is Info file ../info/lispref.info, produced by Makeinfo version
-1.68 from the input file lispref/lispref.texi.
+This is ../info/lispref.info, produced by makeinfo version 4.0 from
+lispref/lispref.texi.
INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
Foundation instead of in the original English.
\1f
-File: lispref.info, Node: Entire Match Data, Next: Saving Match Data, Prev: Replacing Match, Up: Match Data
-
-Accessing the Entire Match Data
--------------------------------
-
- The functions `match-data' and `set-match-data' read or write the
-entire match data, all at once.
-
- - Function: match-data
- This function returns a newly constructed list containing all the
- information on what text the last search matched. Element zero is
- the position of the beginning of the match for the whole
- expression; element one is the position of the end of the match
- for the expression. The next two elements are the positions of
- the beginning and end of the match for the first subexpression,
- and so on. In general, element number 2N corresponds to
- `(match-beginning N)'; and element number 2N + 1 corresponds to
- `(match-end N)'.
-
- All the elements are markers or `nil' if matching was done on a
- buffer, and all are integers or `nil' if matching was done on a
- string with `string-match'. (In Emacs 18 and earlier versions,
- markers were used even for matching on a string, except in the case
- of the integer 0.)
-
- As always, there must be no possibility of intervening searches
- between the call to a search function and the call to `match-data'
- that is intended to access the match data for that search.
-
- (match-data)
- => (#<marker at 9 in foo>
- #<marker at 17 in foo>
- #<marker at 13 in foo>
- #<marker at 17 in foo>)
-
- - Function: set-match-data MATCH-LIST
- This function sets the match data from the elements of MATCH-LIST,
- which should be a list that was the value of a previous call to
- `match-data'.
-
- If MATCH-LIST refers to a buffer that doesn't exist, you don't get
- an error; that sets the match data in a meaningless but harmless
- way.
-
- `store-match-data' is an alias for `set-match-data'.
+File: lispref.info, Node: Change Hooks, Next: Transformations, Prev: Transposition, Up: Text
+
+Change Hooks
+============
+
+ These hook variables let you arrange to take notice of all changes in
+all buffers (or in a particular buffer, if you make them buffer-local).
+
+ The functions you use in these hooks should save and restore the
+match data if they do anything that uses regular expressions;
+otherwise, they will interfere in bizarre ways with the editing
+operations that call them.
+
+ Buffer changes made while executing the following hooks don't
+themselves cause any change hooks to be invoked.
+
+ - Variable: before-change-functions
+ This variable holds a list of a functions to call before any buffer
+ modification. Each function gets two arguments, the beginning and
+ end of the region that is about to change, represented as
+ integers. The buffer that is about to change is always the
+ current buffer.
+
+ - Variable: after-change-functions
+ This variable holds a list of a functions to call after any buffer
+ modification. Each function receives three arguments: the
+ beginning and end of the region just changed, and the length of
+ the text that existed before the change. (To get the current
+ length, subtract the region beginning from the region end.) All
+ three arguments are integers. The buffer that's about to change
+ is always the current buffer.
+
+ - Variable: before-change-function
+ This obsolete variable holds one function to call before any buffer
+ modification (or `nil' for no function). It is called just like
+ the functions in `before-change-functions'.
+
+ - Variable: after-change-function
+ This obsolete variable holds one function to call after any buffer
+ modification (or `nil' for no function). It is called just like
+ the functions in `after-change-functions'.
+
+ - Variable: first-change-hook
+ This variable is a normal hook that is run whenever a buffer is
+ changed that was previously in the unmodified state.
\1f
-File: lispref.info, Node: Saving Match Data, Prev: Entire Match Data, Up: Match Data
+File: lispref.info, Node: Transformations, Prev: Change Hooks, Up: Text
-Saving and Restoring the Match Data
------------------------------------
+Textual transformations--MD5 and base64 support
+===============================================
- When you call a function that may do a search, you may need to save
-and restore the match data around that call, if you want to preserve the
-match data from an earlier search for later use. Here is an example
-that shows the problem that arises if you fail to save the match data:
+ Some textual operations inherently require examining each character
+in turn, and performing arithmetic operations on them. Such operations
+can, of course, be implemented in Emacs Lisp, but tend to be very slow
+for large portions of text or data. This is why some of them are
+implemented in C, with an appropriate interface for Lisp programmers.
+Examples of algorithms thus provided are MD5 and base64 support.
- (re-search-forward "The \\(cat \\)")
- => 48
- (foo) ; Perhaps `foo' does
- ; more searching.
- (match-end 0)
- => 61 ; Unexpected result--not 48!
+ MD5 is an algorithm for calculating message digests, as described in
+rfc1321. Given a message of arbitrary length, MD5 produces an 128-bit
+"fingerprint" ("message digest") corresponding to that message. It is
+considered computationally infeasible to produce two messages having
+the same MD5 digest, or to produce a message having a prespecified
+target digest. MD5 is used heavily by various authentication schemes.
- You can save and restore the match data with `save-match-data':
+ Emacs Lisp interface to MD5 consists of a single function `md5':
- - Macro: save-match-data BODY...
- This special form executes BODY, saving and restoring the match
- data around it.
+ - Function: md5 object &optional start end coding noerror
+ This function returns the MD5 message digest of OBJECT, a buffer
+ or string.
- You can use `set-match-data' together with `match-data' to imitate
-the effect of the special form `save-match-data'. This is useful for
-writing code that can run in Emacs 18. Here is how:
+ Optional arguments START and END denote positions for computing
+ the digest of a portion of OBJECT.
- (let ((data (match-data)))
- (unwind-protect
- ... ; May change the original match data.
- (set-match-data data)))
+ The optional CODING argument specifies the coding system the text
+ is to be represented in while computing the digest. If
+ unspecified, it defaults to the current format of the data, or is
+ guessed.
- Emacs automatically saves and restores the match data when it runs
-process filter functions (*note Filter Functions::.) and process
-sentinels (*note Sentinels::.).
+ If NOERROR is non-`nil', silently assume binary coding if the
+ guesswork fails. Normally, an error is signaled in such case.
-\1f
-File: lispref.info, Node: Searching and Case, Next: Standard Regexps, Prev: Match Data, Up: Searching and Matching
-
-Searching and Case
-==================
-
- By default, searches in Emacs ignore the case of the text they are
-searching through; if you specify searching for `FOO', then `Foo' or
-`foo' is also considered a match. Regexps, and in particular character
-sets, are included: thus, `[aB]' would match `a' or `A' or `b' or `B'.
-
- If you do not want this feature, set the variable `case-fold-search'
-to `nil'. Then all letters must match exactly, including case. This
-is a buffer-local variable; altering the variable affects only the
-current buffer. (*Note Intro to Buffer-Local::.) Alternatively, you
-may change the value of `default-case-fold-search', which is the
-default value of `case-fold-search' for buffers that do not override it.
-
- Note that the user-level incremental search feature handles case
-distinctions differently. When given a lower case letter, it looks for
-a match of either case, but when given an upper case letter, it looks
-for an upper case letter only. But this has nothing to do with the
-searching functions Lisp functions use.
-
- - User Option: case-replace
- This variable determines whether the replacement functions should
- preserve case. If the variable is `nil', that means to use the
- replacement text verbatim. A non-`nil' value means to convert the
- case of the replacement text according to the text being replaced.
-
- The function `replace-match' is where this variable actually has
- its effect. *Note Replacing Match::.
-
- - User Option: case-fold-search
- This buffer-local variable determines whether searches should
- ignore case. If the variable is `nil' they do not ignore case;
- otherwise they do ignore case.
-
- - Variable: default-case-fold-search
- The value of this variable is the default value for
- `case-fold-search' in buffers that do not override it. This is the
- same as `(default-value 'case-fold-search)'.
-
-\1f
-File: lispref.info, Node: Standard Regexps, Prev: Searching and Case, Up: Searching and Matching
-
-Standard Regular Expressions Used in Editing
-============================================
-
- This section describes some variables that hold regular expressions
-used for certain purposes in editing:
-
- - Variable: page-delimiter
- This is the regexp describing line-beginnings that separate pages.
- The default value is `"^\014"' (i.e., `"^^L"' or `"^\C-l"'); this
- matches a line that starts with a formfeed character.
-
- The following two regular expressions should *not* assume the match
-always starts at the beginning of a line; they should not use `^' to
-anchor the match. Most often, the paragraph commands do check for a
-match only at the beginning of a line, which means that `^' would be
-superfluous. When there is a nonzero left margin, they accept matches
-that start after the left margin. In that case, a `^' would be
-incorrect. However, a `^' is harmless in modes where a left margin is
-never used.
-
- - Variable: paragraph-separate
- This is the regular expression for recognizing the beginning of a
- line that separates paragraphs. (If you change this, you may have
- to change `paragraph-start' also.) The default value is
- `"[ \t\f]*$"', which matches a line that consists entirely of
- spaces, tabs, and form feeds (after its left margin).
-
- - Variable: paragraph-start
- This is the regular expression for recognizing the beginning of a
- line that starts *or* separates paragraphs. The default value is
- `"[ \t\n\f]"', which matches a line starting with a space, tab,
- newline, or form feed (after its left margin).
-
- - Variable: sentence-end
- This is the regular expression describing the end of a sentence.
- (All paragraph boundaries also end sentences, regardless.) The
- default value is:
-
- "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
-
- This means a period, question mark or exclamation mark, followed
- optionally by a closing parenthetical character, followed by tabs,
- spaces or new lines.
-
- For a detailed explanation of this regular expression, see *Note
- Regexp Example::.
-
-\1f
-File: lispref.info, Node: Syntax Tables, Next: Abbrevs, Prev: Searching and Matching, Up: Top
-
-Syntax Tables
-*************
-
- A "syntax table" specifies the syntactic textual function of each
-character. This information is used by the parsing commands, the
-complex movement commands, and others to determine where words, symbols,
-and other syntactic constructs begin and end. The current syntax table
-controls the meaning of the word motion functions (*note Word Motion::.)
-and the list motion functions (*note List Motion::.) as well as the
-functions in this chapter.
-
-* Menu:
-
-* Basics: Syntax Basics. Basic concepts of syntax tables.
-* Desc: Syntax Descriptors. How characters are classified.
-* Syntax Table Functions:: How to create, examine and alter syntax tables.
-* Motion and Syntax:: Moving over characters with certain syntaxes.
-* Parsing Expressions:: Parsing balanced expressions
- using the syntax table.
-* Standard Syntax Tables:: Syntax tables used by various major modes.
-* Syntax Table Internals:: How syntax table information is stored.
-
-\1f
-File: lispref.info, Node: Syntax Basics, Next: Syntax Descriptors, Up: Syntax Tables
-
-Syntax Table Concepts
-=====================
-
- A "syntax table" provides Emacs with the information that determines
-the syntactic use of each character in a buffer. This information is
-used by the parsing commands, the complex movement commands, and others
-to determine where words, symbols, and other syntactic constructs begin
-and end. The current syntax table controls the meaning of the word
-motion functions (*note Word Motion::.) and the list motion functions
-(*note List Motion::.) as well as the functions in this chapter.
-
- Under XEmacs 20, a syntax table is a particular subtype of the
-primitive char table type (*note Char Tables::.), and each element of
-the char table is an integer that encodes the syntax of the character in
-question, or a cons of such an integer and a matching character (for
-characters with parenthesis syntax).
-
- Under XEmacs 19, a syntax table is a vector of 256 elements; it
-contains one entry for each of the 256 possible characters in an 8-bit
-byte. Each element is an integer that encodes the syntax of the
-character in question. (The matching character, if any, is embedded in
-the bits of this integer.)
-
- Syntax tables are used only for moving across text, not for the Emacs
-Lisp reader. XEmacs Lisp uses built-in syntactic rules when reading
-Lisp expressions, and these rules cannot be changed.
-
- Each buffer has its own major mode, and each major mode has its own
-idea of the syntactic class of various characters. For example, in Lisp
-mode, the character `;' begins a comment, but in C mode, it terminates
-a statement. To support these variations, XEmacs makes the choice of
-syntax table local to each buffer. Typically, each major mode has its
-own syntax table and installs that table in each buffer that uses that
-mode. Changing this table alters the syntax in all those buffers as
-well as in any buffers subsequently put in that mode. Occasionally
-several similar modes share one syntax table. *Note Example Major
-Modes::, for an example of how to set up a syntax table.
-
- A syntax table can inherit the data for some characters from the
-standard syntax table, while specifying other characters itself. The
-"inherit" syntax class means "inherit this character's syntax from the
-standard syntax table." Most major modes' syntax tables inherit the
-syntax of character codes 0 through 31 and 128 through 255. This is
-useful with character sets such as ISO Latin-1 that have additional
-alphabetic characters in the range 128 to 255. Just changing the
-standard syntax for these characters affects all major modes.
-
- - Function: syntax-table-p OBJECT
- This function returns `t' if OBJECT is a vector of length 256
- elements. This means that the vector may be a syntax table.
- However, according to this test, any vector of length 256 is
- considered to be a syntax table, no matter what its contents.
-
-\1f
-File: lispref.info, Node: Syntax Descriptors, Next: Syntax Table Functions, Prev: Syntax Basics, Up: Syntax Tables
-
-Syntax Descriptors
-==================
-
- This section describes the syntax classes and flags that denote the
-syntax of a character, and how they are represented as a "syntax
-descriptor", which is a Lisp string that you pass to
-`modify-syntax-entry' to specify the desired syntax.
-
- XEmacs defines a number of "syntax classes". Each syntax table puts
-each character into one class. There is no necessary relationship
-between the class of a character in one syntax table and its class in
-any other table.
-
- Each class is designated by a mnemonic character, which serves as the
-name of the class when you need to specify a class. Usually the
-designator character is one that is frequently in that class; however,
-its meaning as a designator is unvarying and independent of what syntax
-that character currently has.
-
- A syntax descriptor is a Lisp string that specifies a syntax class, a
-matching character (used only for the parenthesis classes) and flags.
-The first character is the designator for a syntax class. The second
-character is the character to match; if it is unused, put a space there.
-Then come the characters for any desired flags. If no matching
-character or flags are needed, one character is sufficient.
-
- For example, the descriptor for the character `*' in C mode is
-`. 23' (i.e., punctuation, matching character slot unused, second
-character of a comment-starter, first character of an comment-ender),
-and the entry for `/' is `. 14' (i.e., punctuation, matching character
-slot unused, first character of a comment-starter, second character of
-a comment-ender).
-
-* Menu:
-
-* Syntax Class Table:: Table of syntax classes.
-* Syntax Flags:: Additional flags each character can have.
+ CODING and NOERROR arguments are meaningful only in XEmacsen with
+ file-coding or Mule support. Otherwise, they are ignored. Some
+ examples of usage:
-\1f
-File: lispref.info, Node: Syntax Class Table, Next: Syntax Flags, Up: Syntax Descriptors
-
-Table of Syntax Classes
------------------------
-
- Here is a table of syntax classes, the characters that stand for
-them, their meanings, and examples of their use.
-
- - Syntax class: whitespace character
- "Whitespace characters" (designated with ` ' or `-') separate
- symbols and words from each other. Typically, whitespace
- characters have no other syntactic significance, and multiple
- whitespace characters are syntactically equivalent to a single
- one. Space, tab, newline and formfeed are almost always
- classified as whitespace.
-
- - Syntax class: word constituent
- "Word constituents" (designated with `w') are parts of normal
- English words and are typically used in variable and command names
- in programs. All upper- and lower-case letters, and the digits,
- are typically word constituents.
-
- - Syntax class: symbol constituent
- "Symbol constituents" (designated with `_') are the extra
- characters that are used in variable and command names along with
- word constituents. For example, the symbol constituents class is
- used in Lisp mode to indicate that certain characters may be part
- of symbol names even though they are not part of English words.
- These characters are `$&*+-_<>'. In standard C, the only
- non-word-constituent character that is valid in symbols is
- underscore (`_').
-
- - Syntax class: punctuation character
- "Punctuation characters" (`.') are those characters that are used
- as punctuation in English, or are used in some way in a programming
- language to separate symbols from one another. Most programming
- language modes, including Emacs Lisp mode, have no characters in
- this class since the few characters that are not symbol or word
- constituents all have other uses.
-
- - Syntax class: open parenthesis character
- - Syntax class: close parenthesis character
- Open and close "parenthesis characters" are characters used in
- dissimilar pairs to surround sentences or expressions. Such a
- grouping is begun with an open parenthesis character and
- terminated with a close. Each open parenthesis character matches
- a particular close parenthesis character, and vice versa.
- Normally, XEmacs indicates momentarily the matching open
- parenthesis when you insert a close parenthesis. *Note Blinking::.
-
- The class of open parentheses is designated with `(', and that of
- close parentheses with `)'.
-
- In English text, and in C code, the parenthesis pairs are `()',
- `[]', and `{}'. In XEmacs Lisp, the delimiters for lists and
- vectors (`()' and `[]') are classified as parenthesis characters.
-
- - Syntax class: string quote
- "String quote characters" (designated with `"') are used in many
- languages, including Lisp and C, to delimit string constants. The
- same string quote character appears at the beginning and the end
- of a string. Such quoted strings do not nest.
-
- The parsing facilities of XEmacs consider a string as a single
- token. The usual syntactic meanings of the characters in the
- string are suppressed.
-
- The Lisp modes have two string quote characters: double-quote (`"')
- and vertical bar (`|'). `|' is not used in XEmacs Lisp, but it is
- used in Common Lisp. C also has two string quote characters:
- double-quote for strings, and single-quote (`'') for character
- constants.
-
- English text has no string quote characters because English is not
- a programming language. Although quotation marks are used in
- English, we do not want them to turn off the usual syntactic
- properties of other characters in the quotation.
-
- - Syntax class: escape
- An "escape character" (designated with `\') starts an escape
- sequence such as is used in C string and character constants. The
- character `\' belongs to this class in both C and Lisp. (In C, it
- is used thus only inside strings, but it turns out to cause no
- trouble to treat it this way throughout C code.)
-
- Characters in this class count as part of words if
- `words-include-escapes' is non-`nil'. *Note Word Motion::.
-
- - Syntax class: character quote
- A "character quote character" (designated with `/') quotes the
- following character so that it loses its normal syntactic meaning.
- This differs from an escape character in that only the character
- immediately following is ever affected.
-
- Characters in this class count as part of words if
- `words-include-escapes' is non-`nil'. *Note Word Motion::.
-
- This class is used for backslash in TeX mode.
-
- - Syntax class: paired delimiter
- "Paired delimiter characters" (designated with `$') are like
- string quote characters except that the syntactic properties of the
- characters between the delimiters are not suppressed. Only TeX
- mode uses a paired delimiter presently--the `$' that both enters
- and leaves math mode.
-
- - Syntax class: expression prefix
- An "expression prefix operator" (designated with `'') is used for
- syntactic operators that are part of an expression if they appear
- next to one. These characters in Lisp include the apostrophe, `''
- (used for quoting), the comma, `,' (used in macros), and `#' (used
- in the read syntax for certain data types).
-
- - Syntax class: comment starter
- - Syntax class: comment ender
- The "comment starter" and "comment ender" characters are used in
- various languages to delimit comments. These classes are
- designated with `<' and `>', respectively.
-
- English text has no comment characters. In Lisp, the semicolon
- (`;') starts a comment and a newline or formfeed ends one.
-
- - Syntax class: inherit
- This syntax class does not specify a syntax. It says to look in
- the standard syntax table to find the syntax of this character.
- The designator for this syntax code is `@'.
+ ;; Calculate the digest of the entire buffer
+ (md5 (current-buffer))
+ => "8842b04362899b1cda8d2d126dc11712"
+
+ ;; Calculate the digest of the current line
+ (md5 (current-buffer) (point-at-bol) (point-at-eol))
+ => "60614d21e9dee27dfdb01fa4e30d6d00"
+
+ ;; Calculate the digest of your name and email address
+ (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
+ => "0a2188c40fd38922d941fe6032fce516"
-\1f
-File: lispref.info, Node: Syntax Flags, Prev: Syntax Class Table, Up: Syntax Descriptors
+ Base64 is a portable encoding for arbitrary sequences of octets, in a
+form that need not be readable by humans. It uses a 65-character subset
+of US-ASCII, as described in rfc2045. Base64 is used by MIME to encode
+binary bodies, and to encode binary characters in message headers.
-Syntax Flags
-------------
+ The Lisp interface to base64 consists of four functions:
- In addition to the classes, entries for characters in a syntax table
-can include flags. There are six possible flags, represented by the
-characters `1', `2', `3', `4', `b' and `p'.
+ - Command: base64-encode-region start end &optional no-line-break
+ This function encodes the region between START and END of the
+ current buffer to base64 format. This means that the original
+ region is deleted, and replaced with its base64 equivalent.
- All the flags except `p' are used to describe multi-character
-comment delimiters. The digit flags indicate that a character can
-*also* be part of a comment sequence, in addition to the syntactic
-properties associated with its character class. The flags are
-independent of the class and each other for the sake of characters such
-as `*' in C mode, which is a punctuation character, *and* the second
-character of a start-of-comment sequence (`/*'), *and* the first
-character of an end-of-comment sequence (`*/').
+ Normally, encoded base64 output is multi-line, with 76-character
+ lines. If NO-LINE-BREAK is non-`nil', newlines will not be
+ inserted, resulting in single-line output.
- The flags for a character C are:
+ Mule note: you should make sure that you convert the multibyte
+ characters (those that do not fit into 0-255 range) to something
+ else, because they cannot be meaningfully converted to base64. If
+ the `base64-encode-region' encounters such characters, it will
+ signal an error.
- * `1' means C is the start of a two-character comment-start sequence.
+ `base64-encode-region' returns the length of the encoded text.
- * `2' means C is the second character of such a sequence.
+ ;; Encode the whole buffer in base64
+ (base64-encode-region (point-min) (point-max))
- * `3' means C is the start of a two-character comment-end sequence.
+ The function can also be used interactively, in which case it
+ works on the currently active region.
- * `4' means C is the second character of such a sequence.
+ - Function: base64-encode-string string &optional no-line-break
+ This function encodes STRING to base64, and returns the encoded
+ string.
- * `b' means that C as a comment delimiter belongs to the alternative
- "b" comment style.
+ Normally, encoded base64 output is multi-line, with 76-character
+ lines. If NO-LINE-BREAK is non-`nil', newlines will not be
+ inserted, resulting in single-line output.
- Emacs supports two comment styles simultaneously in any one syntax
- table. This is for the sake of C++. Each style of comment syntax
- has its own comment-start sequence and its own comment-end
- sequence. Each comment must stick to one style or the other;
- thus, if it starts with the comment-start sequence of style "b",
- it must also end with the comment-end sequence of style "b".
+ For Mule, the same considerations apply as for
+ `base64-encode-region'.
- The two comment-start sequences must begin with the same
- character; only the second character may differ. Mark the second
- character of the "b"-style comment-start sequence with the `b'
- flag.
+ (base64-encode-string "fubar")
+ => "ZnViYXI="
- A comment-end sequence (one or two characters) applies to the "b"
- style if its first character has the `b' flag set; otherwise, it
- applies to the "a" style.
+ - Command: base64-decode-region start end
+ This function decodes the region between START and END of the
+ current buffer. The region should be in base64 encoding.
- The appropriate comment syntax settings for C++ are as follows:
+ If the region was decoded correctly, `base64-decode-region' returns
+ the length of the decoded region. If the decoding failed, `nil' is
+ returned.
- `/'
- `124b'
+ ;; Decode a base64 buffer, and replace it with the decoded version
+ (base64-decode-region (point-min) (point-max))
- `*'
- `23'
+ - Function: base64-decode-string string
+ This function decodes STRING to base64, and returns the decoded
+ string. STRING should be valid base64-encoded text.
- newline
- `>b'
+ If encoding was not possible, `nil' is returned.
- This defines four comment-delimiting sequences:
+ (base64-decode-string "ZnViYXI=")
+ => "fubar"
+
+ (base64-decode-string "totally bogus")
+ => nil
- `/*'
- This is a comment-start sequence for "a" style because the
- second character, `*', does not have the `b' flag.
+\1f
+File: lispref.info, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top
- `//'
- This is a comment-start sequence for "b" style because the
- second character, `/', does have the `b' flag.
+Searching and Matching
+**********************
- `*/'
- This is a comment-end sequence for "a" style because the first
- character, `*', does not have the `b' flag
+ XEmacs provides two ways to search through a buffer for specified
+text: exact string searches and regular expression searches. After a
+regular expression search, you can examine the "match data" to
+determine which text matched the whole regular expression or various
+portions of it.
- newline
- This is a comment-end sequence for "b" style, because the
- newline character has the `b' flag.
+* Menu:
- * `p' identifies an additional "prefix character" for Lisp syntax.
- These characters are treated as whitespace when they appear between
- expressions. When they appear within an expression, they are
- handled according to their usual syntax codes.
+* String Search:: Search for an exact match.
+* Regular Expressions:: Describing classes of strings.
+* Regexp Search:: Searching for a match for a regexp.
+* POSIX Regexps:: Searching POSIX-style for the longest match.
+* Search and Replace:: Internals of `query-replace'.
+* Match Data:: Finding out which part of the text matched
+ various parts of a regexp, after regexp search.
+* Searching and Case:: Case-independent or case-significant searching.
+* Standard Regexps:: Useful regexps for finding sentences, pages,...
- The function `backward-prefix-chars' moves back over these
- characters, as well as over characters whose primary syntax class
- is prefix (`''). *Note Motion and Syntax::.
+ The `skip-chars...' functions also perform a kind of searching.
+*Note Skipping Characters::.
\1f
-File: lispref.info, Node: Syntax Table Functions, Next: Motion and Syntax, Prev: Syntax Descriptors, Up: Syntax Tables
-
-Syntax Table Functions
-======================
-
- In this section we describe functions for creating, accessing and
-altering syntax tables.
-
- - Function: make-syntax-table &optional TABLE
- This function creates a new syntax table. Character codes 0
- through 31 and 128 through 255 are set up to inherit from the
- standard syntax table. The other character codes are set up by
- copying what the standard syntax table says about them.
+File: lispref.info, Node: String Search, Next: Regular Expressions, Up: Searching and Matching
- Most major mode syntax tables are created in this way.
-
- - Function: copy-syntax-table &optional TABLE
- This function constructs a copy of TABLE and returns it. If TABLE
- is not supplied (or is `nil'), it returns a copy of the current
- syntax table. Otherwise, an error is signaled if TABLE is not a
- syntax table.
-
- - Command: modify-syntax-entry CHAR SYNTAX-DESCRIPTOR &optional TABLE
- This function sets the syntax entry for CHAR according to
- SYNTAX-DESCRIPTOR. The syntax is changed only for TABLE, which
- defaults to the current buffer's syntax table, and not in any
- other syntax table. The argument SYNTAX-DESCRIPTOR specifies the
- desired syntax; this is a string beginning with a class designator
- character, and optionally containing a matching character and
- flags as well. *Note Syntax Descriptors::.
+Searching for Strings
+=====================
- This function always returns `nil'. The old syntax information in
- the table for this character is discarded.
+ These are the primitive functions for searching through the text in a
+buffer. They are meant for use in programs, but you may call them
+interactively. If you do so, they prompt for the search string; LIMIT
+and NOERROR are set to `nil', and COUNT is set to 1.
- An error is signaled if the first character of the syntax
- descriptor is not one of the twelve syntax class designator
- characters. An error is also signaled if CHAR is not a character.
+ - Command: search-forward string &optional limit noerror count buffer
+ This function searches forward from point for an exact match for
+ STRING. If successful, it sets point to the end of the occurrence
+ found, and returns the new value of point. If no match is found,
+ the value and side effects depend on NOERROR (see below).
- Examples:
+ In the following example, point is initially at the beginning of
+ the line. Then `(search-forward "fox")' moves point after the last
+ letter of `fox':
- ;; Put the space character in class whitespace.
- (modify-syntax-entry ?\ " ")
- => nil
-
- ;; Make `$' an open parenthesis character,
- ;; with `^' as its matching close.
- (modify-syntax-entry ?$ "(^")
- => nil
+ ---------- Buffer: foo ----------
+ -!-The quick brown fox jumped over the lazy dog.
+ ---------- Buffer: foo ----------
- ;; Make `^' a close parenthesis character,
- ;; with `$' as its matching open.
- (modify-syntax-entry ?^ ")$")
- => nil
+ (search-forward "fox")
+ => 20
- ;; Make `/' a punctuation character,
- ;; the first character of a start-comment sequence,
- ;; and the second character of an end-comment sequence.
- ;; This is used in C mode.
- (modify-syntax-entry ?/ ". 14")
- => nil
-
- - Function: char-syntax CHARACTER
- This function returns the syntax class of CHARACTER, represented
- by its mnemonic designator character. This *only* returns the
- class, not any matching parenthesis or flags.
-
- An error is signaled if CHAR is not a character.
-
- The following examples apply to C mode. The first example shows
- that the syntax class of space is whitespace (represented by a
- space). The second example shows that the syntax of `/' is
- punctuation. This does not show the fact that it is also part of
- comment-start and -end sequences. The third example shows that
- open parenthesis is in the class of open parentheses. This does
- not show the fact that it has a matching character, `)'.
-
- (char-to-string (char-syntax ?\ ))
- => " "
+ ---------- Buffer: foo ----------
+ The quick brown fox-!- jumped over the lazy dog.
+ ---------- Buffer: foo ----------
+
+ The argument LIMIT specifies the upper bound to the search. (It
+ must be a position in the current buffer.) No match extending
+ after that position is accepted. If LIMIT is omitted or `nil', it
+ defaults to the end of the accessible portion of the buffer.
+
+ What happens when the search fails depends on the value of
+ NOERROR. If NOERROR is `nil', a `search-failed' error is
+ signaled. If NOERROR is `t', `search-forward' returns `nil' and
+ does nothing. If NOERROR is neither `nil' nor `t', then
+ `search-forward' moves point to the upper bound and returns `nil'.
+ (It would be more consistent now to return the new position of
+ point in that case, but some programs may depend on a value of
+ `nil'.)
+
+ If COUNT is supplied (it must be an integer), then the search is
+ repeated that many times (each time starting at the end of the
+ previous time's match). If COUNT is negative, the search
+ direction is backward. If the successive searches succeed, the
+ function succeeds, moving point and returning its new value.
+ Otherwise the search fails.
+
+ BUFFER is the buffer to search in, and defaults to the current
+ buffer.
+
+ - Command: search-backward string &optional limit noerror count buffer
+ This function searches backward from point for STRING. It is just
+ like `search-forward' except that it searches backwards and leaves
+ point at the beginning of the match.
+
+ - Command: word-search-forward string &optional limit noerror count
+ buffer
+ This function searches forward from point for a "word" match for
+ STRING. If it finds a match, it sets point to the end of the
+ match found, and returns the new value of point.
+
+ Word matching regards STRING as a sequence of words, disregarding
+ punctuation that separates them. It searches the buffer for the
+ same sequence of words. Each word must be distinct in the buffer
+ (searching for the word `ball' does not match the word `balls'),
+ but the details of punctuation and spacing are ignored (searching
+ for `ball boy' does match `ball. Boy!').
+
+ In this example, point is initially at the beginning of the
+ buffer; the search leaves it between the `y' and the `!'.
+
+ ---------- Buffer: foo ----------
+ -!-He said "Please! Find
+ the ball boy!"
+ ---------- Buffer: foo ----------
- (char-to-string (char-syntax ?/))
- => "."
+ (word-search-forward "Please find the ball, boy.")
+ => 35
- (char-to-string (char-syntax ?\())
- => "("
-
- - Function: set-syntax-table TABLE &optional BUFFER
- This function makes TABLE the syntax table for BUFFER, which
- defaults to the current buffer if omitted. It returns TABLE.
-
- - Function: syntax-table &optional BUFFER
- This function returns the syntax table for BUFFER, which defaults
- to the current buffer if omitted.
-
-\1f
-File: lispref.info, Node: Motion and Syntax, Next: Parsing Expressions, Prev: Syntax Table Functions, Up: Syntax Tables
-
-Motion and Syntax
-=================
-
- This section describes functions for moving across characters in
-certain syntax classes. None of these functions exists in Emacs
-version 18 or earlier.
-
- - Function: skip-syntax-forward SYNTAXES &optional LIMIT BUFFER
- This function moves point forward across characters having syntax
- classes mentioned in SYNTAXES. It stops when it encounters the
- end of the buffer, or position LIMIT (if specified), or a
- character it is not supposed to skip. Optional argument BUFFER
- defaults to the current buffer if omitted.
-
- - Function: skip-syntax-backward SYNTAXES &optional LIMIT BUFFER
- This function moves point backward across characters whose syntax
- classes are mentioned in SYNTAXES. It stops when it encounters
- the beginning of the buffer, or position LIMIT (if specified), or a
- character it is not supposed to skip. Optional argument BUFFER
- defaults to the current buffer if omitted.
-
-
- - Function: backward-prefix-chars &optional BUFFER
- This function moves point backward over any number of characters
- with expression prefix syntax. This includes both characters in
- the expression prefix syntax class, and characters with the `p'
- flag. Optional argument BUFFER defaults to the current buffer if
- omitted.
+ ---------- Buffer: foo ----------
+ He said "Please! Find
+ the ball boy-!-!"
+ ---------- Buffer: foo ----------
+
+ If LIMIT is non-`nil' (it must be a position in the current
+ buffer), then it is the upper bound to the search. The match
+ found must not extend after that position.
+
+ If NOERROR is `nil', then `word-search-forward' signals an error
+ if the search fails. If NOERROR is `t', then it returns `nil'
+ instead of signaling an error. If NOERROR is neither `nil' nor
+ `t', it moves point to LIMIT (or the end of the buffer) and
+ returns `nil'.
+
+ If COUNT is non-`nil', then the search is repeated that many
+ times. Point is positioned at the end of the last match.
+
+ BUFFER is the buffer to search in, and defaults to the current
+ buffer.
+
+ - Command: word-search-backward string &optional limit noerror count
+ buffer
+ This function searches backward from point for a word match to
+ STRING. This function is just like `word-search-forward' except
+ that it searches backward and normally leaves point at the
+ beginning of the match.
\1f
-File: lispref.info, Node: Parsing Expressions, Next: Standard Syntax Tables, Prev: Motion and Syntax, Up: Syntax Tables
-
-Parsing Balanced Expressions
-============================
-
- Here are several functions for parsing and scanning balanced
-expressions, also known as "sexps", in which parentheses match in
-pairs. The syntax table controls the interpretation of characters, so
-these functions can be used for Lisp expressions when in Lisp mode and
-for C expressions when in C mode. *Note List Motion::, for convenient
-higher-level functions for moving over balanced expressions.
-
- - Function: parse-partial-sexp START LIMIT &optional TARGET-DEPTH
- STOP-BEFORE STATE STOP-COMMENT BUFFER
- This function parses a sexp in the current buffer starting at
- START, not scanning past LIMIT. It stops at position LIMIT or
- when certain criteria described below are met, and sets point to
- the location where parsing stops. It returns a value describing
- the status of the parse at the point where it stops.
-
- If STATE is `nil', START is assumed to be at the top level of
- parenthesis structure, such as the beginning of a function
- definition. Alternatively, you might wish to resume parsing in the
- middle of the structure. To do this, you must provide a STATE
- argument that describes the initial status of parsing.
-
- If the third argument TARGET-DEPTH is non-`nil', parsing stops if
- the depth in parentheses becomes equal to TARGET-DEPTH. The depth
- starts at 0, or at whatever is given in STATE.
-
- If the fourth argument STOP-BEFORE is non-`nil', parsing stops
- when it comes to any character that starts a sexp. If
- STOP-COMMENT is non-`nil', parsing stops when it comes to the
- start of a comment.
-
- The fifth argument STATE is an eight-element list of the same form
- as the value of this function, described below. The return value
- of one call may be used to initialize the state of the parse on
- another call to `parse-partial-sexp'.
-
- The result is a list of eight elements describing the final state
- of the parse:
-
- 0. The depth in parentheses, counting from 0.
-
- 1. The character position of the start of the innermost
- parenthetical grouping containing the stopping point; `nil'
- if none.
-
- 2. The character position of the start of the last complete
- subexpression terminated; `nil' if none.
-
- 3. Non-`nil' if inside a string. More precisely, this is the
- character that will terminate the string.
-
- 4. `t' if inside a comment (of either style).
+File: lispref.info, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching
+
+Regular Expressions
+===================
+
+ A "regular expression" ("regexp", for short) is a pattern that
+denotes a (possibly infinite) set of strings. Searching for matches for
+a regexp is a very powerful operation. This section explains how to
+write regexps; the following section says how to search for them.
+
+ To gain a thorough understanding of regular expressions and how to
+use them to best advantage, we recommend that you study `Mastering
+Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
+1997'. (It's known as the "Hip Owls" book, because of the picture on its
+cover.) You might also read the manuals to *Note (gawk)Top::, *Note
+(ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
+(rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
+regular expressions.
+
+ The XEmacs regular expression syntax most closely resembles that of
+`ed', or `grep', the GNU versions of which all utilize the GNU `regex'
+library. XEmacs' version of `regex' has recently been extended with
+some Perl-like capabilities, described in the next section.
- 5. `t' if point is just after a quote character.
-
- 6. The minimum parenthesis depth encountered during this scan.
-
- 7. `t' if inside a comment of style "b".
-
- Elements 0, 3, 4, 5 and 7 are significant in the argument STATE.
-
- This function is most often used to compute indentation for
- languages that have nested parentheses.
-
- - Function: scan-lists FROM COUNT DEPTH &optional BUFFER NOERROR
- This function scans forward COUNT balanced parenthetical groupings
- from character number FROM. It returns the character position
- where the scan stops.
-
- If DEPTH is nonzero, parenthesis depth counting begins from that
- value. The only candidates for stopping are places where the
- depth in parentheses becomes zero; `scan-lists' counts COUNT such
- places and then stops. Thus, a positive value for DEPTH means go
- out DEPTH levels of parenthesis.
-
- Scanning ignores comments if `parse-sexp-ignore-comments' is
- non-`nil'.
-
- If the scan reaches the beginning or end of the buffer (or its
- accessible portion), and the depth is not zero, an error is
- signaled. If the depth is zero but the count is not used up,
- `nil' is returned.
-
- If optional arg BUFFER is non-`nil', scanning occurs in that
- buffer instead of in the current buffer.
-
- If optional arg NOERROR is non-`nil', `scan-lists' will return
- `nil' instead of signalling an error.
-
- - Function: scan-sexps FROM COUNT &optional BUFFER NOERROR
- This function scans forward COUNT sexps from character position
- FROM. It returns the character position where the scan stops.
-
- Scanning ignores comments if `parse-sexp-ignore-comments' is
- non-`nil'.
-
- If the scan reaches the beginning or end of (the accessible part
- of) the buffer in the middle of a parenthetical grouping, an error
- is signaled. If it reaches the beginning or end between groupings
- but before count is used up, `nil' is returned.
-
- If optional arg BUFFER is non-`nil', scanning occurs in that
- buffer instead of in the current buffer.
-
- If optional arg NOERROR is non-`nil', `scan-sexps' will return nil
- instead of signalling an error.
-
- - Variable: parse-sexp-ignore-comments
- If the value is non-`nil', then comments are treated as whitespace
- by the functions in this section and by `forward-sexp'.
-
- In older Emacs versions, this feature worked only when the comment
- terminator is something like `*/', and appears only to end a
- comment. In languages where newlines terminate comments, it was
- necessary make this variable `nil', since not every newline is the
- end of a comment. This limitation no longer exists.
-
- You can use `forward-comment' to move forward or backward over one
-comment or several comments.
-
- - Function: forward-comment COUNT &optional BUFFER
- This function moves point forward across COUNT comments (backward,
- if COUNT is negative). If it finds anything other than a comment
- or whitespace, it stops, leaving point at the place where it
- stopped. It also stops after satisfying COUNT.
-
- Optional argument BUFFER defaults to the current buffer.
+* Menu:
- To move forward over all comments and whitespace following point, use
-`(forward-comment (buffer-size))'. `(buffer-size)' is a good argument
-to use, because the number of comments in the buffer cannot exceed that
-many.
+* Syntax of Regexps:: Rules for writing regular expressions.
+* Regexp Example:: Illustrates regular expression syntax.
\1f
-File: lispref.info, Node: Standard Syntax Tables, Next: Syntax Table Internals, Prev: Parsing Expressions, Up: Syntax Tables
-
-Some Standard Syntax Tables
-===========================
-
- Most of the major modes in XEmacs have their own syntax tables. Here
-are several of them:
-
- - Function: standard-syntax-table
- This function returns the standard syntax table, which is the
- syntax table used in Fundamental mode.
-
- - Variable: text-mode-syntax-table
- The value of this variable is the syntax table used in Text mode.
-
- - Variable: c-mode-syntax-table
- The value of this variable is the syntax table for C-mode buffers.
-
- - Variable: emacs-lisp-mode-syntax-table
- The value of this variable is the syntax table used in Emacs Lisp
- mode by editing commands. (It has no effect on the Lisp `read'
- function.)
+File: lispref.info, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions
+
+Syntax of Regular Expressions
+-----------------------------
+
+ Regular expressions have a syntax in which a few characters are
+special constructs and the rest are "ordinary". An ordinary character
+is a simple regular expression that matches that character and nothing
+else. The special characters are `.', `*', `+', `?', `[', `]', `^',
+`$', and `\'; no new special characters will be defined in the future.
+Any other character appearing in a regular expression is ordinary,
+unless a `\' precedes it.
+
+ For example, `f' is not a special character, so it is ordinary, and
+therefore `f' is a regular expression that matches the string `f' and
+no other string. (It does _not_ match the string `ff'.) Likewise, `o'
+is a regular expression that matches only `o'.
+
+ Any two regular expressions A and B can be concatenated. The result
+is a regular expression that matches a string if A matches some amount
+of the beginning of that string and B matches the rest of the string.
+
+ As a simple example, we can concatenate the regular expressions `f'
+and `o' to get the regular expression `fo', which matches only the
+string `fo'. Still trivial. To do something more powerful, you need
+to use one of the special characters. Here is a list of them:
+
+`. (Period)'
+ is a special character that matches any single character except a
+ newline. Using concatenation, we can make regular expressions
+ like `a.b', which matches any three-character string that begins
+ with `a' and ends with `b'.
+
+`*'
+ is not a construct by itself; it is a quantifying suffix operator
+ that means to repeat the preceding regular expression as many
+ times as possible. In `fo*', the `*' applies to the `o', so `fo*'
+ matches one `f' followed by any number of `o's. The case of zero
+ `o's is allowed: `fo*' does match `f'.
+
+ `*' always applies to the _smallest_ possible preceding
+ expression. Thus, `fo*' has a repeating `o', not a repeating `fo'.
+
+ The matcher processes a `*' construct by matching, immediately, as
+ many repetitions as can be found; it is "greedy". Then it
+ continues with the rest of the pattern. If that fails,
+ backtracking occurs, discarding some of the matches of the
+ `*'-modified construct in case that makes it possible to match the
+ rest of the pattern. For example, in matching `ca*ar' against the
+ string `caaar', the `a*' first tries to match all three `a's; but
+ the rest of the pattern is `ar' and there is only `r' left to
+ match, so this try fails. The next alternative is for `a*' to
+ match only two `a's. With this choice, the rest of the regexp
+ matches successfully.
+
+ Nested repetition operators can be extremely slow if they specify
+ backtracking loops. For example, it could take hours for the
+ regular expression `\(x+y*\)*a' to match the sequence
+ `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'. The slowness is because
+ Emacs must try each imaginable way of grouping the 35 `x''s before
+ concluding that none of them can work. To make sure your regular
+ expressions run fast, check nested repetitions carefully.
+
+`+'
+ is a quantifying suffix operator similar to `*' except that the
+ preceding expression must match at least once. It is also
+ "greedy". So, for example, `ca+r' matches the strings `car' and
+ `caaaar' but not the string `cr', whereas `ca*r' matches all three
+ strings.
+
+`?'
+ is a quantifying suffix operator similar to `*', except that the
+ preceding expression can match either once or not at all. For
+ example, `ca?r' matches `car' or `cr', but does not match anything
+ else.
+
+`*?'
+ works just like `*', except that rather than matching the longest
+ match, it matches the shortest match. `*?' is known as a
+ "non-greedy" quantifier, a regexp construct borrowed from Perl.
+
+ This construct is very useful for when you want to match the text
+ inside a pair of delimiters. For instance, `/\*.*?\*/' will match
+ C comments in a string. This could not easily be achieved without
+ the use of a non-greedy quantifier.
+
+ This construct has not been available prior to XEmacs 20.4. It is
+ not available in FSF Emacs.
+
+`+?'
+ is the non-greedy version of `+'.
+
+`??'
+ is the non-greedy version of `?'.
+
+`\{n,m\}'
+ serves as an interval quantifier, analogous to `*' or `+', but
+ specifies that the expression must match at least N times, but no
+ more than M times. This syntax is supported by most Unix regexp
+ utilities, and has been introduced to XEmacs for the version 20.3.
+
+ Unfortunately, the non-greedy version of this quantifier does not
+ exist currently, although it does in Perl.
+
+`[ ... ]'
+ `[' begins a "character set", which is terminated by a `]'. In
+ the simplest case, the characters between the two brackets form
+ the set. Thus, `[ad]' matches either one `a' or one `d', and
+ `[ad]*' matches any string composed of just `a's and `d's
+ (including the empty string), from which it follows that `c[ad]*r'
+ matches `cr', `car', `cdr', `caddaar', etc.
+
+ The usual regular expression special characters are not special
+ inside a character set. A completely different set of special
+ characters exists inside character sets: `]', `-' and `^'.
+
+ `-' is used for ranges of characters. To write a range, write two
+ characters with a `-' between them. Thus, `[a-z]' matches any
+ lower case letter. Ranges may be intermixed freely with individual
+ characters, as in `[a-z$%.]', which matches any lower case letter
+ or `$', `%', or a period.
+
+ To include a `]' in a character set, make it the first character.
+ For example, `[]a]' matches `]' or `a'. To include a `-', write
+ `-' as the first character in the set, or put it immediately after
+ a range. (You can replace one individual character C with the
+ range `C-C' to make a place to put the `-'.) There is no way to
+ write a set containing just `-' and `]'.
+
+ To include `^' in a set, put it anywhere but at the beginning of
+ the set.
+
+`[^ ... ]'
+ `[^' begins a "complement character set", which matches any
+ character except the ones specified. Thus, `[^a-z0-9A-Z]' matches
+ all characters _except_ letters and digits.
+
+ `^' is not special in a character set unless it is the first
+ character. The character following the `^' is treated as if it
+ were first (thus, `-' and `]' are not special there).
+
+ Note that a complement character set can match a newline, unless
+ newline is mentioned as one of the characters not to match.
+
+`^'
+ is a special character that matches the empty string, but only at
+ the beginning of a line in the text being matched. Otherwise it
+ fails to match anything. Thus, `^foo' matches a `foo' that occurs
+ at the beginning of a line.
+
+ When matching a string instead of a buffer, `^' matches at the
+ beginning of the string or after a newline character `\n'.
+
+`$'
+ is similar to `^' but matches only at the end of a line. Thus,
+ `x+$' matches a string of one `x' or more at the end of a line.
+
+ When matching a string instead of a buffer, `$' matches at the end
+ of the string or before a newline character `\n'.
+
+`\'
+ has two functions: it quotes the special characters (including
+ `\'), and it introduces additional special constructs.
+
+ Because `\' quotes special characters, `\$' is a regular
+ expression that matches only `$', and `\[' is a regular expression
+ that matches only `[', and so on.
+
+ Note that `\' also has special meaning in the read syntax of Lisp
+ strings (*note String Type::), and must be quoted with `\'. For
+ example, the regular expression that matches the `\' character is
+ `\\'. To write a Lisp string that contains the characters `\\',
+ Lisp syntax requires you to quote each `\' with another `\'.
+ Therefore, the read syntax for a regular expression matching `\'
+ is `"\\\\"'.
+
+ *Please note:* For historical compatibility, special characters are
+treated as ordinary ones if they are in contexts where their special
+meanings make no sense. For example, `*foo' treats `*' as ordinary
+since there is no preceding expression on which the `*' can act. It is
+poor practice to depend on this behavior; quote the special character
+anyway, regardless of where it appears.
+
+ For the most part, `\' followed by any character matches only that
+character. However, there are several exceptions: characters that,
+when preceded by `\', are special constructs. Such characters are
+always ordinary when encountered on their own. Here is a table of `\'
+constructs:
+
+`\|'
+ specifies an alternative. Two regular expressions A and B with
+ `\|' in between form an expression that matches anything that
+ either A or B matches.
+
+ Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
+
+ `\|' applies to the largest possible surrounding expressions.
+ Only a surrounding `\( ... \)' grouping can limit the grouping
+ power of `\|'.
+
+ Full backtracking capability exists to handle multiple uses of
+ `\|'.
+
+`\( ... \)'
+ is a grouping construct that serves three purposes:
+
+ 1. To enclose a set of `\|' alternatives for other operations.
+ Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
+
+ 2. To enclose an expression for a suffix operator such as `*' to
+ act on. Thus, `ba\(na\)*' matches `bananana', etc., with any
+ (zero or more) number of `na' strings.
+
+ 3. To record a matched substring for future reference.
+
+ This last application is not a consequence of the idea of a
+ parenthetical grouping; it is a separate feature that happens to be
+ assigned as a second meaning to the same `\( ... \)' construct
+ because there is no conflict in practice between the two meanings.
+ Here is an explanation of this feature:
+
+`\DIGIT'
+ matches the same text that matched the DIGITth occurrence of a `\(
+ ... \)' construct.
+
+ In other words, after the end of a `\( ... \)' construct. the
+ matcher remembers the beginning and end of the text matched by that
+ construct. Then, later on in the regular expression, you can use
+ `\' followed by DIGIT to match that same text, whatever it may
+ have been.
+
+ The strings matching the first nine `\( ... \)' constructs
+ appearing in a regular expression are assigned numbers 1 through 9
+ in the order that the open parentheses appear in the regular
+ expression. So you can use `\1' through `\9' to refer to the text
+ matched by the corresponding `\( ... \)' constructs.
+
+ For example, `\(.*\)\1' matches any newline-free string that is
+ composed of two identical halves. The `\(.*\)' matches the first
+ half, which may be anything, but the `\1' that follows must match
+ the same exact text.
+
+`\(?: ... \)'
+ is called a "shy" grouping operator, and it is used just like `\(
+ ... \)', except that it does not cause the matched substring to be
+ recorded for future reference.
+
+ This is useful when you need a lot of grouping `\( ... \)'
+ constructs, but only want to remember one or two - or if you have
+ more than nine groupings and need to use backreferences to refer to
+ the groupings at the end.
+
+ Using `\(?: ... \)' rather than `\( ... \)' when you don't need
+ the captured substrings ought to speed up your programs some,
+ since it shortens the code path followed by the regular expression
+ engine, as well as the amount of memory allocation and string
+ copying it must do. The actual performance gain to be observed
+ has not been measured or quantified as of this writing.
+
+ The shy grouping operator has been borrowed from Perl, and has not
+ been available prior to XEmacs 20.3, nor is it available in FSF
+ Emacs.
+
+`\w'
+ matches any word-constituent character. The editor syntax table
+ determines which characters these are. *Note Syntax Tables::.
+
+`\W'
+ matches any character that is not a word constituent.
+
+`\sCODE'
+ matches any character whose syntax is CODE. Here CODE is a
+ character that represents a syntax code: thus, `w' for word
+ constituent, `-' for whitespace, `(' for open parenthesis, etc.
+ *Note Syntax Tables::, for a list of syntax codes and the
+ characters that stand for them.
+
+`\SCODE'
+ matches any character whose syntax is not CODE.
+
+ The following regular expression constructs match the empty
+string--that is, they don't use up any characters--but whether they
+match depends on the context.
+
+`\`'
+ matches the empty string, but only at the beginning of the buffer
+ or string being matched against.
+
+`\''
+ matches the empty string, but only at the end of the buffer or
+ string being matched against.
+
+`\='
+ matches the empty string, but only at point. (This construct is
+ not defined when matching against a string.)
+
+`\b'
+ matches the empty string, but only at the beginning or end of a
+ word. Thus, `\bfoo\b' matches any occurrence of `foo' as a
+ separate word. `\bballs?\b' matches `ball' or `balls' as a
+ separate word.
+
+`\B'
+ matches the empty string, but _not_ at the beginning or end of a
+ word.
+
+`\<'
+ matches the empty string, but only at the beginning of a word.
+
+`\>'
+ matches the empty string, but only at the end of a word.
+
+ Not every string is a valid regular expression. For example, a
+string with unbalanced square brackets is invalid (with a few
+exceptions, such as `[]]'), and so is a string that ends with a single
+`\'. If an invalid regular expression is passed to any of the search
+functions, an `invalid-regexp' error is signaled.
+
+ - Function: regexp-quote string
+ This function returns a regular expression string that matches
+ exactly STRING and nothing else. This allows you to request an
+ exact string match when calling a function that wants a regular
+ expression.
+
+ (regexp-quote "^The cat$")
+ => "\\^The cat\\$"
+
+ One use of `regexp-quote' is to combine an exact string match with
+ context described as a regular expression. For example, this
+ searches for the string that is the value of `string', surrounded
+ by whitespace:
+
+ (re-search-forward
+ (concat "\\s-" (regexp-quote string) "\\s-"))
\1f
-File: lispref.info, Node: Syntax Table Internals, Prev: Standard Syntax Tables, Up: Syntax Tables
-
-Syntax Table Internals
-======================
+File: lispref.info, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions
- Each element of a syntax table is an integer that encodes the syntax
-of one character: the syntax class, possible matching character, and
-flags. Lisp programs don't usually work with the elements directly; the
-Lisp-level syntax table functions usually work with syntax descriptors
-(*note Syntax Descriptors::.).
+Complex Regexp Example
+----------------------
- The low 8 bits of each element of a syntax table indicate the syntax
-class.
+ Here is a complicated regexp, used by XEmacs to recognize the end of
+a sentence together with any whitespace that follows. It is the value
+of the variable `sentence-end'.
-Integer
- Class
+ First, we show the regexp as a string in Lisp syntax to distinguish
+spaces from tab characters. The string constant begins and ends with a
+double-quote. `\"' stands for a double-quote as part of the string,
+`\\' for a backslash as part of the string, `\t' for a tab and `\n' for
+a newline.
-0
- whitespace
+ "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*"
-1
- punctuation
+ In contrast, if you evaluate the variable `sentence-end', you will
+see the following:
-2
- word
+ sentence-end
+ =>
+ "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[
+ ]*"
-3
- symbol
+In this output, tab and newline appear as themselves.
-4
- open parenthesis
+ This regular expression contains four parts in succession and can be
+deciphered as follows:
-5
- close parenthesis
+`[.?!]'
+ The first part of the pattern is a character set that matches any
+ one of three characters: period, question mark, and exclamation
+ mark. The match must begin with one of these three characters.
-6
- expression prefix
+`[]\"')}]*'
+ The second part of the pattern matches any closing braces and
+ quotation marks, zero or more of them, that may follow the period,
+ question mark or exclamation mark. The `\"' is Lisp syntax for a
+ double-quote in a string. The `*' at the end indicates that the
+ immediately preceding regular expression (a character set, in this
+ case) may be repeated zero or more times.
-7
- string quote
+`\\($\\| $\\|\t\\| \\)'
+ The third part of the pattern matches the whitespace that follows
+ the end of a sentence: the end of a line, or a tab, or two spaces.
+ The double backslashes mark the parentheses and vertical bars as
+ regular expression syntax; the parentheses delimit a group and the
+ vertical bars separate alternatives. The dollar sign is used to
+ match the end of a line.
-8
- paired delimiter
+`[ \t\n]*'
+ Finally, the last part of the pattern matches any additional
+ whitespace beyond the minimum needed to end a sentence.
-9
- escape
-
-10
- character quote
+\1f
+File: lispref.info, Node: Regexp Search, Next: POSIX Regexps, Prev: Regular Expressions, Up: Searching and Matching
-11
- comment-start
+Regular Expression Searching
+============================
-12
- comment-end
+ In XEmacs, you can search for the next match for a regexp either
+incrementally or not. Incremental search commands are described in the
+`The XEmacs Reference Manual'. *Note Regular Expression Search:
+(emacs)Regexp Search. Here we describe only the search functions
+useful in programs. The principal one is `re-search-forward'.
+
+ - Command: re-search-forward regexp &optional limit noerror count
+ buffer
+ This function searches forward in the current buffer for a string
+ of text that is matched by the regular expression REGEXP. The
+ function skips over any amount of text that is not matched by
+ REGEXP, and leaves point at the end of the first match found. It
+ returns the new value of point.
+
+ If LIMIT is non-`nil' (it must be a position in the current
+ buffer), then it is the upper bound to the search. No match
+ extending after that position is accepted.
+
+ What happens when the search fails depends on the value of
+ NOERROR. If NOERROR is `nil', a `search-failed' error is
+ signaled. If NOERROR is `t', `re-search-forward' does nothing and
+ returns `nil'. If NOERROR is neither `nil' nor `t', then
+ `re-search-forward' moves point to LIMIT (or the end of the
+ buffer) and returns `nil'.
+
+ If COUNT is supplied (it must be a positive number), then the
+ search is repeated that many times (each time starting at the end
+ of the previous time's match). If these successive searches
+ succeed, the function succeeds, moving point and returning its new
+ value. Otherwise the search fails.
+
+ In the following example, point is initially before the `T'.
+ Evaluating the search call moves point to the end of that line
+ (between the `t' of `hat' and the newline).
+
+ ---------- Buffer: foo ----------
+ I read "-!-The cat in the hat
+ comes back" twice.
+ ---------- Buffer: foo ----------
+
+ (re-search-forward "[a-z]+" nil t 5)
+ => 27
+
+ ---------- Buffer: foo ----------
+ I read "The cat in the hat-!-
+ comes back" twice.
+ ---------- Buffer: foo ----------
+
+ - Command: re-search-backward regexp &optional limit noerror count
+ buffer
+ This function searches backward in the current buffer for a string
+ of text that is matched by the regular expression REGEXP, leaving
+ point at the beginning of the first text found.
+
+ This function is analogous to `re-search-forward', but they are not
+ simple mirror images. `re-search-forward' finds the match whose
+ beginning is as close as possible to the starting point. If
+ `re-search-backward' were a perfect mirror image, it would find the
+ match whose end is as close as possible. However, in fact it
+ finds the match whose beginning is as close as possible. The
+ reason is that matching a regular expression at a given spot
+ always works from beginning to end, and starts at a specified
+ beginning position.
+
+ A true mirror-image of `re-search-forward' would require a special
+ feature for matching regexps from end to beginning. It's not
+ worth the trouble of implementing that.
+
+ - Function: string-match regexp string &optional start buffer
+ This function returns the index of the start of the first match for
+ the regular expression REGEXP in STRING, or `nil' if there is no
+ match. If START is non-`nil', the search starts at that index in
+ STRING.
+
+ Optional arg BUFFER controls how case folding is done (according
+ to the value of `case-fold-search' in BUFFER and BUFFER's case
+ tables) and defaults to the current buffer.
+
+ For example,
+
+ (string-match
+ "quick" "The quick brown fox jumped quickly.")
+ => 4
+ (string-match
+ "quick" "The quick brown fox jumped quickly." 8)
+ => 27
+
+ The index of the first character of the string is 0, the index of
+ the second character is 1, and so on.
+
+ After this function returns, the index of the first character
+ beyond the match is available as `(match-end 0)'. *Note Match
+ Data::.
+
+ (string-match
+ "quick" "The quick brown fox jumped quickly." 8)
+ => 27
+
+ (match-end 0)
+ => 32
-13
- inherit
+ - Function: split-string string &optional pattern
+ This function splits STRING to substrings delimited by PATTERN,
+ and returns a list of substrings. If PATTERN is omitted, it
+ defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by
+ white-space.
- The next 8 bits are the matching opposite parenthesis (if the
-character has parenthesis syntax); otherwise, they are not meaningful.
-The next 6 bits are the flags.
+ (split-string "foo bar")
+ => ("foo" "bar")
+
+ (split-string "something")
+ => ("something")
+
+ (split-string "a:b:c" ":")
+ => ("a" "b" "c")
+
+ (split-string ":a::b:c" ":")
+ => ("" "a" "" "b" "c")
+
+ - Function: split-path path
+ This function splits a search path into a list of strings. The
+ path components are separated with the characters specified with
+ `path-separator'. Under Unix, `path-separator' will normally be
+ `:', while under Windows, it will be `;'.
+
+ - Function: looking-at regexp &optional buffer
+ This function determines whether the text in the current buffer
+ directly following point matches the regular expression REGEXP.
+ "Directly following" means precisely that: the search is
+ "anchored" and it can succeed only starting with the first
+ character following point. The result is `t' if so, `nil'
+ otherwise.
+
+ This function does not move point, but it updates the match data,
+ which you can access using `match-beginning' and `match-end'.
+ *Note Match Data::.
+
+ In this example, point is located directly before the `T'. If it
+ were anywhere else, the result would be `nil'.
+
+ ---------- Buffer: foo ----------
+ I read "-!-The cat in the hat
+ comes back" twice.
+ ---------- Buffer: foo ----------
+
+ (looking-at "The cat in the hat$")
+ => t
\1f
-File: lispref.info, Node: Abbrevs, Next: Extents, Prev: Syntax Tables, Up: Top
-
-Abbrevs And Abbrev Expansion
-****************************
-
- An abbreviation or "abbrev" is a string of characters that may be
-expanded to a longer string. The user can insert the abbrev string and
-find it replaced automatically with the expansion of the abbrev. This
-saves typing.
-
- The set of abbrevs currently in effect is recorded in an "abbrev
-table". Each buffer has a local abbrev table, but normally all buffers
-in the same major mode share one abbrev table. There is also a global
-abbrev table. Normally both are used.
-
- An abbrev table is represented as an obarray containing a symbol for
-each abbreviation. The symbol's name is the abbreviation; its value is
-the expansion; its function definition is the hook function to do the
-expansion (*note Defining Abbrevs::.); its property list cell contains
-the use count, the number of times the abbreviation has been expanded.
-Because these symbols are not interned in the usual obarray, they will
-never appear as the result of reading a Lisp expression; in fact,
-normally they are never used except by the code that handles abbrevs.
-Therefore, it is safe to use them in an extremely nonstandard way.
-*Note Creating Symbols::.
-
- For the user-level commands for abbrevs, see *Note Abbrev Mode:
-(emacs)Abbrevs.
-
-* Menu:
-
-* Abbrev Mode:: Setting up XEmacs for abbreviation.
-* Tables: Abbrev Tables. Creating and working with abbrev tables.
-* Defining Abbrevs:: Specifying abbreviations and their expansions.
-* Files: Abbrev Files. Saving abbrevs in files.
-* Expansion: Abbrev Expansion. Controlling expansion; expansion subroutines.
-* Standard Abbrev Tables:: Abbrev tables used by various major modes.
+File: lispref.info, Node: POSIX Regexps, Next: Search and Replace, Prev: Regexp Search, Up: Searching and Matching
+
+POSIX Regular Expression Searching
+==================================
+
+ The usual regular expression functions do backtracking when necessary
+to handle the `\|' and repetition constructs, but they continue this
+only until they find _some_ match. Then they succeed and report the
+first match found.
+
+ This section describes alternative search functions which perform the
+full backtracking specified by the POSIX standard for regular expression
+matching. They continue backtracking until they have tried all
+possibilities and found all matches, so they can report the longest
+match, as required by POSIX. This is much slower, so use these
+functions only when you really need the longest match.
+
+ In Emacs versions prior to 19.29, these functions did not exist, and
+the functions described above implemented full POSIX backtracking.
+
+ - Command: posix-search-forward regexp &optional limit noerror count
+ buffer
+ This is like `re-search-forward' except that it performs the full
+ backtracking specified by the POSIX standard for regular expression
+ matching.
+
+ - Command: posix-search-backward regexp &optional limit noerror count
+ buffer
+ This is like `re-search-backward' except that it performs the full
+ backtracking specified by the POSIX standard for regular expression
+ matching.
+
+ - Function: posix-looking-at regexp &optional buffer
+ This is like `looking-at' except that it performs the full
+ backtracking specified by the POSIX standard for regular expression
+ matching.
+
+ - Function: posix-string-match regexp string &optional start buffer
+ This is like `string-match' except that it performs the full
+ backtracking specified by the POSIX standard for regular expression
+ matching.
+
+ Optional arg BUFFER controls how case folding is done (according
+ to the value of `case-fold-search' in BUFFER and BUFFER's case
+ tables) and defaults to the current buffer.
\1f
-File: lispref.info, Node: Abbrev Mode, Next: Abbrev Tables, Up: Abbrevs
-
-Setting Up Abbrev Mode
-======================
+File: lispref.info, Node: Search and Replace, Next: Match Data, Prev: POSIX Regexps, Up: Searching and Matching
- Abbrev mode is a minor mode controlled by the value of the variable
-`abbrev-mode'.
-
- - Variable: abbrev-mode
- A non-`nil' value of this variable turns on the automatic expansion
- of abbrevs when their abbreviations are inserted into a buffer.
- If the value is `nil', abbrevs may be defined, but they are not
- expanded automatically.
+Search and Replace
+==================
- This variable automatically becomes local when set in any fashion.
+ - Function: perform-replace from-string replacements query-flag
+ regexp-flag delimited-flag &optional repeat-count map
+ This function is the guts of `query-replace' and related commands.
+ It searches for occurrences of FROM-STRING and replaces some or
+ all of them. If QUERY-FLAG is `nil', it replaces all occurrences;
+ otherwise, it asks the user what to do about each one.
- - Variable: default-abbrev-mode
- This is the value of `abbrev-mode' for buffers that do not
- override it. This is the same as `(default-value 'abbrev-mode)'.
+ If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a
+ regular expression; otherwise, it must match literally. If
+ DELIMITED-FLAG is non-`nil', then only replacements surrounded by
+ word boundaries are considered.
-\1f
-File: lispref.info, Node: Abbrev Tables, Next: Defining Abbrevs, Prev: Abbrev Mode, Up: Abbrevs
+ The argument REPLACEMENTS specifies what to replace occurrences
+ with. If it is a string, that string is used. It can also be a
+ list of strings, to be used in cyclic order.
-Abbrev Tables
-=============
+ If REPEAT-COUNT is non-`nil', it should be an integer. Then it
+ specifies how many times to use each of the strings in the
+ REPLACEMENTS list before advancing cyclicly to the next one.
- This section describes how to create and manipulate abbrev tables.
+ Normally, the keymap `query-replace-map' defines the possible user
+ responses for queries. The argument MAP, if non-`nil', is a
+ keymap to use instead of `query-replace-map'.
- - Function: make-abbrev-table
- This function creates and returns a new, empty abbrev table--an
- obarray containing no symbols. It is a vector filled with zeros.
+ - Variable: query-replace-map
+ This variable holds a special keymap that defines the valid user
+ responses for `query-replace' and related functions, as well as
+ `y-or-n-p' and `map-y-or-n-p'. It is unusual in two ways:
- - Function: clear-abbrev-table TABLE
- This function undefines all the abbrevs in abbrev table TABLE,
- leaving it empty. The function returns `nil'.
+ * The "key bindings" are not commands, just symbols that are
+ meaningful to the functions that use this map.
- - Function: define-abbrev-table TABNAME DEFINITIONS
- This function defines TABNAME (a symbol) as an abbrev table name,
- i.e., as a variable whose value is an abbrev table. It defines
- abbrevs in the table according to DEFINITIONS, a list of elements
- of the form `(ABBREVNAME EXPANSION HOOK USECOUNT)'. The value is
- always `nil'.
+ * Prefix keys are not supported; each key binding must be for a
+ single event key sequence. This is because the functions
+ don't use read key sequence to get the input; instead, they
+ read a single event and look it up "by hand."
- - Variable: abbrev-table-name-list
- This is a list of symbols whose values are abbrev tables.
- `define-abbrev-table' adds the new abbrev table name to this list.
+ Here are the meaningful "bindings" for `query-replace-map'. Several
+of them are meaningful only for `query-replace' and friends.
- - Function: insert-abbrev-table-description NAME &optional HUMAN
- This function inserts before point a description of the abbrev
- table named NAME. The argument NAME is a symbol whose value is an
- abbrev table. The value is always `nil'.
+`act'
+ Do take the action being considered--in other words, "yes."
- If HUMAN is non-`nil', the description is human-oriented.
- Otherwise the description is a Lisp expression--a call to
- `define-abbrev-table' that would define NAME exactly as it is
- currently defined.
+`skip'
+ Do not take action for this question--in other words, "no."
-\1f
-File: lispref.info, Node: Defining Abbrevs, Next: Abbrev Files, Prev: Abbrev Tables, Up: Abbrevs
+`exit'
+ Answer this question "no," and give up on the entire series of
+ questions, assuming that the answers will be "no."
-Defining Abbrevs
-================
+`act-and-exit'
+ Answer this question "yes," and give up on the entire series of
+ questions, assuming that subsequent answers will be "no."
- These functions define an abbrev in a specified abbrev table.
-`define-abbrev' is the low-level basic function, while `add-abbrev' is
-used by commands that ask for information from the user.
+`act-and-show'
+ Answer this question "yes," but show the results--don't advance yet
+ to the next question.
- - Function: add-abbrev TABLE TYPE ARG
- This function adds an abbreviation to abbrev table TABLE based on
- information from the user. The argument TYPE is a string
- describing in English the kind of abbrev this will be (typically,
- `"global"' or `"mode-specific"'); this is used in prompting the
- user. The argument ARG is the number of words in the expansion.
+`automatic'
+ Answer this question and all subsequent questions in the series
+ with "yes," without further user interaction.
- The return value is the symbol that internally represents the new
- abbrev, or `nil' if the user declines to confirm redefining an
- existing abbrev.
+`backup'
+ Move back to the previous place that a question was asked about.
- - Function: define-abbrev TABLE NAME EXPANSION HOOK
- This function defines an abbrev in TABLE named NAME, to expand to
- EXPANSION, and call HOOK. The return value is an uninterned
- symbol that represents the abbrev inside XEmacs; its name is NAME.
+`edit'
+ Enter a recursive edit to deal with this question--instead of any
+ other action that would normally be taken.
- The argument NAME should be a string. The argument EXPANSION
- should be a string, or `nil' to undefine the abbrev.
+`delete-and-edit'
+ Delete the text being considered, then enter a recursive edit to
+ replace it.
- The argument HOOK is a function or `nil'. If HOOK is non-`nil',
- then it is called with no arguments after the abbrev is replaced
- with EXPANSION; point is located at the end of EXPANSION when HOOK
- is called.
+`recenter'
+ Redisplay and center the window, then ask the same question again.
- The use count of the abbrev is initialized to zero.
+`quit'
+ Perform a quit right away. Only `y-or-n-p' and related functions
+ use this answer.
- - User Option: only-global-abbrevs
- If this variable is non-`nil', it means that the user plans to use
- global abbrevs only. This tells the commands that define
- mode-specific abbrevs to define global ones instead. This
- variable does not alter the behavior of the functions in this
- section; it is examined by their callers.
+`help'
+ Display some help, then ask again.
\1f
-File: lispref.info, Node: Abbrev Files, Next: Abbrev Expansion, Prev: Defining Abbrevs, Up: Abbrevs
-
-Saving Abbrevs in Files
-=======================
-
- A file of saved abbrev definitions is actually a file of Lisp code.
-The abbrevs are saved in the form of a Lisp program to define the same
-abbrev tables with the same contents. Therefore, you can load the file
-with `load' (*note How Programs Do Loading::.). However, the function
-`quietly-read-abbrev-file' is provided as a more convenient interface.
+File: lispref.info, Node: Match Data, Next: Searching and Case, Prev: Search and Replace, Up: Searching and Matching
- User-level facilities such as `save-some-buffers' can save abbrevs
-in a file automatically, under the control of variables described here.
+The Match Data
+==============
- - User Option: abbrev-file-name
- This is the default file name for reading and saving abbrevs.
+ XEmacs keeps track of the positions of the start and end of segments
+of text found during a regular expression search. This means, for
+example, that you can search for a complex pattern, such as a date in
+an Rmail message, and then extract parts of the match under control of
+the pattern.
- - Function: quietly-read-abbrev-file FILENAME
- This function reads abbrev definitions from a file named FILENAME,
- previously written with `write-abbrev-file'. If FILENAME is
- `nil', the file specified in `abbrev-file-name' is used.
- `save-abbrevs' is set to `t' so that changes will be saved.
+ Because the match data normally describe the most recent search only,
+you must be careful not to do another search inadvertently between the
+search you wish to refer back to and the use of the match data. If you
+can't avoid another intervening search, you must save and restore the
+match data around it, to prevent it from being overwritten.
- This function does not display any messages. It returns `nil'.
-
- - User Option: save-abbrevs
- A non-`nil' value for `save-abbrev' means that XEmacs should save
- abbrevs when files are saved. `abbrev-file-name' specifies the
- file to save the abbrevs in.
-
- - Variable: abbrevs-changed
- This variable is set non-`nil' by defining or altering any
- abbrevs. This serves as a flag for various XEmacs commands to
- offer to save your abbrevs.
+* Menu:
- - Command: write-abbrev-file FILENAME
- Save all abbrev definitions, in all abbrev tables, in the file
- FILENAME, in the form of a Lisp program that when loaded will
- define the same abbrevs. This function returns `nil'.
+* Simple Match Data:: Accessing single items of match data,
+ such as where a particular subexpression started.
+* Replacing Match:: Replacing a substring that was matched.
+* Entire Match Data:: Accessing the entire match data at once, as a list.
+* Saving Match Data:: Saving and restoring the match data.
\1f
-File: lispref.info, Node: Abbrev Expansion, Next: Standard Abbrev Tables, Prev: Abbrev Files, Up: Abbrevs
-
-Looking Up and Expanding Abbreviations
-======================================
-
- Abbrevs are usually expanded by commands for interactive use,
-including `self-insert-command'. This section describes the
-subroutines used in writing such functions, as well as the variables
-they use for communication.
-
- - Function: abbrev-symbol ABBREV &optional TABLE
- This function returns the symbol representing the abbrev named
- ABBREV. The value returned is `nil' if that abbrev is not
- defined. The optional second argument TABLE is the abbrev table
- to look it up in. If TABLE is `nil', this function tries first
- the current buffer's local abbrev table, and second the global
- abbrev table.
-
- - Function: abbrev-expansion ABBREV &optional TABLE
- This function returns the string that ABBREV would expand into (as
- defined by the abbrev tables used for the current buffer). The
- optional argument TABLE specifies the abbrev table to use, as in
- `abbrev-symbol'.
-
- - Command: expand-abbrev
- This command expands the abbrev before point, if any. If point
- does not follow an abbrev, this command does nothing. The command
- returns `t' if it did expansion, `nil' otherwise.
-
- - Command: abbrev-prefix-mark &optional ARG
- Mark current point as the beginning of an abbrev. The next call to
- `expand-abbrev' will use the text from here to point (where it is
- then) as the abbrev to expand, rather than using the previous word
- as usual.
-
- - User Option: abbrev-all-caps
- When this is set non-`nil', an abbrev entered entirely in upper
- case is expanded using all upper case. Otherwise, an abbrev
- entered entirely in upper case is expanded by capitalizing each
- word of the expansion.
-
- - Variable: abbrev-start-location
- This is the buffer position for `expand-abbrev' to use as the start
- of the next abbrev to be expanded. (`nil' means use the word
- before point instead.) `abbrev-start-location' is set to `nil'
- each time `expand-abbrev' is called. This variable is also set by
- `abbrev-prefix-mark'.
-
- - Variable: abbrev-start-location-buffer
- The value of this variable is the buffer for which
- `abbrev-start-location' has been set. Trying to expand an abbrev
- in any other buffer clears `abbrev-start-location'. This variable
- is set by `abbrev-prefix-mark'.
-
- - Variable: last-abbrev
- This is the `abbrev-symbol' of the last abbrev expanded. This
- information is left by `expand-abbrev' for the sake of the
- `unexpand-abbrev' command.
-
- - Variable: last-abbrev-location
- This is the location of the last abbrev expanded. This contains
- information left by `expand-abbrev' for the sake of the
- `unexpand-abbrev' command.
-
- - Variable: last-abbrev-text
- This is the exact expansion text of the last abbrev expanded,
- after case conversion (if any). Its value is `nil' if the abbrev
- has already been unexpanded. This contains information left by
- `expand-abbrev' for the sake of the `unexpand-abbrev' command.
-
- - Variable: pre-abbrev-expand-hook
- This is a normal hook whose functions are executed, in sequence,
- just before any expansion of an abbrev. *Note Hooks::. Since it
- is a normal hook, the hook functions receive no arguments.
- However, they can find the abbrev to be expanded by looking in the
- buffer before point.
-
- The following sample code shows a simple use of
-`pre-abbrev-expand-hook'. If the user terminates an abbrev with a
-punctuation character, the hook function asks for confirmation. Thus,
-this hook allows the user to decide whether to expand the abbrev, and
-aborts expansion if it is not confirmed.
-
- (add-hook 'pre-abbrev-expand-hook 'query-if-not-space)
+File: lispref.info, Node: Simple Match Data, Next: Replacing Match, Up: Match Data
+
+Simple Match Data Access
+------------------------
+
+ This section explains how to use the match data to find out what was
+matched by the last search or match operation.
+
+ You can ask about the entire matching text, or about a particular
+parenthetical subexpression of a regular expression. The COUNT
+argument in the functions below specifies which. If COUNT is zero, you
+are asking about the entire match. If COUNT is positive, it specifies
+which subexpression you want.
+
+ Recall that the subexpressions of a regular expression are those
+expressions grouped with escaped parentheses, `\(...\)'. The COUNTth
+subexpression is found by counting occurrences of `\(' from the
+beginning of the whole regular expression. The first subexpression is
+numbered 1, the second 2, and so on. Only regular expressions can have
+subexpressions--after a simple string search, the only information
+available is about the entire match.
+
+ - Function: match-string count &optional in-string
+ This function returns, as a string, the text matched in the last
+ search or match operation. It returns the entire text if COUNT is
+ zero, or just the portion corresponding to the COUNTth
+ parenthetical subexpression, if COUNT is positive. If COUNT is
+ out of range, or if that subexpression didn't match anything, the
+ value is `nil'.
+
+ If the last such operation was done against a string with
+ `string-match', then you should pass the same string as the
+ argument IN-STRING. Otherwise, after a buffer search or match,
+ you should omit IN-STRING or pass `nil' for it; but you should
+ make sure that the current buffer when you call `match-string' is
+ the one in which you did the searching or matching.
+
+ - Function: match-beginning count
+ This function returns the position of the start of text matched by
+ the last regular expression searched for, or a subexpression of it.
+
+ If COUNT is zero, then the value is the position of the start of
+ the entire match. Otherwise, COUNT specifies a subexpression in
+ the regular expression, and the value of the function is the
+ starting position of the match for that subexpression.
+
+ The value is `nil' for a subexpression inside a `\|' alternative
+ that wasn't used in the match.
+
+ - Function: match-end count
+ This function is like `match-beginning' except that it returns the
+ position of the end of the match, rather than the position of the
+ beginning.
+
+ Here is an example of using the match data, with a comment showing
+the positions within the text:
+
+ (string-match "\\(qu\\)\\(ick\\)"
+ "The quick fox jumped quickly.")
+ ;0123456789
+ => 4
- ;; This is the function invoked by `pre-abbrev-expand-hook'.
+ (match-string 0 "The quick fox jumped quickly.")
+ => "quick"
+ (match-string 1 "The quick fox jumped quickly.")
+ => "qu"
+ (match-string 2 "The quick fox jumped quickly.")
+ => "ick"
- ;; If the user terminated the abbrev with a space, the function does
- ;; nothing (that is, it returns so that the abbrev can expand). If the
- ;; user entered some other character, this function asks whether
- ;; expansion should continue.
+ (match-beginning 1) ; The beginning of the match
+ => 4 ; with `qu' is at index 4.
- ;; If the user answers the prompt with `y', the function returns
- ;; `nil' (because of the `not' function), but that is
- ;; acceptable; the return value has no effect on expansion.
+ (match-beginning 2) ; The beginning of the match
+ => 6 ; with `ick' is at index 6.
- (defun query-if-not-space ()
- (if (/= ?\ (preceding-char))
- (if (not (y-or-n-p "Do you want to expand this abbrev? "))
- (error "Not expanding this abbrev"))))
-
-\1f
-File: lispref.info, Node: Standard Abbrev Tables, Prev: Abbrev Expansion, Up: Abbrevs
-
-Standard Abbrev Tables
-======================
-
- Here we list the variables that hold the abbrev tables for the
-preloaded major modes of XEmacs.
-
- - Variable: global-abbrev-table
- This is the abbrev table for mode-independent abbrevs. The abbrevs
- defined in it apply to all buffers. Each buffer may also have a
- local abbrev table, whose abbrev definitions take precedence over
- those in the global table.
-
- - Variable: local-abbrev-table
- The value of this buffer-local variable is the (mode-specific)
- abbreviation table of the current buffer.
-
- - Variable: fundamental-mode-abbrev-table
- This is the local abbrev table used in Fundamental mode; in other
- words, it is the local abbrev table in all buffers in Fundamental
- mode.
-
- - Variable: text-mode-abbrev-table
- This is the local abbrev table used in Text mode.
-
- - Variable: c-mode-abbrev-table
- This is the local abbrev table used in C mode.
-
- - Variable: lisp-mode-abbrev-table
- This is the local abbrev table used in Lisp mode and Emacs Lisp
- mode.
+ (match-end 1) ; The end of the match
+ => 6 ; with `qu' is at index 6.
+
+ (match-end 2) ; The end of the match
+ => 9 ; with `ick' is at index 9.
+
+ Here is another example. Point is initially located at the beginning
+of the line. Searching moves point to between the space and the word
+`in'. The beginning of the entire match is at the 9th character of the
+buffer (`T'), and the beginning of the match for the first
+subexpression is at the 13th character (`c').
+
+ (list
+ (re-search-forward "The \\(cat \\)")
+ (match-beginning 0)
+ (match-beginning 1))
+ => (9 9 13)
+
+ ---------- Buffer: foo ----------
+ I read "The cat -!-in the hat comes back" twice.
+ ^ ^
+ 9 13
+ ---------- Buffer: foo ----------
+
+(In this case, the index returned is a buffer position; the first
+character of the buffer counts as 1.)