X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;f=info%2Flispref.info-32;h=0a550157f55e39769e40fc1b9d37e6bec0333064;hb=d0e6b27b97829086778013f56539d65d6d3d6def;hp=3173f9fe9830d1ea6552ab8175b9ef93f58f263b;hpb=7d6edaefa00e7b7e102354283824a4f6a721b71a;p=chise%2Fxemacs-chise.git diff --git a/info/lispref.info-32 b/info/lispref.info-32 index 3173f9f..0a55015 100644 --- a/info/lispref.info-32 +++ b/info/lispref.info-32 @@ -50,1184 +50,1150 @@ may be included in a translation approved by the Free Software Foundation instead of in the original English.  -File: lispref.info, Node: Entire Match Data, Next: Saving Match Data, Prev: Replacing Match, Up: Match Data - -Accessing the Entire Match Data -------------------------------- - - The functions `match-data' and `set-match-data' read or write the -entire match data, all at once. - - - Function: match-data - This function returns a newly constructed list containing all the - information on what text the last search matched. Element zero is - the position of the beginning of the match for the whole - expression; element one is the position of the end of the match - for the expression. The next two elements are the positions of - the beginning and end of the match for the first subexpression, - and so on. In general, element number 2N corresponds to - `(match-beginning N)'; and element number 2N + 1 corresponds to - `(match-end N)'. - - All the elements are markers or `nil' if matching was done on a - buffer, and all are integers or `nil' if matching was done on a - string with `string-match'. (In Emacs 18 and earlier versions, - markers were used even for matching on a string, except in the case - of the integer 0.) - - As always, there must be no possibility of intervening searches - between the call to a search function and the call to `match-data' - that is intended to access the match data for that search. - - (match-data) - => (# - # - # - #) - - - Function: set-match-data match-list - This function sets the match data from the elements of MATCH-LIST, - which should be a list that was the value of a previous call to - `match-data'. - - If MATCH-LIST refers to a buffer that doesn't exist, you don't get - an error; that sets the match data in a meaningless but harmless - way. - - `store-match-data' is an alias for `set-match-data'. +File: lispref.info, Node: Change Hooks, Next: Transformations, Prev: Transposition, Up: Text + +Change Hooks +============ + + These hook variables let you arrange to take notice of all changes in +all buffers (or in a particular buffer, if you make them buffer-local). + + The functions you use in these hooks should save and restore the +match data if they do anything that uses regular expressions; +otherwise, they will interfere in bizarre ways with the editing +operations that call them. + + Buffer changes made while executing the following hooks don't +themselves cause any change hooks to be invoked. + + - Variable: before-change-functions + This variable holds a list of a functions to call before any buffer + modification. Each function gets two arguments, the beginning and + end of the region that is about to change, represented as + integers. The buffer that is about to change is always the + current buffer. + + - Variable: after-change-functions + This variable holds a list of a functions to call after any buffer + modification. Each function receives three arguments: the + beginning and end of the region just changed, and the length of + the text that existed before the change. (To get the current + length, subtract the region beginning from the region end.) All + three arguments are integers. The buffer that's about to change + is always the current buffer. + + - Variable: before-change-function + This obsolete variable holds one function to call before any buffer + modification (or `nil' for no function). It is called just like + the functions in `before-change-functions'. + + - Variable: after-change-function + This obsolete variable holds one function to call after any buffer + modification (or `nil' for no function). It is called just like + the functions in `after-change-functions'. + + - Variable: first-change-hook + This variable is a normal hook that is run whenever a buffer is + changed that was previously in the unmodified state.  -File: lispref.info, Node: Saving Match Data, Prev: Entire Match Data, Up: Match Data +File: lispref.info, Node: Transformations, Prev: Change Hooks, Up: Text -Saving and Restoring the Match Data ------------------------------------ +Textual transformations--MD5 and base64 support +=============================================== - When you call a function that may do a search, you may need to save -and restore the match data around that call, if you want to preserve the -match data from an earlier search for later use. Here is an example -that shows the problem that arises if you fail to save the match data: + Some textual operations inherently require examining each character +in turn, and performing arithmetic operations on them. Such operations +can, of course, be implemented in Emacs Lisp, but tend to be very slow +for large portions of text or data. This is why some of them are +implemented in C, with an appropriate interface for Lisp programmers. +Examples of algorithms thus provided are MD5 and base64 support. - (re-search-forward "The \\(cat \\)") - => 48 - (foo) ; Perhaps `foo' does - ; more searching. - (match-end 0) - => 61 ; Unexpected result--not 48! + MD5 is an algorithm for calculating message digests, as described in +rfc1321. Given a message of arbitrary length, MD5 produces an 128-bit +"fingerprint" ("message digest") corresponding to that message. It is +considered computationally infeasible to produce two messages having +the same MD5 digest, or to produce a message having a prespecified +target digest. MD5 is used heavily by various authentication schemes. - You can save and restore the match data with `save-match-data': + Emacs Lisp interface to MD5 consists of a single function `md5': - - Macro: save-match-data body... - This special form executes BODY, saving and restoring the match - data around it. + - Function: md5 object &optional start end coding noerror + This function returns the MD5 message digest of OBJECT, a buffer + or string. - You can use `set-match-data' together with `match-data' to imitate -the effect of the special form `save-match-data'. This is useful for -writing code that can run in Emacs 18. Here is how: + Optional arguments START and END denote positions for computing + the digest of a portion of OBJECT. - (let ((data (match-data))) - (unwind-protect - ... ; May change the original match data. - (set-match-data data))) + The optional CODING argument specifies the coding system the text + is to be represented in while computing the digest. If + unspecified, it defaults to the current format of the data, or is + guessed. - Emacs automatically saves and restores the match data when it runs -process filter functions (*note Filter Functions::) and process -sentinels (*note Sentinels::). + If NOERROR is non-`nil', silently assume binary coding if the + guesswork fails. Normally, an error is signaled in such case. - -File: lispref.info, Node: Searching and Case, Next: Standard Regexps, Prev: Match Data, Up: Searching and Matching - -Searching and Case -================== - - By default, searches in Emacs ignore the case of the text they are -searching through; if you specify searching for `FOO', then `Foo' or -`foo' is also considered a match. Regexps, and in particular character -sets, are included: thus, `[aB]' would match `a' or `A' or `b' or `B'. - - If you do not want this feature, set the variable `case-fold-search' -to `nil'. Then all letters must match exactly, including case. This -is a buffer-local variable; altering the variable affects only the -current buffer. (*Note Intro to Buffer-Local::.) Alternatively, you -may change the value of `default-case-fold-search', which is the -default value of `case-fold-search' for buffers that do not override it. - - Note that the user-level incremental search feature handles case -distinctions differently. When given a lower case letter, it looks for -a match of either case, but when given an upper case letter, it looks -for an upper case letter only. But this has nothing to do with the -searching functions Lisp functions use. - - - User Option: case-replace - This variable determines whether the replacement functions should - preserve case. If the variable is `nil', that means to use the - replacement text verbatim. A non-`nil' value means to convert the - case of the replacement text according to the text being replaced. - - The function `replace-match' is where this variable actually has - its effect. *Note Replacing Match::. - - - User Option: case-fold-search - This buffer-local variable determines whether searches should - ignore case. If the variable is `nil' they do not ignore case; - otherwise they do ignore case. - - - Variable: default-case-fold-search - The value of this variable is the default value for - `case-fold-search' in buffers that do not override it. This is the - same as `(default-value 'case-fold-search)'. - - -File: lispref.info, Node: Standard Regexps, Prev: Searching and Case, Up: Searching and Matching - -Standard Regular Expressions Used in Editing -============================================ - - This section describes some variables that hold regular expressions -used for certain purposes in editing: - - - Variable: page-delimiter - This is the regexp describing line-beginnings that separate pages. - The default value is `"^\014"' (i.e., `"^^L"' or `"^\C-l"'); this - matches a line that starts with a formfeed character. - - The following two regular expressions should _not_ assume the match -always starts at the beginning of a line; they should not use `^' to -anchor the match. Most often, the paragraph commands do check for a -match only at the beginning of a line, which means that `^' would be -superfluous. When there is a nonzero left margin, they accept matches -that start after the left margin. In that case, a `^' would be -incorrect. However, a `^' is harmless in modes where a left margin is -never used. - - - Variable: paragraph-separate - This is the regular expression for recognizing the beginning of a - line that separates paragraphs. (If you change this, you may have - to change `paragraph-start' also.) The default value is - `"[ \t\f]*$"', which matches a line that consists entirely of - spaces, tabs, and form feeds (after its left margin). - - - Variable: paragraph-start - This is the regular expression for recognizing the beginning of a - line that starts _or_ separates paragraphs. The default value is - `"[ \t\n\f]"', which matches a line starting with a space, tab, - newline, or form feed (after its left margin). - - - Variable: sentence-end - This is the regular expression describing the end of a sentence. - (All paragraph boundaries also end sentences, regardless.) The - default value is: - - "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" - - This means a period, question mark or exclamation mark, followed - optionally by a closing parenthetical character, followed by tabs, - spaces or new lines. - - For a detailed explanation of this regular expression, see *Note - Regexp Example::. - - -File: lispref.info, Node: Syntax Tables, Next: Abbrevs, Prev: Searching and Matching, Up: Top - -Syntax Tables -************* - - A "syntax table" specifies the syntactic textual function of each -character. This information is used by the parsing commands, the -complex movement commands, and others to determine where words, symbols, -and other syntactic constructs begin and end. The current syntax table -controls the meaning of the word motion functions (*note Word Motion::) -and the list motion functions (*note List Motion::) as well as the -functions in this chapter. - -* Menu: - -* Basics: Syntax Basics. Basic concepts of syntax tables. -* Desc: Syntax Descriptors. How characters are classified. -* Syntax Table Functions:: How to create, examine and alter syntax tables. -* Motion and Syntax:: Moving over characters with certain syntaxes. -* Parsing Expressions:: Parsing balanced expressions - using the syntax table. -* Standard Syntax Tables:: Syntax tables used by various major modes. -* Syntax Table Internals:: How syntax table information is stored. - - -File: lispref.info, Node: Syntax Basics, Next: Syntax Descriptors, Up: Syntax Tables - -Syntax Table Concepts -===================== - - A "syntax table" provides Emacs with the information that determines -the syntactic use of each character in a buffer. This information is -used by the parsing commands, the complex movement commands, and others -to determine where words, symbols, and other syntactic constructs begin -and end. The current syntax table controls the meaning of the word -motion functions (*note Word Motion::) and the list motion functions -(*note List Motion::) as well as the functions in this chapter. - - Under XEmacs 20, a syntax table is a particular subtype of the -primitive char table type (*note Char Tables::), and each element of the -char table is an integer that encodes the syntax of the character in -question, or a cons of such an integer and a matching character (for -characters with parenthesis syntax). - - Under XEmacs 19, a syntax table is a vector of 256 elements; it -contains one entry for each of the 256 possible characters in an 8-bit -byte. Each element is an integer that encodes the syntax of the -character in question. (The matching character, if any, is embedded in -the bits of this integer.) - - Syntax tables are used only for moving across text, not for the Emacs -Lisp reader. XEmacs Lisp uses built-in syntactic rules when reading -Lisp expressions, and these rules cannot be changed. - - Each buffer has its own major mode, and each major mode has its own -idea of the syntactic class of various characters. For example, in Lisp -mode, the character `;' begins a comment, but in C mode, it terminates -a statement. To support these variations, XEmacs makes the choice of -syntax table local to each buffer. Typically, each major mode has its -own syntax table and installs that table in each buffer that uses that -mode. Changing this table alters the syntax in all those buffers as -well as in any buffers subsequently put in that mode. Occasionally -several similar modes share one syntax table. *Note Example Major -Modes::, for an example of how to set up a syntax table. - - A syntax table can inherit the data for some characters from the -standard syntax table, while specifying other characters itself. The -"inherit" syntax class means "inherit this character's syntax from the -standard syntax table." Most major modes' syntax tables inherit the -syntax of character codes 0 through 31 and 128 through 255. This is -useful with character sets such as ISO Latin-1 that have additional -alphabetic characters in the range 128 to 255. Just changing the -standard syntax for these characters affects all major modes. - - - Function: syntax-table-p object - This function returns `t' if OBJECT is a vector of length 256 - elements. This means that the vector may be a syntax table. - However, according to this test, any vector of length 256 is - considered to be a syntax table, no matter what its contents. - - -File: lispref.info, Node: Syntax Descriptors, Next: Syntax Table Functions, Prev: Syntax Basics, Up: Syntax Tables - -Syntax Descriptors -================== - - This section describes the syntax classes and flags that denote the -syntax of a character, and how they are represented as a "syntax -descriptor", which is a Lisp string that you pass to -`modify-syntax-entry' to specify the desired syntax. - - XEmacs defines a number of "syntax classes". Each syntax table puts -each character into one class. There is no necessary relationship -between the class of a character in one syntax table and its class in -any other table. - - Each class is designated by a mnemonic character, which serves as the -name of the class when you need to specify a class. Usually the -designator character is one that is frequently in that class; however, -its meaning as a designator is unvarying and independent of what syntax -that character currently has. - - A syntax descriptor is a Lisp string that specifies a syntax class, a -matching character (used only for the parenthesis classes) and flags. -The first character is the designator for a syntax class. The second -character is the character to match; if it is unused, put a space there. -Then come the characters for any desired flags. If no matching -character or flags are needed, one character is sufficient. - - For example, the descriptor for the character `*' in C mode is -`. 23' (i.e., punctuation, matching character slot unused, second -character of a comment-starter, first character of an comment-ender), -and the entry for `/' is `. 14' (i.e., punctuation, matching character -slot unused, first character of a comment-starter, second character of -a comment-ender). - -* Menu: - -* Syntax Class Table:: Table of syntax classes. -* Syntax Flags:: Additional flags each character can have. + CODING and NOERROR arguments are meaningful only in XEmacsen with + file-coding or Mule support. Otherwise, they are ignored. Some + examples of usage: - -File: lispref.info, Node: Syntax Class Table, Next: Syntax Flags, Up: Syntax Descriptors - -Table of Syntax Classes ------------------------ - - Here is a table of syntax classes, the characters that stand for -them, their meanings, and examples of their use. - - - Syntax class: whitespace character - "Whitespace characters" (designated with ` ' or `-') separate - symbols and words from each other. Typically, whitespace - characters have no other syntactic significance, and multiple - whitespace characters are syntactically equivalent to a single - one. Space, tab, newline and formfeed are almost always - classified as whitespace. - - - Syntax class: word constituent - "Word constituents" (designated with `w') are parts of normal - English words and are typically used in variable and command names - in programs. All upper- and lower-case letters, and the digits, - are typically word constituents. - - - Syntax class: symbol constituent - "Symbol constituents" (designated with `_') are the extra - characters that are used in variable and command names along with - word constituents. For example, the symbol constituents class is - used in Lisp mode to indicate that certain characters may be part - of symbol names even though they are not part of English words. - These characters are `$&*+-_<>'. In standard C, the only - non-word-constituent character that is valid in symbols is - underscore (`_'). - - - Syntax class: punctuation character - "Punctuation characters" (`.') are those characters that are used - as punctuation in English, or are used in some way in a programming - language to separate symbols from one another. Most programming - language modes, including Emacs Lisp mode, have no characters in - this class since the few characters that are not symbol or word - constituents all have other uses. - - - Syntax class: open parenthesis character - - Syntax class: close parenthesis character - Open and close "parenthesis characters" are characters used in - dissimilar pairs to surround sentences or expressions. Such a - grouping is begun with an open parenthesis character and - terminated with a close. Each open parenthesis character matches - a particular close parenthesis character, and vice versa. - Normally, XEmacs indicates momentarily the matching open - parenthesis when you insert a close parenthesis. *Note Blinking::. - - The class of open parentheses is designated with `(', and that of - close parentheses with `)'. - - In English text, and in C code, the parenthesis pairs are `()', - `[]', and `{}'. In XEmacs Lisp, the delimiters for lists and - vectors (`()' and `[]') are classified as parenthesis characters. - - - Syntax class: string quote - "String quote characters" (designated with `"') are used in many - languages, including Lisp and C, to delimit string constants. The - same string quote character appears at the beginning and the end - of a string. Such quoted strings do not nest. - - The parsing facilities of XEmacs consider a string as a single - token. The usual syntactic meanings of the characters in the - string are suppressed. - - The Lisp modes have two string quote characters: double-quote (`"') - and vertical bar (`|'). `|' is not used in XEmacs Lisp, but it is - used in Common Lisp. C also has two string quote characters: - double-quote for strings, and single-quote (`'') for character - constants. - - English text has no string quote characters because English is not - a programming language. Although quotation marks are used in - English, we do not want them to turn off the usual syntactic - properties of other characters in the quotation. - - - Syntax class: escape - An "escape character" (designated with `\') starts an escape - sequence such as is used in C string and character constants. The - character `\' belongs to this class in both C and Lisp. (In C, it - is used thus only inside strings, but it turns out to cause no - trouble to treat it this way throughout C code.) - - Characters in this class count as part of words if - `words-include-escapes' is non-`nil'. *Note Word Motion::. - - - Syntax class: character quote - A "character quote character" (designated with `/') quotes the - following character so that it loses its normal syntactic meaning. - This differs from an escape character in that only the character - immediately following is ever affected. - - Characters in this class count as part of words if - `words-include-escapes' is non-`nil'. *Note Word Motion::. - - This class is used for backslash in TeX mode. - - - Syntax class: paired delimiter - "Paired delimiter characters" (designated with `$') are like - string quote characters except that the syntactic properties of the - characters between the delimiters are not suppressed. Only TeX - mode uses a paired delimiter presently--the `$' that both enters - and leaves math mode. - - - Syntax class: expression prefix - An "expression prefix operator" (designated with `'') is used for - syntactic operators that are part of an expression if they appear - next to one. These characters in Lisp include the apostrophe, `'' - (used for quoting), the comma, `,' (used in macros), and `#' (used - in the read syntax for certain data types). - - - Syntax class: comment starter - - Syntax class: comment ender - The "comment starter" and "comment ender" characters are used in - various languages to delimit comments. These classes are - designated with `<' and `>', respectively. - - English text has no comment characters. In Lisp, the semicolon - (`;') starts a comment and a newline or formfeed ends one. - - - Syntax class: inherit - This syntax class does not specify a syntax. It says to look in - the standard syntax table to find the syntax of this character. - The designator for this syntax code is `@'. + ;; Calculate the digest of the entire buffer + (md5 (current-buffer)) + => "8842b04362899b1cda8d2d126dc11712" + + ;; Calculate the digest of the current line + (md5 (current-buffer) (point-at-bol) (point-at-eol)) + => "60614d21e9dee27dfdb01fa4e30d6d00" + + ;; Calculate the digest of your name and email address + (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address))) + => "0a2188c40fd38922d941fe6032fce516" - -File: lispref.info, Node: Syntax Flags, Prev: Syntax Class Table, Up: Syntax Descriptors + Base64 is a portable encoding for arbitrary sequences of octets, in a +form that need not be readable by humans. It uses a 65-character subset +of US-ASCII, as described in rfc2045. Base64 is used by MIME to encode +binary bodies, and to encode binary characters in message headers. -Syntax Flags ------------- + The Lisp interface to base64 consists of four functions: - In addition to the classes, entries for characters in a syntax table -can include flags. There are six possible flags, represented by the -characters `1', `2', `3', `4', `b' and `p'. + - Command: base64-encode-region start end &optional no-line-break + This function encodes the region between START and END of the + current buffer to base64 format. This means that the original + region is deleted, and replaced with its base64 equivalent. - All the flags except `p' are used to describe multi-character -comment delimiters. The digit flags indicate that a character can -_also_ be part of a comment sequence, in addition to the syntactic -properties associated with its character class. The flags are -independent of the class and each other for the sake of characters such -as `*' in C mode, which is a punctuation character, _and_ the second -character of a start-of-comment sequence (`/*'), _and_ the first -character of an end-of-comment sequence (`*/'). + Normally, encoded base64 output is multi-line, with 76-character + lines. If NO-LINE-BREAK is non-`nil', newlines will not be + inserted, resulting in single-line output. - The flags for a character C are: + Mule note: you should make sure that you convert the multibyte + characters (those that do not fit into 0-255 range) to something + else, because they cannot be meaningfully converted to base64. If + the `base64-encode-region' encounters such characters, it will + signal an error. - * `1' means C is the start of a two-character comment-start sequence. + `base64-encode-region' returns the length of the encoded text. - * `2' means C is the second character of such a sequence. + ;; Encode the whole buffer in base64 + (base64-encode-region (point-min) (point-max)) - * `3' means C is the start of a two-character comment-end sequence. + The function can also be used interactively, in which case it + works on the currently active region. - * `4' means C is the second character of such a sequence. + - Function: base64-encode-string string &optional no-line-break + This function encodes STRING to base64, and returns the encoded + string. - * `b' means that C as a comment delimiter belongs to the alternative - "b" comment style. + Normally, encoded base64 output is multi-line, with 76-character + lines. If NO-LINE-BREAK is non-`nil', newlines will not be + inserted, resulting in single-line output. - Emacs supports two comment styles simultaneously in any one syntax - table. This is for the sake of C++. Each style of comment syntax - has its own comment-start sequence and its own comment-end - sequence. Each comment must stick to one style or the other; - thus, if it starts with the comment-start sequence of style "b", - it must also end with the comment-end sequence of style "b". + For Mule, the same considerations apply as for + `base64-encode-region'. - The two comment-start sequences must begin with the same - character; only the second character may differ. Mark the second - character of the "b"-style comment-start sequence with the `b' - flag. + (base64-encode-string "fubar") + => "ZnViYXI=" - A comment-end sequence (one or two characters) applies to the "b" - style if its first character has the `b' flag set; otherwise, it - applies to the "a" style. + - Command: base64-decode-region start end + This function decodes the region between START and END of the + current buffer. The region should be in base64 encoding. - The appropriate comment syntax settings for C++ are as follows: + If the region was decoded correctly, `base64-decode-region' returns + the length of the decoded region. If the decoding failed, `nil' is + returned. - `/' - `124b' + ;; Decode a base64 buffer, and replace it with the decoded version + (base64-decode-region (point-min) (point-max)) - `*' - `23' + - Function: base64-decode-string string + This function decodes STRING to base64, and returns the decoded + string. STRING should be valid base64-encoded text. - newline - `>b' + If encoding was not possible, `nil' is returned. - This defines four comment-delimiting sequences: + (base64-decode-string "ZnViYXI=") + => "fubar" + + (base64-decode-string "totally bogus") + => nil - `/*' - This is a comment-start sequence for "a" style because the - second character, `*', does not have the `b' flag. + +File: lispref.info, Node: Searching and Matching, Next: Syntax Tables, Prev: Text, Up: Top - `//' - This is a comment-start sequence for "b" style because the - second character, `/', does have the `b' flag. +Searching and Matching +********************** - `*/' - This is a comment-end sequence for "a" style because the first - character, `*', does not have the `b' flag + XEmacs provides two ways to search through a buffer for specified +text: exact string searches and regular expression searches. After a +regular expression search, you can examine the "match data" to +determine which text matched the whole regular expression or various +portions of it. - newline - This is a comment-end sequence for "b" style, because the - newline character has the `b' flag. +* Menu: - * `p' identifies an additional "prefix character" for Lisp syntax. - These characters are treated as whitespace when they appear between - expressions. When they appear within an expression, they are - handled according to their usual syntax codes. +* String Search:: Search for an exact match. +* Regular Expressions:: Describing classes of strings. +* Regexp Search:: Searching for a match for a regexp. +* POSIX Regexps:: Searching POSIX-style for the longest match. +* Search and Replace:: Internals of `query-replace'. +* Match Data:: Finding out which part of the text matched + various parts of a regexp, after regexp search. +* Searching and Case:: Case-independent or case-significant searching. +* Standard Regexps:: Useful regexps for finding sentences, pages,... - The function `backward-prefix-chars' moves back over these - characters, as well as over characters whose primary syntax class - is prefix (`''). *Note Motion and Syntax::. + The `skip-chars...' functions also perform a kind of searching. +*Note Skipping Characters::.  -File: lispref.info, Node: Syntax Table Functions, Next: Motion and Syntax, Prev: Syntax Descriptors, Up: Syntax Tables - -Syntax Table Functions -====================== - - In this section we describe functions for creating, accessing and -altering syntax tables. - - - Function: make-syntax-table &optional table - This function creates a new syntax table. Character codes 0 - through 31 and 128 through 255 are set up to inherit from the - standard syntax table. The other character codes are set up by - copying what the standard syntax table says about them. +File: lispref.info, Node: String Search, Next: Regular Expressions, Up: Searching and Matching - Most major mode syntax tables are created in this way. - - - Function: copy-syntax-table &optional table - This function constructs a copy of TABLE and returns it. If TABLE - is not supplied (or is `nil'), it returns a copy of the current - syntax table. Otherwise, an error is signaled if TABLE is not a - syntax table. - - - Command: modify-syntax-entry char syntax-descriptor &optional table - This function sets the syntax entry for CHAR according to - SYNTAX-DESCRIPTOR. The syntax is changed only for TABLE, which - defaults to the current buffer's syntax table, and not in any - other syntax table. The argument SYNTAX-DESCRIPTOR specifies the - desired syntax; this is a string beginning with a class designator - character, and optionally containing a matching character and - flags as well. *Note Syntax Descriptors::. +Searching for Strings +===================== - This function always returns `nil'. The old syntax information in - the table for this character is discarded. + These are the primitive functions for searching through the text in a +buffer. They are meant for use in programs, but you may call them +interactively. If you do so, they prompt for the search string; LIMIT +and NOERROR are set to `nil', and COUNT is set to 1. - An error is signaled if the first character of the syntax - descriptor is not one of the twelve syntax class designator - characters. An error is also signaled if CHAR is not a character. + - Command: search-forward string &optional limit noerror count buffer + This function searches forward from point for an exact match for + STRING. If successful, it sets point to the end of the occurrence + found, and returns the new value of point. If no match is found, + the value and side effects depend on NOERROR (see below). - Examples: + In the following example, point is initially at the beginning of + the line. Then `(search-forward "fox")' moves point after the last + letter of `fox': - ;; Put the space character in class whitespace. - (modify-syntax-entry ?\ " ") - => nil - - ;; Make `$' an open parenthesis character, - ;; with `^' as its matching close. - (modify-syntax-entry ?$ "(^") - => nil + ---------- Buffer: foo ---------- + -!-The quick brown fox jumped over the lazy dog. + ---------- Buffer: foo ---------- - ;; Make `^' a close parenthesis character, - ;; with `$' as its matching open. - (modify-syntax-entry ?^ ")$") - => nil + (search-forward "fox") + => 20 - ;; Make `/' a punctuation character, - ;; the first character of a start-comment sequence, - ;; and the second character of an end-comment sequence. - ;; This is used in C mode. - (modify-syntax-entry ?/ ". 14") - => nil - - - Function: char-syntax character - This function returns the syntax class of CHARACTER, represented - by its mnemonic designator character. This _only_ returns the - class, not any matching parenthesis or flags. - - An error is signaled if CHAR is not a character. - - The following examples apply to C mode. The first example shows - that the syntax class of space is whitespace (represented by a - space). The second example shows that the syntax of `/' is - punctuation. This does not show the fact that it is also part of - comment-start and -end sequences. The third example shows that - open parenthesis is in the class of open parentheses. This does - not show the fact that it has a matching character, `)'. - - (char-to-string (char-syntax ?\ )) - => " " + ---------- Buffer: foo ---------- + The quick brown fox-!- jumped over the lazy dog. + ---------- Buffer: foo ---------- + + The argument LIMIT specifies the upper bound to the search. (It + must be a position in the current buffer.) No match extending + after that position is accepted. If LIMIT is omitted or `nil', it + defaults to the end of the accessible portion of the buffer. + + What happens when the search fails depends on the value of + NOERROR. If NOERROR is `nil', a `search-failed' error is + signaled. If NOERROR is `t', `search-forward' returns `nil' and + does nothing. If NOERROR is neither `nil' nor `t', then + `search-forward' moves point to the upper bound and returns `nil'. + (It would be more consistent now to return the new position of + point in that case, but some programs may depend on a value of + `nil'.) + + If COUNT is supplied (it must be an integer), then the search is + repeated that many times (each time starting at the end of the + previous time's match). If COUNT is negative, the search + direction is backward. If the successive searches succeed, the + function succeeds, moving point and returning its new value. + Otherwise the search fails. + + BUFFER is the buffer to search in, and defaults to the current + buffer. + + - Command: search-backward string &optional limit noerror count buffer + This function searches backward from point for STRING. It is just + like `search-forward' except that it searches backwards and leaves + point at the beginning of the match. + + - Command: word-search-forward string &optional limit noerror count + buffer + This function searches forward from point for a "word" match for + STRING. If it finds a match, it sets point to the end of the + match found, and returns the new value of point. + + Word matching regards STRING as a sequence of words, disregarding + punctuation that separates them. It searches the buffer for the + same sequence of words. Each word must be distinct in the buffer + (searching for the word `ball' does not match the word `balls'), + but the details of punctuation and spacing are ignored (searching + for `ball boy' does match `ball. Boy!'). + + In this example, point is initially at the beginning of the + buffer; the search leaves it between the `y' and the `!'. + + ---------- Buffer: foo ---------- + -!-He said "Please! Find + the ball boy!" + ---------- Buffer: foo ---------- - (char-to-string (char-syntax ?/)) - => "." + (word-search-forward "Please find the ball, boy.") + => 35 - (char-to-string (char-syntax ?\()) - => "(" - - - Function: set-syntax-table table &optional buffer - This function makes TABLE the syntax table for BUFFER, which - defaults to the current buffer if omitted. It returns TABLE. - - - Function: syntax-table &optional buffer - This function returns the syntax table for BUFFER, which defaults - to the current buffer if omitted. - - -File: lispref.info, Node: Motion and Syntax, Next: Parsing Expressions, Prev: Syntax Table Functions, Up: Syntax Tables - -Motion and Syntax -================= - - This section describes functions for moving across characters in -certain syntax classes. None of these functions exists in Emacs -version 18 or earlier. - - - Function: skip-syntax-forward syntaxes &optional limit buffer - This function moves point forward across characters having syntax - classes mentioned in SYNTAXES. It stops when it encounters the - end of the buffer, or position LIMIT (if specified), or a - character it is not supposed to skip. Optional argument BUFFER - defaults to the current buffer if omitted. - - - Function: skip-syntax-backward syntaxes &optional limit buffer - This function moves point backward across characters whose syntax - classes are mentioned in SYNTAXES. It stops when it encounters - the beginning of the buffer, or position LIMIT (if specified), or a - character it is not supposed to skip. Optional argument BUFFER - defaults to the current buffer if omitted. - - - - Function: backward-prefix-chars &optional buffer - This function moves point backward over any number of characters - with expression prefix syntax. This includes both characters in - the expression prefix syntax class, and characters with the `p' - flag. Optional argument BUFFER defaults to the current buffer if - omitted. + ---------- Buffer: foo ---------- + He said "Please! Find + the ball boy-!-!" + ---------- Buffer: foo ---------- + + If LIMIT is non-`nil' (it must be a position in the current + buffer), then it is the upper bound to the search. The match + found must not extend after that position. + + If NOERROR is `nil', then `word-search-forward' signals an error + if the search fails. If NOERROR is `t', then it returns `nil' + instead of signaling an error. If NOERROR is neither `nil' nor + `t', it moves point to LIMIT (or the end of the buffer) and + returns `nil'. + + If COUNT is non-`nil', then the search is repeated that many + times. Point is positioned at the end of the last match. + + BUFFER is the buffer to search in, and defaults to the current + buffer. + + - Command: word-search-backward string &optional limit noerror count + buffer + This function searches backward from point for a word match to + STRING. This function is just like `word-search-forward' except + that it searches backward and normally leaves point at the + beginning of the match.  -File: lispref.info, Node: Parsing Expressions, Next: Standard Syntax Tables, Prev: Motion and Syntax, Up: Syntax Tables - -Parsing Balanced Expressions -============================ - - Here are several functions for parsing and scanning balanced -expressions, also known as "sexps", in which parentheses match in -pairs. The syntax table controls the interpretation of characters, so -these functions can be used for Lisp expressions when in Lisp mode and -for C expressions when in C mode. *Note List Motion::, for convenient -higher-level functions for moving over balanced expressions. - - - Function: parse-partial-sexp start limit &optional target-depth - stop-before state stop-comment buffer - This function parses a sexp in the current buffer starting at - START, not scanning past LIMIT. It stops at position LIMIT or - when certain criteria described below are met, and sets point to - the location where parsing stops. It returns a value describing - the status of the parse at the point where it stops. - - If STATE is `nil', START is assumed to be at the top level of - parenthesis structure, such as the beginning of a function - definition. Alternatively, you might wish to resume parsing in the - middle of the structure. To do this, you must provide a STATE - argument that describes the initial status of parsing. - - If the third argument TARGET-DEPTH is non-`nil', parsing stops if - the depth in parentheses becomes equal to TARGET-DEPTH. The depth - starts at 0, or at whatever is given in STATE. - - If the fourth argument STOP-BEFORE is non-`nil', parsing stops - when it comes to any character that starts a sexp. If - STOP-COMMENT is non-`nil', parsing stops when it comes to the - start of a comment. - - The fifth argument STATE is an eight-element list of the same form - as the value of this function, described below. The return value - of one call may be used to initialize the state of the parse on - another call to `parse-partial-sexp'. - - The result is a list of eight elements describing the final state - of the parse: - - 0. The depth in parentheses, counting from 0. - - 1. The character position of the start of the innermost - parenthetical grouping containing the stopping point; `nil' - if none. - - 2. The character position of the start of the last complete - subexpression terminated; `nil' if none. - - 3. Non-`nil' if inside a string. More precisely, this is the - character that will terminate the string. - - 4. `t' if inside a comment (of either style). +File: lispref.info, Node: Regular Expressions, Next: Regexp Search, Prev: String Search, Up: Searching and Matching + +Regular Expressions +=================== + + A "regular expression" ("regexp", for short) is a pattern that +denotes a (possibly infinite) set of strings. Searching for matches for +a regexp is a very powerful operation. This section explains how to +write regexps; the following section says how to search for them. + + To gain a thorough understanding of regular expressions and how to +use them to best advantage, we recommend that you study `Mastering +Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates, +1997'. (It's known as the "Hip Owls" book, because of the picture on its +cover.) You might also read the manuals to *Note (gawk)Top::, *Note +(ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note +(rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of +regular expressions. + + The XEmacs regular expression syntax most closely resembles that of +`ed', or `grep', the GNU versions of which all utilize the GNU `regex' +library. XEmacs' version of `regex' has recently been extended with +some Perl-like capabilities, described in the next section. - 5. `t' if point is just after a quote character. - - 6. The minimum parenthesis depth encountered during this scan. - - 7. `t' if inside a comment of style "b". - - Elements 0, 3, 4, 5 and 7 are significant in the argument STATE. - - This function is most often used to compute indentation for - languages that have nested parentheses. - - - Function: scan-lists from count depth &optional buffer noerror - This function scans forward COUNT balanced parenthetical groupings - from character number FROM. It returns the character position - where the scan stops. - - If DEPTH is nonzero, parenthesis depth counting begins from that - value. The only candidates for stopping are places where the - depth in parentheses becomes zero; `scan-lists' counts COUNT such - places and then stops. Thus, a positive value for DEPTH means go - out DEPTH levels of parenthesis. - - Scanning ignores comments if `parse-sexp-ignore-comments' is - non-`nil'. - - If the scan reaches the beginning or end of the buffer (or its - accessible portion), and the depth is not zero, an error is - signaled. If the depth is zero but the count is not used up, - `nil' is returned. - - If optional arg BUFFER is non-`nil', scanning occurs in that - buffer instead of in the current buffer. - - If optional arg NOERROR is non-`nil', `scan-lists' will return - `nil' instead of signalling an error. - - - Function: scan-sexps from count &optional buffer noerror - This function scans forward COUNT sexps from character position - FROM. It returns the character position where the scan stops. - - Scanning ignores comments if `parse-sexp-ignore-comments' is - non-`nil'. - - If the scan reaches the beginning or end of (the accessible part - of) the buffer in the middle of a parenthetical grouping, an error - is signaled. If it reaches the beginning or end between groupings - but before count is used up, `nil' is returned. - - If optional arg BUFFER is non-`nil', scanning occurs in that - buffer instead of in the current buffer. - - If optional arg NOERROR is non-`nil', `scan-sexps' will return nil - instead of signalling an error. - - - Variable: parse-sexp-ignore-comments - If the value is non-`nil', then comments are treated as whitespace - by the functions in this section and by `forward-sexp'. - - In older Emacs versions, this feature worked only when the comment - terminator is something like `*/', and appears only to end a - comment. In languages where newlines terminate comments, it was - necessary make this variable `nil', since not every newline is the - end of a comment. This limitation no longer exists. - - You can use `forward-comment' to move forward or backward over one -comment or several comments. - - - Function: forward-comment count &optional buffer - This function moves point forward across COUNT comments (backward, - if COUNT is negative). If it finds anything other than a comment - or whitespace, it stops, leaving point at the place where it - stopped. It also stops after satisfying COUNT. - - Optional argument BUFFER defaults to the current buffer. +* Menu: - To move forward over all comments and whitespace following point, use -`(forward-comment (buffer-size))'. `(buffer-size)' is a good argument -to use, because the number of comments in the buffer cannot exceed that -many. +* Syntax of Regexps:: Rules for writing regular expressions. +* Regexp Example:: Illustrates regular expression syntax.  -File: lispref.info, Node: Standard Syntax Tables, Next: Syntax Table Internals, Prev: Parsing Expressions, Up: Syntax Tables - -Some Standard Syntax Tables -=========================== - - Most of the major modes in XEmacs have their own syntax tables. Here -are several of them: - - - Function: standard-syntax-table - This function returns the standard syntax table, which is the - syntax table used in Fundamental mode. - - - Variable: text-mode-syntax-table - The value of this variable is the syntax table used in Text mode. - - - Variable: c-mode-syntax-table - The value of this variable is the syntax table for C-mode buffers. - - - Variable: emacs-lisp-mode-syntax-table - The value of this variable is the syntax table used in Emacs Lisp - mode by editing commands. (It has no effect on the Lisp `read' - function.) +File: lispref.info, Node: Syntax of Regexps, Next: Regexp Example, Up: Regular Expressions + +Syntax of Regular Expressions +----------------------------- + + Regular expressions have a syntax in which a few characters are +special constructs and the rest are "ordinary". An ordinary character +is a simple regular expression that matches that character and nothing +else. The special characters are `.', `*', `+', `?', `[', `]', `^', +`$', and `\'; no new special characters will be defined in the future. +Any other character appearing in a regular expression is ordinary, +unless a `\' precedes it. + + For example, `f' is not a special character, so it is ordinary, and +therefore `f' is a regular expression that matches the string `f' and +no other string. (It does _not_ match the string `ff'.) Likewise, `o' +is a regular expression that matches only `o'. + + Any two regular expressions A and B can be concatenated. The result +is a regular expression that matches a string if A matches some amount +of the beginning of that string and B matches the rest of the string. + + As a simple example, we can concatenate the regular expressions `f' +and `o' to get the regular expression `fo', which matches only the +string `fo'. Still trivial. To do something more powerful, you need +to use one of the special characters. Here is a list of them: + +`. (Period)' + is a special character that matches any single character except a + newline. Using concatenation, we can make regular expressions + like `a.b', which matches any three-character string that begins + with `a' and ends with `b'. + +`*' + is not a construct by itself; it is a quantifying suffix operator + that means to repeat the preceding regular expression as many + times as possible. In `fo*', the `*' applies to the `o', so `fo*' + matches one `f' followed by any number of `o's. The case of zero + `o's is allowed: `fo*' does match `f'. + + `*' always applies to the _smallest_ possible preceding + expression. Thus, `fo*' has a repeating `o', not a repeating `fo'. + + The matcher processes a `*' construct by matching, immediately, as + many repetitions as can be found; it is "greedy". Then it + continues with the rest of the pattern. If that fails, + backtracking occurs, discarding some of the matches of the + `*'-modified construct in case that makes it possible to match the + rest of the pattern. For example, in matching `ca*ar' against the + string `caaar', the `a*' first tries to match all three `a's; but + the rest of the pattern is `ar' and there is only `r' left to + match, so this try fails. The next alternative is for `a*' to + match only two `a's. With this choice, the rest of the regexp + matches successfully. + + Nested repetition operators can be extremely slow if they specify + backtracking loops. For example, it could take hours for the + regular expression `\(x+y*\)*a' to match the sequence + `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'. The slowness is because + Emacs must try each imaginable way of grouping the 35 `x''s before + concluding that none of them can work. To make sure your regular + expressions run fast, check nested repetitions carefully. + +`+' + is a quantifying suffix operator similar to `*' except that the + preceding expression must match at least once. It is also + "greedy". So, for example, `ca+r' matches the strings `car' and + `caaaar' but not the string `cr', whereas `ca*r' matches all three + strings. + +`?' + is a quantifying suffix operator similar to `*', except that the + preceding expression can match either once or not at all. For + example, `ca?r' matches `car' or `cr', but does not match anything + else. + +`*?' + works just like `*', except that rather than matching the longest + match, it matches the shortest match. `*?' is known as a + "non-greedy" quantifier, a regexp construct borrowed from Perl. + + This construct is very useful for when you want to match the text + inside a pair of delimiters. For instance, `/\*.*?\*/' will match + C comments in a string. This could not easily be achieved without + the use of a non-greedy quantifier. + + This construct has not been available prior to XEmacs 20.4. It is + not available in FSF Emacs. + +`+?' + is the non-greedy version of `+'. + +`??' + is the non-greedy version of `?'. + +`\{n,m\}' + serves as an interval quantifier, analogous to `*' or `+', but + specifies that the expression must match at least N times, but no + more than M times. This syntax is supported by most Unix regexp + utilities, and has been introduced to XEmacs for the version 20.3. + + Unfortunately, the non-greedy version of this quantifier does not + exist currently, although it does in Perl. + +`[ ... ]' + `[' begins a "character set", which is terminated by a `]'. In + the simplest case, the characters between the two brackets form + the set. Thus, `[ad]' matches either one `a' or one `d', and + `[ad]*' matches any string composed of just `a's and `d's + (including the empty string), from which it follows that `c[ad]*r' + matches `cr', `car', `cdr', `caddaar', etc. + + The usual regular expression special characters are not special + inside a character set. A completely different set of special + characters exists inside character sets: `]', `-' and `^'. + + `-' is used for ranges of characters. To write a range, write two + characters with a `-' between them. Thus, `[a-z]' matches any + lower case letter. Ranges may be intermixed freely with individual + characters, as in `[a-z$%.]', which matches any lower case letter + or `$', `%', or a period. + + To include a `]' in a character set, make it the first character. + For example, `[]a]' matches `]' or `a'. To include a `-', write + `-' as the first character in the set, or put it immediately after + a range. (You can replace one individual character C with the + range `C-C' to make a place to put the `-'.) There is no way to + write a set containing just `-' and `]'. + + To include `^' in a set, put it anywhere but at the beginning of + the set. + +`[^ ... ]' + `[^' begins a "complement character set", which matches any + character except the ones specified. Thus, `[^a-z0-9A-Z]' matches + all characters _except_ letters and digits. + + `^' is not special in a character set unless it is the first + character. The character following the `^' is treated as if it + were first (thus, `-' and `]' are not special there). + + Note that a complement character set can match a newline, unless + newline is mentioned as one of the characters not to match. + +`^' + is a special character that matches the empty string, but only at + the beginning of a line in the text being matched. Otherwise it + fails to match anything. Thus, `^foo' matches a `foo' that occurs + at the beginning of a line. + + When matching a string instead of a buffer, `^' matches at the + beginning of the string or after a newline character `\n'. + +`$' + is similar to `^' but matches only at the end of a line. Thus, + `x+$' matches a string of one `x' or more at the end of a line. + + When matching a string instead of a buffer, `$' matches at the end + of the string or before a newline character `\n'. + +`\' + has two functions: it quotes the special characters (including + `\'), and it introduces additional special constructs. + + Because `\' quotes special characters, `\$' is a regular + expression that matches only `$', and `\[' is a regular expression + that matches only `[', and so on. + + Note that `\' also has special meaning in the read syntax of Lisp + strings (*note String Type::), and must be quoted with `\'. For + example, the regular expression that matches the `\' character is + `\\'. To write a Lisp string that contains the characters `\\', + Lisp syntax requires you to quote each `\' with another `\'. + Therefore, the read syntax for a regular expression matching `\' + is `"\\\\"'. + + *Please note:* For historical compatibility, special characters are +treated as ordinary ones if they are in contexts where their special +meanings make no sense. For example, `*foo' treats `*' as ordinary +since there is no preceding expression on which the `*' can act. It is +poor practice to depend on this behavior; quote the special character +anyway, regardless of where it appears. + + For the most part, `\' followed by any character matches only that +character. However, there are several exceptions: characters that, +when preceded by `\', are special constructs. Such characters are +always ordinary when encountered on their own. Here is a table of `\' +constructs: + +`\|' + specifies an alternative. Two regular expressions A and B with + `\|' in between form an expression that matches anything that + either A or B matches. + + Thus, `foo\|bar' matches either `foo' or `bar' but no other string. + + `\|' applies to the largest possible surrounding expressions. + Only a surrounding `\( ... \)' grouping can limit the grouping + power of `\|'. + + Full backtracking capability exists to handle multiple uses of + `\|'. + +`\( ... \)' + is a grouping construct that serves three purposes: + + 1. To enclose a set of `\|' alternatives for other operations. + Thus, `\(foo\|bar\)x' matches either `foox' or `barx'. + + 2. To enclose an expression for a suffix operator such as `*' to + act on. Thus, `ba\(na\)*' matches `bananana', etc., with any + (zero or more) number of `na' strings. + + 3. To record a matched substring for future reference. + + This last application is not a consequence of the idea of a + parenthetical grouping; it is a separate feature that happens to be + assigned as a second meaning to the same `\( ... \)' construct + because there is no conflict in practice between the two meanings. + Here is an explanation of this feature: + +`\DIGIT' + matches the same text that matched the DIGITth occurrence of a `\( + ... \)' construct. + + In other words, after the end of a `\( ... \)' construct. the + matcher remembers the beginning and end of the text matched by that + construct. Then, later on in the regular expression, you can use + `\' followed by DIGIT to match that same text, whatever it may + have been. + + The strings matching the first nine `\( ... \)' constructs + appearing in a regular expression are assigned numbers 1 through 9 + in the order that the open parentheses appear in the regular + expression. So you can use `\1' through `\9' to refer to the text + matched by the corresponding `\( ... \)' constructs. + + For example, `\(.*\)\1' matches any newline-free string that is + composed of two identical halves. The `\(.*\)' matches the first + half, which may be anything, but the `\1' that follows must match + the same exact text. + +`\(?: ... \)' + is called a "shy" grouping operator, and it is used just like `\( + ... \)', except that it does not cause the matched substring to be + recorded for future reference. + + This is useful when you need a lot of grouping `\( ... \)' + constructs, but only want to remember one or two - or if you have + more than nine groupings and need to use backreferences to refer to + the groupings at the end. + + Using `\(?: ... \)' rather than `\( ... \)' when you don't need + the captured substrings ought to speed up your programs some, + since it shortens the code path followed by the regular expression + engine, as well as the amount of memory allocation and string + copying it must do. The actual performance gain to be observed + has not been measured or quantified as of this writing. + + The shy grouping operator has been borrowed from Perl, and has not + been available prior to XEmacs 20.3, nor is it available in FSF + Emacs. + +`\w' + matches any word-constituent character. The editor syntax table + determines which characters these are. *Note Syntax Tables::. + +`\W' + matches any character that is not a word constituent. + +`\sCODE' + matches any character whose syntax is CODE. Here CODE is a + character that represents a syntax code: thus, `w' for word + constituent, `-' for whitespace, `(' for open parenthesis, etc. + *Note Syntax Tables::, for a list of syntax codes and the + characters that stand for them. + +`\SCODE' + matches any character whose syntax is not CODE. + + The following regular expression constructs match the empty +string--that is, they don't use up any characters--but whether they +match depends on the context. + +`\`' + matches the empty string, but only at the beginning of the buffer + or string being matched against. + +`\'' + matches the empty string, but only at the end of the buffer or + string being matched against. + +`\=' + matches the empty string, but only at point. (This construct is + not defined when matching against a string.) + +`\b' + matches the empty string, but only at the beginning or end of a + word. Thus, `\bfoo\b' matches any occurrence of `foo' as a + separate word. `\bballs?\b' matches `ball' or `balls' as a + separate word. + +`\B' + matches the empty string, but _not_ at the beginning or end of a + word. + +`\<' + matches the empty string, but only at the beginning of a word. + +`\>' + matches the empty string, but only at the end of a word. + + Not every string is a valid regular expression. For example, a +string with unbalanced square brackets is invalid (with a few +exceptions, such as `[]]'), and so is a string that ends with a single +`\'. If an invalid regular expression is passed to any of the search +functions, an `invalid-regexp' error is signaled. + + - Function: regexp-quote string + This function returns a regular expression string that matches + exactly STRING and nothing else. This allows you to request an + exact string match when calling a function that wants a regular + expression. + + (regexp-quote "^The cat$") + => "\\^The cat\\$" + + One use of `regexp-quote' is to combine an exact string match with + context described as a regular expression. For example, this + searches for the string that is the value of `string', surrounded + by whitespace: + + (re-search-forward + (concat "\\s-" (regexp-quote string) "\\s-"))  -File: lispref.info, Node: Syntax Table Internals, Prev: Standard Syntax Tables, Up: Syntax Tables - -Syntax Table Internals -====================== +File: lispref.info, Node: Regexp Example, Prev: Syntax of Regexps, Up: Regular Expressions - Each element of a syntax table is an integer that encodes the syntax -of one character: the syntax class, possible matching character, and -flags. Lisp programs don't usually work with the elements directly; the -Lisp-level syntax table functions usually work with syntax descriptors -(*note Syntax Descriptors::). +Complex Regexp Example +---------------------- - The low 8 bits of each element of a syntax table indicate the syntax -class. + Here is a complicated regexp, used by XEmacs to recognize the end of +a sentence together with any whitespace that follows. It is the value +of the variable `sentence-end'. -Integer - Class + First, we show the regexp as a string in Lisp syntax to distinguish +spaces from tab characters. The string constant begins and ends with a +double-quote. `\"' stands for a double-quote as part of the string, +`\\' for a backslash as part of the string, `\t' for a tab and `\n' for +a newline. -0 - whitespace + "[.?!][]\"')}]*\\($\\| $\\|\t\\| \\)[ \t\n]*" -1 - punctuation + In contrast, if you evaluate the variable `sentence-end', you will +see the following: -2 - word + sentence-end + => + "[.?!][]\"')}]*\\($\\| $\\| \\| \\)[ + ]*" -3 - symbol +In this output, tab and newline appear as themselves. -4 - open parenthesis + This regular expression contains four parts in succession and can be +deciphered as follows: -5 - close parenthesis +`[.?!]' + The first part of the pattern is a character set that matches any + one of three characters: period, question mark, and exclamation + mark. The match must begin with one of these three characters. -6 - expression prefix +`[]\"')}]*' + The second part of the pattern matches any closing braces and + quotation marks, zero or more of them, that may follow the period, + question mark or exclamation mark. The `\"' is Lisp syntax for a + double-quote in a string. The `*' at the end indicates that the + immediately preceding regular expression (a character set, in this + case) may be repeated zero or more times. -7 - string quote +`\\($\\| $\\|\t\\| \\)' + The third part of the pattern matches the whitespace that follows + the end of a sentence: the end of a line, or a tab, or two spaces. + The double backslashes mark the parentheses and vertical bars as + regular expression syntax; the parentheses delimit a group and the + vertical bars separate alternatives. The dollar sign is used to + match the end of a line. -8 - paired delimiter +`[ \t\n]*' + Finally, the last part of the pattern matches any additional + whitespace beyond the minimum needed to end a sentence. -9 - escape - -10 - character quote + +File: lispref.info, Node: Regexp Search, Next: POSIX Regexps, Prev: Regular Expressions, Up: Searching and Matching -11 - comment-start +Regular Expression Searching +============================ -12 - comment-end + In XEmacs, you can search for the next match for a regexp either +incrementally or not. Incremental search commands are described in the +`The XEmacs Lisp Reference Manual'. *Note Regular Expression Search: +(xemacs)Regexp Search. Here we describe only the search functions +useful in programs. The principal one is `re-search-forward'. + + - Command: re-search-forward regexp &optional limit noerror count + buffer + This function searches forward in the current buffer for a string + of text that is matched by the regular expression REGEXP. The + function skips over any amount of text that is not matched by + REGEXP, and leaves point at the end of the first match found. It + returns the new value of point. + + If LIMIT is non-`nil' (it must be a position in the current + buffer), then it is the upper bound to the search. No match + extending after that position is accepted. + + What happens when the search fails depends on the value of + NOERROR. If NOERROR is `nil', a `search-failed' error is + signaled. If NOERROR is `t', `re-search-forward' does nothing and + returns `nil'. If NOERROR is neither `nil' nor `t', then + `re-search-forward' moves point to LIMIT (or the end of the + buffer) and returns `nil'. + + If COUNT is supplied (it must be a positive number), then the + search is repeated that many times (each time starting at the end + of the previous time's match). If these successive searches + succeed, the function succeeds, moving point and returning its new + value. Otherwise the search fails. + + In the following example, point is initially before the `T'. + Evaluating the search call moves point to the end of that line + (between the `t' of `hat' and the newline). + + ---------- Buffer: foo ---------- + I read "-!-The cat in the hat + comes back" twice. + ---------- Buffer: foo ---------- + + (re-search-forward "[a-z]+" nil t 5) + => 27 + + ---------- Buffer: foo ---------- + I read "The cat in the hat-!- + comes back" twice. + ---------- Buffer: foo ---------- + + - Command: re-search-backward regexp &optional limit noerror count + buffer + This function searches backward in the current buffer for a string + of text that is matched by the regular expression REGEXP, leaving + point at the beginning of the first text found. + + This function is analogous to `re-search-forward', but they are not + simple mirror images. `re-search-forward' finds the match whose + beginning is as close as possible to the starting point. If + `re-search-backward' were a perfect mirror image, it would find the + match whose end is as close as possible. However, in fact it + finds the match whose beginning is as close as possible. The + reason is that matching a regular expression at a given spot + always works from beginning to end, and starts at a specified + beginning position. + + A true mirror-image of `re-search-forward' would require a special + feature for matching regexps from end to beginning. It's not + worth the trouble of implementing that. + + - Function: string-match regexp string &optional start buffer + This function returns the index of the start of the first match for + the regular expression REGEXP in STRING, or `nil' if there is no + match. If START is non-`nil', the search starts at that index in + STRING. + + Optional arg BUFFER controls how case folding is done (according + to the value of `case-fold-search' in BUFFER and BUFFER's case + tables) and defaults to the current buffer. + + For example, + + (string-match + "quick" "The quick brown fox jumped quickly.") + => 4 + (string-match + "quick" "The quick brown fox jumped quickly." 8) + => 27 + + The index of the first character of the string is 0, the index of + the second character is 1, and so on. + + After this function returns, the index of the first character + beyond the match is available as `(match-end 0)'. *Note Match + Data::. + + (string-match + "quick" "The quick brown fox jumped quickly." 8) + => 27 + + (match-end 0) + => 32 -13 - inherit + - Function: split-string string &optional pattern + This function splits STRING to substrings delimited by PATTERN, + and returns a list of substrings. If PATTERN is omitted, it + defaults to `[ \f\t\n\r\v]+', which means that it splits STRING by + white-space. - The next 8 bits are the matching opposite parenthesis (if the -character has parenthesis syntax); otherwise, they are not meaningful. -The next 6 bits are the flags. + (split-string "foo bar") + => ("foo" "bar") + + (split-string "something") + => ("something") + + (split-string "a:b:c" ":") + => ("a" "b" "c") + + (split-string ":a::b:c" ":") + => ("" "a" "" "b" "c") + + - Function: split-path path + This function splits a search path into a list of strings. The + path components are separated with the characters specified with + `path-separator'. Under Unix, `path-separator' will normally be + `:', while under Windows, it will be `;'. + + - Function: looking-at regexp &optional buffer + This function determines whether the text in the current buffer + directly following point matches the regular expression REGEXP. + "Directly following" means precisely that: the search is + "anchored" and it can succeed only starting with the first + character following point. The result is `t' if so, `nil' + otherwise. + + This function does not move point, but it updates the match data, + which you can access using `match-beginning' and `match-end'. + *Note Match Data::. + + In this example, point is located directly before the `T'. If it + were anywhere else, the result would be `nil'. + + ---------- Buffer: foo ---------- + I read "-!-The cat in the hat + comes back" twice. + ---------- Buffer: foo ---------- + + (looking-at "The cat in the hat$") + => t  -File: lispref.info, Node: Abbrevs, Next: Extents, Prev: Syntax Tables, Up: Top - -Abbrevs And Abbrev Expansion -**************************** - - An abbreviation or "abbrev" is a string of characters that may be -expanded to a longer string. The user can insert the abbrev string and -find it replaced automatically with the expansion of the abbrev. This -saves typing. - - The set of abbrevs currently in effect is recorded in an "abbrev -table". Each buffer has a local abbrev table, but normally all buffers -in the same major mode share one abbrev table. There is also a global -abbrev table. Normally both are used. - - An abbrev table is represented as an obarray containing a symbol for -each abbreviation. The symbol's name is the abbreviation; its value is -the expansion; its function definition is the hook function to do the -expansion (*note Defining Abbrevs::); its property list cell contains -the use count, the number of times the abbreviation has been expanded. -Because these symbols are not interned in the usual obarray, they will -never appear as the result of reading a Lisp expression; in fact, -normally they are never used except by the code that handles abbrevs. -Therefore, it is safe to use them in an extremely nonstandard way. -*Note Creating Symbols::. - - For the user-level commands for abbrevs, see *Note Abbrev Mode: -(emacs)Abbrevs. - -* Menu: - -* Abbrev Mode:: Setting up XEmacs for abbreviation. -* Tables: Abbrev Tables. Creating and working with abbrev tables. -* Defining Abbrevs:: Specifying abbreviations and their expansions. -* Files: Abbrev Files. Saving abbrevs in files. -* Expansion: Abbrev Expansion. Controlling expansion; expansion subroutines. -* Standard Abbrev Tables:: Abbrev tables used by various major modes. +File: lispref.info, Node: POSIX Regexps, Next: Search and Replace, Prev: Regexp Search, Up: Searching and Matching + +POSIX Regular Expression Searching +================================== + + The usual regular expression functions do backtracking when necessary +to handle the `\|' and repetition constructs, but they continue this +only until they find _some_ match. Then they succeed and report the +first match found. + + This section describes alternative search functions which perform the +full backtracking specified by the POSIX standard for regular expression +matching. They continue backtracking until they have tried all +possibilities and found all matches, so they can report the longest +match, as required by POSIX. This is much slower, so use these +functions only when you really need the longest match. + + In Emacs versions prior to 19.29, these functions did not exist, and +the functions described above implemented full POSIX backtracking. + + - Command: posix-search-forward regexp &optional limit noerror count + buffer + This is like `re-search-forward' except that it performs the full + backtracking specified by the POSIX standard for regular expression + matching. + + - Command: posix-search-backward regexp &optional limit noerror count + buffer + This is like `re-search-backward' except that it performs the full + backtracking specified by the POSIX standard for regular expression + matching. + + - Function: posix-looking-at regexp &optional buffer + This is like `looking-at' except that it performs the full + backtracking specified by the POSIX standard for regular expression + matching. + + - Function: posix-string-match regexp string &optional start buffer + This is like `string-match' except that it performs the full + backtracking specified by the POSIX standard for regular expression + matching. + + Optional arg BUFFER controls how case folding is done (according + to the value of `case-fold-search' in BUFFER and BUFFER's case + tables) and defaults to the current buffer.  -File: lispref.info, Node: Abbrev Mode, Next: Abbrev Tables, Up: Abbrevs - -Setting Up Abbrev Mode -====================== +File: lispref.info, Node: Search and Replace, Next: Match Data, Prev: POSIX Regexps, Up: Searching and Matching - Abbrev mode is a minor mode controlled by the value of the variable -`abbrev-mode'. - - - Variable: abbrev-mode - A non-`nil' value of this variable turns on the automatic expansion - of abbrevs when their abbreviations are inserted into a buffer. - If the value is `nil', abbrevs may be defined, but they are not - expanded automatically. +Search and Replace +================== - This variable automatically becomes local when set in any fashion. + - Function: perform-replace from-string replacements query-flag + regexp-flag delimited-flag &optional repeat-count map + This function is the guts of `query-replace' and related commands. + It searches for occurrences of FROM-STRING and replaces some or + all of them. If QUERY-FLAG is `nil', it replaces all occurrences; + otherwise, it asks the user what to do about each one. - - Variable: default-abbrev-mode - This is the value of `abbrev-mode' for buffers that do not - override it. This is the same as `(default-value 'abbrev-mode)'. + If REGEXP-FLAG is non-`nil', then FROM-STRING is considered a + regular expression; otherwise, it must match literally. If + DELIMITED-FLAG is non-`nil', then only replacements surrounded by + word boundaries are considered. - -File: lispref.info, Node: Abbrev Tables, Next: Defining Abbrevs, Prev: Abbrev Mode, Up: Abbrevs + The argument REPLACEMENTS specifies what to replace occurrences + with. If it is a string, that string is used. It can also be a + list of strings, to be used in cyclic order. -Abbrev Tables -============= + If REPEAT-COUNT is non-`nil', it should be an integer. Then it + specifies how many times to use each of the strings in the + REPLACEMENTS list before advancing cyclicly to the next one. - This section describes how to create and manipulate abbrev tables. + Normally, the keymap `query-replace-map' defines the possible user + responses for queries. The argument MAP, if non-`nil', is a + keymap to use instead of `query-replace-map'. - - Function: make-abbrev-table - This function creates and returns a new, empty abbrev table--an - obarray containing no symbols. It is a vector filled with zeros. + - Variable: query-replace-map + This variable holds a special keymap that defines the valid user + responses for `query-replace' and related functions, as well as + `y-or-n-p' and `map-y-or-n-p'. It is unusual in two ways: - - Function: clear-abbrev-table table - This function undefines all the abbrevs in abbrev table TABLE, - leaving it empty. The function returns `nil'. + * The "key bindings" are not commands, just symbols that are + meaningful to the functions that use this map. - - Function: define-abbrev-table tabname definitions - This function defines TABNAME (a symbol) as an abbrev table name, - i.e., as a variable whose value is an abbrev table. It defines - abbrevs in the table according to DEFINITIONS, a list of elements - of the form `(ABBREVNAME EXPANSION HOOK USECOUNT)'. The value is - always `nil'. + * Prefix keys are not supported; each key binding must be for a + single event key sequence. This is because the functions + don't use read key sequence to get the input; instead, they + read a single event and look it up "by hand." - - Variable: abbrev-table-name-list - This is a list of symbols whose values are abbrev tables. - `define-abbrev-table' adds the new abbrev table name to this list. + Here are the meaningful "bindings" for `query-replace-map'. Several +of them are meaningful only for `query-replace' and friends. - - Function: insert-abbrev-table-description name &optional human - This function inserts before point a description of the abbrev - table named NAME. The argument NAME is a symbol whose value is an - abbrev table. The value is always `nil'. +`act' + Do take the action being considered--in other words, "yes." - If HUMAN is non-`nil', the description is human-oriented. - Otherwise the description is a Lisp expression--a call to - `define-abbrev-table' that would define NAME exactly as it is - currently defined. +`skip' + Do not take action for this question--in other words, "no." - -File: lispref.info, Node: Defining Abbrevs, Next: Abbrev Files, Prev: Abbrev Tables, Up: Abbrevs +`exit' + Answer this question "no," and give up on the entire series of + questions, assuming that the answers will be "no." -Defining Abbrevs -================ +`act-and-exit' + Answer this question "yes," and give up on the entire series of + questions, assuming that subsequent answers will be "no." - These functions define an abbrev in a specified abbrev table. -`define-abbrev' is the low-level basic function, while `add-abbrev' is -used by commands that ask for information from the user. +`act-and-show' + Answer this question "yes," but show the results--don't advance yet + to the next question. - - Function: add-abbrev table type arg - This function adds an abbreviation to abbrev table TABLE based on - information from the user. The argument TYPE is a string - describing in English the kind of abbrev this will be (typically, - `"global"' or `"mode-specific"'); this is used in prompting the - user. The argument ARG is the number of words in the expansion. +`automatic' + Answer this question and all subsequent questions in the series + with "yes," without further user interaction. - The return value is the symbol that internally represents the new - abbrev, or `nil' if the user declines to confirm redefining an - existing abbrev. +`backup' + Move back to the previous place that a question was asked about. - - Function: define-abbrev table name expansion hook - This function defines an abbrev in TABLE named NAME, to expand to - EXPANSION, and call HOOK. The return value is an uninterned - symbol that represents the abbrev inside XEmacs; its name is NAME. +`edit' + Enter a recursive edit to deal with this question--instead of any + other action that would normally be taken. - The argument NAME should be a string. The argument EXPANSION - should be a string, or `nil' to undefine the abbrev. +`delete-and-edit' + Delete the text being considered, then enter a recursive edit to + replace it. - The argument HOOK is a function or `nil'. If HOOK is non-`nil', - then it is called with no arguments after the abbrev is replaced - with EXPANSION; point is located at the end of EXPANSION when HOOK - is called. +`recenter' + Redisplay and center the window, then ask the same question again. - The use count of the abbrev is initialized to zero. +`quit' + Perform a quit right away. Only `y-or-n-p' and related functions + use this answer. - - User Option: only-global-abbrevs - If this variable is non-`nil', it means that the user plans to use - global abbrevs only. This tells the commands that define - mode-specific abbrevs to define global ones instead. This - variable does not alter the behavior of the functions in this - section; it is examined by their callers. +`help' + Display some help, then ask again.  -File: lispref.info, Node: Abbrev Files, Next: Abbrev Expansion, Prev: Defining Abbrevs, Up: Abbrevs - -Saving Abbrevs in Files -======================= - - A file of saved abbrev definitions is actually a file of Lisp code. -The abbrevs are saved in the form of a Lisp program to define the same -abbrev tables with the same contents. Therefore, you can load the file -with `load' (*note How Programs Do Loading::). However, the function -`quietly-read-abbrev-file' is provided as a more convenient interface. +File: lispref.info, Node: Match Data, Next: Searching and Case, Prev: Search and Replace, Up: Searching and Matching - User-level facilities such as `save-some-buffers' can save abbrevs -in a file automatically, under the control of variables described here. +The Match Data +============== - - User Option: abbrev-file-name - This is the default file name for reading and saving abbrevs. + XEmacs keeps track of the positions of the start and end of segments +of text found during a regular expression search. This means, for +example, that you can search for a complex pattern, such as a date in +an Rmail message, and then extract parts of the match under control of +the pattern. - - Function: quietly-read-abbrev-file filename - This function reads abbrev definitions from a file named FILENAME, - previously written with `write-abbrev-file'. If FILENAME is - `nil', the file specified in `abbrev-file-name' is used. - `save-abbrevs' is set to `t' so that changes will be saved. + Because the match data normally describe the most recent search only, +you must be careful not to do another search inadvertently between the +search you wish to refer back to and the use of the match data. If you +can't avoid another intervening search, you must save and restore the +match data around it, to prevent it from being overwritten. - This function does not display any messages. It returns `nil'. - - - User Option: save-abbrevs - A non-`nil' value for `save-abbrev' means that XEmacs should save - abbrevs when files are saved. `abbrev-file-name' specifies the - file to save the abbrevs in. - - - Variable: abbrevs-changed - This variable is set non-`nil' by defining or altering any - abbrevs. This serves as a flag for various XEmacs commands to - offer to save your abbrevs. +* Menu: - - Command: write-abbrev-file filename - Save all abbrev definitions, in all abbrev tables, in the file - FILENAME, in the form of a Lisp program that when loaded will - define the same abbrevs. This function returns `nil'. +* Simple Match Data:: Accessing single items of match data, + such as where a particular subexpression started. +* Replacing Match:: Replacing a substring that was matched. +* Entire Match Data:: Accessing the entire match data at once, as a list. +* Saving Match Data:: Saving and restoring the match data.  -File: lispref.info, Node: Abbrev Expansion, Next: Standard Abbrev Tables, Prev: Abbrev Files, Up: Abbrevs - -Looking Up and Expanding Abbreviations -====================================== - - Abbrevs are usually expanded by commands for interactive use, -including `self-insert-command'. This section describes the -subroutines used in writing such functions, as well as the variables -they use for communication. - - - Function: abbrev-symbol abbrev &optional table - This function returns the symbol representing the abbrev named - ABBREV. The value returned is `nil' if that abbrev is not - defined. The optional second argument TABLE is the abbrev table - to look it up in. If TABLE is `nil', this function tries first - the current buffer's local abbrev table, and second the global - abbrev table. - - - Function: abbrev-expansion abbrev &optional table - This function returns the string that ABBREV would expand into (as - defined by the abbrev tables used for the current buffer). The - optional argument TABLE specifies the abbrev table to use, as in - `abbrev-symbol'. - - - Command: expand-abbrev - This command expands the abbrev before point, if any. If point - does not follow an abbrev, this command does nothing. The command - returns `t' if it did expansion, `nil' otherwise. - - - Command: abbrev-prefix-mark &optional arg - Mark current point as the beginning of an abbrev. The next call to - `expand-abbrev' will use the text from here to point (where it is - then) as the abbrev to expand, rather than using the previous word - as usual. - - - User Option: abbrev-all-caps - When this is set non-`nil', an abbrev entered entirely in upper - case is expanded using all upper case. Otherwise, an abbrev - entered entirely in upper case is expanded by capitalizing each - word of the expansion. - - - Variable: abbrev-start-location - This is the buffer position for `expand-abbrev' to use as the start - of the next abbrev to be expanded. (`nil' means use the word - before point instead.) `abbrev-start-location' is set to `nil' - each time `expand-abbrev' is called. This variable is also set by - `abbrev-prefix-mark'. - - - Variable: abbrev-start-location-buffer - The value of this variable is the buffer for which - `abbrev-start-location' has been set. Trying to expand an abbrev - in any other buffer clears `abbrev-start-location'. This variable - is set by `abbrev-prefix-mark'. - - - Variable: last-abbrev - This is the `abbrev-symbol' of the last abbrev expanded. This - information is left by `expand-abbrev' for the sake of the - `unexpand-abbrev' command. - - - Variable: last-abbrev-location - This is the location of the last abbrev expanded. This contains - information left by `expand-abbrev' for the sake of the - `unexpand-abbrev' command. - - - Variable: last-abbrev-text - This is the exact expansion text of the last abbrev expanded, - after case conversion (if any). Its value is `nil' if the abbrev - has already been unexpanded. This contains information left by - `expand-abbrev' for the sake of the `unexpand-abbrev' command. - - - Variable: pre-abbrev-expand-hook - This is a normal hook whose functions are executed, in sequence, - just before any expansion of an abbrev. *Note Hooks::. Since it - is a normal hook, the hook functions receive no arguments. - However, they can find the abbrev to be expanded by looking in the - buffer before point. - - The following sample code shows a simple use of -`pre-abbrev-expand-hook'. If the user terminates an abbrev with a -punctuation character, the hook function asks for confirmation. Thus, -this hook allows the user to decide whether to expand the abbrev, and -aborts expansion if it is not confirmed. - - (add-hook 'pre-abbrev-expand-hook 'query-if-not-space) +File: lispref.info, Node: Simple Match Data, Next: Replacing Match, Up: Match Data + +Simple Match Data Access +------------------------ + + This section explains how to use the match data to find out what was +matched by the last search or match operation. + + You can ask about the entire matching text, or about a particular +parenthetical subexpression of a regular expression. The COUNT +argument in the functions below specifies which. If COUNT is zero, you +are asking about the entire match. If COUNT is positive, it specifies +which subexpression you want. + + Recall that the subexpressions of a regular expression are those +expressions grouped with escaped parentheses, `\(...\)'. The COUNTth +subexpression is found by counting occurrences of `\(' from the +beginning of the whole regular expression. The first subexpression is +numbered 1, the second 2, and so on. Only regular expressions can have +subexpressions--after a simple string search, the only information +available is about the entire match. + + - Function: match-string count &optional in-string + This function returns, as a string, the text matched in the last + search or match operation. It returns the entire text if COUNT is + zero, or just the portion corresponding to the COUNTth + parenthetical subexpression, if COUNT is positive. If COUNT is + out of range, or if that subexpression didn't match anything, the + value is `nil'. + + If the last such operation was done against a string with + `string-match', then you should pass the same string as the + argument IN-STRING. Otherwise, after a buffer search or match, + you should omit IN-STRING or pass `nil' for it; but you should + make sure that the current buffer when you call `match-string' is + the one in which you did the searching or matching. + + - Function: match-beginning count + This function returns the position of the start of text matched by + the last regular expression searched for, or a subexpression of it. + + If COUNT is zero, then the value is the position of the start of + the entire match. Otherwise, COUNT specifies a subexpression in + the regular expression, and the value of the function is the + starting position of the match for that subexpression. + + The value is `nil' for a subexpression inside a `\|' alternative + that wasn't used in the match. + + - Function: match-end count + This function is like `match-beginning' except that it returns the + position of the end of the match, rather than the position of the + beginning. + + Here is an example of using the match data, with a comment showing +the positions within the text: + + (string-match "\\(qu\\)\\(ick\\)" + "The quick fox jumped quickly.") + ;0123456789 + => 4 - ;; This is the function invoked by `pre-abbrev-expand-hook'. + (match-string 0 "The quick fox jumped quickly.") + => "quick" + (match-string 1 "The quick fox jumped quickly.") + => "qu" + (match-string 2 "The quick fox jumped quickly.") + => "ick" - ;; If the user terminated the abbrev with a space, the function does - ;; nothing (that is, it returns so that the abbrev can expand). If the - ;; user entered some other character, this function asks whether - ;; expansion should continue. + (match-beginning 1) ; The beginning of the match + => 4 ; with `qu' is at index 4. - ;; If the user answers the prompt with `y', the function returns - ;; `nil' (because of the `not' function), but that is - ;; acceptable; the return value has no effect on expansion. + (match-beginning 2) ; The beginning of the match + => 6 ; with `ick' is at index 6. - (defun query-if-not-space () - (if (/= ?\ (preceding-char)) - (if (not (y-or-n-p "Do you want to expand this abbrev? ")) - (error "Not expanding this abbrev")))) - - -File: lispref.info, Node: Standard Abbrev Tables, Prev: Abbrev Expansion, Up: Abbrevs - -Standard Abbrev Tables -====================== - - Here we list the variables that hold the abbrev tables for the -preloaded major modes of XEmacs. - - - Variable: global-abbrev-table - This is the abbrev table for mode-independent abbrevs. The abbrevs - defined in it apply to all buffers. Each buffer may also have a - local abbrev table, whose abbrev definitions take precedence over - those in the global table. - - - Variable: local-abbrev-table - The value of this buffer-local variable is the (mode-specific) - abbreviation table of the current buffer. - - - Variable: fundamental-mode-abbrev-table - This is the local abbrev table used in Fundamental mode; in other - words, it is the local abbrev table in all buffers in Fundamental - mode. - - - Variable: text-mode-abbrev-table - This is the local abbrev table used in Text mode. - - - Variable: c-mode-abbrev-table - This is the local abbrev table used in C mode. - - - Variable: lisp-mode-abbrev-table - This is the local abbrev table used in Lisp mode and Emacs Lisp - mode. + (match-end 1) ; The end of the match + => 6 ; with `qu' is at index 6. + + (match-end 2) ; The end of the match + => 9 ; with `ick' is at index 9. + + Here is another example. Point is initially located at the beginning +of the line. Searching moves point to between the space and the word +`in'. The beginning of the entire match is at the 9th character of the +buffer (`T'), and the beginning of the match for the first +subexpression is at the 13th character (`c'). + + (list + (re-search-forward "The \\(cat \\)") + (match-beginning 0) + (match-beginning 1)) + => (9 9 13) + + ---------- Buffer: foo ---------- + I read "The cat -!-in the hat comes back" twice. + ^ ^ + 9 13 + ---------- Buffer: foo ---------- + +(In this case, the index returned is a buffer position; the first +character of the buffer counts as 1.)