This is ../info/lispref.info, produced by makeinfo version 4.0 from
lispref/lispref.texi.

INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
* Lispref: (lispref).		XEmacs Lisp Reference Manual.
END-INFO-DIR-ENTRY

   Edition History:

   GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998

   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
Copyright (C) 1995, 1996 Ben Wing.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: lispref.info,  Node: Examining Properties,  Next: Changing Properties,  Up: Text Properties

Examining Text Properties
-------------------------

   The simplest way to examine text properties is to ask for the value
of a particular property of a particular character.  For that, use
`get-text-property'.  Use `text-properties-at' to get the entire
property list of a character.  *Note Property Search::, for functions
to examine the properties of a number of characters at once.

   These functions handle both strings and buffers.  (Keep in mind that
positions in a string start from 0, whereas positions in a buffer start
from 1.)

 - Function: get-text-property pos prop &optional object
     This function returns the value of the PROP property of the
     character after position POS in OBJECT (a buffer or string).  The
     argument OBJECT is optional and defaults to the current buffer.

 - Function: get-char-property pos prop &optional object
     This function is like `get-text-property', except that it checks
     all extents, not just text-property extents.


 - Function: text-properties-at position &optional object
     This function returns the entire property list of the character at
     POSITION in the string or buffer OBJECT.  If OBJECT is `nil', it
     defaults to the current buffer.

 - Variable: default-text-properties
     This variable holds a property list giving default values for text
     properties.  Whenever a character does not specify a value for a
     property, the value stored in this list is used instead.  Here is
     an example:

          (setq default-text-properties '(foo 69))
          ;; Make sure character 1 has no properties of its own.
          (set-text-properties 1 2 nil)
          ;; What we get, when we ask, is the default value.
          (get-text-property 1 'foo)
               => 69


File: lispref.info,  Node: Changing Properties,  Next: Property Search,  Prev: Examining Properties,  Up: Text Properties

Changing Text Properties
------------------------

   The primitives for changing properties apply to a specified range of
text.  The function `set-text-properties' (see end of section) sets the
entire property list of the text in that range; more often, it is
useful to add, change, or delete just certain properties specified by
name.

   Since text properties are considered part of the buffer's contents,
and can affect how the buffer looks on the screen, any change in the
text properties is considered a buffer modification.  Buffer text
property changes are undoable (*note Undo::).

 - Function: put-text-property start end prop value &optional object
     This function sets the PROP property to VALUE for the text between
     START and END in the string or buffer OBJECT.  If OBJECT is `nil',
     it defaults to the current buffer.

 - Function: add-text-properties start end props &optional object
     This function modifies the text properties for the text between
     START and END in the string or buffer OBJECT.  If OBJECT is `nil',
     it defaults to the current buffer.

     The argument PROPS specifies which properties to change.  It
     should have the form of a property list (*note Property Lists::):
     a list whose elements include the property names followed
     alternately by the corresponding values.

     The return value is `t' if the function actually changed some
     property's value; `nil' otherwise (if PROPS is `nil' or its values
     agree with those in the text).

     For example, here is how to set the `comment' and `face'
     properties of a range of text:

          (add-text-properties START END
                               '(comment t face highlight))

 - Function: remove-text-properties start end props &optional object
     This function deletes specified text properties from the text
     between START and END in the string or buffer OBJECT.  If OBJECT
     is `nil', it defaults to the current buffer.

     The argument PROPS specifies which properties to delete.  It
     should have the form of a property list (*note Property Lists::):
     a list whose elements are property names alternating with
     corresponding values.  But only the names matter--the values that
     accompany them are ignored.  For example, here's how to remove the
     `face' property.

          (remove-text-properties START END '(face nil))

     The return value is `t' if the function actually changed some
     property's value; `nil' otherwise (if PROPS is `nil' or if no
     character in the specified text had any of those properties).

 - Function: set-text-properties start end props &optional object
     This function completely replaces the text property list for the
     text between START and END in the string or buffer OBJECT.  If
     OBJECT is `nil', it defaults to the current buffer.

     The argument PROPS is the new property list.  It should be a list
     whose elements are property names alternating with corresponding
     values.

     After `set-text-properties' returns, all the characters in the
     specified range have identical properties.

     If PROPS is `nil', the effect is to get rid of all properties from
     the specified range of text.  Here's an example:

          (set-text-properties START END nil)

   See also the function `buffer-substring-without-properties' (*note
Buffer Contents::) which copies text from the buffer but does not copy
its properties.


File: lispref.info,  Node: Property Search,  Next: Special Properties,  Prev: Changing Properties,  Up: Text Properties

Property Search Functions
-------------------------

   In typical use of text properties, most of the time several or many
consecutive characters have the same value for a property.  Rather than
writing your programs to examine characters one by one, it is much
faster to process chunks of text that have the same property value.

   Here are functions you can use to do this.  They use `eq' for
comparing property values.  In all cases, OBJECT defaults to the
current buffer.

   For high performance, it's very important to use the LIMIT argument
to these functions, especially the ones that search for a single
property--otherwise, they may spend a long time scanning to the end of
the buffer, if the property you are interested in does not change.

   Remember that a position is always between two characters; the
position returned by these functions is between two characters with
different properties.

 - Function: next-property-change pos &optional object limit
     The function scans the text forward from position POS in the
     string or buffer OBJECT till it finds a change in some text
     property, then returns the position of the change.  In other
     words, it returns the position of the first character beyond POS
     whose properties are not identical to those of the character just
     after POS.

     If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
     there is no property change before that point,
     `next-property-change' returns LIMIT.

     The value is `nil' if the properties remain unchanged all the way
     to the end of OBJECT and LIMIT is `nil'.  If the value is
     non-`nil', it is a position greater than or equal to POS.  The
     value equals POS only when LIMIT equals POS.

     Here is an example of how to scan the buffer by chunks of text
     within which all properties are constant:

          (while (not (eobp))
            (let ((plist (text-properties-at (point)))
                  (next-change
                   (or (next-property-change (point) (current-buffer))
                       (point-max))))
              Process text from point to NEXT-CHANGE...
              (goto-char next-change)))

 - Function: next-single-property-change pos prop &optional object limit
     The function scans the text forward from position POS in the
     string or buffer OBJECT till it finds a change in the PROP
     property, then returns the position of the change.  In other
     words, it returns the position of the first character beyond POS
     whose PROP property differs from that of the character just after
     POS.

     If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
     there is no property change before that point,
     `next-single-property-change' returns LIMIT.

     The value is `nil' if the property remains unchanged all the way to
     the end of OBJECT and LIMIT is `nil'.  If the value is non-`nil',
     it is a position greater than or equal to POS; it equals POS only
     if LIMIT equals POS.

 - Function: previous-property-change pos &optional object limit
     This is like `next-property-change', but scans back from POS
     instead of forward.  If the value is non-`nil', it is a position
     less than or equal to POS; it equals POS only if LIMIT equals POS.

 - Function: previous-single-property-change pos prop &optional object
          limit
     This is like `next-single-property-change', but scans back from
     POS instead of forward.  If the value is non-`nil', it is a
     position less than or equal to POS; it equals POS only if LIMIT
     equals POS.

 - Function: text-property-any start end prop value &optional object
     This function returns non-`nil' if at least one character between
     START and END has a property PROP whose value is VALUE.  More
     precisely, it returns the position of the first such character.
     Otherwise, it returns `nil'.

     The optional fifth argument, OBJECT, specifies the string or
     buffer to scan.  Positions are relative to OBJECT.  The default
     for OBJECT is the current buffer.

 - Function: text-property-not-all start end prop value &optional object
     This function returns non-`nil' if at least one character between
     START and END has a property PROP whose value differs from VALUE.
     More precisely, it returns the position of the first such
     character.  Otherwise, it returns `nil'.

     The optional fifth argument, OBJECT, specifies the string or
     buffer to scan.  Positions are relative to OBJECT.  The default
     for OBJECT is the current buffer.


File: lispref.info,  Node: Special Properties,  Next: Saving Properties,  Prev: Property Search,  Up: Text Properties

Properties with Special Meanings
--------------------------------

   The predefined properties are the same as those for extents.  *Note
Extent Properties::.


File: lispref.info,  Node: Saving Properties,  Prev: Special Properties,  Up: Text Properties

Saving Text Properties in Files
-------------------------------

   You can save text properties in files, and restore text properties
when inserting the files, using these two hooks:

 - Variable: write-region-annotate-functions
     This variable's value is a list of functions for `write-region' to
     run to encode text properties in some fashion as annotations to
     the text being written in the file.  *Note Writing to Files::.

     Each function in the list is called with two arguments: the start
     and end of the region to be written.  These functions should not
     alter the contents of the buffer.  Instead, they should return
     lists indicating annotations to write in the file in addition to
     the text in the buffer.

     Each function should return a list of elements of the form
     `(POSITION . STRING)', where POSITION is an integer specifying the
     relative position in the text to be written, and STRING is the
     annotation to add there.

     Each list returned by one of these functions must be already
     sorted in increasing order by POSITION.  If there is more than one
     function, `write-region' merges the lists destructively into one
     sorted list.

     When `write-region' actually writes the text from the buffer to the
     file, it intermixes the specified annotations at the corresponding
     positions.  All this takes place without modifying the buffer.

 - Variable: after-insert-file-functions
     This variable holds a list of functions for `insert-file-contents'
     to call after inserting a file's contents.  These functions should
     scan the inserted text for annotations, and convert them to the
     text properties they stand for.

     Each function receives one argument, the length of the inserted
     text; point indicates the start of that text.  The function should
     scan that text for annotations, delete them, and create the text
     properties that the annotations specify.  The function should
     return the updated length of the inserted text, as it stands after
     those changes.  The value returned by one function becomes the
     argument to the next function.

     These functions should always return with point at the beginning of
     the inserted text.

     The intended use of `after-insert-file-functions' is for converting
     some sort of textual annotations into actual text properties.  But
     other uses may be possible.

   We invite users to write Lisp programs to store and retrieve text
properties in files, using these hooks, and thus to experiment with
various data formats and find good ones.  Eventually we hope users will
produce good, general extensions we can install in Emacs.

   We suggest not trying to handle arbitrary Lisp objects as property
names or property values--because a program that general is probably
difficult to write, and slow.  Instead, choose a set of possible data
types that are reasonably flexible, and not too hard to encode.

   *Note Format Conversion::, for a related feature.


File: lispref.info,  Node: Substitution,  Next: Registers,  Prev: Text Properties,  Up: Text

Substituting for a Character Code
=================================

   The following functions replace characters within a specified region
based on their character codes.

 - Function: subst-char-in-region start end old-char new-char &optional
          noundo
     This function replaces all occurrences of the character OLD-CHAR
     with the character NEW-CHAR in the region of the current buffer
     defined by START and END.

     If NOUNDO is non-`nil', then `subst-char-in-region' does not
     record the change for undo and does not mark the buffer as
     modified.  This feature is used for controlling selective display
     (*note Selective Display::).

     `subst-char-in-region' does not move point and returns `nil'.

          ---------- Buffer: foo ----------
          This is the contents of the buffer before.
          ---------- Buffer: foo ----------
          
          (subst-char-in-region 1 20 ?i ?X)
               => nil
          
          ---------- Buffer: foo ----------
          ThXs Xs the contents of the buffer before.
          ---------- Buffer: foo ----------

 - Function: translate-region start end table
     This function applies a translation table to the characters in the
     buffer between positions START and END.  The translation table
     TABLE can be either a string, a vector, or a char-table.

     If TABLE is a string, its Nth element is the mapping for the
     character with code N.

     If TABLE is a vector, its Nth element is the mapping for character
     with code N.  Legal mappings are characters, strings, or `nil'
     (meaning don't replace.)

     If TABLE is a char-table, its elements describe the mapping
     between characters and their replacements.  The char-table should
     be of type `char' or `generic'.

     When the TABLE is a string or vector and its length is less than
     the total number of characters (256 without Mule), any characters
     with codes larger than the length of TABLE are not altered by the
     translation.

     The return value of `translate-region' is the number of characters
     that were actually changed by the translation.  This does not
     count characters that were mapped into themselves in the
     translation table.

     *NOTE*: Prior to XEmacs 21.2, the TABLE argument was allowed only
     to be a string.  This is still the case in FSF Emacs.

     The following example creates a char-table that is passed to
     `translate-region', which translates character `a' to `the letter
     a', removes character `b', and translates character `c' to newline.

          ---------- Buffer: foo ----------
          Here is a sentence in the buffer.
          ---------- Buffer: foo ----------
          
          (let ((table (make-char-table 'generic)))
            (put-char-table ?a "the letter a" table)
            (put-char-table ?b "" table)
            (put-char-table ?c ?\n table)
            (translate-region (point-min) (point-max) table))
               => 3
          
          ---------- Buffer: foo ----------
          Here is the letter a senten
          e in the uffer.
          ---------- Buffer: foo ----------


File: lispref.info,  Node: Registers,  Next: Transposition,  Prev: Substitution,  Up: Text

Registers
=========

   A register is a sort of variable used in XEmacs editing that can
hold a marker, a string, a rectangle, a window configuration (of one
frame), or a frame configuration (of all frames).  Each register is
named by a single character.  All characters, including control and
meta characters (but with the exception of `C-g'), can be used to name
registers.  Thus, there are 255 possible registers.  A register is
designated in Emacs Lisp by a character that is its name.

   The functions in this section return unpredictable values unless
otherwise stated.

 - Variable: register-alist
     This variable is an alist of elements of the form `(NAME .
     CONTENTS)'.  Normally, there is one element for each XEmacs
     register that has been used.

     The object NAME is a character (an integer) identifying the
     register.  The object CONTENTS is a string, marker, or list
     representing the register contents.  A string represents text
     stored in the register.  A marker represents a position.  A list
     represents a rectangle; its elements are strings, one per line of
     the rectangle.

 - Function: get-register reg
     This function returns the contents of the register REG, or `nil'
     if it has no contents.

 - Function: set-register reg value
     This function sets the contents of register REG to VALUE.  A
     register can be set to any value, but the other register functions
     expect only certain data types.  The return value is VALUE.

 - Command: view-register reg
     This command displays what is contained in register REG.

 - Command: insert-register reg &optional beforep
     This command inserts contents of register REG into the current
     buffer.

     Normally, this command puts point before the inserted text, and the
     mark after it.  However, if the optional second argument BEFOREP
     is non-`nil', it puts the mark before and point after.  You can
     pass a non-`nil' second argument BEFOREP to this function
     interactively by supplying any prefix argument.

     If the register contains a rectangle, then the rectangle is
     inserted with its upper left corner at point.  This means that
     text is inserted in the current line and underneath it on
     successive lines.

     If the register contains something other than saved text (a
     string) or a rectangle (a list), currently useless things happen.
     This may be changed in the future.


File: lispref.info,  Node: Transposition,  Next: Change Hooks,  Prev: Registers,  Up: Text

Transposition of Text
=====================

   This subroutine is used by the transposition commands.

 - Function: transpose-regions start1 end1 start2 end2 &optional
          leave-markers
     This function exchanges two nonoverlapping portions of the buffer.
     Arguments START1 and END1 specify the bounds of one portion and
     arguments START2 and END2 specify the bounds of the other portion.

     Normally, `transpose-regions' relocates markers with the transposed
     text; a marker previously positioned within one of the two
     transposed portions moves along with that portion, thus remaining
     between the same two characters in their new position.  However,
     if LEAVE-MARKERS is non-`nil', `transpose-regions' does not do
     this--it leaves all markers unrelocated.


File: lispref.info,  Node: Change Hooks,  Next: Transformations,  Prev: Transposition,  Up: Text

Change Hooks
============

   These hook variables let you arrange to take notice of all changes in
all buffers (or in a particular buffer, if you make them buffer-local).

   The functions you use in these hooks should save and restore the
match data if they do anything that uses regular expressions;
otherwise, they will interfere in bizarre ways with the editing
operations that call them.

   Buffer changes made while executing the following hooks don't
themselves cause any change hooks to be invoked.

 - Variable: before-change-functions
     This variable holds a list of a functions to call before any buffer
     modification.  Each function gets two arguments, the beginning and
     end of the region that is about to change, represented as
     integers.  The buffer that is about to change is always the
     current buffer.

 - Variable: after-change-functions
     This variable holds a list of a functions to call after any buffer
     modification.  Each function receives three arguments: the
     beginning and end of the region just changed, and the length of
     the text that existed before the change.  (To get the current
     length, subtract the region beginning from the region end.)  All
     three arguments are integers.  The buffer that's about to change
     is always the current buffer.

 - Variable: before-change-function
     This obsolete variable holds one function to call before any buffer
     modification (or `nil' for no function).  It is called just like
     the functions in `before-change-functions'.

 - Variable: after-change-function
     This obsolete variable holds one function to call after any buffer
     modification (or `nil' for no function).  It is called just like
     the functions in `after-change-functions'.

 - Variable: first-change-hook
     This variable is a normal hook that is run whenever a buffer is
     changed that was previously in the unmodified state.


File: lispref.info,  Node: Transformations,  Prev: Change Hooks,  Up: Text

Textual transformations--MD5 and base64 support
===============================================

   Some textual operations inherently require examining each character
in turn, and performing arithmetic operations on them.  Such operations
can, of course, be implemented in Emacs Lisp, but tend to be very slow
for large portions of text or data.  This is why some of them are
implemented in C, with an appropriate interface for Lisp programmers.
Examples of algorithms thus provided are MD5 and base64 support.

   MD5 is an algorithm for calculating message digests, as described in
rfc1321.  Given a message of arbitrary length, MD5 produces an 128-bit
"fingerprint" ("message digest") corresponding to that message.  It is
considered computationally infeasible to produce two messages having
the same MD5 digest, or to produce a message having a prespecified
target digest.  MD5 is used heavily by various authentication schemes.

   Emacs Lisp interface to MD5 consists of a single function `md5':

 - Function: md5 object &optional start end
     This function returns the MD5 message digest of OBJECT, a buffer
     or string.

     Optional arguments START and END denote positions for computing
     the digest of a portion of OBJECT.

     Some examples of usage:

          ;; Calculate the digest of the entire buffer
          (md5 (current-buffer))
               => "8842b04362899b1cda8d2d126dc11712"
          
          ;; Calculate the digest of the current line
          (md5 (current-buffer) (point-at-bol) (point-at-eol))
               => "60614d21e9dee27dfdb01fa4e30d6d00"
          
          ;; Calculate the digest of your name and email address
          (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
               => "0a2188c40fd38922d941fe6032fce516"

   Base64 is a portable encoding for arbitrary sequences of octets, in a
form that need not be readable by humans.  It uses a 65-character subset
of US-ASCII, as described in rfc2045.  Base64 is used by MIME to encode
binary bodies, and to encode binary characters in message headers.

   The Lisp interface to base64 consists of four functions:

 - Function: base64-encode-region beg end &optional no-line-break
     This function encodes the region between BEG and END of the
     current buffer to base64 format.  This means that the original
     region is deleted, and replaced with its base64 equivalent.

     Normally, encoded base64 output is multi-line, with 76-character
     lines.  If NO-LINE-BREAK is non-`nil', newlines will not be
     inserted, resulting in single-line output.

     Mule note: you should make sure that you convert the multibyte
     characters (those that do not fit into 0-255 range) to something
     else, because they cannot be meaningfully converted to base64.  If
     the `base64-encode-region' encounters such characters, it will
     signal an error.

     `base64-encode-region' returns the length of the encoded text.

          ;; Encode the whole buffer in base64
          (base64-encode-region (point-min) (point-max))

     The function can also be used interactively, in which case it
     works on the currently active region.

 - Function: base64-encode-string string
     This function encodes STRING to base64, and returns the encoded
     string.

     For Mule, the same considerations apply as for
     `base64-encode-region'.

          (base64-encode-string "fubar")
              => "ZnViYXI="

 - Function: base64-decode-region beg end
     This function decodes the region between BEG and END of the
     current buffer.  The region should be in base64 encoding.

     If the region was decoded correctly, `base64-decode-region' returns
     the length of the decoded region.  If the decoding failed, `nil' is
     returned.

          ;; Decode a base64 buffer, and replace it with the decoded version
          (base64-decode-region (point-min) (point-max))

 - Function: base64-decode-string string
     This function decodes STRING to base64, and returns the decoded
     string.  STRING should be valid base64-encoded text.

     If encoding was not possible, `nil' is returned.

          (base64-decode-string "ZnViYXI=")
              => "fubar"
          
          (base64-decode-string "totally bogus")
              => nil


File: lispref.info,  Node: Searching and Matching,  Next: Syntax Tables,  Prev: Text,  Up: Top

Searching and Matching
**********************

   XEmacs provides two ways to search through a buffer for specified
text: exact string searches and regular expression searches.  After a
regular expression search, you can examine the "match data" to
determine which text matched the whole regular expression or various
portions of it.

* Menu:

* String Search::         Search for an exact match.
* Regular Expressions::   Describing classes of strings.
* Regexp Search::         Searching for a match for a regexp.
* POSIX Regexps::         Searching POSIX-style for the longest match.
* Search and Replace::	  Internals of `query-replace'.
* Match Data::            Finding out which part of the text matched
                            various parts of a regexp, after regexp search.
* Searching and Case::    Case-independent or case-significant searching.
* Standard Regexps::      Useful regexps for finding sentences, pages,...

   The `skip-chars...' functions also perform a kind of searching.
*Note Skipping Characters::.


File: lispref.info,  Node: String Search,  Next: Regular Expressions,  Up: Searching and Matching

Searching for Strings
=====================

   These are the primitive functions for searching through the text in a
buffer.  They are meant for use in programs, but you may call them
interactively.  If you do so, they prompt for the search string; LIMIT
and NOERROR are set to `nil', and REPEAT is set to 1.

 - Command: search-forward string &optional limit noerror repeat
     This function searches forward from point for an exact match for
     STRING.  If successful, it sets point to the end of the occurrence
     found, and returns the new value of point.  If no match is found,
     the value and side effects depend on NOERROR (see below).

     In the following example, point is initially at the beginning of
     the line.  Then `(search-forward "fox")' moves point after the last
     letter of `fox':

          ---------- Buffer: foo ----------
          -!-The quick brown fox jumped over the lazy dog.
          ---------- Buffer: foo ----------
          
          (search-forward "fox")
               => 20
          
          ---------- Buffer: foo ----------
          The quick brown fox-!- jumped over the lazy dog.
          ---------- Buffer: foo ----------

     The argument LIMIT specifies the upper bound to the search.  (It
     must be a position in the current buffer.)  No match extending
     after that position is accepted.  If LIMIT is omitted or `nil', it
     defaults to the end of the accessible portion of the buffer.

     What happens when the search fails depends on the value of
     NOERROR.  If NOERROR is `nil', a `search-failed' error is
     signaled.  If NOERROR is `t', `search-forward' returns `nil' and
     does nothing.  If NOERROR is neither `nil' nor `t', then
     `search-forward' moves point to the upper bound and returns `nil'.
     (It would be more consistent now to return the new position of
     point in that case, but some programs may depend on a value of
     `nil'.)

     If REPEAT is supplied (it must be a positive number), then the
     search is repeated that many times (each time starting at the end
     of the previous time's match).  If these successive searches
     succeed, the function succeeds, moving point and returning its new
     value.  Otherwise the search fails.

 - Command: search-backward string &optional limit noerror repeat
     This function searches backward from point for STRING.  It is just
     like `search-forward' except that it searches backwards and leaves
     point at the beginning of the match.

 - Command: word-search-forward string &optional limit noerror repeat
     This function searches forward from point for a "word" match for
     STRING.  If it finds a match, it sets point to the end of the
     match found, and returns the new value of point.

     Word matching regards STRING as a sequence of words, disregarding
     punctuation that separates them.  It searches the buffer for the
     same sequence of words.  Each word must be distinct in the buffer
     (searching for the word `ball' does not match the word `balls'),
     but the details of punctuation and spacing are ignored (searching
     for `ball boy' does match `ball.  Boy!').

     In this example, point is initially at the beginning of the
     buffer; the search leaves it between the `y' and the `!'.

          ---------- Buffer: foo ----------
          -!-He said "Please!  Find
          the ball boy!"
          ---------- Buffer: foo ----------
          
          (word-search-forward "Please find the ball, boy.")
               => 35
          
          ---------- Buffer: foo ----------
          He said "Please!  Find
          the ball boy-!-!"
          ---------- Buffer: foo ----------

     If LIMIT is non-`nil' (it must be a position in the current
     buffer), then it is the upper bound to the search.  The match
     found must not extend after that position.

     If NOERROR is `nil', then `word-search-forward' signals an error
     if the search fails.  If NOERROR is `t', then it returns `nil'
     instead of signaling an error.  If NOERROR is neither `nil' nor
     `t', it moves point to LIMIT (or the end of the buffer) and
     returns `nil'.

     If REPEAT is non-`nil', then the search is repeated that many
     times.  Point is positioned at the end of the last match.

 - Command: word-search-backward string &optional limit noerror repeat
     This function searches backward from point for a word match to
     STRING.  This function is just like `word-search-forward' except
     that it searches backward and normally leaves point at the
     beginning of the match.


File: lispref.info,  Node: Regular Expressions,  Next: Regexp Search,  Prev: String Search,  Up: Searching and Matching

Regular Expressions
===================

   A "regular expression" ("regexp", for short) is a pattern that
denotes a (possibly infinite) set of strings.  Searching for matches for
a regexp is a very powerful operation.  This section explains how to
write regexps; the following section says how to search for them.

   To gain a thorough understanding of regular expressions and how to
use them to best advantage, we recommend that you study `Mastering
Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
1997'. (It's known as the "Hip Owls" book, because of the picture on its
cover.)  You might also read the manuals to *Note (gawk)Top::, *Note
(ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
(rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
regular expressions.

   The XEmacs regular expression syntax most closely resembles that of
`ed', or `grep', the GNU versions of which all utilize the GNU `regex'
library.  XEmacs' version of `regex' has recently been extended with
some Perl-like capabilities, described in the next section.

* Menu:

* Syntax of Regexps::       Rules for writing regular expressions.
* Regexp Example::          Illustrates regular expression syntax.


File: lispref.info,  Node: Syntax of Regexps,  Next: Regexp Example,  Up: Regular Expressions

Syntax of Regular Expressions
-----------------------------

   Regular expressions have a syntax in which a few characters are
special constructs and the rest are "ordinary".  An ordinary character
is a simple regular expression that matches that character and nothing
else.  The special characters are `.', `*', `+', `?', `[', `]', `^',
`$', and `\'; no new special characters will be defined in the future.
Any other character appearing in a regular expression is ordinary,
unless a `\' precedes it.

   For example, `f' is not a special character, so it is ordinary, and
therefore `f' is a regular expression that matches the string `f' and
no other string.  (It does _not_ match the string `ff'.)  Likewise, `o'
is a regular expression that matches only `o'.

   Any two regular expressions A and B can be concatenated.  The result
is a regular expression that matches a string if A matches some amount
of the beginning of that string and B matches the rest of the string.

   As a simple example, we can concatenate the regular expressions `f'
and `o' to get the regular expression `fo', which matches only the
string `fo'.  Still trivial.  To do something more powerful, you need
to use one of the special characters.  Here is a list of them:

`. (Period)'
     is a special character that matches any single character except a
     newline.  Using concatenation, we can make regular expressions
     like `a.b', which matches any three-character string that begins
     with `a' and ends with `b'.

`*'
     is not a construct by itself; it is a quantifying suffix operator
     that means to repeat the preceding regular expression as many
     times as possible.  In `fo*', the `*' applies to the `o', so `fo*'
     matches one `f' followed by any number of `o's.  The case of zero
     `o's is allowed: `fo*' does match `f'.

     `*' always applies to the _smallest_ possible preceding
     expression.  Thus, `fo*' has a repeating `o', not a repeating `fo'.

     The matcher processes a `*' construct by matching, immediately, as
     many repetitions as can be found; it is "greedy".  Then it
     continues with the rest of the pattern.  If that fails,
     backtracking occurs, discarding some of the matches of the
     `*'-modified construct in case that makes it possible to match the
     rest of the pattern.  For example, in matching `ca*ar' against the
     string `caaar', the `a*' first tries to match all three `a's; but
     the rest of the pattern is `ar' and there is only `r' left to
     match, so this try fails.  The next alternative is for `a*' to
     match only two `a's.  With this choice, the rest of the regexp
     matches successfully.

     Nested repetition operators can be extremely slow if they specify
     backtracking loops.  For example, it could take hours for the
     regular expression `\(x+y*\)*a' to match the sequence
     `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'.  The slowness is because
     Emacs must try each imaginable way of grouping the 35 `x''s before
     concluding that none of them can work.  To make sure your regular
     expressions run fast, check nested repetitions carefully.

`+'
     is a quantifying suffix operator similar to `*' except that the
     preceding expression must match at least once.  It is also
     "greedy".  So, for example, `ca+r' matches the strings `car' and
     `caaaar' but not the string `cr', whereas `ca*r' matches all three
     strings.

`?'
     is a quantifying suffix operator similar to `*', except that the
     preceding expression can match either once or not at all.  For
     example, `ca?r' matches `car' or `cr', but does not match anything
     else.

`*?'
     works just like `*', except that rather than matching the longest
     match, it matches the shortest match.  `*?' is known as a
     "non-greedy" quantifier, a regexp construct borrowed from Perl.

     This construct is very useful for when you want to match the text
     inside a pair of delimiters.  For instance, `/\*.*?\*/' will match
     C comments in a string.  This could not easily be achieved without
     the use of a non-greedy quantifier.

     This construct has not been available prior to XEmacs 20.4.  It is
     not available in FSF Emacs.

`+?'
     is the non-greedy version of `+'.

`??'
     is the non-greedy version of `?'.

`\{n,m\}'
     serves as an interval quantifier, analogous to `*' or `+', but
     specifies that the expression must match at least N times, but no
     more than M times.  This syntax is supported by most Unix regexp
     utilities, and has been introduced to XEmacs for the version 20.3.

     Unfortunately, the non-greedy version of this quantifier does not
     exist currently, although it does in Perl.

`[ ... ]'
     `[' begins a "character set", which is terminated by a `]'.  In
     the simplest case, the characters between the two brackets form
     the set.  Thus, `[ad]' matches either one `a' or one `d', and
     `[ad]*' matches any string composed of just `a's and `d's
     (including the empty string), from which it follows that `c[ad]*r'
     matches `cr', `car', `cdr', `caddaar', etc.

     The usual regular expression special characters are not special
     inside a character set.  A completely different set of special
     characters exists inside character sets: `]', `-' and `^'.

     `-' is used for ranges of characters.  To write a range, write two
     characters with a `-' between them.  Thus, `[a-z]' matches any
     lower case letter.  Ranges may be intermixed freely with individual
     characters, as in `[a-z$%.]', which matches any lower case letter
     or `$', `%', or a period.

     To include a `]' in a character set, make it the first character.
     For example, `[]a]' matches `]' or `a'.  To include a `-', write
     `-' as the first character in the set, or put it immediately after
     a range.  (You can replace one individual character C with the
     range `C-C' to make a place to put the `-'.)  There is no way to
     write a set containing just `-' and `]'.

     To include `^' in a set, put it anywhere but at the beginning of
     the set.

`[^ ... ]'
     `[^' begins a "complement character set", which matches any
     character except the ones specified.  Thus, `[^a-z0-9A-Z]' matches
     all characters _except_ letters and digits.

     `^' is not special in a character set unless it is the first
     character.  The character following the `^' is treated as if it
     were first (thus, `-' and `]' are not special there).

     Note that a complement character set can match a newline, unless
     newline is mentioned as one of the characters not to match.

`^'
     is a special character that matches the empty string, but only at
     the beginning of a line in the text being matched.  Otherwise it
     fails to match anything.  Thus, `^foo' matches a `foo' that occurs
     at the beginning of a line.

     When matching a string instead of a buffer, `^' matches at the
     beginning of the string or after a newline character `\n'.

`$'
     is similar to `^' but matches only at the end of a line.  Thus,
     `x+$' matches a string of one `x' or more at the end of a line.

     When matching a string instead of a buffer, `$' matches at the end
     of the string or before a newline character `\n'.

`\'
     has two functions: it quotes the special characters (including
     `\'), and it introduces additional special constructs.

     Because `\' quotes special characters, `\$' is a regular
     expression that matches only `$', and `\[' is a regular expression
     that matches only `[', and so on.

     Note that `\' also has special meaning in the read syntax of Lisp
     strings (*note String Type::), and must be quoted with `\'.  For
     example, the regular expression that matches the `\' character is
     `\\'.  To write a Lisp string that contains the characters `\\',
     Lisp syntax requires you to quote each `\' with another `\'.
     Therefore, the read syntax for a regular expression matching `\'
     is `"\\\\"'.

   *Please note:* For historical compatibility, special characters are
treated as ordinary ones if they are in contexts where their special
meanings make no sense.  For example, `*foo' treats `*' as ordinary
since there is no preceding expression on which the `*' can act.  It is
poor practice to depend on this behavior; quote the special character
anyway, regardless of where it appears.

   For the most part, `\' followed by any character matches only that
character.  However, there are several exceptions: characters that,
when preceded by `\', are special constructs.  Such characters are
always ordinary when encountered on their own.  Here is a table of `\'
constructs:

`\|'
     specifies an alternative.  Two regular expressions A and B with
     `\|' in between form an expression that matches anything that
     either A or B matches.

     Thus, `foo\|bar' matches either `foo' or `bar' but no other string.

     `\|' applies to the largest possible surrounding expressions.
     Only a surrounding `\( ... \)' grouping can limit the grouping
     power of `\|'.

     Full backtracking capability exists to handle multiple uses of
     `\|'.

`\( ... \)'
     is a grouping construct that serves three purposes:

       1. To enclose a set of `\|' alternatives for other operations.
          Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.

       2. To enclose an expression for a suffix operator such as `*' to
          act on.  Thus, `ba\(na\)*' matches `bananana', etc., with any
          (zero or more) number of `na' strings.

       3. To record a matched substring for future reference.

     This last application is not a consequence of the idea of a
     parenthetical grouping; it is a separate feature that happens to be
     assigned as a second meaning to the same `\( ... \)' construct
     because there is no conflict in practice between the two meanings.
     Here is an explanation of this feature:

`\DIGIT'
     matches the same text that matched the DIGITth occurrence of a `\(
     ... \)' construct.

     In other words, after the end of a `\( ... \)' construct.  the
     matcher remembers the beginning and end of the text matched by that
     construct.  Then, later on in the regular expression, you can use
     `\' followed by DIGIT to match that same text, whatever it may
     have been.

     The strings matching the first nine `\( ... \)' constructs
     appearing in a regular expression are assigned numbers 1 through 9
     in the order that the open parentheses appear in the regular
     expression.  So you can use `\1' through `\9' to refer to the text
     matched by the corresponding `\( ... \)' constructs.

     For example, `\(.*\)\1' matches any newline-free string that is
     composed of two identical halves.  The `\(.*\)' matches the first
     half, which may be anything, but the `\1' that follows must match
     the same exact text.

`\(?: ... \)'
     is called a "shy" grouping operator, and it is used just like `\(
     ... \)', except that it does not cause the matched substring to be
     recorded for future reference.

     This is useful when you need a lot of grouping `\( ... \)'
     constructs, but only want to remember one or two - or if you have
     more than nine groupings and need to use backreferences to refer to
     the groupings at the end.

     Using `\(?: ... \)' rather than `\( ... \)' when you don't need
     the captured substrings ought to speed up your programs some,
     since it shortens the code path followed by the regular expression
     engine, as well as the amount of memory allocation and string
     copying it must do.  The actual performance gain to be observed
     has not been measured or quantified as of this writing.

     The shy grouping operator has been borrowed from Perl, and has not
     been available prior to XEmacs 20.3, nor is it available in FSF
     Emacs.

`\w'
     matches any word-constituent character.  The editor syntax table
     determines which characters these are.  *Note Syntax Tables::.

`\W'
     matches any character that is not a word constituent.

`\sCODE'
     matches any character whose syntax is CODE.  Here CODE is a
     character that represents a syntax code: thus, `w' for word
     constituent, `-' for whitespace, `(' for open parenthesis, etc.
     *Note Syntax Tables::, for a list of syntax codes and the
     characters that stand for them.

`\SCODE'
     matches any character whose syntax is not CODE.

   The following regular expression constructs match the empty
string--that is, they don't use up any characters--but whether they
match depends on the context.

`\`'
     matches the empty string, but only at the beginning of the buffer
     or string being matched against.

`\''
     matches the empty string, but only at the end of the buffer or
     string being matched against.

`\='
     matches the empty string, but only at point.  (This construct is
     not defined when matching against a string.)

`\b'
     matches the empty string, but only at the beginning or end of a
     word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
     separate word.  `\bballs?\b' matches `ball' or `balls' as a
     separate word.

`\B'
     matches the empty string, but _not_ at the beginning or end of a
     word.

`\<'
     matches the empty string, but only at the beginning of a word.

`\>'
     matches the empty string, but only at the end of a word.

   Not every string is a valid regular expression.  For example, a
string with unbalanced square brackets is invalid (with a few
exceptions, such as `[]]'), and so is a string that ends with a single
`\'.  If an invalid regular expression is passed to any of the search
functions, an `invalid-regexp' error is signaled.

 - Function: regexp-quote string
     This function returns a regular expression string that matches
     exactly STRING and nothing else.  This allows you to request an
     exact string match when calling a function that wants a regular
     expression.

          (regexp-quote "^The cat$")
               => "\\^The cat\\$"

     One use of `regexp-quote' is to combine an exact string match with
     context described as a regular expression.  For example, this
     searches for the string that is the value of `string', surrounded
     by whitespace:

          (re-search-forward
           (concat "\\s-" (regexp-quote string) "\\s-"))