info/lispref.info-31

   1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Examining Properties,  Next: Changing Properties,  Up: Text Properties
  54
  55 Examining Text Properties
  56 -------------------------
  57
  58    The simplest way to examine text properties is to ask for the value
  59 of a particular property of a particular character.  For that, use
  60 `get-text-property'.  Use `text-properties-at' to get the entire
  61 property list of a character.  *Note Property Search::, for functions
  62 to examine the properties of a number of characters at once.
  63
  64    These functions handle both strings and buffers.  (Keep in mind that
  65 positions in a string start from 0, whereas positions in a buffer start
  66 from 1.)
  67
  68  - Function: get-text-property pos prop &optional object
  69      This function returns the value of the PROP property of the
  70      character after position POS in OBJECT (a buffer or string).  The
  71      argument OBJECT is optional and defaults to the current buffer.
  72
  73  - Function: get-char-property pos prop &optional object
  74      This function is like `get-text-property', except that it checks
  75      all extents, not just text-property extents.
  76
  77
  78  - Function: text-properties-at position &optional object
  79      This function returns the entire property list of the character at
  80      POSITION in the string or buffer OBJECT.  If OBJECT is `nil', it
  81      defaults to the current buffer.
  82
  83  - Variable: default-text-properties
  84      This variable holds a property list giving default values for text
  85      properties.  Whenever a character does not specify a value for a
  86      property, the value stored in this list is used instead.  Here is
  87      an example:
  88
  89           (setq default-text-properties '(foo 69))
  90           ;; Make sure character 1 has no properties of its own.
  91           (set-text-properties 1 2 nil)
  92           ;; What we get, when we ask, is the default value.
  93           (get-text-property 1 'foo)
  94                => 69
  95
  96 \1f
  97 File: lispref.info,  Node: Changing Properties,  Next: Property Search,  Prev: Examining Properties,  Up: Text Properties
  98
  99 Changing Text Properties
 100 ------------------------
 101
 102    The primitives for changing properties apply to a specified range of
 103 text.  The function `set-text-properties' (see end of section) sets the
 104 entire property list of the text in that range; more often, it is
 105 useful to add, change, or delete just certain properties specified by
 106 name.
 107
 108    Since text properties are considered part of the buffer's contents,
 109 and can affect how the buffer looks on the screen, any change in the
 110 text properties is considered a buffer modification.  Buffer text
 111 property changes are undoable (*note Undo::).
 112
 113  - Function: put-text-property start end prop value &optional object
 114      This function sets the PROP property to VALUE for the text between
 115      START and END in the string or buffer OBJECT.  If OBJECT is `nil',
 116      it defaults to the current buffer.
 117
 118  - Function: add-text-properties start end props &optional object
 119      This function modifies the text properties for the text between
 120      START and END in the string or buffer OBJECT.  If OBJECT is `nil',
 121      it defaults to the current buffer.
 122
 123      The argument PROPS specifies which properties to change.  It
 124      should have the form of a property list (*note Property Lists::):
 125      a list whose elements include the property names followed
 126      alternately by the corresponding values.
 127
 128      The return value is `t' if the function actually changed some
 129      property's value; `nil' otherwise (if PROPS is `nil' or its values
 130      agree with those in the text).
 131
 132      For example, here is how to set the `comment' and `face'
 133      properties of a range of text:
 134
 135           (add-text-properties START END
 136                                '(comment t face highlight))
 137
 138  - Function: remove-text-properties start end props &optional object
 139      This function deletes specified text properties from the text
 140      between START and END in the string or buffer OBJECT.  If OBJECT
 141      is `nil', it defaults to the current buffer.
 142
 143      The argument PROPS specifies which properties to delete.  It
 144      should have the form of a property list (*note Property Lists::):
 145      a list whose elements are property names alternating with
 146      corresponding values.  But only the names matter--the values that
 147      accompany them are ignored.  For example, here's how to remove the
 148      `face' property.
 149
 150           (remove-text-properties START END '(face nil))
 151
 152      The return value is `t' if the function actually changed some
 153      property's value; `nil' otherwise (if PROPS is `nil' or if no
 154      character in the specified text had any of those properties).
 155
 156  - Function: set-text-properties start end props &optional object
 157      This function completely replaces the text property list for the
 158      text between START and END in the string or buffer OBJECT.  If
 159      OBJECT is `nil', it defaults to the current buffer.
 160
 161      The argument PROPS is the new property list.  It should be a list
 162      whose elements are property names alternating with corresponding
 163      values.
 164
 165      After `set-text-properties' returns, all the characters in the
 166      specified range have identical properties.
 167
 168      If PROPS is `nil', the effect is to get rid of all properties from
 169      the specified range of text.  Here's an example:
 170
 171           (set-text-properties START END nil)
 172
 173    See also the function `buffer-substring-without-properties' (*note
 174 Buffer Contents::) which copies text from the buffer but does not copy
 175 its properties.
 176
 177 \1f
 178 File: lispref.info,  Node: Property Search,  Next: Special Properties,  Prev: Changing Properties,  Up: Text Properties
 179
 180 Property Search Functions
 181 -------------------------
 182
 183    In typical use of text properties, most of the time several or many
 184 consecutive characters have the same value for a property.  Rather than
 185 writing your programs to examine characters one by one, it is much
 186 faster to process chunks of text that have the same property value.
 187
 188    Here are functions you can use to do this.  They use `eq' for
 189 comparing property values.  In all cases, OBJECT defaults to the
 190 current buffer.
 191
 192    For high performance, it's very important to use the LIMIT argument
 193 to these functions, especially the ones that search for a single
 194 property--otherwise, they may spend a long time scanning to the end of
 195 the buffer, if the property you are interested in does not change.
 196
 197    Remember that a position is always between two characters; the
 198 position returned by these functions is between two characters with
 199 different properties.
 200
 201  - Function: next-property-change pos &optional object limit
 202      The function scans the text forward from position POS in the
 203      string or buffer OBJECT till it finds a change in some text
 204      property, then returns the position of the change.  In other
 205      words, it returns the position of the first character beyond POS
 206      whose properties are not identical to those of the character just
 207      after POS.
 208
 209      If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
 210      there is no property change before that point,
 211      `next-property-change' returns LIMIT.
 212
 213      The value is `nil' if the properties remain unchanged all the way
 214      to the end of OBJECT and LIMIT is `nil'.  If the value is
 215      non-`nil', it is a position greater than or equal to POS.  The
 216      value equals POS only when LIMIT equals POS.
 217
 218      Here is an example of how to scan the buffer by chunks of text
 219      within which all properties are constant:
 220
 221           (while (not (eobp))
 222             (let ((plist (text-properties-at (point)))
 223                   (next-change
 224                    (or (next-property-change (point) (current-buffer))
 225                        (point-max))))
 226               Process text from point to NEXT-CHANGE...
 227               (goto-char next-change)))
 228
 229  - Function: next-single-property-change pos prop &optional object limit
 230      The function scans the text forward from position POS in the
 231      string or buffer OBJECT till it finds a change in the PROP
 232      property, then returns the position of the change.  In other
 233      words, it returns the position of the first character beyond POS
 234      whose PROP property differs from that of the character just after
 235      POS.
 236
 237      If LIMIT is non-`nil', then the scan ends at position LIMIT.  If
 238      there is no property change before that point,
 239      `next-single-property-change' returns LIMIT.
 240
 241      The value is `nil' if the property remains unchanged all the way to
 242      the end of OBJECT and LIMIT is `nil'.  If the value is non-`nil',
 243      it is a position greater than or equal to POS; it equals POS only
 244      if LIMIT equals POS.
 245
 246  - Function: previous-property-change pos &optional object limit
 247      This is like `next-property-change', but scans back from POS
 248      instead of forward.  If the value is non-`nil', it is a position
 249      less than or equal to POS; it equals POS only if LIMIT equals POS.
 250
 251  - Function: previous-single-property-change pos prop &optional object
 252           limit
 253      This is like `next-single-property-change', but scans back from
 254      POS instead of forward.  If the value is non-`nil', it is a
 255      position less than or equal to POS; it equals POS only if LIMIT
 256      equals POS.
 257
 258  - Function: text-property-any start end prop value &optional object
 259      This function returns non-`nil' if at least one character between
 260      START and END has a property PROP whose value is VALUE.  More
 261      precisely, it returns the position of the first such character.
 262      Otherwise, it returns `nil'.
 263
 264      The optional fifth argument, OBJECT, specifies the string or
 265      buffer to scan.  Positions are relative to OBJECT.  The default
 266      for OBJECT is the current buffer.
 267
 268  - Function: text-property-not-all start end prop value &optional object
 269      This function returns non-`nil' if at least one character between
 270      START and END has a property PROP whose value differs from VALUE.
 271      More precisely, it returns the position of the first such
 272      character.  Otherwise, it returns `nil'.
 273
 274      The optional fifth argument, OBJECT, specifies the string or
 275      buffer to scan.  Positions are relative to OBJECT.  The default
 276      for OBJECT is the current buffer.
 277
 278 \1f
 279 File: lispref.info,  Node: Special Properties,  Next: Saving Properties,  Prev: Property Search,  Up: Text Properties
 280
 281 Properties with Special Meanings
 282 --------------------------------
 283
 284    The predefined properties are the same as those for extents.  *Note
 285 Extent Properties::.
 286
 287 \1f
 288 File: lispref.info,  Node: Saving Properties,  Prev: Special Properties,  Up: Text Properties
 289
 290 Saving Text Properties in Files
 291 -------------------------------
 292
 293    You can save text properties in files, and restore text properties
 294 when inserting the files, using these two hooks:
 295
 296  - Variable: write-region-annotate-functions
 297      This variable's value is a list of functions for `write-region' to
 298      run to encode text properties in some fashion as annotations to
 299      the text being written in the file.  *Note Writing to Files::.
 300
 301      Each function in the list is called with two arguments: the start
 302      and end of the region to be written.  These functions should not
 303      alter the contents of the buffer.  Instead, they should return
 304      lists indicating annotations to write in the file in addition to
 305      the text in the buffer.
 306
 307      Each function should return a list of elements of the form
 308      `(POSITION . STRING)', where POSITION is an integer specifying the
 309      relative position in the text to be written, and STRING is the
 310      annotation to add there.
 311
 312      Each list returned by one of these functions must be already
 313      sorted in increasing order by POSITION.  If there is more than one
 314      function, `write-region' merges the lists destructively into one
 315      sorted list.
 316
 317      When `write-region' actually writes the text from the buffer to the
 318      file, it intermixes the specified annotations at the corresponding
 319      positions.  All this takes place without modifying the buffer.
 320
 321  - Variable: after-insert-file-functions
 322      This variable holds a list of functions for `insert-file-contents'
 323      to call after inserting a file's contents.  These functions should
 324      scan the inserted text for annotations, and convert them to the
 325      text properties they stand for.
 326
 327      Each function receives one argument, the length of the inserted
 328      text; point indicates the start of that text.  The function should
 329      scan that text for annotations, delete them, and create the text
 330      properties that the annotations specify.  The function should
 331      return the updated length of the inserted text, as it stands after
 332      those changes.  The value returned by one function becomes the
 333      argument to the next function.
 334
 335      These functions should always return with point at the beginning of
 336      the inserted text.
 337
 338      The intended use of `after-insert-file-functions' is for converting
 339      some sort of textual annotations into actual text properties.  But
 340      other uses may be possible.
 341
 342    We invite users to write Lisp programs to store and retrieve text
 343 properties in files, using these hooks, and thus to experiment with
 344 various data formats and find good ones.  Eventually we hope users will
 345 produce good, general extensions we can install in Emacs.
 346
 347    We suggest not trying to handle arbitrary Lisp objects as property
 348 names or property values--because a program that general is probably
 349 difficult to write, and slow.  Instead, choose a set of possible data
 350 types that are reasonably flexible, and not too hard to encode.
 351
 352    *Note Format Conversion::, for a related feature.
 353
 354 \1f
 355 File: lispref.info,  Node: Substitution,  Next: Registers,  Prev: Text Properties,  Up: Text
 356
 357 Substituting for a Character Code
 358 =================================
 359
 360    The following functions replace characters within a specified region
 361 based on their character codes.
 362
 363  - Function: subst-char-in-region start end old-char new-char &optional
 364           noundo
 365      This function replaces all occurrences of the character OLD-CHAR
 366      with the character NEW-CHAR in the region of the current buffer
 367      defined by START and END.
 368
 369      If NOUNDO is non-`nil', then `subst-char-in-region' does not
 370      record the change for undo and does not mark the buffer as
 371      modified.  This feature is used for controlling selective display
 372      (*note Selective Display::).
 373
 374      `subst-char-in-region' does not move point and returns `nil'.
 375
 376           ---------- Buffer: foo ----------
 377           This is the contents of the buffer before.
 378           ---------- Buffer: foo ----------
 379
 380           (subst-char-in-region 1 20 ?i ?X)
 381                => nil
 382
 383           ---------- Buffer: foo ----------
 384           ThXs Xs the contents of the buffer before.
 385           ---------- Buffer: foo ----------
 386
 387  - Function: translate-region start end table
 388      This function applies a translation table to the characters in the
 389      buffer between positions START and END.  The translation table
 390      TABLE can be either a string, a vector, or a char-table.
 391
 392      If TABLE is a string, its Nth element is the mapping for the
 393      character with code N.
 394
 395      If TABLE is a vector, its Nth element is the mapping for character
 396      with code N.  Legal mappings are characters, strings, or `nil'
 397      (meaning don't replace.)
 398
 399      If TABLE is a char-table, its elements describe the mapping
 400      between characters and their replacements.  The char-table should
 401      be of type `char' or `generic'.
 402
 403      When the TABLE is a string or vector and its length is less than
 404      the total number of characters (256 without Mule), any characters
 405      with codes larger than the length of TABLE are not altered by the
 406      translation.
 407
 408      The return value of `translate-region' is the number of characters
 409      that were actually changed by the translation.  This does not
 410      count characters that were mapped into themselves in the
 411      translation table.
 412
 413      *NOTE*: Prior to XEmacs 21.2, the TABLE argument was allowed only
 414      to be a string.  This is still the case in FSF Emacs.
 415
 416      The following example creates a char-table that is passed to
 417      `translate-region', which translates character `a' to `the letter
 418      a', removes character `b', and translates character `c' to newline.
 419
 420           ---------- Buffer: foo ----------
 421           Here is a sentence in the buffer.
 422           ---------- Buffer: foo ----------
 423
 424           (let ((table (make-char-table 'generic)))
 425             (put-char-table ?a "the letter a" table)
 426             (put-char-table ?b "" table)
 427             (put-char-table ?c ?\n table)
 428             (translate-region (point-min) (point-max) table))
 429                => 3
 430
 431           ---------- Buffer: foo ----------
 432           Here is the letter a senten
 433           e in the uffer.
 434           ---------- Buffer: foo ----------
 435
 436 \1f
 437 File: lispref.info,  Node: Registers,  Next: Transposition,  Prev: Substitution,  Up: Text
 438
 439 Registers
 440 =========
 441
 442    A register is a sort of variable used in XEmacs editing that can
 443 hold a marker, a string, a rectangle, a window configuration (of one
 444 frame), or a frame configuration (of all frames).  Each register is
 445 named by a single character.  All characters, including control and
 446 meta characters (but with the exception of `C-g'), can be used to name
 447 registers.  Thus, there are 255 possible registers.  A register is
 448 designated in Emacs Lisp by a character that is its name.
 449
 450    The functions in this section return unpredictable values unless
 451 otherwise stated.
 452
 453  - Variable: register-alist
 454      This variable is an alist of elements of the form `(NAME .
 455      CONTENTS)'.  Normally, there is one element for each XEmacs
 456      register that has been used.
 457
 458      The object NAME is a character (an integer) identifying the
 459      register.  The object CONTENTS is a string, marker, or list
 460      representing the register contents.  A string represents text
 461      stored in the register.  A marker represents a position.  A list
 462      represents a rectangle; its elements are strings, one per line of
 463      the rectangle.
 464
 465  - Function: get-register reg
 466      This function returns the contents of the register REG, or `nil'
 467      if it has no contents.
 468
 469  - Function: set-register reg value
 470      This function sets the contents of register REG to VALUE.  A
 471      register can be set to any value, but the other register functions
 472      expect only certain data types.  The return value is VALUE.
 473
 474  - Command: view-register reg
 475      This command displays what is contained in register REG.
 476
 477  - Command: insert-register reg &optional beforep
 478      This command inserts contents of register REG into the current
 479      buffer.
 480
 481      Normally, this command puts point before the inserted text, and the
 482      mark after it.  However, if the optional second argument BEFOREP
 483      is non-`nil', it puts the mark before and point after.  You can
 484      pass a non-`nil' second argument BEFOREP to this function
 485      interactively by supplying any prefix argument.
 486
 487      If the register contains a rectangle, then the rectangle is
 488      inserted with its upper left corner at point.  This means that
 489      text is inserted in the current line and underneath it on
 490      successive lines.
 491
 492      If the register contains something other than saved text (a
 493      string) or a rectangle (a list), currently useless things happen.
 494      This may be changed in the future.
 495
 496 \1f
 497 File: lispref.info,  Node: Transposition,  Next: Change Hooks,  Prev: Registers,  Up: Text
 498
 499 Transposition of Text
 500 =====================
 501
 502    This subroutine is used by the transposition commands.
 503
 504  - Function: transpose-regions start1 end1 start2 end2 &optional
 505           leave-markers
 506      This function exchanges two nonoverlapping portions of the buffer.
 507      Arguments START1 and END1 specify the bounds of one portion and
 508      arguments START2 and END2 specify the bounds of the other portion.
 509
 510      Normally, `transpose-regions' relocates markers with the transposed
 511      text; a marker previously positioned within one of the two
 512      transposed portions moves along with that portion, thus remaining
 513      between the same two characters in their new position.  However,
 514      if LEAVE-MARKERS is non-`nil', `transpose-regions' does not do
 515      this--it leaves all markers unrelocated.
 516
 517 \1f
 518 File: lispref.info,  Node: Change Hooks,  Next: Transformations,  Prev: Transposition,  Up: Text
 519
 520 Change Hooks
 521 ============
 522
 523    These hook variables let you arrange to take notice of all changes in
 524 all buffers (or in a particular buffer, if you make them buffer-local).
 525
 526    The functions you use in these hooks should save and restore the
 527 match data if they do anything that uses regular expressions;
 528 otherwise, they will interfere in bizarre ways with the editing
 529 operations that call them.
 530
 531    Buffer changes made while executing the following hooks don't
 532 themselves cause any change hooks to be invoked.
 533
 534  - Variable: before-change-functions
 535      This variable holds a list of a functions to call before any buffer
 536      modification.  Each function gets two arguments, the beginning and
 537      end of the region that is about to change, represented as
 538      integers.  The buffer that is about to change is always the
 539      current buffer.
 540
 541  - Variable: after-change-functions
 542      This variable holds a list of a functions to call after any buffer
 543      modification.  Each function receives three arguments: the
 544      beginning and end of the region just changed, and the length of
 545      the text that existed before the change.  (To get the current
 546      length, subtract the region beginning from the region end.)  All
 547      three arguments are integers.  The buffer that's about to change
 548      is always the current buffer.
 549
 550  - Variable: before-change-function
 551      This obsolete variable holds one function to call before any buffer
 552      modification (or `nil' for no function).  It is called just like
 553      the functions in `before-change-functions'.
 554
 555  - Variable: after-change-function
 556      This obsolete variable holds one function to call after any buffer
 557      modification (or `nil' for no function).  It is called just like
 558      the functions in `after-change-functions'.
 559
 560  - Variable: first-change-hook
 561      This variable is a normal hook that is run whenever a buffer is
 562      changed that was previously in the unmodified state.
 563
 564 \1f
 565 File: lispref.info,  Node: Transformations,  Prev: Change Hooks,  Up: Text
 566
 567 Textual transformations--MD5 and base64 support
 568 ===============================================
 569
 570    Some textual operations inherently require examining each character
 571 in turn, and performing arithmetic operations on them.  Such operations
 572 can, of course, be implemented in Emacs Lisp, but tend to be very slow
 573 for large portions of text or data.  This is why some of them are
 574 implemented in C, with an appropriate interface for Lisp programmers.
 575 Examples of algorithms thus provided are MD5 and base64 support.
 576
 577    MD5 is an algorithm for calculating message digests, as described in
 578 rfc1321.  Given a message of arbitrary length, MD5 produces an 128-bit
 579 "fingerprint" ("message digest") corresponding to that message.  It is
 580 considered computationally infeasible to produce two messages having
 581 the same MD5 digest, or to produce a message having a prespecified
 582 target digest.  MD5 is used heavily by various authentication schemes.
 583
 584    Emacs Lisp interface to MD5 consists of a single function `md5':
 585
 586  - Function: md5 object &optional start end
 587      This function returns the MD5 message digest of OBJECT, a buffer
 588      or string.
 589
 590      Optional arguments START and END denote positions for computing
 591      the digest of a portion of OBJECT.
 592
 593      Some examples of usage:
 594
 595           ;; Calculate the digest of the entire buffer
 596           (md5 (current-buffer))
 597                => "8842b04362899b1cda8d2d126dc11712"
 598
 599           ;; Calculate the digest of the current line
 600           (md5 (current-buffer) (point-at-bol) (point-at-eol))
 601                => "60614d21e9dee27dfdb01fa4e30d6d00"
 602
 603           ;; Calculate the digest of your name and email address
 604           (md5 (concat (format "%s <%s>" (user-full-name) user-mail-address)))
 605                => "0a2188c40fd38922d941fe6032fce516"
 606
 607    Base64 is a portable encoding for arbitrary sequences of octets, in a
 608 form that need not be readable by humans.  It uses a 65-character subset
 609 of US-ASCII, as described in rfc2045.  Base64 is used by MIME to encode
 610 binary bodies, and to encode binary characters in message headers.
 611
 612    The Lisp interface to base64 consists of four functions:
 613
 614  - Function: base64-encode-region beg end &optional no-line-break
 615      This function encodes the region between BEG and END of the
 616      current buffer to base64 format.  This means that the original
 617      region is deleted, and replaced with its base64 equivalent.
 618
 619      Normally, encoded base64 output is multi-line, with 76-character
 620      lines.  If NO-LINE-BREAK is non-`nil', newlines will not be
 621      inserted, resulting in single-line output.
 622
 623      Mule note: you should make sure that you convert the multibyte
 624      characters (those that do not fit into 0-255 range) to something
 625      else, because they cannot be meaningfully converted to base64.  If
 626      the `base64-encode-region' encounters such characters, it will
 627      signal an error.
 628
 629      `base64-encode-region' returns the length of the encoded text.
 630
 631           ;; Encode the whole buffer in base64
 632           (base64-encode-region (point-min) (point-max))
 633
 634      The function can also be used interactively, in which case it
 635      works on the currently active region.
 636
 637  - Function: base64-encode-string string
 638      This function encodes STRING to base64, and returns the encoded
 639      string.
 640
 641      For Mule, the same considerations apply as for
 642      `base64-encode-region'.
 643
 644           (base64-encode-string "fubar")
 645               => "ZnViYXI="
 646
 647  - Function: base64-decode-region beg end
 648      This function decodes the region between BEG and END of the
 649      current buffer.  The region should be in base64 encoding.
 650
 651      If the region was decoded correctly, `base64-decode-region' returns
 652      the length of the decoded region.  If the decoding failed, `nil' is
 653      returned.
 654
 655           ;; Decode a base64 buffer, and replace it with the decoded version
 656           (base64-decode-region (point-min) (point-max))
 657
 658  - Function: base64-decode-string string
 659      This function decodes STRING to base64, and returns the decoded
 660      string.  STRING should be valid base64-encoded text.
 661
 662      If encoding was not possible, `nil' is returned.
 663
 664           (base64-decode-string "ZnViYXI=")
 665               => "fubar"
 666
 667           (base64-decode-string "totally bogus")
 668               => nil
 669
 670 \1f
 671 File: lispref.info,  Node: Searching and Matching,  Next: Syntax Tables,  Prev: Text,  Up: Top
 672
 673 Searching and Matching
 674 **********************
 675
 676    XEmacs provides two ways to search through a buffer for specified
 677 text: exact string searches and regular expression searches.  After a
 678 regular expression search, you can examine the "match data" to
 679 determine which text matched the whole regular expression or various
 680 portions of it.
 681
 682 * Menu:
 683
 684 * String Search::         Search for an exact match.
 685 * Regular Expressions::   Describing classes of strings.
 686 * Regexp Search::         Searching for a match for a regexp.
 687 * POSIX Regexps::         Searching POSIX-style for the longest match.
 688 * Search and Replace::    Internals of `query-replace'.
 689 * Match Data::            Finding out which part of the text matched
 690                             various parts of a regexp, after regexp search.
 691 * Searching and Case::    Case-independent or case-significant searching.
 692 * Standard Regexps::      Useful regexps for finding sentences, pages,...
 693
 694    The `skip-chars...' functions also perform a kind of searching.
 695 *Note Skipping Characters::.
 696
 697 \1f
 698 File: lispref.info,  Node: String Search,  Next: Regular Expressions,  Up: Searching and Matching
 699
 700 Searching for Strings
 701 =====================
 702
 703    These are the primitive functions for searching through the text in a
 704 buffer.  They are meant for use in programs, but you may call them
 705 interactively.  If you do so, they prompt for the search string; LIMIT
 706 and NOERROR are set to `nil', and REPEAT is set to 1.
 707
 708  - Command: search-forward string &optional limit noerror repeat
 709      This function searches forward from point for an exact match for
 710      STRING.  If successful, it sets point to the end of the occurrence
 711      found, and returns the new value of point.  If no match is found,
 712      the value and side effects depend on NOERROR (see below).
 713
 714      In the following example, point is initially at the beginning of
 715      the line.  Then `(search-forward "fox")' moves point after the last
 716      letter of `fox':
 717
 718           ---------- Buffer: foo ----------
 719           -!-The quick brown fox jumped over the lazy dog.
 720           ---------- Buffer: foo ----------
 721
 722           (search-forward "fox")
 723                => 20
 724
 725           ---------- Buffer: foo ----------
 726           The quick brown fox-!- jumped over the lazy dog.
 727           ---------- Buffer: foo ----------
 728
 729      The argument LIMIT specifies the upper bound to the search.  (It
 730      must be a position in the current buffer.)  No match extending
 731      after that position is accepted.  If LIMIT is omitted or `nil', it
 732      defaults to the end of the accessible portion of the buffer.
 733
 734      What happens when the search fails depends on the value of
 735      NOERROR.  If NOERROR is `nil', a `search-failed' error is
 736      signaled.  If NOERROR is `t', `search-forward' returns `nil' and
 737      does nothing.  If NOERROR is neither `nil' nor `t', then
 738      `search-forward' moves point to the upper bound and returns `nil'.
 739      (It would be more consistent now to return the new position of
 740      point in that case, but some programs may depend on a value of
 741      `nil'.)
 742
 743      If REPEAT is supplied (it must be a positive number), then the
 744      search is repeated that many times (each time starting at the end
 745      of the previous time's match).  If these successive searches
 746      succeed, the function succeeds, moving point and returning its new
 747      value.  Otherwise the search fails.
 748
 749  - Command: search-backward string &optional limit noerror repeat
 750      This function searches backward from point for STRING.  It is just
 751      like `search-forward' except that it searches backwards and leaves
 752      point at the beginning of the match.
 753
 754  - Command: word-search-forward string &optional limit noerror repeat
 755      This function searches forward from point for a "word" match for
 756      STRING.  If it finds a match, it sets point to the end of the
 757      match found, and returns the new value of point.
 758
 759      Word matching regards STRING as a sequence of words, disregarding
 760      punctuation that separates them.  It searches the buffer for the
 761      same sequence of words.  Each word must be distinct in the buffer
 762      (searching for the word `ball' does not match the word `balls'),
 763      but the details of punctuation and spacing are ignored (searching
 764      for `ball boy' does match `ball.  Boy!').
 765
 766      In this example, point is initially at the beginning of the
 767      buffer; the search leaves it between the `y' and the `!'.
 768
 769           ---------- Buffer: foo ----------
 770           -!-He said "Please!  Find
 771           the ball boy!"
 772           ---------- Buffer: foo ----------
 773
 774           (word-search-forward "Please find the ball, boy.")
 775                => 35
 776
 777           ---------- Buffer: foo ----------
 778           He said "Please!  Find
 779           the ball boy-!-!"
 780           ---------- Buffer: foo ----------
 781
 782      If LIMIT is non-`nil' (it must be a position in the current
 783      buffer), then it is the upper bound to the search.  The match
 784      found must not extend after that position.
 785
 786      If NOERROR is `nil', then `word-search-forward' signals an error
 787      if the search fails.  If NOERROR is `t', then it returns `nil'
 788      instead of signaling an error.  If NOERROR is neither `nil' nor
 789      `t', it moves point to LIMIT (or the end of the buffer) and
 790      returns `nil'.
 791
 792      If REPEAT is non-`nil', then the search is repeated that many
 793      times.  Point is positioned at the end of the last match.
 794
 795  - Command: word-search-backward string &optional limit noerror repeat
 796      This function searches backward from point for a word match to
 797      STRING.  This function is just like `word-search-forward' except
 798      that it searches backward and normally leaves point at the
 799      beginning of the match.
 800
 801 \1f
 802 File: lispref.info,  Node: Regular Expressions,  Next: Regexp Search,  Prev: String Search,  Up: Searching and Matching
 803
 804 Regular Expressions
 805 ===================
 806
 807    A "regular expression" ("regexp", for short) is a pattern that
 808 denotes a (possibly infinite) set of strings.  Searching for matches for
 809 a regexp is a very powerful operation.  This section explains how to
 810 write regexps; the following section says how to search for them.
 811
 812    To gain a thorough understanding of regular expressions and how to
 813 use them to best advantage, we recommend that you study `Mastering
 814 Regular Expressions, by Jeffrey E.F. Friedl, O'Reilly and Associates,
 815 1997'. (It's known as the "Hip Owls" book, because of the picture on its
 816 cover.)  You might also read the manuals to *Note (gawk)Top::, *Note
 817 (ed)Top::, `sed', `grep', *Note (perl)Top::, *Note (regex)Top::, *Note
 818 (rx)Top::, `pcre', and *Note (flex)Top::, which also make good use of
 819 regular expressions.
 820
 821    The XEmacs regular expression syntax most closely resembles that of
 822 `ed', or `grep', the GNU versions of which all utilize the GNU `regex'
 823 library.  XEmacs' version of `regex' has recently been extended with
 824 some Perl-like capabilities, described in the next section.
 825
 826 * Menu:
 827
 828 * Syntax of Regexps::       Rules for writing regular expressions.
 829 * Regexp Example::          Illustrates regular expression syntax.
 830
 831 \1f
 832 File: lispref.info,  Node: Syntax of Regexps,  Next: Regexp Example,  Up: Regular Expressions
 833
 834 Syntax of Regular Expressions
 835 -----------------------------
 836
 837    Regular expressions have a syntax in which a few characters are
 838 special constructs and the rest are "ordinary".  An ordinary character
 839 is a simple regular expression that matches that character and nothing
 840 else.  The special characters are `.', `*', `+', `?', `[', `]', `^',
 841 `$', and `\'; no new special characters will be defined in the future.
 842 Any other character appearing in a regular expression is ordinary,
 843 unless a `\' precedes it.
 844
 845    For example, `f' is not a special character, so it is ordinary, and
 846 therefore `f' is a regular expression that matches the string `f' and
 847 no other string.  (It does _not_ match the string `ff'.)  Likewise, `o'
 848 is a regular expression that matches only `o'.
 849
 850    Any two regular expressions A and B can be concatenated.  The result
 851 is a regular expression that matches a string if A matches some amount
 852 of the beginning of that string and B matches the rest of the string.
 853
 854    As a simple example, we can concatenate the regular expressions `f'
 855 and `o' to get the regular expression `fo', which matches only the
 856 string `fo'.  Still trivial.  To do something more powerful, you need
 857 to use one of the special characters.  Here is a list of them:
 858
 859 `. (Period)'
 860      is a special character that matches any single character except a
 861      newline.  Using concatenation, we can make regular expressions
 862      like `a.b', which matches any three-character string that begins
 863      with `a' and ends with `b'.
 864
 865 `*'
 866      is not a construct by itself; it is a quantifying suffix operator
 867      that means to repeat the preceding regular expression as many
 868      times as possible.  In `fo*', the `*' applies to the `o', so `fo*'
 869      matches one `f' followed by any number of `o's.  The case of zero
 870      `o's is allowed: `fo*' does match `f'.
 871
 872      `*' always applies to the _smallest_ possible preceding
 873      expression.  Thus, `fo*' has a repeating `o', not a repeating `fo'.
 874
 875      The matcher processes a `*' construct by matching, immediately, as
 876      many repetitions as can be found; it is "greedy".  Then it
 877      continues with the rest of the pattern.  If that fails,
 878      backtracking occurs, discarding some of the matches of the
 879      `*'-modified construct in case that makes it possible to match the
 880      rest of the pattern.  For example, in matching `ca*ar' against the
 881      string `caaar', the `a*' first tries to match all three `a's; but
 882      the rest of the pattern is `ar' and there is only `r' left to
 883      match, so this try fails.  The next alternative is for `a*' to
 884      match only two `a's.  With this choice, the rest of the regexp
 885      matches successfully.
 886
 887      Nested repetition operators can be extremely slow if they specify
 888      backtracking loops.  For example, it could take hours for the
 889      regular expression `\(x+y*\)*a' to match the sequence
 890      `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxz'.  The slowness is because
 891      Emacs must try each imaginable way of grouping the 35 `x''s before
 892      concluding that none of them can work.  To make sure your regular
 893      expressions run fast, check nested repetitions carefully.
 894
 895 `+'
 896      is a quantifying suffix operator similar to `*' except that the
 897      preceding expression must match at least once.  It is also
 898      "greedy".  So, for example, `ca+r' matches the strings `car' and
 899      `caaaar' but not the string `cr', whereas `ca*r' matches all three
 900      strings.
 901
 902 `?'
 903      is a quantifying suffix operator similar to `*', except that the
 904      preceding expression can match either once or not at all.  For
 905      example, `ca?r' matches `car' or `cr', but does not match anything
 906      else.
 907
 908 `*?'
 909      works just like `*', except that rather than matching the longest
 910      match, it matches the shortest match.  `*?' is known as a
 911      "non-greedy" quantifier, a regexp construct borrowed from Perl.
 912
 913      This construct very useful for when you want to match the text
 914      inside a pair of delimiters.  For instance, `/\*.*?\*/' will match
 915      C comments in a string.  This could not be achieved without the
 916      use of greedy quantifier.
 917
 918      This construct has not been available prior to XEmacs 20.4.  It is
 919      not available in FSF Emacs.
 920
 921 `+?'
 922      is the `+' analog to `*?'.
 923
 924 `\{n,m\}'
 925      serves as an interval quantifier, analogous to `*' or `+', but
 926      specifies that the expression must match at least N times, but no
 927      more than M times.  This syntax is supported by most Unix regexp
 928      utilities, and has been introduced to XEmacs for the version 20.3.
 929
 930 `[ ... ]'
 931      `[' begins a "character set", which is terminated by a `]'.  In
 932      the simplest case, the characters between the two brackets form
 933      the set.  Thus, `[ad]' matches either one `a' or one `d', and
 934      `[ad]*' matches any string composed of just `a's and `d's
 935      (including the empty string), from which it follows that `c[ad]*r'
 936      matches `cr', `car', `cdr', `caddaar', etc.
 937
 938      The usual regular expression special characters are not special
 939      inside a character set.  A completely different set of special
 940      characters exists inside character sets: `]', `-' and `^'.
 941
 942      `-' is used for ranges of characters.  To write a range, write two
 943      characters with a `-' between them.  Thus, `[a-z]' matches any
 944      lower case letter.  Ranges may be intermixed freely with individual
 945      characters, as in `[a-z$%.]', which matches any lower case letter
 946      or `$', `%', or a period.
 947
 948      To include a `]' in a character set, make it the first character.
 949      For example, `[]a]' matches `]' or `a'.  To include a `-', write
 950      `-' as the first character in the set, or put it immediately after
 951      a range.  (You can replace one individual character C with the
 952      range `C-C' to make a place to put the `-'.)  There is no way to
 953      write a set containing just `-' and `]'.
 954
 955      To include `^' in a set, put it anywhere but at the beginning of
 956      the set.
 957
 958 `[^ ... ]'
 959      `[^' begins a "complement character set", which matches any
 960      character except the ones specified.  Thus, `[^a-z0-9A-Z]' matches
 961      all characters _except_ letters and digits.
 962
 963      `^' is not special in a character set unless it is the first
 964      character.  The character following the `^' is treated as if it
 965      were first (thus, `-' and `]' are not special there).
 966
 967      Note that a complement character set can match a newline, unless
 968      newline is mentioned as one of the characters not to match.
 969
 970 `^'
 971      is a special character that matches the empty string, but only at
 972      the beginning of a line in the text being matched.  Otherwise it
 973      fails to match anything.  Thus, `^foo' matches a `foo' that occurs
 974      at the beginning of a line.
 975
 976      When matching a string instead of a buffer, `^' matches at the
 977      beginning of the string or after a newline character `\n'.
 978
 979 `$'
 980      is similar to `^' but matches only at the end of a line.  Thus,
 981      `x+$' matches a string of one `x' or more at the end of a line.
 982
 983      When matching a string instead of a buffer, `$' matches at the end
 984      of the string or before a newline character `\n'.
 985
 986 `\'
 987      has two functions: it quotes the special characters (including
 988      `\'), and it introduces additional special constructs.
 989
 990      Because `\' quotes special characters, `\$' is a regular
 991      expression that matches only `$', and `\[' is a regular expression
 992      that matches only `[', and so on.
 993
 994      Note that `\' also has special meaning in the read syntax of Lisp
 995      strings (*note String Type::), and must be quoted with `\'.  For
 996      example, the regular expression that matches the `\' character is
 997      `\\'.  To write a Lisp string that contains the characters `\\',
 998      Lisp syntax requires you to quote each `\' with another `\'.
 999      Therefore, the read syntax for a regular expression matching `\'
1000      is `"\\\\"'.
1001
1002    *Please note:* For historical compatibility, special characters are
1003 treated as ordinary ones if they are in contexts where their special
1004 meanings make no sense.  For example, `*foo' treats `*' as ordinary
1005 since there is no preceding expression on which the `*' can act.  It is
1006 poor practice to depend on this behavior; quote the special character
1007 anyway, regardless of where it appears.
1008
1009    For the most part, `\' followed by any character matches only that
1010 character.  However, there are several exceptions: characters that,
1011 when preceded by `\', are special constructs.  Such characters are
1012 always ordinary when encountered on their own.  Here is a table of `\'
1013 constructs:
1014
1015 `\|'
1016      specifies an alternative.  Two regular expressions A and B with
1017      `\|' in between form an expression that matches anything that
1018      either A or B matches.
1019
1020      Thus, `foo\|bar' matches either `foo' or `bar' but no other string.
1021
1022      `\|' applies to the largest possible surrounding expressions.
1023      Only a surrounding `\( ... \)' grouping can limit the grouping
1024      power of `\|'.
1025
1026      Full backtracking capability exists to handle multiple uses of
1027      `\|'.
1028
1029 `\( ... \)'
1030      is a grouping construct that serves three purposes:
1031
1032        1. To enclose a set of `\|' alternatives for other operations.
1033           Thus, `\(foo\|bar\)x' matches either `foox' or `barx'.
1034
1035        2. To enclose an expression for a suffix operator such as `*' to
1036           act on.  Thus, `ba\(na\)*' matches `bananana', etc., with any
1037           (zero or more) number of `na' strings.
1038
1039        3. To record a matched substring for future reference.
1040
1041      This last application is not a consequence of the idea of a
1042      parenthetical grouping; it is a separate feature that happens to be
1043      assigned as a second meaning to the same `\( ... \)' construct
1044      because there is no conflict in practice between the two meanings.
1045      Here is an explanation of this feature:
1046
1047 `\DIGIT'
1048      matches the same text that matched the DIGITth occurrence of a `\(
1049      ... \)' construct.
1050
1051      In other words, after the end of a `\( ... \)' construct.  the
1052      matcher remembers the beginning and end of the text matched by that
1053      construct.  Then, later on in the regular expression, you can use
1054      `\' followed by DIGIT to match that same text, whatever it may
1055      have been.
1056
1057      The strings matching the first nine `\( ... \)' constructs
1058      appearing in a regular expression are assigned numbers 1 through 9
1059      in the order that the open parentheses appear in the regular
1060      expression.  So you can use `\1' through `\9' to refer to the text
1061      matched by the corresponding `\( ... \)' constructs.
1062
1063      For example, `\(.*\)\1' matches any newline-free string that is
1064      composed of two identical halves.  The `\(.*\)' matches the first
1065      half, which may be anything, but the `\1' that follows must match
1066      the same exact text.
1067
1068 `\(?: ... \)'
1069      is called a "shy" grouping operator, and it is used just like `\(
1070      ... \)', except that it does not cause the matched substring to be
1071      recorded for future reference.
1072
1073      This is useful when you need a lot of grouping `\( ... \)'
1074      constructs, but only want to remember one or two.  Then you can use
1075      not want to remember them for later use with `match-string'.
1076
1077      Using `\(?: ... \)' rather than `\( ... \)' when you don't need
1078      the captured substrings ought to speed up your programs some,
1079      since it shortens the code path followed by the regular expression
1080      engine, as well as the amount of memory allocation and string
1081      copying it must do.  The actual performance gain to be observed
1082      has not been measured or quantified as of this writing.
1083
1084      The shy grouping operator has been borrowed from Perl, and has not
1085      been available prior to XEmacs 20.3, nor is it available in FSF
1086      Emacs.
1087
1088 `\w'
1089      matches any word-constituent character.  The editor syntax table
1090      determines which characters these are.  *Note Syntax Tables::.
1091
1092 `\W'
1093      matches any character that is not a word constituent.
1094
1095 `\sCODE'
1096      matches any character whose syntax is CODE.  Here CODE is a
1097      character that represents a syntax code: thus, `w' for word
1098      constituent, `-' for whitespace, `(' for open parenthesis, etc.
1099      *Note Syntax Tables::, for a list of syntax codes and the
1100      characters that stand for them.
1101
1102 `\SCODE'
1103      matches any character whose syntax is not CODE.
1104
1105    The following regular expression constructs match the empty
1106 string--that is, they don't use up any characters--but whether they
1107 match depends on the context.
1108
1109 `\`'
1110      matches the empty string, but only at the beginning of the buffer
1111      or string being matched against.
1112
1113 `\''
1114      matches the empty string, but only at the end of the buffer or
1115      string being matched against.
1116
1117 `\='
1118      matches the empty string, but only at point.  (This construct is
1119      not defined when matching against a string.)
1120
1121 `\b'
1122      matches the empty string, but only at the beginning or end of a
1123      word.  Thus, `\bfoo\b' matches any occurrence of `foo' as a
1124      separate word.  `\bballs?\b' matches `ball' or `balls' as a
1125      separate word.
1126
1127 `\B'
1128      matches the empty string, but _not_ at the beginning or end of a
1129      word.
1130
1131 `\<'
1132      matches the empty string, but only at the beginning of a word.
1133
1134 `\>'
1135      matches the empty string, but only at the end of a word.
1136
1137    Not every string is a valid regular expression.  For example, a
1138 string with unbalanced square brackets is invalid (with a few
1139 exceptions, such as `[]]'), and so is a string that ends with a single
1140 `\'.  If an invalid regular expression is passed to any of the search
1141 functions, an `invalid-regexp' error is signaled.
1142
1143  - Function: regexp-quote string
1144      This function returns a regular expression string that matches
1145      exactly STRING and nothing else.  This allows you to request an
1146      exact string match when calling a function that wants a regular
1147      expression.
1148
1149           (regexp-quote "^The cat$")
1150                => "\\^The cat\\$"
1151
1152      One use of `regexp-quote' is to combine an exact string match with
1153      context described as a regular expression.  For example, this
1154      searches for the string that is the value of `string', surrounded
1155      by whitespace:
1156
1157           (re-search-forward
1158            (concat "\\s-" (regexp-quote string) "\\s-"))
1159