This is ../info/lispref.info, produced by makeinfo version 4.6 from
lispref/lispref.texi.

INFO-DIR-SECTION XEmacs Editor
START-INFO-DIR-ENTRY
* Lispref: (lispref).		XEmacs Lisp Reference Manual.
END-INFO-DIR-ENTRY

   Edition History:

   GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998

   Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
Copyright (C) 1995, 1996 Ben Wing.

   Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that the
entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.

   Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the section entitled "GNU General Public License" is included
exactly as in the original, and provided that the entire resulting
derived work is distributed under the terms of a permission notice
identical to this one.

   Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that the section entitled "GNU General Public License"
may be included in a translation approved by the Free Software
Foundation instead of in the original English.


File: lispref.info,  Node: The LDAP Lisp Object,  Next: Opening and Closing a LDAP Connection,  Prev: The Low-Level LDAP API,  Up: The Low-Level LDAP API

The LDAP Lisp Object
....................

An internal built-in `ldap' lisp object represents a LDAP connection.

 - Function: ldapp object
     This function returns non-`nil' if OBJECT is a `ldap' object.

 - Function: ldap-host ldap
     Return the server host of the connection represented by LDAP.

 - Function: ldap-live-p ldap
     Return non-`nil' if LDAP is an active LDAP connection.


File: lispref.info,  Node: Opening and Closing a LDAP Connection,  Next: Low-level Operations on a LDAP Server,  Prev: The LDAP Lisp Object,  Up: The Low-Level LDAP API

Opening and Closing a LDAP Connection
.....................................

 - Function: ldap-open host &optional plist
     Open a LDAP connection to HOST.  PLIST is a property list
     containing additional parameters for the connection.  Valid keys
     in that list are:
    `port'
          The TCP port to use for the connection if different from
          `ldap-default-port' or the library builtin value

    `auth'
          The authentication method to use, possible values depend on
          the LDAP library XEmacs was compiled with, they may include
          `simple', `krbv41' and `krbv42'.

    `binddn'
          The distinguished name of the user to bind as.  This may look
          like `c=com, o=Acme, cn=Babs Jensen', see RFC 1779 for
          details.

    `passwd'
          The password to use for authentication.

    `deref'
          The dereference policy is one of the symbols `never',
          `always', `search' or `find' and defines how aliases are
          dereferenced.
         `never'
               Aliases are never dereferenced.

         `always'
               Aliases are always dereferenced.

         `search'
               Aliases are dereferenced when searching.

         `find'
               Aliases are dereferenced when locating the base object
               for the search.
          The default is `never'.

    `timelimit'
          The timeout limit for the connection in seconds.

    `sizelimit'
          The maximum number of matches to return for searches
          performed on this connection.

 - Function: ldap-close ldap
     Close the connection represented by LDAP.


File: lispref.info,  Node: Low-level Operations on a LDAP Server,  Prev: Opening and Closing a LDAP Connection,  Up: The Low-Level LDAP API

Low-level Operations on a LDAP Server
.....................................

`ldap-search-basic' is the low-level primitive to perform a search on a
LDAP server.  It works directly on an open LDAP connection thus
requiring a preliminary call to `ldap-open'.  Multiple searches can be
made on the same connection, then the session must be closed with
`ldap-close'.

 - Function: ldap-search-basic ldap filter &optional base scope attrs
          attrsonly withdn verbose
     Perform a search on an open connection LDAP created with
     `ldap-open'.  FILTER is a filter string for the search *note
     Syntax of Search Filters:: BASE is the distinguished name at which
     to start the search.  SCOPE is one of the symbols `base',
     `onelevel' or `subtree' indicating the scope of the search limited
     to a base object, to a single level or to the whole subtree.  The
     default is `subtree'.  ATTRS is a list of strings indicating which
     attributes to retrieve for each matching entry. If `nil' all
     available attributes are returned.  If ATTRSONLY is non-`nil' then
     only the attributes are retrieved, not their associated values.
     If WITHDN is non-`nil' then each entry in the result is prepended
     with its distinguished name DN.  If VERBOSE is non-`nil' then
     progress messages are echoed The function returns a list of
     matching entries.  Each entry  is itself an alist of
     attribute/value pairs optionally preceded by the DN of the entry
     according to the value of WITHDN.

 - Function: ldap-add ldap dn entry
     Add ENTRY to a LDAP directory which a connection LDAP has been
     opened to with `ldap-open'.  DN is the distinguished name of the
     entry to add.  ENTRY is an entry specification, i.e., a list of
     cons cells containing attribute/value string pairs.

 - Function: ldap-modify ldap dn mods
     Modify an entry in an LDAP directory.  LDAP is an LDAP connection
     object created with `ldap-open'.  DN is the distinguished name of
     the entry to modify.  MODS is a list of modifications to apply.  A
     modification is a list of the form `(MOD-OP ATTR VALUE1 VALUE2
     ...)'  MOD-OP and ATTR are mandatory, VALUES are optional
     depending on MOD-OP.  MOD-OP is the type of modification, one of
     the symbols `add', `delete' or `replace'. ATTR is the LDAP
     attribute type to modify.

 - Function: ldap-delete ldap dn
     Delete an entry to an LDAP directory.  LDAP is an LDAP connection
     object created with `ldap-open'.  DN is the distinguished name of
     the entry to delete.


File: lispref.info,  Node: LDAP Internationalization,  Prev: The Low-Level LDAP API,  Up: XEmacs LDAP API

LDAP Internationalization
-------------------------

The XEmacs LDAP API provides basic internationalization features based
on the LDAP v3 specification (essentially RFC2252 on "LDAP v3 Attribute
Syntax Definitions").  Unfortunately since there is currently no free
LDAP v3 server software, this part has not received much testing and
should be considered experimental.  The framework is in place though.

 - Function: ldap-decode-attribute attr
     Decode the attribute/value pair ATTR according to LDAP rules.  The
     attribute name is looked up in `ldap-attribute-syntaxes-alist' and
     the corresponding decoder is then retrieved from
     `ldap-attribute-syntax-decoders'' and applied on the value(s).

* Menu:

* LDAP Internationalization Variables::
* Encoder/Decoder Functions::


File: lispref.info,  Node: LDAP Internationalization Variables,  Next: Encoder/Decoder Functions,  Prev: LDAP Internationalization,  Up: LDAP Internationalization

LDAP Internationalization Variables
...................................

 - Variable: ldap-ignore-attribute-codings
     If non-`nil', no encoding/decoding will be performed LDAP
     attribute values

 - Variable: ldap-coding-system
     Coding system of LDAP string values.  LDAP v3 specifies the coding
     system of strings to be UTF-8.  You need an XEmacs with Mule
     support for this.

 - Variable: ldap-default-attribute-decoder
     Decoder function to use for attributes whose syntax is unknown.
     Such a function receives an encoded attribute value as a string
     and should return the decoded value as a string.

 - Variable: ldap-attribute-syntax-encoders
     A vector of functions used to encode LDAP attribute values.  The
     sequence of functions corresponds to the sequence of LDAP
     attribute syntax object identifiers of the form
     1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2.
     As of this writing, only a few encoder functions are available.

 - Variable: ldap-attribute-syntax-decoders
     A vector of functions used to decode LDAP attribute values.  The
     sequence of functions corresponds to the sequence of LDAP
     attribute syntax object identifiers of the form
     1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2.
     As of this writing, only a few decoder functions are available.

 - Variable: ldap-attribute-syntaxes-alist
     A map of LDAP attribute names to their type object id minor number.
     This table is built from RFC2252 Section 5 and RFC2256 Section 5.


File: lispref.info,  Node: Encoder/Decoder Functions,  Prev: LDAP Internationalization Variables,  Up: LDAP Internationalization

Encoder/Decoder Functions
.........................

 - Function: ldap-encode-boolean bool
     A function that encodes an elisp boolean BOOL into a LDAP boolean
     string representation.

 - Function: ldap-decode-boolean str
     A function that decodes a LDAP boolean string representation STR
     into an elisp boolean.

 - Function: ldap-decode-string str
     Decode a string STR according to `ldap-coding-system'.

 - Function: ldap-encode-string str
     Encode a string STR according to `ldap-coding-system'.

 - Function: ldap-decode-address str
     Decode an address STR according to `ldap-coding-system' and
     replacing $ signs with newlines as specified by LDAP encoding
     rules for addresses.

 - Function: ldap-encode-address str
     Encode an address STR according to `ldap-coding-system' and
     replacing newlines with $ signs as specified by LDAP encoding
     rules for addresses.


File: lispref.info,  Node: Syntax of Search Filters,  Prev: XEmacs LDAP API,  Up: LDAP Support

Syntax of Search Filters
========================

LDAP search functions use RFC1558 syntax to describe the search filter.
In that syntax simple filters have the form:

     (<attr> <filtertype> <value>)

   `<attr>' is an attribute name such as `cn' for Common Name, `o' for
Organization, etc...

   `<value>' is the corresponding value.  This is generally an exact
string but may also contain `*' characters as wildcards

   `filtertype' is one `=' `~=', `<=', `>=' which respectively describe
equality, approximate equality, inferiority and superiority.

   Thus `(cn=John Smith)' matches all records having a canonical name
equal to John Smith.

   A special case is the presence filter `(<attr>=*' which matches
records containing a particular attribute.  For instance `(mail=*)'
matches all records containing a `mail' attribute.

   Simple filters can be connected together with the logical operators
`&', `|' and `!' which stand for the usual and, or and not operators.

   `(&(objectClass=Person)(mail=*)(|(sn=Smith)(givenname=John)))'
matches records of class `Person' containing a `mail' attribute and
corresponding to people whose last name is `Smith' or whose first name
is `John'.


File: lispref.info,  Node: PostgreSQL Support,  Next: Internationalization,  Prev: LDAP Support,  Up: Top

PostgreSQL Support
******************

XEmacs can be linked with PostgreSQL libpq run-time support to provide
relational database access from Emacs Lisp code.

* Menu:

* Building XEmacs with PostgreSQL support::
* XEmacs PostgreSQL libpq API::
* XEmacs PostgreSQL libpq Examples::


File: lispref.info,  Node: Building XEmacs with PostgreSQL support,  Next: XEmacs PostgreSQL libpq API,  Up: PostgreSQL Support

Building XEmacs with PostgreSQL support
=======================================

XEmacs PostgreSQL support requires linking to the PostgreSQL libpq
library.  Describing how to build and install PostgreSQL is beyond the
scope of this document.  See the PostgreSQL manual for details.

   If you have installed XEmacs from one of the binary kits on
(<ftp://ftp.xemacs.org/>), or are using an XEmacs binary from a CD ROM,
you may have XEmacs PostgreSQL support by default.  `M-x
describe-installation' will tell you if you do.

   If you are building XEmacs from source, you need to install
PostgreSQL first.  On some systems, PostgreSQL will come pre-installed
in /usr.  In this case, it should be autodetected when you run
configure.  If PostgreSQL is installed into its default location,
`/usr/local/pgsql', you must specify `--site-prefixes=/usr/local/pgsql'
when you run configure.  If PostgreSQL is installed into another
location, use that instead of `/usr/local/pgsql' when specifying
`--site-prefixes'.

   As of XEmacs 21.2, PostgreSQL versions 6.5.3 and 7.0 are supported.
XEmacs Lisp support for V7.0 is somewhat more extensive than support for
V6.5.  In particular, asynchronous queries are supported.


File: lispref.info,  Node: XEmacs PostgreSQL libpq API,  Next: XEmacs PostgreSQL libpq Examples,  Prev: Building XEmacs with PostgreSQL support,  Up: PostgreSQL Support

XEmacs PostgreSQL libpq API
===========================

The XEmacs PostgreSQL API is intended to be a policy-free, low-level
binding to libpq.  The intent is to provide all the basic functionality
and then let high level Lisp code decide its own policies.

   This documentation assumes that the reader has knowledge of SQL, but
requires no prior knowledge of libpq.

   There are many examples in this manual and some setup will be
required.  In order to run most of the following examples, the
following code needs to be executed.  In addition to the data is in
this table, nearly all of the examples will assume that the free
variable `P' refers to this database connection.  The examples in the
original edition of this manual were run against Postgres 7.0beta1.

     (progn
       (setq P (pq-connectdb ""))
       ;; id is the primary key, shikona is a Japanese word that
       ;; means `the professional name of a Sumo wrestler', and
       ;; rank is the Sumo rank name.
       (pq-exec P (concat "CREATE TABLE xemacs_test"
                          " (id int, shikona text, rank text);"))
       (pq-exec P "COPY xemacs_test FROM stdin;")
       (pq-put-line P "1\tMusashimaru\tYokuzuna\n")
       (pq-put-line P "2\tDejima\tOozeki\n")
       (pq-put-line P "3\tMusoyama\tSekiwake\n")
       (pq-put-line P "4\tMiyabiyama\tSekiwake\n")
       (pq-put-line P "5\tWakanoyama\tMaegashira\n")
       (pq-put-line P "\\.\n")
       (pq-end-copy P))
          => nil

* Menu:

* libpq Lisp Variables::
* libpq Lisp Symbols and DataTypes::
* Synchronous Interface Functions::
* Asynchronous Interface Functions::
* Large Object Support::
* Other libpq Functions::
* Unimplemented libpq Functions::


File: lispref.info,  Node: libpq Lisp Variables,  Next: libpq Lisp Symbols and DataTypes,  Prev: XEmacs PostgreSQL libpq API,  Up: XEmacs PostgreSQL libpq API

libpq Lisp Variables
--------------------

Various Unix environment variables are used by libpq to provide defaults
to the many different parameters.  In the XEmacs Lisp API, these
environment variables are bound to Lisp variables to provide more
convenient access to Lisp Code.  These variables are passed to the
backend database server during the establishment of a database
connection and when the `pq-setenv' call is made.

 - Variable: pg:host
     Initialized from the `PGHOST' environment variable.  The default
     host to connect to.

 - Variable: pg:user
     Initialized from the `PGUSER' environment variable.  The default
     database user name.

 - Variable: pg:options
     Initialized from the `PGOPTIONS' environment variable.  Default
     additional server options.

 - Variable: pg:port
     Initialized from the `PGPORT' environment variable.  The default
     TCP port to connect to.

 - Variable: pg:tty
     Initialized from the `PGTTY' environment variable.  The default
     debugging TTY.

     Compatibility note:  Debugging TTYs are turned off in the XEmacs
     Lisp binding.

 - Variable: pg:database
     Initialized from the `PGDATABASE' environment variable.  The
     default database to connect to.

 - Variable: pg:realm
     Initialized from the `PGREALM' environment variable.  The default
     Kerberos realm.

 - Variable: pg:client-encoding
     Initialized from the `PGCLIENTENCODING' environment variable.  The
     default client encoding.

     Compatibility note:  This variable is not present in non-Mule
     XEmacsen.  This variable is not present in versions of libpq prior
     to 7.0.  In the current implementation, client encoding is
     equivalent to the `file-name-coding-system' format.

 - Variable: pg:authtype
     Initialized from the `PGAUTHTYPE' environment variable.  The
     default authentication scheme used.

     Compatibility note:  This variable is unused in versions of libpq
     after 6.5.  It is not implemented at all in the XEmacs Lisp
     binding.

 - Variable: pg:geqo
     Initialized from the `PGGEQO' environment variable.  Genetic
     optimizer options.

 - Variable: pg:cost-index
     Initialized from the `PGCOSTINDEX' environment variable.  Cost
     index options.

 - Variable: pg:cost-heap
     Initialized from the `PGCOSTHEAP' environment variable.  Cost heap
     options.

 - Variable: pg:tz
     Initialized from the `PGTZ' environment variable.  Default
     timezone.

 - Variable: pg:date-style
     Initialized from the `PGDATESTYLE' environment variable.  Default
     date style in returned date objects.

 - Variable: pg-coding-system
     This is a variable controlling which coding system is used to
     encode non-ASCII strings sent to the database.

     Compatibility Note: This variable is not present in InfoDock.


File: lispref.info,  Node: libpq Lisp Symbols and DataTypes,  Next: Synchronous Interface Functions,  Prev: libpq Lisp Variables,  Up: XEmacs PostgreSQL libpq API

libpq Lisp Symbols and Datatypes
--------------------------------

The following set of symbols are used to represent the intermediate
states involved in the asynchronous interface.

 - Symbol: pgres::polling-failed
     Undocumented.  A fatal error has occurred during processing of an
     asynchronous operation.

 - Symbol: pgres::polling-reading
     An intermediate status return during an asynchronous operation.  It
     indicates that one may use `select' before polling again.

 - Symbol: pgres::polling-writing
     An intermediate status return during an asynchronous operation.  It
     indicates that one may use `select' before polling again.

 - Symbol: pgres::polling-ok
     An asynchronous operation has successfully completed.

 - Symbol: pgres::polling-active
     An intermediate status return during an asynchronous operation.
     One can call the poll function again immediately.

 - Function: pq-pgconn conn field
     CONN A database connection object.  FIELD A symbol indicating
     which field of PGconn to fetch.  Possible values are shown in the
     following table.
    `pq::db'
          Database name

    `pq::user'
          Database user name

    `pq::pass'
          Database user's password

    `pq::host'
          Hostname database server is running on

    `pq::port'
          TCP port number used in the connection

    `pq::tty'
          Debugging TTY

          Compatibility note:  Debugging TTYs are not used in the
          XEmacs Lisp API.

    `pq::options'
          Additional server options

    `pq::status'
          Connection status.  Possible return values are shown in the
          following table.
         `pg::connection-ok'
               The normal, connected status.

         `pg::connection-bad'
               The connection is not open and the PGconn object needs
               to be deleted by `pq-finish'.

         `pg::connection-started'
               An asynchronous connection has been started, but is not
               yet complete.

         `pg::connection-made'
               An asynchronous connect has been made, and there is data
               waiting to be sent.

         `pg::connection-awaiting-response'
               Awaiting data from the backend during an asynchronous
               connection.

         `pg::connection-auth-ok'
               Received authentication, waiting for the backend to
               start up.

         `pg::connection-setenv'
               Negotiating environment during an asynchronous
               connection.

    `pq::error-message'
          The last error message that was delivered to this connection.

    `pq::backend-pid'
          The process ID of the backend database server.

   The `PGresult' object is used by libpq to encapsulate the results of
queries.  The printed representation takes on four forms.  When the
PGresult object contains tuples from an SQL `SELECT' it will look like:

     (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
          => #<PGresult PGRES_TUPLES_OK[5] - SELECT>

   The number in brackets indicates how many rows of data are available.
When the PGresult object is the result of a command query that doesn't
return anything, it will look like:

     (pq-exec P "CREATE TABLE a_new_table (i int);")
          => #<PGresult PGRES_COMMAND_OK - CREATE>

   When either the query is a command-type query that can affect a
number of different rows, but doesn't return any of them it will look
like:

     (progn
       (pq-exec P "INSERT INTO a_new_table VALUES (1);")
       (pq-exec P "INSERT INTO a_new_table VALUES (2);")
       (pq-exec P "INSERT INTO a_new_table VALUES (3);")
       (setq R (pq-exec P "DELETE FROM a_new_table;")))
          => #<PGresult PGRES_COMMAND_OK[3] - DELETE 3>

   Lastly, when the underlying PGresult object has been deallocated
directly by `pq-clear' the printed representation will look like:

     (progn
       (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
       (pq-clear R)
       R)
          => #<PGresult DEAD>

   The following set of functions are accessors to various data in the
PGresult object.

 - Function: pq-result-status result
     Return status of a query result.  RESULT is a PGresult object.
     The return value is one of the symbols in the following table.
    `pgres::empty-query'
          A query contained no text.  This is usually the result of a
          recoverable error, or a minor programming error.

    `pgres::command-ok'
          A query command that doesn't return anything was executed
          properly by the backend.

    `pgres::tuples-ok'
          A query command that returns tuples was executed properly by
          the backend.

    `pgres::copy-out'
          Copy Out data transfer is in progress.

    `pgres::copy-in'
          Copy In data transfer is in progress.

    `pgres::bad-response'
          An unexpected response was received from the backend.

    `pgres::nonfatal-error'
          Undocumented.  This value is returned when the libpq function
          `PQresultStatus' is called with a `NULL' pointer.

    `pgres::fatal-error'
          Undocumented.  An error has occurred in processing the query
          and the operation was not completed.

 - Function: pq-res-status result
     Return the query result status as a string, not a symbol.  RESULT
     is a PGresult object.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-res-status R)
               => "PGRES_TUPLES_OK"

 - Function: pq-result-error-message result
     Return an error message generated by the query, if any.  RESULT is
     a PGresult object.

          (setq R (pq-exec P "SELECT * FROM xemacs-test;"))
               => <A fatal error is signaled in the echo area>
          (pq-result-error-message R)
               => "ERROR:  parser: parse error at or near \"-\"
          "

 - Function: pq-ntuples result
     Return the number of tuples in the query result.  RESULT is a
     PGresult object.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-ntuples R)
               => 5

 - Function: pq-nfields result
     Return the number of fields in each tuple of the query result.
     RESULT is a PGresult object.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-nfields R)
               => 3

 - Function: pq-binary-tuples result
     Returns t if binary tuples are present in the results, nil
     otherwise.  RESULT is a PGresult object.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-binary-tuples R)
               => nil

 - Function: pq-fname result field-index
     Returns the name of a specific field.  RESULT is a PGresult object.
     FIELD-INDEX is the number of the column to select from.  The first
     column is number zero.

          (let (i l)
            (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
            (setq i (pq-nfields R))
            (while (>= (decf i) 0)
              (push (pq-fname R i) l))
            l)
               => ("id" "shikona" "rank")

 - Function: pq-fnumber result field-name
     Return the field number corresponding to the given field name.  -1
     is returned on a bad field name.  RESULT is a PGresult object.
     FIELD-NAME is a string representing the field name to find.
          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-fnumber R "id")
               => 0
          (pq-fnumber R "Not a field")
               => -1

 - Function: pq-ftype result field-num
     Return an integer code representing the data type of the specified
     column.  RESULT is a PGresult object.  FIELD-NUM is the field
     number.

     The return value of this function is the Object ID (Oid) in the
     database of the type.  Further queries need to be made to various
     system tables in order to convert this value into something useful.

 - Function: pq-fmod result field-num
     Return the type modifier code associated with a field.  Field
     numbers start at zero.  RESULT is a PGresult object.  FIELD-INDEX
     selects which field to use.

 - Function: pq-fsize result field-index
     Return size of the given field.  RESULT is a PGresult object.
     FIELD-INDEX selects which field to use.

          (let (i l)
            (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
            (setq i (pq-nfields R))
            (while (>= (decf i) 0)
              (push (list (pq-ftype R i) (pq-fsize R i)) l))
            l)
               => ((23 23) (25 25) (25 25))

 - Function: pq-get-value result tup-num field-num
     Retrieve a return value.  RESULT is a PGresult object.  TUP-NUM
     selects which tuple to fetch from.  FIELD-NUM selects which field
     to fetch from.

     Both tuples and fields are numbered from zero.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-get-value R 0 1)
               => "Musashimaru"
          (pq-get-value R 1 1)
               => "Dejima"
          (pq-get-value R 2 1)
               => "Musoyama"

 - Function: pq-get-length result tup-num field-num
     Return the length of a specific value.  RESULT is a PGresult
     object.  TUP-NUM selects which tuple to fetch from.  FIELD-NUM
     selects which field to fetch from.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
          (pq-get-length R 0 1)
               => 11
          (pq-get-length R 1 1)
               => 6
          (pq-get-length R 2 1)
               => 8

 - Function: pq-get-is-null result tup-num field-num
     Return t if the specific value is the SQL `NULL'.  RESULT is a
     PGresult object.  TUP-NUM selects which tuple to fetch from.
     FIELD-NUM selects which field to fetch from.

 - Function: pq-cmd-status result
     Return a summary string from the query.  RESULT is a PGresult
     object.
          (setq R (pq-exec P "INSERT INTO xemacs_test
                             VALUES (6, 'Wakanohana', 'Yokozuna');"))
               => #<PGresult PGRES_COMMAND_OK[1] - INSERT 542086 1>
          (pq-cmd-status R)
               => "INSERT 542086 1"
          (setq R (pq-exec P "UPDATE xemacs_test SET rank='retired'
                              WHERE shikona='Wakanohana';"))
               => #<PGresult PGRES_COMMAND_OK[1] - UPDATE 1>
          (pq-cmd-status R)
               => "UPDATE 1"

     Note that the first number returned from an insertion, like in the
     example, is an object ID number and will almost certainly vary from
     system to system since object ID numbers in Postgres must be unique
     across all databases.

 - Function: pq-cmd-tuples result
     Return the number of tuples if the last command was an
     INSERT/UPDATE/DELETE.  If the last command was something else, the
     empty string is returned.  RESULT is a PGresult object.

          (setq R (pq-exec P "INSERT INTO xemacs_test VALUES
                              (7, 'Takanohana', 'Yokuzuna');"))
               => #<PGresult PGRES_COMMAND_OK[1] - INSERT 38688 1>
          (pq-cmd-tuples R)
               => "1"
          (setq R (pq-exec P "SELECT * from xemacs_test;"))
               => #<PGresult PGRES_TUPLES_OK[7] - SELECT>
          (pq-cmd-tuples R)
               => ""
          (setq R (pq-exec P "DELETE FROM xemacs_test
                              WHERE shikona LIKE '%hana';"))
               => #<PGresult PGRES_COMMAND_OK[2] - DELETE 2>
          (pq-cmd-tuples R)
               => "2"

 - Function: pq-oid-value result
     Return the object id of the insertion if the last command was an
     INSERT.  0 is returned if the last command was not an insertion.
     RESULT is a PGresult object.

     In the first example, the numbers you will see on your local
     system will almost certainly be different, however the second
     number from the right in the unprintable PGresult object and the
     number returned by `pq-oid-value' should match.
          (setq R (pq-exec P "INSERT INTO xemacs_test VALUES
                              (8, 'Terao', 'Maegashira');"))
               => #<PGresult PGRES_COMMAND_OK[1] - INSERT 542089 1>
          (pq-oid-value R)
               => 542089
          (setq R (pq-exec P "SELECT shikona FROM xemacs_test
                              WHERE rank='Maegashira';"))
               => #<PGresult PGRES_TUPLES_OK[2] - SELECT>
          (pq-oid-value R)
               => 0

 - Function: pq-make-empty-pgresult conn status
     Create an empty pgresult with the given status.  CONN a database
     connection object STATUS a value that can be returned by
     `pq-result-status'.

     The caller is responsible for making sure the return value gets
     properly freed.


File: lispref.info,  Node: Synchronous Interface Functions,  Next: Asynchronous Interface Functions,  Prev: libpq Lisp Symbols and DataTypes,  Up: XEmacs PostgreSQL libpq API

Synchronous Interface Functions
-------------------------------

 - Function: pq-connectdb conninfo
     Establish a (synchronous) database connection.  CONNINFO A string
     of blank separated options.  Options are of the form "OPTION =
     VALUE".  If VALUE contains blanks, it must be single quoted.
     Blanks around the equal sign are optional.  Multiple option
     assignments are blank separated.
          (pq-connectdb "dbname=japanese port = 25432")
               => #<PGconn localhost:25432 steve/japanese>
     The printed representation of a database connection object has four
     fields.  The first field is the hostname where the database server
     is running (in this case localhost), the second field is the port
     number, the third field is the database user name, and the fourth
     field is the name of the database.

     Database connection objects which have been disconnected and will
     generate an immediate error if they are used look like:
            #<PGconn BAD>
     Bad connections can be reestablished with `pq-reset', or deleted
     entirely with `pq-finish'.

     A database connection object that has been deleted looks like:
          (let ((P1 (pq-connectdb "")))
            (pq-finish P1)
            P1)
               => #<PGconn DEAD>

     Note that database connection objects are the most heavy weight
     objects in XEmacs Lisp at this writing, usually representing as
     much as several megabytes of virtual memory on the machine the
     database server is running on.  It is wisest to explicitly delete
     them when you are finished with them, rather than letting garbage
     collection do it.  An example idiom is:

          (let ((P (pq-connectiondb "")))
            (unwind-protect
                (progn
          	(...)) ; access database here
              (pq-finish P)))

     The following options are available in the options string:
    `authtype'
          Authentication type.  Same as `PGAUTHTYPE'.  This is no
          longer used.

    `user'
          Database user name.  Same as `PGUSER'.

    `password'
          Database password.

    `dbname'
          Database name.  Same as `PGDATABASE'

    `host'
          Symbolic hostname.  Same as `PGHOST'.

    `hostaddr'
          Host address as four octets (eg. like 192.168.1.1).

    `port'
          TCP port to connect to.  Same as `PGPORT'.

    `tty'
          Debugging TTY.  Same as `PGTTY'.  This value is suppressed in
          the XEmacs Lisp API.

    `options'
          Extra backend database options.  Same as `PGOPTIONS'.
     A database connection object is returned regardless of whether a
     connection was established or not.

 - Function: pq-reset conn
     Reestablish database connection.  CONN A database connection
     object.

     This function reestablishes a database connection using the
     original connection parameters.  This is useful if something has
     happened to the TCP link and it has become broken.

 - Function: pq-exec conn query
     Make a synchronous database query.  CONN A database connection
     object.  QUERY A string containing an SQL query.  A PGresult
     object is returned, which in turn may be queried by its many
     accessor functions to retrieve state out of it.  If the query
     string contains multiple SQL commands, only results from the final
     command are returned.

          (setq R (pq-exec P "SELECT * FROM xemacs_test;
          DELETE FROM xemacs_test WHERE id=8;"))
               => #<PGresult PGRES_COMMAND_OK[1] - DELETE 1>

 - Function: pq-notifies conn
     Return the latest async notification that has not yet been handled.
     CONN A database connection object.  If there has been a
     notification, then a list of two elements will be returned.  The
     first element contains the relation name being notified, the second
     element contains the backend process ID number.  nil is returned
     if there aren't any notifications to process.

 - Function: PQsetenv conn
     Synchronous transfer of environment variables to a backend CONN A
     database connection object.

     Environment variable transfer is done as a normal part of database
     connection.

     Compatibility note: This function was present but not documented
     in versions of libpq prior to 7.0.


File: lispref.info,  Node: Asynchronous Interface Functions,  Next: Large Object Support,  Prev: Synchronous Interface Functions,  Up: XEmacs PostgreSQL libpq API

Asynchronous Interface Functions
--------------------------------

Making command by command examples is too complex with the asynchronous
interface functions.  See the examples section for complete calling
sequences.

 - Function: pq-connect-start conninfo
     Begin establishing an asynchronous database connection.  CONNINFO
     A string containing the connection options.  See the documentation
     of `pq-connectdb' for a listing of all the available flags.

 - Function: pq-connect-poll conn
     An intermediate function to be called during an asynchronous
     database connection.  CONN A database connection object.  The
     result codes are documented in a previous section.

 - Function: pq-is-busy conn
     Returns t if `pq-get-result' would block waiting for input.  CONN
     A database connection object.

 - Function: pq-consume-input conn
     Consume any available input from the backend.  CONN A database
     connection object.

     Nil is returned if anything bad happens.

 - Function: pq-reset-start conn
     Reset connection to the backend asynchronously.  CONN A database
     connection object.

 - Function: pq-reset-poll conn
     Poll an asynchronous reset for completion CONN A database
     connection object.

 - Function: pq-reset-cancel conn
     Attempt to request cancellation of the current operation.  CONN A
     database connection object.

     The return value is t if the cancel request was successfully
     dispatched, nil if not (in which case conn->errorMessage is set).
     Note: successful dispatch is no guarantee that there will be any
     effect at the backend.  The application must read the operation
     result as usual.

 - Function: pq-send-query conn query
     Submit a query to Postgres and don't wait for the result.  CONN A
     database connection object.  Returns: t if successfully submitted
            nil if error (conn->errorMessage is set)

 - Function: pq-get-result conn
     Retrieve an asynchronous result from a query.  CONN A database
     connection object.

     `nil' is returned when no more query work remains.

 - Function: pq-set-nonblocking conn arg
     Sets the PGconn's database connection non-blocking if the arg is
     TRUE or makes it non-blocking if the arg is FALSE, this will not
     protect you from PQexec(), you'll only be safe when using the
     non-blocking API.  CONN A database connection object.

 - Function: pq-is-nonblocking conn
     Return the blocking status of the database connection CONN A
     database connection object.

 - Function: pq-flush conn
     Force the write buffer to be written (or at least try) CONN A
     database connection object.

 - Function: PQsetenvStart conn
     Start asynchronously passing environment variables to a backend.
     CONN A database connection object.

     Compatibility note: this function is only available with libpq-7.0.

 - Function: PQsetenvPoll conn
     Check an asynchronous environment variables transfer for
     completion.  CONN A database connection object.

     Compatibility note: this function is only available with libpq-7.0.

 - Function: PQsetenvAbort conn
     Attempt to terminate an asynchronous environment variables
     transfer.  CONN A database connection object.

     Compatibility note: this function is only available with libpq-7.0.


File: lispref.info,  Node: Large Object Support,  Next: Other libpq Functions,  Prev: Asynchronous Interface Functions,  Up: XEmacs PostgreSQL libpq API

Large Object Support
--------------------

 - Function: pq-lo-import conn filename
     Import a file as a large object into the database.  CONN a
     database connection object FILENAME filename to import

     On success, the object id is returned.

 - Function: pq-lo-export conn oid filename
     Copy a large object in the database into a file.  CONN a database
     connection object.  OID object id number of a large object.
     FILENAME filename to export to.


File: lispref.info,  Node: Other libpq Functions,  Next: Unimplemented libpq Functions,  Prev: Large Object Support,  Up: XEmacs PostgreSQL libpq API

Other libpq Functions
---------------------

 - Function: pq-finish conn
     Destroy a database connection object by calling free on it.  CONN
     a database connection object

     It is possible to not call this routine because the usual XEmacs
     garbage collection mechanism will call the underlying libpq
     routine whenever it is releasing stale `PGconn' objects.  However,
     this routine is useful in `unwind-protect' clauses to make
     connections go away quickly when unrecoverable errors have
     occurred.

     After calling this routine, the printed representation of the
     XEmacs wrapper object will contain the string "DEAD".

 - Function: pq-client-encoding conn
     Return the client encoding as an integer code.  CONN a database
     connection object

          (pq-client-encoding P)
               => 1

     Compatibility note: This function did not exist prior to libpq-7.0
     and does not exist in a non-Mule XEmacs.

 - Function: pq-set-client-encoding conn encoding
     Set client coding system.  CONN a database connection object
     ENCODING a string representing the desired coding system

          (pq-set-client-encoding P "EUC_JP")
               => 0

     The current idiom for ensuring proper coding system conversion is
     the following (illustrated for EUC Japanese encoding):
          (setq P (pq-connectdb "..."))
          (let ((file-name-coding-system 'euc-jp)
                (pg-coding-system 'euc-jp))
            (pq-set-client-encoding "EUC_JP")
            ...)
          (pq-finish P)
     Compatibility note: This function did not exist prior to libpq-7.0
     and does not exist in a non-Mule XEmacs.

 - Function: pq-env-2-encoding
     Return the integer code representing the coding system in
     `PGCLIENTENCODING'.

          (pq-env-2-encoding)
               => 0
     Compatibility note: This function did not exist prior to libpq-7.0
     and does not exist in a non-Mule XEmacs.

 - Function: pq-clear res
     Destroy a query result object by calling free() on it.  RES a
     query result object

     Note:  The memory allocation systems of libpq and XEmacs are
     different.  The XEmacs representation of a query result object
     will have both the XEmacs version and the libpq version freed at
     the next garbage collection when the object is no longer being
     referenced.  Calling this function does not release the XEmacs
     object, it is still subject to the usual rules for Lisp objects.
     The printed representation of the XEmacs object will contain the
     string "DEAD" after this routine is called indicating that it is no
     longer useful for anything.

 - Function: pq-conn-defaults
     Return a data structure that represents the connection defaults.
     The data is returned as a list of lists, where each sublist
     contains info regarding a single option.


File: lispref.info,  Node: Unimplemented libpq Functions,  Prev: Other libpq Functions,  Up: XEmacs PostgreSQL libpq API

Unimplemented libpq Functions
-----------------------------

 - Unimplemented Function: PGconn *PQsetdbLogin (char *pghost, char
          *pgport, char *pgoptions, char *pgtty, char *dbName, char
          *login, char *pwd)
     Synchronous database connection.  PGHOST is the hostname of the
     PostgreSQL backend to connect to.  PGPORT is the TCP port number
     to use.  PGOPTIONS specifies other backend options.  PGTTY
     specifies the debugging tty to use.  DBNAME specifies the database
     name to use.  LOGIN specifies the database user name.  PWD
     specifies the database user's password.

     This routine is deprecated as of libpq-7.0, and its functionality
     can be replaced by external Lisp code if needed.

 - Unimplemented Function: PGconn *PQsetdb (char *pghost, char *pgport,
          char *pgoptions, char *pgtty, char *dbName)
     Synchronous database connection.  PGHOST is the hostname of the
     PostgreSQL backend to connect to.  PGPORT is the TCP port number
     to use.  PGOPTIONS specifies other backend options.  PGTTY
     specifies the debugging tty to use.  DBNAME specifies the database
     name to use.

     This routine was deprecated in libpq-6.5.

 - Unimplemented Function: int PQsocket (PGconn *conn)
     Return socket file descriptor to a backend database process.  CONN
     database connection object.

 - Unimplemented Function: void PQprint (FILE *fout, PGresult *res,
          PGprintOpt *ps)
     Print out the results of a query to a designated C stream.  FOUT C
     stream to print to RES the query result object to print PS the
     print options structure.

     This routine is deprecated as of libpq-7.0 and cannot be sensibly
     exported to XEmacs Lisp.

 - Unimplemented Function: void PQdisplayTuples (PGresult *res, FILE
          *fp, int fillAlign, char *fieldSep, int printHeader, int
          quiet)
     RES query result object to print FP C stream to print to FILLALIGN
     pad the fields with spaces FIELDSEP field separator PRINTHEADER
     display headers?  QUIET

     This routine was deprecated in libpq-6.5.

 - Unimplemented Function: void PQprintTuples (PGresult *res, FILE
          *fout, int printAttName, int terseOutput, int width)
     RES query result object to print FOUT C stream to print to
     PRINTATTNAME print attribute names TERSEOUTPUT delimiter bars
     WIDTH width of column, if 0, use variable width

     This routine was deprecated in libpq-6.5.

 - Unimplemented Function: int PQmblen (char *s, int encoding)
     Determine length of a multibyte encoded char at `*s'.  S encoded
     string ENCODING type of encoding

     Compatibility note:  This function was introduced in libpq-7.0.

 - Unimplemented Function: void PQtrace (PGconn *conn, FILE *debug_port)
     Enable tracing on `debug_port'.  CONN database connection object.
     DEBUG_PORT C output stream to use.

 - Unimplemented Function: void PQuntrace (PGconn *conn)
     Disable tracing.  CONN database connection object.

 - Unimplemented Function: char *PQoidStatus (PGconn *conn)
     Return the object id as a string of the last tuple inserted.  CONN
     database connection object.

     Compatibility note: This function is deprecated in libpq-7.0,
     however it is used internally by the XEmacs binding code when
     linked against versions prior to 7.0.

 - Unimplemented Function: PGresult *PQfn (PGconn *conn, int fnid, int
          *result_buf, int *result_len, int result_is_int, PQArgBlock
          *args, int nargs)
     "Fast path" interface -- not really recommended for application use
     CONN A database connection object.  FNID RESULT_BUF RESULT_LEN
     RESULT_IS_INT ARGS NARGS

   The following set of very low level large object functions aren't
appropriate to be exported to Lisp.

 - Unimplemented Function: int pq-lo-open (PGconn *conn, int lobjid,
          int mode)
     CONN a database connection object.  LOBJID a large object ID.
     MODE opening modes.

 - Unimplemented Function: int pq-lo-close (PGconn *conn, int fd)
     CONN a database connection object.  FD a large object file
     descriptor

 - Unimplemented Function: int pq-lo-read (PGconn *conn, int fd, char
          *buf, int len)
     CONN a database connection object.  FD a large object file
     descriptor.  BUF buffer to read into.  LEN size of buffer.

 - Unimplemented Function: int pq-lo-write (PGconn *conn, int fd, char
          *buf, size_t len)
     CONN a database connection object.  FD a large object file
     descriptor.  BUF buffer to write from.  LEN size of buffer.

 - Unimplemented Function: int pq-lo-lseek (PGconn *conn, int fd, int
          offset, int whence)
     CONN a database connection object.  FD a large object file
     descriptor.  OFFSET WHENCE

 - Unimplemented Function: int pq-lo-creat (PGconn *conn, int mode)
     CONN a database connection object.  MODE opening modes.

 - Unimplemented Function: int pq-lo-tell (PGconn *conn, int fd)
     CONN a database connection object.  FD a large object file
     descriptor.

 - Unimplemented Function: int pq-lo-unlink (PGconn *conn, int lobjid)
     CONN a database connection object.  LBOJID a large object ID.


File: lispref.info,  Node: XEmacs PostgreSQL libpq Examples,  Prev: XEmacs PostgreSQL libpq API,  Up: PostgreSQL Support

XEmacs PostgreSQL libpq Examples
================================

This is an example of one method of establishing an asynchronous
connection.

     (defun database-poller (P)
       (message "%S before poll" (pq-pgconn P 'pq::status))
       (pq-connect-poll P)
       (message "%S after poll" (pq-pgconn P 'pq::status))
       (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
           (message "Done!")
         (add-timeout .1 'database-poller P)))
          => database-poller
     (progn
       (setq P (pq-connect-start ""))
       (add-timeout .1 'database-poller P))
          => pg::connection-started before poll
          => pg::connection-made after poll
          => pg::connection-made before poll
          => pg::connection-awaiting-response after poll
          => pg::connection-awaiting-response before poll
          => pg::connection-auth-ok after poll
          => pg::connection-auth-ok before poll
          => pg::connection-setenv after poll
          => pg::connection-setenv before poll
          => pg::connection-ok after poll
          => Done!
     P
          => #<PGconn localhost:25432 steve/steve>

   Here is an example of one method of doing an asynchronous reset.

     (defun database-poller (P)
       (let (PS)
         (message "%S before poll" (pq-pgconn P 'pq::status))
         (setq PS (pq-reset-poll P))
         (message "%S after poll [%S]" (pq-pgconn P 'pq::status) PS)
         (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
     	(message "Done!")
           (add-timeout .1 'database-poller P))))
          => database-poller
     (progn
       (pq-reset-start P)
       (add-timeout .1 'database-poller P))
          => pg::connection-started before poll
          => pg::connection-made after poll [pgres::polling-writing]
          => pg::connection-made before poll
          => pg::connection-awaiting-response after poll [pgres::polling-reading]
          => pg::connection-awaiting-response before poll
          => pg::connection-setenv after poll [pgres::polling-reading]
          => pg::connection-setenv before poll
          => pg::connection-ok after poll [pgres::polling-ok]
          => Done!
     P
          => #<PGconn localhost:25432 steve/steve>

   And finally, an asynchronous query.

     (defun database-poller (P)
       (let (R)
         (pq-consume-input P)
         (if (pq-is-busy P)
     	(add-timeout .1 'database-poller P)
           (setq R (pq-get-result P))
           (if R
     	  (progn
     	    (push R result-list)
     	    (add-timeout .1 'database-poller P))))))
          => database-poller
     (when (pq-send-query P "SELECT * FROM xemacs_test;")
       (setq result-list nil)
       (add-timeout .1 'database-poller P))
          => 885
     ;; wait a moment
     result-list
          => (#<PGresult PGRES_TUPLES_OK - SELECT>)

   Here is an example showing how multiple SQL statements in a single
query can have all their results collected.
     ;; Using the same `database-poller' function from the previous example
     (when (pq-send-query P "SELECT * FROM xemacs_test;
     SELECT * FROM pg_database;
     SELECT * FROM pg_user;")
       (setq result-list nil)
       (add-timeout .1 'database-poller P))
          => 1782
     ;; wait a moment
     result-list
          => (#<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT>)

   Here is an example which illustrates collecting all data from a
query, including the field names.

     (defun pg-util-query-results (results)
       "Retrieve results of last SQL query into a list structure."
       (let ((i (1- (pq-ntuples R)))
     	j l1 l2)
         (while (>= i 0)
           (setq j (1- (pq-nfields R)))
           (setq l2 nil)
           (while (>= j 0)
     	(push (pq-get-value R i j) l2)
     	(decf j))
           (push l2 l1)
           (decf i))
         (setq j (1- (pq-nfields R)))
         (setq l2 nil)
         (while (>= j 0)
           (push (pq-fname R j) l2)
           (decf j))
         (push l2 l1)
         l1))
          => pg-util-query-results
     (setq R (pq-exec P "SELECT * FROM xemacs_test ORDER BY field2 DESC;"))
          => #<PGresult PGRES_TUPLES_OK - SELECT>
     (pg-util-query-results R)
          => (("f1" "field2") ("a" "97") ("b" "97") ("stuff" "42") ("a string" "12") ("foo" "10") ("string" "2") ("text" "1"))

   Here is an example of a query that uses a database cursor.

     (let (data R)
       (setq R (pq-exec P "BEGIN;"))
       (setq R (pq-exec P "DECLARE k_cursor CURSOR FOR SELECT * FROM xemacs_test ORDER BY f1 DESC;"))
     
       (setq R (pq-exec P "FETCH k_cursor;"))
       (while (eq (pq-ntuples R) 1)
         (push (list (pq-get-value R 0 0) (pq-get-value R 0 1)) data)
         (setq R (pq-exec P "FETCH k_cursor;")))
       (setq R (pq-exec P "END;"))
       data)
          => (("a" "97") ("a string" "12") ("b" "97") ("foo" "10") ("string" "2") ("stuff" "42") ("text" "1"))

   Here's another example of cursors, this time with a Lisp macro to
implement a mapping function over a table.

     (defmacro map-db (P table condition callout)
       `(let (R)
          (pq-exec ,P "BEGIN;")
          (pq-exec ,P (concat "DECLARE k_cursor CURSOR FOR SELECT * FROM "
     			 ,table
     			 " "
     			 ,condition
     			 " ORDER BY f1 DESC;"))
          (setq R (pq-exec P "FETCH k_cursor;"))
          (while (eq (pq-ntuples R) 1)
            (,callout (pq-get-value R 0 0) (pq-get-value R 0 1))
            (setq R (pq-exec P "FETCH k_cursor;")))
          (pq-exec P "END;")))
          => map-db
     (defun callback (arg1 arg2)
       (message "arg1 = %s, arg2 = %s" arg1 arg2))
          => callback
     (map-db P "xemacs_test" "WHERE field2 > 10" callback)
          => arg1 = stuff, arg2 = 42
          => arg1 = b, arg2 = 97
          => arg1 = a string, arg2 = 12
          => arg1 = a, arg2 = 97
          => #<PGresult PGRES_COMMAND_OK - COMMIT>


File: lispref.info,  Node: Internationalization,  Next: MULE,  Prev: PostgreSQL Support,  Up: Top

Internationalization
********************

* Menu:

* I18N Levels 1 and 2:: Support for different time, date, and currency formats.
* I18N Level 3::        Support for localized messages.
* I18N Level 4::        Support for Asian languages.


File: lispref.info,  Node: I18N Levels 1 and 2,  Next: I18N Level 3,  Up: Internationalization

I18N Levels 1 and 2
===================

XEmacs is now compliant with I18N levels 1 and 2.  Specifically, this
means that it is 8-bit clean and correctly handles time and date
functions.  XEmacs will correctly display the entire ISO-Latin 1
character set.

   The compose key may now be used to create any character in the
ISO-Latin 1 character set not directly available via the keyboard..  In
order for the compose key to work it is necessary to load the file
`x-compose.el'.  At any time while composing a character, `C-h' will
display all valid completions and the character which would be produced.


File: lispref.info,  Node: I18N Level 3,  Next: I18N Level 4,  Prev: I18N Levels 1 and 2,  Up: Internationalization

I18N Level 3
============

* Menu:

* Level 3 Basics::
* Level 3 Primitives::
* Dynamic Messaging::
* Domain Specification::
* Documentation String Extraction::


File: lispref.info,  Node: Level 3 Basics,  Next: Level 3 Primitives,  Up: I18N Level 3

Level 3 Basics
--------------

XEmacs now provides alpha-level functionality for I18N Level 3.  This
means that everything necessary for full messaging is available, but
not every file has been converted.

   The two message files which have been created are `src/emacs.po' and
`lisp/packages/mh-e.po'.  Both files need to be converted using
`msgfmt', and the resulting `.mo' files placed in some locale's
`LC_MESSAGES' directory.  The test "translations" in these files are
the original messages prefixed by `TRNSLT_'.

   The domain for a variable is stored on the variable's property list
under the property name VARIABLE-DOMAIN.  The function
`documentation-property' uses this information when translating a
variable's documentation.


File: lispref.info,  Node: Level 3 Primitives,  Next: Dynamic Messaging,  Prev: Level 3 Basics,  Up: I18N Level 3

Level 3 Primitives
------------------

 - Function: gettext string
     This function looks up STRING in the default message domain and
     returns its translation.  If `I18N3' was not enabled when XEmacs
     was compiled, it just returns STRING.

 - Function: dgettext domain string
     This function looks up STRING in the specified message domain and
     returns its translation.  If `I18N3' was not enabled when XEmacs
     was compiled, it just returns STRING.

 - Function: bind-text-domain domain pathname
     This function associates a pathname with a message domain.  Here's
     how the path to message file is constructed under SunOS 5.x:

          `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo'

     If `I18N3' was not enabled when XEmacs was compiled, this function
     does nothing.

 - Special Form: domain string
     This function specifies the text domain used for translating
     documentation strings and interactive prompts of a function.  For
     example, write:

          (defun foo (arg) "Doc string" (domain "emacs-foo") ...)

     to specify `emacs-foo' as the text domain of the function `foo'.
     The "call" to `domain' is actually a declaration rather than a
     function; when actually called, `domain' just returns `nil'.

 - Function: domain-of function
     This function returns the text domain of FUNCTION; it returns
     `nil' if it is the default domain.  If `I18N3' was not enabled
     when XEmacs was compiled, it always returns `nil'.


File: lispref.info,  Node: Dynamic Messaging,  Next: Domain Specification,  Prev: Level 3 Primitives,  Up: I18N Level 3

Dynamic Messaging
-----------------

The `format' function has been extended to permit you to change the
order of parameter insertion.  For example, the conversion format
`%1$s' inserts parameter one as a string, while `%2$s' inserts
parameter two.  This is useful when creating translations which require
you to change the word order.


File: lispref.info,  Node: Domain Specification,  Next: Documentation String Extraction,  Prev: Dynamic Messaging,  Up: I18N Level 3

Domain Specification
--------------------

The default message domain of XEmacs is `emacs'.  For add-on packages,
it is best to use a different domain.  For example, let us say we want
to convert the "gorilla" package to use the domain `emacs-gorilla'.  To
translate the message "What gorilla?", use `dgettext' as follows:

     (dgettext "emacs-gorilla" "What gorilla?")

   A function (or macro) which has a documentation string or an
interactive prompt needs to be associated with the domain in order for
the documentation or prompt to be translated.  This is done with the
`domain' special form as follows:

     (defun scratch (location)
       "Scratch the specified location."
       (domain "emacs-gorilla")
       (interactive "sScratch: ")
       ... )

   It is most efficient to specify the domain in the first line of the
function body, before the `interactive' form.

   For variables and constants which have documentation strings,
specify the domain after the documentation.

 - Special Form: defvar symbol [value [doc-string [domain]]]
     Example:
          (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla")

 - Special Form: defconst symbol [value [doc-string [domain]]]
     Example:
          (defconst limbs 4 "Number of limbs" "emacs-gorilla")

 - Function: autoload function filename &optional docstring interactive
          type
     This function defines FUNCTION to autoload from FILENAME Example:
          (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")


File: lispref.info,  Node: Documentation String Extraction,  Prev: Domain Specification,  Up: I18N Level 3

Documentation String Extraction
-------------------------------

The utility `etc/make-po' scans the file `DOC' to extract documentation
strings and creates a message file `doc.po'.  This file may then be
inserted within `emacs.po'.

   Currently, `make-po' is hard-coded to read from `DOC' and write to
`doc.po'.  In order to extract documentation strings from an add-on
package, first run `make-docfile' on the package to produce the `DOC'
file.  Then run `make-po -p' with the `-p' argument to indicate that we
are extracting documentation for an add-on package.

   (The `-p' argument is a kludge to make up for a subtle difference
between pre-loaded documentation and add-on documentation:  For add-on
packages, the final carriage returns in the strings produced by
`make-docfile' must be ignored.)


File: lispref.info,  Node: I18N Level 4,  Prev: I18N Level 3,  Up: Internationalization

I18N Level 4
============

The Asian-language support in XEmacs is called "MULE".  *Note MULE::.


File: lispref.info,  Node: MULE,  Next: Tips,  Prev: Internationalization,  Up: Top

MULE
****

"MULE" is the name originally given to the version of GNU Emacs
extended for multi-lingual (and in particular Asian-language) support.
"MULE" is short for "MUlti-Lingual Emacs".  It is an extension and
complete rewrite of Nemacs ("Nihon Emacs" where "Nihon" is the Japanese
word for "Japan"), which only provided support for Japanese.  XEmacs
refers to its multi-lingual support as "MULE support" since it is based
on "MULE".

* Menu:

* Internationalization Terminology::
                        Definition of various internationalization terms.
* Charsets::            Sets of related characters.
* MULE Characters::     Working with characters in XEmacs/MULE.
* Composite Characters:: Making new characters by overstriking other ones.
* Coding Systems::      Ways of representing a string of chars using integers.
* CCL::                 A special language for writing fast converters.
* Category Tables::     Subdividing charsets into groups.


File: lispref.info,  Node: Internationalization Terminology,  Next: Charsets,  Up: MULE

Internationalization Terminology
================================

In internationalization terminology, a string of text is divided up
into "characters", which are the printable units that make up the text.
A single character is (for example) a capital `A', the number `2', a
Katakana character, a Hangul character, a Kanji ideograph (an
"ideograph" is a "picture" character, such as is used in Japanese
Kanji, Chinese Hanzi, and Korean Hanja; typically there are thousands
of such ideographs in each language), etc.  The basic property of a
character is that it is the smallest unit of text with semantic
significance in text processing.

   Human beings normally process text visually, so to a first
approximation a character may be identified with its shape.  Note that
the same character may be drawn by two different people (or in two
different fonts) in slightly different ways, although the "basic shape"
will be the same.  But consider the works of Scott Kim; human beings
can recognize hugely variant shapes as the "same" character.
Sometimes, especially where characters are extremely complicated to
write, completely different shapes may be defined as the "same"
character in national standards.  The Taiwanese variant of Hanzi is
generally the most complicated; over the centuries, the Japanese,
Koreans, and the People's Republic of China have adopted
simplifications of the shape, but the line of descent from the original
shape is recorded, and the meanings and pronunciation of different
forms of the same character are considered to be identical within each
language.  (Of course, it may take a specialist to recognize the
related form; the point is that the relations are standardized, despite
the differing shapes.)

   In some cases, the differences will be significant enough that it is
actually possible to identify two or more distinct shapes that both
represent the same character.  For example, the lowercase letters `a'
and `g' each have two distinct possible shapes--the `a' can optionally
have a curved tail projecting off the top, and the `g' can be formed
either of two loops, or of one loop and a tail hanging off the bottom.
Such distinct possible shapes of a character are called "glyphs".  The
important characteristic of two glyphs making up the same character is
that the choice between one or the other is purely stylistic and has no
linguistic effect on a word (this is the reason why a capital `A' and
lowercase `a' are different characters rather than different
glyphs--e.g.  `Aspen' is a city while `aspen' is a kind of tree).

   Note that "character" and "glyph" are used differently here than
elsewhere in XEmacs.

   A "character set" is essentially a set of related characters.  ASCII,
for example, is a set of 94 characters (or 128, if you count
non-printing characters).  Other character sets are ISO8859-1 (ASCII
plus various accented characters and other international symbols), JIS
X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
(Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
GB2312 (Mainland Chinese Hanzi), etc.

   The definition of a character set will implicitly or explicitly give
it an "ordering", a way of assigning a number to each character in the
set.  For many character sets, there is a natural ordering, for example
the "ABC" ordering of the Roman letters.  But it is not clear whether
digits should come before or after the letters, and in fact different
European languages treat the ordering of accented characters
differently.  It is useful to use the natural order where available, of
course.  The number assigned to any particular character is called the
character's "code point".  (Within a given character set, each
character has a unique code point.  Thus the word "set" is ill-chosen;
different orderings of the same characters are different character sets.
Identifying characters is simple enough for alphabetic character sets,
but the difference in ordering can cause great headaches when the same
thousands of characters are used by different cultures as in the Hanzi.)

   A code point may be broken into a number of "position codes".  The
number of position codes required to index a particular character in a
character set is called the "dimension" of the character set.  For
practical purposes, a position code may be thought of as a byte-sized
index.  The printing characters of ASCII, being a relatively small
character set, is of dimension one, and each character in the set is
indexed using a single position code, in the range 1 through 94.  Use of
this unusual range, rather than the familiar 33 through 126, is an
intentional abstraction; to understand the programming issues you must
break the equation between character sets and encodings.

   JIS X 0208, i.e. Japanese Kanji, has thousands of characters, and is
of dimension two - every character is indexed by two position codes,
each in the range 1 through 94.  (This number "94" is not a
coincidence; we shall see that the JIS position codes were chosen so
that JIS kanji could be encoded without using codes that in ASCII are
associated with device control functions.)  Note that the choice of the
range here is somewhat arbitrary.  You could just as easily index the
printing characters in ASCII using numbers in the range 0 through 93, 2
through 95, 3 through 96, etc.  In fact, the standardized _encoding_
for the ASCII _character set_ uses the range 33 through 126.

   An "encoding" is a way of numerically representing characters from
one or more character sets into a stream of like-sized numerical values
called "words"; typically these are 8-bit, 16-bit, or 32-bit
quantities.  If an encoding encompasses only one character set, then the
position codes for the characters in that character set could be used
directly.  (This is the case with the trivial cipher used by children,
assigning 1 to `A', 2 to `B', and so on.)  However, even with ASCII,
other considerations intrude.  For example, why are the upper- and
lowercase alphabets separated by 8 characters?  Why do the digits start
with `0' being assigned the code 48?  In both cases because semantically
interesting operations (case conversion and numerical value extraction)
become convenient masking operations.  Other artificial aspects (the
control characters being assigned to codes 0-31 and 127) are historical
accidents.  (The use of 127 for `DEL' is an artifact of the "punch
once" nature of paper tape, for example.)

   Naive use of the position code is not possible, however, if more than
one character set is to be used in the encoding.  For example, printed
Japanese text typically requires characters from multiple character sets
- ASCII, JIS X 0208, and JIS X 0212, to be specific.  Each of these is
indexed using one or more position codes in the range 1 through 94, so
the position codes could not be used directly or there would be no way
to tell which character was meant.  Different Japanese encodings handle
this differently - JIS uses special escape characters to denote
different character sets; EUC sets the high bit of the position codes
for JIS X 0208 and JIS X 0212, and puts a special extra byte before each
JIS X 0212 character; etc.  (JIS, EUC, and most of the other encodings
you will encounter in files are 7-bit or 8-bit encodings.  There is one
common 16-bit encoding, which is Unicode; this strives to represent all
the world's characters in a single large character set.  32-bit
encodings are often used internally in programs, such as XEmacs with
MULE support, to simplify the code that manipulates them; however, they
are not used externally because they are not very space-efficient.)

   A general method of handling text using multiple character sets
(whether for multilingual text, or simply text in an extremely
complicated single language like Japanese) is defined in the
international standard ISO 2022.  ISO 2022 will be discussed in more
detail later (*note ISO 2022::), but for now suffice it to say that text
needs control functions (at least spacing), and if escape sequences are
to be used, an escape sequence introducer.  It was decided to make all
text streams compatible with ASCII in the sense that the codes 0-31
(and 128-159) would always be control codes, never graphic characters,
and where defined by the character set the `SPC' character would be
assigned code 32, and `DEL' would be assigned 127.  Thus there are 94
code points remaining if 7 bits are used.  This is the reason that most
character sets are defined using position codes in the range 1 through
94.  Then ISO 2022 compatible encodings are produced by shifting the
position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
codes are available) into character codes 161 to 254.

   Encodings are classified as either "modal" or "non-modal".  In a
"modal encoding", there are multiple states that the encoding can be
in, and the interpretation of the values in the stream depends on the
current global state of the encoding.  Special values in the encoding,
called "escape sequences", are used to change the global state.  JIS,
for example, is a modal encoding.  The bytes `ESC $ B' indicate that,
from then on, bytes are to be interpreted as position codes for JIS X
0208, rather than as ASCII.  This effect is cancelled using the bytes
`ESC ( B', which mean "switch from whatever the current state is to
ASCII".  To switch to JIS X 0212, the escape sequence `ESC $ ( D'.
(Note that here, as is common, the escape sequences do in fact begin
with `ESC'.  This is not necessarily the case, however.  Some encodings
use control characters called "locking shifts" (effect persists until
cancelled) to switch character sets.)

   A "non-modal encoding" has no global state that extends past the
character currently being interpreted.  EUC, for example, is a
non-modal encoding.  Characters in JIS X 0208 are encoded by setting
the high bit of the position codes, and characters in JIS X 0212 are
encoded by doing the same but also prefixing the character with the
byte 0x8F.

   The advantage of a modal encoding is that it is generally more
space-efficient, and is easily extendible because there are essentially
an arbitrary number of escape sequences that can be created.  The
disadvantage, however, is that it is much more difficult to work with
if it is not being processed in a sequential manner.  In the non-modal
EUC encoding, for example, the byte 0x41 always refers to the letter
`A'; whereas in JIS, it could either be the letter `A', or one of the
two position codes in a JIS X 0208 character, or one of the two
position codes in a JIS X 0212 character.  Determining exactly which
one is meant could be difficult and time-consuming if the previous
bytes in the string have not already been processed, or impossible if
they are drawn from an external stream that cannot be rewound.

   Non-modal encodings are further divided into "fixed-width" and
"variable-width" formats.  A fixed-width encoding always uses the same
number of words per character, whereas a variable-width encoding does
not.  EUC is a good example of a variable-width encoding: one to three
bytes are used per character, depending on the character set.  16-bit
and 32-bit encodings are nearly always fixed-width, and this is in fact
one of the main reasons for using an encoding with a larger word size.
The advantages of fixed-width encodings should be obvious.  The
advantages of variable-width encodings are that they are generally more
space-efficient and allow for compatibility with existing 8-bit
encodings such as ASCII.  (For example, in Unicode ASCII characters are
simply promoted to a 16-bit representation.  That means that every
ASCII character contains a `NUL' byte; evidently all of the standard
string manipulation functions will lose badly in a fixed-width Unicode
environment.)

   The bytes in an 8-bit encoding are often referred to as "octets"
rather than simply as bytes.  This terminology dates back to the days
before 8-bit bytes were universal, when some computers had 9-bit bytes,
others had 10-bit bytes, etc.


File: lispref.info,  Node: Charsets,  Next: MULE Characters,  Prev: Internationalization Terminology,  Up: MULE

Charsets
========

A "charset" in MULE is an object that encapsulates a particular
character set as well as an ordering of those characters.  Charsets are
permanent objects and are named using symbols, like faces.

 - Function: charsetp object
     This function returns non-`nil' if OBJECT is a charset.

* Menu:

* Charset Properties::          Properties of a charset.
* Basic Charset Functions::     Functions for working with charsets.
* Charset Property Functions::  Functions for accessing charset properties.
* Predefined Charsets::         Predefined charset objects.


File: lispref.info,  Node: Charset Properties,  Next: Basic Charset Functions,  Up: Charsets

Charset Properties
------------------

Charsets have the following properties:

`name'
     A symbol naming the charset.  Every charset must have a different
     name; this allows a charset to be referred to using its name
     rather than the actual charset object.

`doc-string'
     A documentation string describing the charset.

`registry'
     A regular expression matching the font registry field for this
     character set.  For example, both the `ascii' and `latin-iso8859-1'
     charsets use the registry `"ISO8859-1"'.  This field is used to
     choose an appropriate font when the user gives a general font
     specification such as `-*-courier-medium-r-*-140-*', i.e. a
     14-point upright medium-weight Courier font.

`dimension'
     Number of position codes used to index a character in the
     character set.  XEmacs/MULE can only handle character sets of
     dimension 1 or 2.  This property defaults to 1.

`chars'
     Number of characters in each dimension.  In XEmacs/MULE, the only
     allowed values are 94 or 96. (There are a couple of pre-defined
     character sets, such as ASCII, that do not follow this, but you
     cannot define new ones like this.) Defaults to 94.  Note that if
     the dimension is 2, the character set thus described is 94x94 or
     96x96.

`columns'
     Number of columns used to display a character in this charset.
     Only used in TTY mode. (Under X, the actual width of a character
     can be derived from the font used to display the characters.)  If
     unspecified, defaults to the dimension. (This is almost always the
     correct value, because character sets with dimension 2 are usually
     ideograph character sets, which need two columns to display the
     intricate ideographs.)

`direction'
     A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left).
     Defaults to `l2r'.  This specifies the direction that the text
     should be displayed in, and will be left-to-right for most
     charsets but right-to-left for Hebrew and Arabic. (Right-to-left
     display is not currently implemented.)

`final'
     Final byte of the standard ISO 2022 escape sequence designating
     this charset.  Must be supplied.  Each combination of (DIMENSION,
     CHARS) defines a separate namespace for final bytes, and each
     charset within a particular namespace must have a different final
     byte.  Note that ISO 2022 restricts the final byte to the range
     0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2.
     Note also that final bytes in the range 0x30 - 0x3F are reserved
     for user-defined (not official) character sets.  For more
     information on ISO 2022, see *Note Coding Systems::.

`graphic'
     0 (use left half of font on output) or 1 (use right half of font on
     output).  Defaults to 0.  This specifies how to convert the
     position codes that index a character in a character set into an
     index into the font used to display the character set.  With
     `graphic' set to 0, position codes 33 through 126 map to font
     indices 33 through 126; with it set to 1, position codes 33
     through 126 map to font indices 161 through 254 (i.e. the same
     number but with the high bit set).  For example, for a font whose
     registry is ISO8859-1, the left half of the font (octets 0x20 -
     0x7F) is the `ascii' charset, while the right half (octets 0xA0 -
     0xFF) is the `latin-iso8859-1' charset.

`ccl-program'
     A compiled CCL program used to convert a character in this charset
     into an index into the font.  This is in addition to the `graphic'
     property.  If a CCL program is defined, the position codes of a
     character will first be processed according to `graphic' and then
     passed through the CCL program, with the resulting values used to
     index the font.

     This is used, for example, in the Big5 character set (used in
     Taiwan).  This character set is not ISO-2022-compliant, and its
     size (94x157) does not fit within the maximum 96x96 size of
     ISO-2022-compliant character sets.  As a result, XEmacs/MULE
     splits it (in a rather complex fashion, so as to group the most
     commonly used characters together) into two charset objects
     (`big5-1' and `big5-2'), each of size 94x94, and each charset
     object uses a CCL program to convert the modified position codes
     back into standard Big5 indices to retrieve a character from a
     Big5 font.

   Most of the above properties can only be set when the charset is
initialized, and cannot be changed later.  *Note Charset Property
Functions::.


File: lispref.info,  Node: Basic Charset Functions,  Next: Charset Property Functions,  Prev: Charset Properties,  Up: Charsets

Basic Charset Functions
-----------------------

 - Function: find-charset charset-or-name
     This function retrieves the charset of the given name.  If
     CHARSET-OR-NAME is a charset object, it is simply returned.
     Otherwise, CHARSET-OR-NAME should be a symbol.  If there is no
     such charset, `nil' is returned.  Otherwise the associated charset
     object is returned.

 - Function: get-charset name
     This function retrieves the charset of the given name.  Same as
     `find-charset' except an error is signalled if there is no such
     charset instead of returning `nil'.

 - Function: charset-list
     This function returns a list of the names of all defined charsets.

 - Function: make-charset name doc-string props
     This function defines a new character set.  This function is for
     use with MULE support.  NAME is a symbol, the name by which the
     character set is normally referred.  DOC-STRING is a string
     describing the character set.  PROPS is a property list,
     describing the specific nature of the character set.  The
     recognized properties are `registry', `dimension', `columns',
     `chars', `final', `graphic', `direction', and `ccl-program', as
     previously described.

 - Function: make-reverse-direction-charset charset new-name
     This function makes a charset equivalent to CHARSET but which goes
     in the opposite direction.  NEW-NAME is the name of the new
     charset.  The new charset is returned.

 - Function: charset-from-attributes dimension chars final &optional
          direction
     This function returns a charset with the given DIMENSION, CHARS,
     FINAL, and DIRECTION.  If DIRECTION is omitted, both directions
     will be checked (left-to-right will be returned if character sets
     exist for both directions).

 - Function: charset-reverse-direction-charset charset
     This function returns the charset (if any) with the same dimension,
     number of characters, and final byte as CHARSET, but which is
     displayed in the opposite direction.


File: lispref.info,  Node: Charset Property Functions,  Next: Predefined Charsets,  Prev: Basic Charset Functions,  Up: Charsets

Charset Property Functions
--------------------------

All of these functions accept either a charset name or charset object.

 - Function: charset-property charset prop
     This function returns property PROP of CHARSET.  *Note Charset
     Properties::.

   Convenience functions are also provided for retrieving individual
properties of a charset.

 - Function: charset-name charset
     This function returns the name of CHARSET.  This will be a symbol.

 - Function: charset-description charset
     This function returns the documentation string of CHARSET.

 - Function: charset-registry charset
     This function returns the registry of CHARSET.

 - Function: charset-dimension charset
     This function returns the dimension of CHARSET.

 - Function: charset-chars charset
     This function returns the number of characters per dimension of
     CHARSET.

 - Function: charset-width charset
     This function returns the number of display columns per character
     (in TTY mode) of CHARSET.

 - Function: charset-direction charset
     This function returns the display direction of CHARSET--either
     `l2r' or `r2l'.

 - Function: charset-iso-final-char charset
     This function returns the final byte of the ISO 2022 escape
     sequence designating CHARSET.

 - Function: charset-iso-graphic-plane charset
     This function returns either 0 or 1, depending on whether the
     position codes of characters in CHARSET map to the left or right
     half of their font, respectively.

 - Function: charset-ccl-program charset
     This function returns the CCL program, if any, for converting
     position codes of characters in CHARSET into font indices.

   The two properties of a charset that can currently be set after the
charset has been created are the CCL program and the font registry.

 - Function: set-charset-ccl-program charset ccl-program
     This function sets the `ccl-program' property of CHARSET to
     CCL-PROGRAM.

 - Function: set-charset-registry charset registry
     This function sets the `registry' property of CHARSET to REGISTRY.


File: lispref.info,  Node: Predefined Charsets,  Prev: Charset Property Functions,  Up: Charsets

Predefined Charsets
-------------------

The following charsets are predefined in the C code.

     Name                    Type  Fi Gr Dir Registry
     --------------------------------------------------------------
     ascii                    94    B  0  l2r ISO8859-1
     control-1                94       0  l2r ---
     latin-iso8859-1          94    A  1  l2r ISO8859-1
     latin-iso8859-2          96    B  1  l2r ISO8859-2
     latin-iso8859-3          96    C  1  l2r ISO8859-3
     latin-iso8859-4          96    D  1  l2r ISO8859-4
     cyrillic-iso8859-5       96    L  1  l2r ISO8859-5
     arabic-iso8859-6         96    G  1  r2l ISO8859-6
     greek-iso8859-7          96    F  1  l2r ISO8859-7
     hebrew-iso8859-8         96    H  1  r2l ISO8859-8
     latin-iso8859-9          96    M  1  l2r ISO8859-9
     thai-tis620              96    T  1  l2r TIS620
     katakana-jisx0201        94    I  1  l2r JISX0201.1976
     latin-jisx0201           94    J  0  l2r JISX0201.1976
     japanese-jisx0208-1978   94x94 @  0  l2r JISX0208.1978
     japanese-jisx0208        94x94 B  0  l2r JISX0208.19(83|90)
     japanese-jisx0212        94x94 D  0  l2r JISX0212
     chinese-gb2312           94x94 A  0  l2r GB2312
     chinese-cns11643-1       94x94 G  0  l2r CNS11643.1
     chinese-cns11643-2       94x94 H  0  l2r CNS11643.2
     chinese-big5-1           94x94 0  0  l2r Big5
     chinese-big5-2           94x94 1  0  l2r Big5
     korean-ksc5601           94x94 C  0  l2r KSC5601
     composite                96x96    0  l2r ---

   The following charsets are predefined in the Lisp code.

     Name                     Type  Fi Gr Dir Registry
     --------------------------------------------------------------
     arabic-digit             94    2  0  l2r MuleArabic-0
     arabic-1-column          94    3  0  r2l MuleArabic-1
     arabic-2-column          94    4  0  r2l MuleArabic-2
     sisheng                  94    0  0  l2r sisheng_cwnn\|OMRON_UDC_ZH
     chinese-cns11643-3       94x94 I  0  l2r CNS11643.1
     chinese-cns11643-4       94x94 J  0  l2r CNS11643.1
     chinese-cns11643-5       94x94 K  0  l2r CNS11643.1
     chinese-cns11643-6       94x94 L  0  l2r CNS11643.1
     chinese-cns11643-7       94x94 M  0  l2r CNS11643.1
     ethiopic                 94x94 2  0  l2r Ethio
     ascii-r2l                94    B  0  r2l ISO8859-1
     ipa                      96    0  1  l2r MuleIPA
     vietnamese-viscii-lower  96    1  1  l2r VISCII1.1
     vietnamese-viscii-upper  96    2  1  l2r VISCII1.1

   For all of the above charsets, the dimension and number of columns
are the same.

   Note that ASCII, Control-1, and Composite are handled specially.
This is why some of the fields are blank; and some of the filled-in
fields (e.g. the type) are not really accurate.


File: lispref.info,  Node: MULE Characters,  Next: Composite Characters,  Prev: Charsets,  Up: MULE

MULE Characters
===============

 - Function: make-char charset arg1 &optional arg2
     This function makes a multi-byte character from CHARSET and octets
     ARG1 and ARG2.

 - Function: char-charset character
     This function returns the character set of char CHARACTER.

 - Function: char-octet character &optional n
     This function returns the octet (i.e. position code) numbered N
     (should be 0 or 1) of char CHARACTER.  N defaults to 0 if omitted.

 - Function: find-charset-region start end &optional buffer
     This function returns a list of the charsets in the region between
     START and END.  BUFFER defaults to the current buffer if omitted.

 - Function: find-charset-string string
     This function returns a list of the charsets in STRING.


File: lispref.info,  Node: Composite Characters,  Next: Coding Systems,  Prev: MULE Characters,  Up: MULE

Composite Characters
====================

Composite characters are not yet completely implemented.

 - Function: make-composite-char string
     This function converts a string into a single composite character.
     The character is the result of overstriking all the characters in
     the string.

 - Function: composite-char-string character
     This function returns a string of the characters comprising a
     composite character.

 - Function: compose-region start end &optional buffer
     This function composes the characters in the region from START to
     END in BUFFER into one composite character.  The composite
     character replaces the composed characters.  BUFFER defaults to
     the current buffer if omitted.

 - Function: decompose-region start end &optional buffer
     This function decomposes any composite characters in the region
     from START to END in BUFFER.  This converts each composite
     character into one or more characters, the individual characters
     out of which the composite character was formed.  Non-composite
     characters are left as-is.  BUFFER defaults to the current buffer
     if omitted.


File: lispref.info,  Node: Coding Systems,  Next: CCL,  Prev: Composite Characters,  Up: MULE

Coding Systems
==============

A coding system is an object that defines how text containing multiple
character sets is encoded into a stream of (typically 8-bit) bytes.  The
coding system is used to decode the stream into a series of characters
(which may be from multiple charsets) when the text is read from a file
or process, and is used to encode the text back into the same format
when it is written out to a file or process.

   For example, many ISO-2022-compliant coding systems (such as Compound
Text, which is used for inter-client data under the X Window System) use
escape sequences to switch between different charsets - Japanese Kanji,
for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC (
B'; and Cyrillic is invoked with `ESC - L'.  See `make-coding-system'
for more information.

   Coding systems are normally identified using a symbol, and the
symbol is accepted in place of the actual coding system object whenever
a coding system is called for. (This is similar to how faces and
charsets work.)

 - Function: coding-system-p object
     This function returns non-`nil' if OBJECT is a coding system.

* Menu:

* Coding System Types::               Classifying coding systems.
* ISO 2022::                          An international standard for
                                        charsets and encodings.
* EOL Conversion::                    Dealing with different ways of denoting
                                        the end of a line.
* Coding System Properties::          Properties of a coding system.
* Basic Coding System Functions::     Working with coding systems.
* Coding System Property Functions::  Retrieving a coding system's properties.
* Encoding and Decoding Text::        Encoding and decoding text.
* Detection of Textual Encoding::     Determining how text is encoded.
* Big5 and Shift-JIS Functions::      Special functions for these non-standard
                                        encodings.
* Predefined Coding Systems::         Coding systems implemented by MULE.


File: lispref.info,  Node: Coding System Types,  Next: ISO 2022,  Up: Coding Systems

Coding System Types
-------------------

The coding system type determines the basic algorithm XEmacs will use to
decode or encode a data stream.  Character encodings will be converted
to the MULE encoding, escape sequences processed, and newline sequences
converted to XEmacs's internal representation.  There are three basic
classes of coding system type: no-conversion, ISO-2022, and special.

   No conversion allows you to look at the file's internal
representation.  Since XEmacs is basically a text editor, "no
conversion" does convert newline conventions by default.  (Use the
'binary coding-system if this is not desired.)

   ISO 2022 (*note ISO 2022::) is the basic international standard
regulating use of "coded character sets for the exchange of data", ie,
text streams.  ISO 2022 contains functions that make it possible to
encode text streams to comply with restrictions of the Internet mail
system and de facto restrictions of most file systems (eg, use of the
separator character in file names).  Coding systems which are not ISO
2022 conformant can be difficult to handle.  Perhaps more important,
they are not adaptable to multilingual information interchange, with
the obvious exception of ISO 10646 (Unicode).  (Unicode is partially
supported by XEmacs with the addition of the Lisp package ucs-conv.)

   The special class of coding systems includes automatic detection,
CCL (a "little language" embedded as an interpreter, useful for
translating between variants of a single character set),
non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5,
and MULE internal coding.  (NB: this list is based on XEmacs 21.2.
Terminology may vary slightly for other versions of XEmacs and for GNU
Emacs 20.)

`no-conversion'
     No conversion, for binary files, and a few special cases of
     non-ISO-2022 coding systems where conversion is done by hook
     functions (usually implemented in CCL).  On output, graphic
     characters that are not in ASCII or Latin-1 will be replaced by a
     `?'. (For a no-conversion-encoded buffer, these characters will
     only be present if you explicitly insert them.)

`iso2022'
     Any ISO-2022-compliant encoding.  Among others, this includes JIS
     (the Japanese encoding commonly used for e-mail), national
     variants of EUC (the standard Unix encoding for Japanese and other
     languages), and Compound Text (an encoding used in X11).  You can
     specify more specific information about the conversion with the
     FLAGS argument.

`ucs-4'
     ISO 10646 UCS-4 encoding.  A 31-bit fixed-width superset of
     Unicode.

`utf-8'
     ISO 10646 UTF-8 encoding.  A "file system safe" transformation
     format that can be used with both UCS-4 and Unicode.

`undecided'
     Automatic conversion.  XEmacs attempts to detect the coding system
     used in the file.

`shift-jis'
     Shift-JIS (a Japanese encoding commonly used in PC operating
     systems).

`big5'
     Big5 (the encoding commonly used for Taiwanese).

`ccl'
     The conversion is performed using a user-written pseudo-code
     program.  CCL (Code Conversion Language) is the name of this
     pseudo-code.  For example, CCL is used to map KOI8-R characters
     (an encoding for Russian Cyrillic) to ISO8859-5 (the form used
     internally by MULE).

`internal'
     Write out or read in the raw contents of the memory representing
     the buffer's text.  This is primarily useful for debugging
     purposes, and is only enabled when XEmacs has been compiled with
     `DEBUG_XEMACS' set (the `--debug' configure option).  *Warning*:
     Reading in a file using `internal' conversion can result in an
     internal inconsistency in the memory representing a buffer's text,
     which will produce unpredictable results and may cause XEmacs to
     crash.  Under normal circumstances you should never use `internal'
     conversion.


File: lispref.info,  Node: ISO 2022,  Next: EOL Conversion,  Prev: Coding System Types,  Up: Coding Systems

ISO 2022
========

This section briefly describes the ISO 2022 encoding standard.  A more
thorough treatment is available in the original document of ISO 2022 as
well as various national standards (such as JIS X 0202).

   Character sets ("charsets") are classified into the following four
categories, according to the number of characters in the charset:
94-charset, 96-charset, 94x94-charset, and 96x96-charset.  This means
that although an ISO 2022 coding system may have variable width
characters, each charset used is fixed-width (in contrast to the MULE
character set and UTF-8, for example).

   ISO 2022 provides for switching between character sets via escape
sequences.  This switching is somewhat complicated, because ISO 2022
provides for both legacy applications like Internet mail that accept
only 7 significant bits in some contexts (RFC 822 headers, for example),
and more modern "8-bit clean" applications.  It also provides for
compact and transparent representation of languages like Japanese which
mix ASCII and a national script (even outside of computer programs).

   First, ISO 2022 codified prevailing practice by dividing the code
space into "control" and "graphic" regions.  The code points 0x00-0x1F
and 0x80-0x9F are reserved for "control characters", while "graphic
characters" must be assigned to code points in the regions 0x20-0x7F and
0xA0-0xFF.  The positions 0x20 and 0x7F are special, and under some
circumstances must be assigned the graphic character "ASCII SPACE" and
the control character "ASCII DEL" respectively.

   The various regions are given the name C0 (0x00-0x1F), GL
(0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF).  GL and GR stand for
"graphic left" and "graphic right", respectively, because of the
standard method of displaying graphic character sets in tables with the
high byte indexing columns and the low byte indexing rows.  I don't
find it very intuitive, but these are called "registers".

   An ISO 2022-conformant encoding for a graphic character set must use
a fixed number of bytes per character, and the values must fit into a
single register; that is, each byte must range over either 0x20-0x7F, or
0xA0-0xFF.  It is not allowed to extend the range of the repertoire of a
character set by using both ranges at the same.  This is why a standard
character set such as ISO 8859-1 is actually considered by ISO 2022 to
be an aggregation of two character sets, ASCII and LATIN-1, and why it
is technically incorrect to refer to ISO 8859-1 as "Latin 1".  Also, a
single character's bytes must all be drawn from the same register; this
is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
2022-compatible encodings.

   The reason for this restriction becomes clear when you attempt to
define an efficient, robust encoding for a language like Japanese.
Like ISO 8859, Japanese encodings are aggregations of several character
sets.  In practice, the vast majority of characters are drawn from the
"JIS Roman" character set (a derivative of ASCII; it won't hurt to
think of it as ASCII) and the JIS X 0208 standard "basic Japanese"
character set including not only ideographic characters ("kanji") but
syllabic Japanese characters ("kana"), a wide variety of symbols, and
many alphabetic characters (Roman, Greek, and Cyrillic) as well.
Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code
it is not suited to programming; thus the inclusion of ASCII in the
standard Japanese encodings.

   For normal Japanese text such as in newspapers, a broad repertoire of
approximately 3000 characters is used.  Evidently this won't fit into
one byte; two must be used.  But much of the text processed by Japanese
computers is computer source code, nearly all of which is ASCII.  A not
insignificant portion of ordinary text is English (as such or as
borrowed Japanese vocabulary) or other languages which can represented
at least approximately in ASCII, as well.  It seems reasonable then to
represent ASCII in one byte, and JIS X 0208 in two.  And this is exactly
what the Extended Unix Code for Japanese (EUC-JP) does.  ASCII is
invoked to the GL register, and JIS X 0208 is invoked to the GR
register.  Thus, each byte can be tested for its character set by
looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
Furthermore, since control characters like newline can never be part of
a graphic character, even in the case of corruption in transmission the
stream will be resynchronized at every line break, on the order of 60-80
bytes.  This coding system requires no escape sequences or special
control codes to represent 99.9% of all Japanese text.

   Note carefully the distinction between the character sets (ASCII and
JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022).
The JIS X 0208 character set is used in three different encodings for
Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
always clear), in EUC-JP it is invoked into GR (setting the high bit in
the process), and in Shift JIS the high bit may be set or reset, and the
significant bits are shifted within the 16-bit character so that the two
main character sets can coexist with a third (the "halfwidth katakana"
of JIS X 0201).  As the name implies, the ISO-2022-JP encoding is also a
version of the ISO-2022 coding system.

   In order to systematically treat subsidiary character sets (like the
"halfwidth katakana" already mentioned, and the "supplementary kanji" of
JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
Unlike GL and GR, they are not logically distinguished by internal
format.  Instead, the process of "invocation" mentioned earlier is
broken into two steps: first, a character set is "designated" to one of
the registers G0-G3 by use of an "escape sequence" of the form:

             ESC [I] I F

   where I is an intermediate character or characters in the range 0x20
- 0x3F, and F, from the range 0x30-0x7Fm is the final character
identifying this charset.  (Final characters in the range 0x30-0x3F are
reserved for private use and will never have a publicly registered
meaning.)

   Then that register is "invoked" to either GL or GR, either
automatically (designations to G0 normally involve invocation to GL as
well), or by use of shifting (affecting only the following character in
the data stream) or locking (effective until the next designation or
locking) control sequences.  An encoding conformant to ISO 2022 is
typically defined by designating the initial contents of the G0-G3
registers, specifying a 7 or 8 bit environment, and specifying whether
further designations will be recognized.

   Some examples of character sets and the registered final characters
F used to designate them:

94-charset
     ASCII (B), left (J) and right (I) half of JIS X 0201, ...

96-charset
     Latin-1 (A), Latin-2 (B), Latin-3 (C), ...

94x94-charset
     GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...

96x96-charset
     none for the moment

   The meanings of the various characters in these sequences, where not
specified by the ISO 2022 standard (such as the ESC character), are
assigned by "ECMA", the European Computer Manufacturers Association.

   The meaning of intermediate characters are:

             $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
             ( [0x28]: designate to G0 a 94-charset whose final byte is F.
             ) [0x29]: designate to G1 a 94-charset whose final byte is F.
             * [0x2A]: designate to G2 a 94-charset whose final byte is F.
             + [0x2B]: designate to G3 a 94-charset whose final byte is F.
             , [0x2C]: designate to G0 a 96-charset whose final byte is F.
             - [0x2D]: designate to G1 a 96-charset whose final byte is F.
             . [0x2E]: designate to G2 a 96-charset whose final byte is F.
             / [0x2F]: designate to G3 a 96-charset whose final byte is F.

   The comma may be used in files read and written only by MULE, as a
MULE extension, but this is illegal in ISO 2022.  (The reason is that
in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned
the value SPACE, and 0x7F assigned the value DEL.)

   Here are examples of designations:

             ESC ( B :              designate to G0 ASCII
             ESC - A :              designate to G1 Latin-1
             ESC $ ( A or ESC $ A : designate to G0 GB2312
             ESC $ ( B or ESC $ B : designate to G0 JISX0208
             ESC $ ) C :            designate to G1 KSC5601

   (The short forms used to designate GB2312 and JIS X 0208 are for
backwards compatibility; the long forms are preferred.)

   To use a charset designated to G2 or G3, and to use a charset
designated to G1 in a 7-bit environment, you must explicitly invoke G1,
G2, or G3 into GL.  There are two types of invocation, Locking Shift
(forever) and Single Shift (one character only).

   Locking Shift is done as follows:

             LS0 or SI (0x0F): invoke G0 into GL
             LS1 or SO (0x0E): invoke G1 into GL
             LS2:  invoke G2 into GL
             LS3:  invoke G3 into GL
             LS1R: invoke G1 into GR
             LS2R: invoke G2 into GR
             LS3R: invoke G3 into GR

   Single Shift is done as follows:

             SS2 or ESC N: invoke G2 into GL
             SS3 or ESC O: invoke G3 into GL

   The shift functions (such as LS1R and SS3) are represented by control
characters (from C1) in 8 bit environments and by escape sequences in 7
bit environments.

   (#### Ben says: I think the above is slightly incorrect.  It appears
that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
and ESC O behave as indicated.  The above definitions will not parse
EUC-encoded text correctly, and it looks like the code in mule-coding.c
has similar problems.)

   Evidently there are a lot of ISO-2022-compliant ways of encoding
multilingual text.  Now, in the world, there exist many coding systems
such as X11's Compound Text, Japanese JUNET code, and so-called EUC
(Extended UNIX Code); all of these are variants of ISO 2022.

   In MULE, we characterize a version of ISO 2022 by the following
attributes:

  1. The character sets initially designated to G0 thru G3.

  2. Whether short form designations are allowed for Japanese and
     Chinese.

  3. Whether ASCII should be designated to G0 before control characters.

  4. Whether ASCII should be designated to G0 at the end of line.

  5. 7-bit environment or 8-bit environment.

  6. Whether Locking Shifts are used or not.

  7. Whether to use ASCII or the variant JIS X 0201-1976-Roman.

  8. Whether to use JIS X 0208-1983 or the older version JIS X
     0208-1976.

   (The last two are only for Japanese.)

   By specifying these attributes, you can create any variant of ISO
2022.

   Here are several examples:

     ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
             1. G0 <- ASCII, G1..3 <- never used
             2. Yes.
             3. Yes.
             4. Yes.
             5. 7-bit environment
             6. No.
             7. Use ASCII
             8. Use JIS X 0208-1983
     
     ctext -- X11 Compound Text
             1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
             2. No.
             3. No.
             4. Yes.
             5. 8-bit environment.
             6. No.
             7. Use ASCII.
             8. Use JIS X 0208-1983.
     
     euc-china -- Chinese EUC.  Often called the "GB encoding", but that is
     technically incorrect.
             1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
             2. No.
             3. Yes.
             4. Yes.
             5. 8-bit environment.
             6. No.
             7. Use ASCII.
             8. Use JIS X 0208-1983.
     
     ISO-2022-KR -- Coding system used in Korean email.
             1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
             2. No.
             3. Yes.
             4. Yes.
             5. 7-bit environment.
             6. Yes.
             7. Use ASCII.
             8. Use JIS X 0208-1983.

   MULE creates all of these coding systems by default.


File: lispref.info,  Node: EOL Conversion,  Next: Coding System Properties,  Prev: ISO 2022,  Up: Coding Systems

EOL Conversion
--------------

`nil'
     Automatically detect the end-of-line type (LF, CRLF, or CR).  Also
     generate subsidiary coding systems named `NAME-unix', `NAME-dos',
     and `NAME-mac', that are identical to this coding system but have
     an EOL-TYPE value of `lf', `crlf', and `cr', respectively.

`lf'
     The end of a line is marked externally using ASCII LF.  Since this
     is also the way that XEmacs represents an end-of-line internally,
     specifying this option results in no end-of-line conversion.  This
     is the standard format for Unix text files.

`crlf'
     The end of a line is marked externally using ASCII CRLF.  This is
     the standard format for MS-DOS text files.

`cr'
     The end of a line is marked externally using ASCII CR.  This is the
     standard format for Macintosh text files.

`t'
     Automatically detect the end-of-line type but do not generate
     subsidiary coding systems.  (This value is converted to `nil' when
     stored internally, and `coding-system-property' will return `nil'.)


File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems

Coding System Properties
------------------------

`mnemonic'
     String to be displayed in the modeline when this coding system is
     active.

`eol-type'
     End-of-line conversion to be used.  It should be one of the types
     listed in *Note EOL Conversion::.

`eol-lf'
     The coding system which is the same as this one, except that it
     uses the Unix line-breaking convention.

`eol-crlf'
     The coding system which is the same as this one, except that it
     uses the DOS line-breaking convention.

`eol-cr'
     The coding system which is the same as this one, except that it
     uses the Macintosh line-breaking convention.

`post-read-conversion'
     Function called after a file has been read in, to perform the
     decoding.  Called with two arguments, START and END, denoting a
     region of the current buffer to be decoded.

`pre-write-conversion'
     Function called before a file is written out, to perform the
     encoding.  Called with two arguments, START and END, denoting a
     region of the current buffer to be encoded.

   The following additional properties are recognized if TYPE is
`iso2022':

`charset-g0'
`charset-g1'
`charset-g2'
`charset-g3'
     The character set initially designated to the G0 - G3 registers.
     The value should be one of

        * A charset object (designate that character set)

        * `nil' (do not ever use this register)

        * `t' (no character set is initially designated to the
          register, but may be later on; this automatically sets the
          corresponding `force-g*-on-output' property)

`force-g0-on-output'
`force-g1-on-output'
`force-g2-on-output'
`force-g3-on-output'
     If non-`nil', send an explicit designation sequence on output
     before using the specified register.

`short'
     If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
     B' on output in place of the full designation sequences `ESC $ (
     @', `ESC $ ( A', and `ESC $ ( B'.

`no-ascii-eol'
     If non-`nil', don't designate ASCII to G0 at each end of line on
     output.  Setting this to non-`nil' also suppresses other
     state-resetting that normally happens at the end of a line.

`no-ascii-cntl'
     If non-`nil', don't designate ASCII to G0 before control chars on
     output.

`seven'
     If non-`nil', use 7-bit environment on output.  Otherwise, use
     8-bit environment.

`lock-shift'
     If non-`nil', use locking-shift (SO/SI) instead of single-shift or
     designation by escape sequence.

`no-iso6429'
     If non-`nil', don't use ISO6429's direction specification.

`escape-quoted'
     If non-`nil', literal control characters that are the same as the
     beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
     particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
     (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
     that they can be properly distinguished from an escape sequence.
     (Note that doing this results in a non-portable encoding.) This
     encoding flag is used for byte-compiled files.  Note that ESC is a
     good choice for a quoting character because there are no escape
     sequences whose second byte is a character from the Control-0 or
     Control-1 character sets; this is explicitly disallowed by the ISO
     2022 standard.

`input-charset-conversion'
     A list of conversion specifications, specifying conversion of
     characters in one charset to another when decoding is performed.
     Each specification is a list of two elements: the source charset,
     and the destination charset.

`output-charset-conversion'
     A list of conversion specifications, specifying conversion of
     characters in one charset to another when encoding is performed.
     The form of each specification is the same as for
     `input-charset-conversion'.

   The following additional properties are recognized (and required) if
TYPE is `ccl':

`decode'
     CCL program used for decoding (converting to internal format).

`encode'
     CCL program used for encoding (converting to external format).

   The following properties are used internally:  EOL-CR, EOL-CRLF,
EOL-LF, and BASE.


File: lispref.info,  Node: Basic Coding System Functions,  Next: Coding System Property Functions,  Prev: Coding System Properties,  Up: Coding Systems

Basic Coding System Functions
-----------------------------

 - Function: find-coding-system coding-system-or-name
     This function retrieves the coding system of the given name.

     If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
     returned.  Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
     If there is no such coding system, `nil' is returned.  Otherwise
     the associated coding system object is returned.

 - Function: get-coding-system name
     This function retrieves the coding system of the given name.  Same
     as `find-coding-system' except an error is signalled if there is no
     such coding system instead of returning `nil'.

 - Function: coding-system-list
     This function returns a list of the names of all defined coding
     systems.

 - Function: coding-system-name coding-system
     This function returns the name of the given coding system.

 - Function: coding-system-base coding-system
     Returns the base coding system (undecided EOL convention) coding
     system.

 - Function: make-coding-system name type &optional doc-string props
     This function registers symbol NAME as a coding system.

     TYPE describes the conversion method used and should be one of the
     types listed in *Note Coding System Types::.

     DOC-STRING is a string describing the coding system.

     PROPS is a property list, describing the specific nature of the
     character set.  Recognized properties are as in *Note Coding
     System Properties::.

 - Function: copy-coding-system old-coding-system new-name
     This function copies OLD-CODING-SYSTEM to NEW-NAME.  If NEW-NAME
     does not name an existing coding system, a new one will be created.

 - Function: subsidiary-coding-system coding-system eol-type
     This function returns the subsidiary coding system of
     CODING-SYSTEM with eol type EOL-TYPE.


File: lispref.info,  Node: Coding System Property Functions,  Next: Encoding and Decoding Text,  Prev: Basic Coding System Functions,  Up: Coding Systems

Coding System Property Functions
--------------------------------

 - Function: coding-system-doc-string coding-system
     This function returns the doc string for CODING-SYSTEM.

 - Function: coding-system-type coding-system
     This function returns the type of CODING-SYSTEM.

 - Function: coding-system-property coding-system prop
     This function returns the PROP property of CODING-SYSTEM.


File: lispref.info,  Node: Encoding and Decoding Text,  Next: Detection of Textual Encoding,  Prev: Coding System Property Functions,  Up: Coding Systems

Encoding and Decoding Text
--------------------------

 - Function: decode-coding-region start end coding-system &optional
          buffer
     This function decodes the text between START and END which is
     encoded in CODING-SYSTEM.  This is useful if you've read in
     encoded text from a file without decoding it (e.g. you read in a
     JIS-formatted file but used the `binary' or `no-conversion' coding
     system, so that it shows up as `^[$B!<!+^[(B').  The length of the
     encoded text is returned.  BUFFER defaults to the current buffer
     if unspecified.

 - Function: encode-coding-region start end coding-system &optional
          buffer
     This function encodes the text between START and END using
     CODING-SYSTEM.  This will, for example, convert Japanese
     characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
     encoding.  The length of the encoded text is returned.  BUFFER
     defaults to the current buffer if unspecified.


File: lispref.info,  Node: Detection of Textual Encoding,  Next: Big5 and Shift-JIS Functions,  Prev: Encoding and Decoding Text,  Up: Coding Systems

Detection of Textual Encoding
-----------------------------

 - Function: coding-category-list
     This function returns a list of all recognized coding categories.

 - Function: set-coding-priority-list list
     This function changes the priority order of the coding categories.
     LIST should be a list of coding categories, in descending order of
     priority.  Unspecified coding categories will be lower in priority
     than all specified ones, in the same relative order they were in
     previously.

 - Function: coding-priority-list
     This function returns a list of coding categories in descending
     order of priority.

 - Function: set-coding-category-system coding-category coding-system
     This function changes the coding system associated with a coding
     category.

 - Function: coding-category-system coding-category
     This function returns the coding system associated with a coding
     category.

 - Function: detect-coding-region start end &optional buffer
     This function detects coding system of the text in the region
     between START and END.  Returned value is a list of possible coding
     systems ordered by priority.  If only ASCII characters are found,
     it returns `autodetect' or one of its subsidiary coding systems
     according to a detected end-of-line type.  Optional arg BUFFER
     defaults to the current buffer.


File: lispref.info,  Node: Big5 and Shift-JIS Functions,  Next: Predefined Coding Systems,  Prev: Detection of Textual Encoding,  Up: Coding Systems

Big5 and Shift-JIS Functions
----------------------------

These are special functions for working with the non-standard Shift-JIS
and Big5 encodings.

 - Function: decode-shift-jis-char code
     This function decodes a JIS X 0208 character of Shift-JIS
     coding-system.  CODE is the character code in Shift-JIS as a cons
     of type bytes.  The corresponding character is returned.

 - Function: encode-shift-jis-char character
     This function encodes a JIS X 0208 character CHARACTER to
     SHIFT-JIS coding-system.  The corresponding character code in
     SHIFT-JIS is returned as a cons of two bytes.

 - Function: decode-big5-char code
     This function decodes a Big5 character CODE of BIG5 coding-system.
     CODE is the character code in BIG5.  The corresponding character
     is returned.

 - Function: encode-big5-char character
     This function encodes the Big5 character CHARACTER to BIG5
     coding-system.  The corresponding character code in Big5 is
     returned.


File: lispref.info,  Node: Predefined Coding Systems,  Prev: Big5 and Shift-JIS Functions,  Up: Coding Systems

Coding Systems Implemented
--------------------------

MULE initializes most of the commonly used coding systems at XEmacs's
startup.  A few others are initialized only when the relevant language
environment is selected and support libraries are loaded.  (NB: The
following list is based on XEmacs 21.2.19, the development branch at the
time of writing.  The list may be somewhat different for other
versions.  Recent versions of GNU Emacs 20 implement a few more rare
coding systems; work is being done to port these to XEmacs.)

   Unfortunately, there is not a consistent naming convention for
character sets, and for practical purposes coding systems often take
their name from their principal character sets (ASCII, KOI8-R, Shift
JIS).  Others take their names from the coding system (ISO-2022-JP,
EUC-KR), and a few from their non-text usages (internal, binary).  To
provide for this, and for the fact that many coding systems have
several common names, an aliasing system is provided.  Finally, some
effort has been made to use names that are registered as MIME charsets
(this is why the name 'shift_jis contains that un-Lisp-y underscore).

   There is a systematic naming convention regarding end-of-line (EOL)
conventions for different systems.  A coding system whose name ends in
"-unix" forces the assumptions that lines are broken by newlines (0x0A).
A coding system whose name ends in "-mac" forces the assumptions that
lines are broken by ASCII CRs (0x0D).  A coding system whose name ends
in "-dos" forces the assumptions that lines are broken by CRLF sequences
(0x0D 0x0A).  These subsidiary coding systems are automatically derived
from a base coding system.  Use of the base coding system implies
autodetection of the text file convention.  (The fact that the -unix,
-mac, and -dos are derived from a base system results in them showing up
as "aliases" in `list-coding-systems'.)  These subsidiaries have a
consistent modeline indicator as well.  "-dos" coding systems have ":T"
appended to their modeline indicator, while "-mac" coding systems have
":t" appended (eg, "ISO8:t" for iso-2022-8-mac).

   In the following table, each coding system is given with its mode
line indicator in parentheses.  Non-textual coding systems are listed
first, followed by textual coding systems and their aliases. (The
coding system subsidiary modeline indicators ":T" and ":t" will be
omitted from the table of coding systems.)

   ### SJT 1999-08-23 Maybe should order these by language?  Definitely
need language usage for the ISO-8859 family.

   Note that although true coding system aliases have been implemented
for XEmacs 21.2, the coding system initialization has not yet been
converted as of 21.2.19.  So coding systems described as aliases have
the same properties as the aliased coding system, but will not be equal
as Lisp objects.

`automatic-conversion'
`undecided'
`undecided-dos'
`undecided-mac'
`undecided-unix'
     Modeline indicator: `Auto'.  A type `undecided' coding system.
     Attempts to determine an appropriate coding system from file
     contents or the environment.

`raw-text'
`no-conversion'
`raw-text-dos'
`raw-text-mac'
`raw-text-unix'
`no-conversion-dos'
`no-conversion-mac'
`no-conversion-unix'
     Modeline indicator: `Raw'.  A type `no-conversion' coding system,
     which converts only line-break-codes.  An implementation quirk
     means that this coding system is also used for ISO8859-1.

`binary'
     Modeline indicator: `Binary'.  A type `no-conversion' coding
     system which does no character coding or EOL conversions.  An
     alias for `raw-text-unix'.

`alternativnyj'
`alternativnyj-dos'
`alternativnyj-mac'
`alternativnyj-unix'
     Modeline indicator: `Cy.Alt'.  A type `ccl' coding system used for
     Alternativnyj, an encoding of the Cyrillic alphabet.

`big5'
`big5-dos'
`big5-mac'
`big5-unix'
     Modeline indicator: `Zh/Big5'.  A type `big5' coding system used
     for BIG5, the most common encoding of traditional Chinese as used
     in Taiwan.

`cn-gb-2312'
`cn-gb-2312-dos'
`cn-gb-2312-mac'
`cn-gb-2312-unix'
     Modeline indicator: `Zh-GB/EUC'.  A type `iso2022' coding system
     used for simplified Chinese (as used in the People's Republic of
     China), with the `ascii' (G0), `chinese-gb2312' (G1), and `sisheng'
     (G2) character sets initially designated.  Chinese EUC (Extended
     Unix Code).

`ctext-hebrew'
`ctext-hebrew-dos'
`ctext-hebrew-mac'
`ctext-hebrew-unix'
     Modeline indicator: `CText/Hbrw'.  A type `iso2022' coding system
     with the `ascii' (G0) and `hebrew-iso8859-8' (G1) character sets
     initially designated for Hebrew.

`ctext'
`ctext-dos'
`ctext-mac'
`ctext-unix'
     Modeline indicator: `CText'.  A type `iso2022' 8-bit coding system
     with the `ascii' (G0) and `latin-iso8859-1' (G1) character sets
     initially designated.  X11 Compound Text Encoding.  Often
     mistakenly recognized instead of EUC encodings; usual cause is
     inappropriate setting of `coding-priority-list'.

`escape-quoted'
     Modeline indicator: `ESC/Quot'.  A type `iso2022' 8-bit coding
     system with the `ascii' (G0) and `latin-iso8859-1' (G1) character
     sets initially designated and escape quoting.  Unix EOL conversion
     (ie, no conversion).  It is used for .ELC files.

`euc-jp'
`euc-jp-dos'
`euc-jp-mac'
`euc-jp-unix'
     Modeline indicator: `Ja/EUC'.  A type `iso2022' 8-bit coding system
     with `ascii' (G0), `japanese-jisx0208' (G1), `katakana-jisx0201'
     (G2), and `japanese-jisx0212' (G3) initially designated.  Japanese
     EUC (Extended Unix Code).

`euc-kr'
`euc-kr-dos'
`euc-kr-mac'
`euc-kr-unix'
     Modeline indicator: `ko/EUC'.  A type `iso2022' 8-bit coding system
     with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
     Korean EUC (Extended Unix Code).

`hz-gb-2312'
     Modeline indicator: `Zh-GB/Hz'.  A type `no-conversion' coding
     system with Unix EOL convention (ie, no conversion) using
     post-read-decode and pre-write-encode functions to translate the
     Hz/ZW coding system used for Chinese.

`iso-2022-7bit'
`iso-2022-7bit-unix'
`iso-2022-7bit-dos'
`iso-2022-7bit-mac'
`iso-2022-7'
     Modeline indicator: `ISO7'.  A type `iso2022' 7-bit coding system
     with `ascii' (G0) initially designated.  Other character sets must
     be explicitly designated to be used.

`iso-2022-7bit-ss2'
`iso-2022-7bit-ss2-dos'
`iso-2022-7bit-ss2-mac'
`iso-2022-7bit-ss2-unix'
     Modeline indicator: `ISO7/SS'.  A type `iso2022' 7-bit coding
     system with `ascii' (G0) initially designated.  Other character
     sets must be explicitly designated to be used.  SS2 is used to
     invoke a 96-charset, one character at a time.

`iso-2022-8'
`iso-2022-8-dos'
`iso-2022-8-mac'
`iso-2022-8-unix'
     Modeline indicator: `ISO8'.  A type `iso2022' 8-bit coding system
     with `ascii' (G0) and `latin-iso8859-1' (G1) initially designated.
     Other character sets must be explicitly designated to be used.
     No single-shift or locking-shift.

`iso-2022-8bit-ss2'
`iso-2022-8bit-ss2-dos'
`iso-2022-8bit-ss2-mac'
`iso-2022-8bit-ss2-unix'
     Modeline indicator: `ISO8/SS'.  A type `iso2022' 8-bit coding
     system with `ascii' (G0) and `latin-iso8859-1' (G1) initially
     designated.  Other character sets must be explicitly designated to
     be used.  SS2 is used to invoke a 96-charset, one character at a
     time.

`iso-2022-int-1'
`iso-2022-int-1-dos'
`iso-2022-int-1-mac'
`iso-2022-int-1-unix'
     Modeline indicator: `INT-1'.  A type `iso2022' 7-bit coding system
     with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
     ISO-2022-INT-1.

`iso-2022-jp-1978-irv'
`iso-2022-jp-1978-irv-dos'
`iso-2022-jp-1978-irv-mac'
`iso-2022-jp-1978-irv-unix'
     Modeline indicator: `Ja-78/7bit'.  A type `iso2022' 7-bit coding
     system.  For compatibility with old Japanese terminals; if you
     need to know, look at the source.

`iso-2022-jp'
`iso-2022-jp-2 (ISO7/SS)'
`iso-2022-jp-dos'
`iso-2022-jp-mac'
`iso-2022-jp-unix'
`iso-2022-jp-2-dos'
`iso-2022-jp-2-mac'
`iso-2022-jp-2-unix'
     Modeline indicator: `MULE/7bit'.  A type `iso2022' 7-bit coding
     system with `ascii' (G0) initially designated, and complex
     specifications to insure backward compatibility with old Japanese
     systems.  Used for communication with mail and news in Japan.  The
     "-2" versions also use SS2 to invoke a 96-charset one character at
     a time.

`iso-2022-kr'
     Modeline indicator: `Ko/7bit'  A type `iso2022' 7-bit coding
     system with `ascii' (G0) and `korean-ksc5601' (G1) initially
     designated.  Used for e-mail in Korea.

`iso-2022-lock'
`iso-2022-lock-dos'
`iso-2022-lock-mac'
`iso-2022-lock-unix'
     Modeline indicator: `ISO7/Lock'.  A type `iso2022' 7-bit coding
     system with `ascii' (G0) initially designated, using Locking-Shift
     to invoke a 96-charset.

`iso-8859-1'
`iso-8859-1-dos'
`iso-8859-1-mac'
`iso-8859-1-unix'
     Due to implementation, this is not a type `iso2022' coding system,
     but rather an alias for the `raw-text' coding system.

`iso-8859-2'
`iso-8859-2-dos'
`iso-8859-2-mac'
`iso-8859-2-unix'
     Modeline indicator: `MIME/Ltn-2'.  A type `iso2022' coding system
     with `ascii' (G0) and `latin-iso8859-2' (G1) initially invoked.

`iso-8859-3'
`iso-8859-3-dos'
`iso-8859-3-mac'
`iso-8859-3-unix'
     Modeline indicator: `MIME/Ltn-3'.  A type `iso2022' coding system
     with `ascii' (G0) and `latin-iso8859-3' (G1) initially invoked.

`iso-8859-4'
`iso-8859-4-dos'
`iso-8859-4-mac'
`iso-8859-4-unix'
     Modeline indicator: `MIME/Ltn-4'.  A type `iso2022' coding system
     with `ascii' (G0) and `latin-iso8859-4' (G1) initially invoked.

`iso-8859-5'
`iso-8859-5-dos'
`iso-8859-5-mac'
`iso-8859-5-unix'
     Modeline indicator: `ISO8/Cyr'.  A type `iso2022' coding system
     with `ascii' (G0) and `cyrillic-iso8859-5' (G1) initially invoked.

`iso-8859-7'
`iso-8859-7-dos'
`iso-8859-7-mac'
`iso-8859-7-unix'
     Modeline indicator: `Grk'.  A type `iso2022' coding system with
     `ascii' (G0) and `greek-iso8859-7' (G1) initially invoked.

`iso-8859-8'
`iso-8859-8-dos'
`iso-8859-8-mac'
`iso-8859-8-unix'
     Modeline indicator: `MIME/Hbrw'.  A type `iso2022' coding system
     with `ascii' (G0) and `hebrew-iso8859-8' (G1) initially invoked.

`iso-8859-9'
`iso-8859-9-dos'
`iso-8859-9-mac'
`iso-8859-9-unix'
     Modeline indicator: `MIME/Ltn-5'.  A type `iso2022' coding system
     with `ascii' (G0) and `latin-iso8859-9' (G1) initially invoked.

`koi8-r'
`koi8-r-dos'
`koi8-r-mac'
`koi8-r-unix'
     Modeline indicator: `KOI8'.  A type `ccl' coding-system used for
     KOI8-R, an encoding of the Cyrillic alphabet.

`shift_jis'
`shift_jis-dos'
`shift_jis-mac'
`shift_jis-unix'
     Modeline indicator: `Ja/SJIS'.  A type `shift-jis' coding-system
     implementing the Shift-JIS encoding for Japanese.  The underscore
     is to conform to the MIME charset implementing this encoding.

`tis-620'
`tis-620-dos'
`tis-620-mac'
`tis-620-unix'
     Modeline indicator: `TIS620'.  A type `ccl' encoding for Thai.  The
     external encoding is defined by TIS620, the internal encoding is
     peculiar to MULE, and called `thai-xtis'.

`viqr'
     Modeline indicator: `VIQR'.  A type `no-conversion' coding system
     with Unix EOL convention (ie, no conversion) using
     post-read-decode and pre-write-encode functions to translate the
     VIQR coding system for Vietnamese.

`viscii'
`viscii-dos'
`viscii-mac'
`viscii-unix'
     Modeline indicator: `VISCII'.  A type `ccl' coding-system used for
     VISCII 1.1 for Vietnamese.  Differs slightly from VSCII; VISCII is
     given priority by XEmacs.

`vscii'
`vscii-dos'
`vscii-mac'
`vscii-unix'
     Modeline indicator: `VSCII'.  A type `ccl' coding-system used for
     VSCII 1.1 for Vietnamese.  Differs slightly from VISCII, which is
     given priority by XEmacs.  Use `(prefer-coding-system
     'vietnamese-vscii)' to give priority to VSCII.



File: lispref.info,  Node: CCL,  Next: Category Tables,  Prev: Coding Systems,  Up: MULE

CCL
===

CCL (Code Conversion Language) is a simple structured programming
language designed for character coding conversions.  A CCL program is
compiled to CCL code (represented by a vector of integers) and executed
by the CCL interpreter embedded in Emacs.  The CCL interpreter
implements a virtual machine with 8 registers called `r0', ..., `r7', a
number of control structures, and some I/O operators.  Take care when
using registers `r0' (used in implicit "set" statements) and especially
`r7' (used internally by several statements and operations, especially
for multiple return values and I/O operations).

   CCL is used for code conversion during process I/O and file I/O for
non-ISO2022 coding systems.  (It is the only way for a user to specify a
code conversion function.)  It is also used for calculating the code
point of an X11 font from a character code.  However, since CCL is
designed as a powerful programming language, it can be used for more
generic calculation where efficiency is demanded.  A combination of
three or more arithmetic operations can be calculated faster by CCL than
by Emacs Lisp.

   *Warning:*  The code in `src/mule-ccl.c' and
`$packages/lisp/mule-base/mule-ccl.el' is the definitive description of
CCL's semantics.  The previous version of this section contained
several typos and obsolete names left from earlier versions of MULE,
and many may remain.  (I am not an experienced CCL programmer; the few
who know CCL well find writing English painful.)

   A CCL program transforms an input data stream into an output data
stream.  The input stream, held in a buffer of constant bytes, is left
unchanged.  The buffer may be filled by an external input operation,
taken from an Emacs buffer, or taken from a Lisp string.  The output
buffer is a dynamic array of bytes, which can be written by an external
output operation, inserted into an Emacs buffer, or returned as a Lisp
string.

   A CCL program is a (Lisp) list containing two or three members.  The
first member is the "buffer magnification", which indicates the
required minimum size of the output buffer as a multiple of the input
buffer.  It is followed by the "main block" which executes while there
is input remaining, and an optional "EOF block" which is executed when
the input is exhausted.  Both the main block and the EOF block are CCL
blocks.

   A "CCL block" is either a CCL statement or list of CCL statements.
A "CCL statement" is either a "set statement" (either an integer or an
"assignment", which is a list of a register to receive the assignment,
an assignment operator, and an expression) or a "control statement" (a
list starting with a keyword, whose allowable syntax depends on the
keyword).

* Menu:

* CCL Syntax::          CCL program syntax in BNF notation.
* CCL Statements::      Semantics of CCL statements.
* CCL Expressions::     Operators and expressions in CCL.
* Calling CCL::         Running CCL programs.
* CCL Examples::        The encoding functions for Big5 and KOI-8.


File: lispref.info,  Node: CCL Syntax,  Next: CCL Statements,  Up: CCL

CCL Syntax
----------

The full syntax of a CCL program in BNF notation:

CCL_PROGRAM :=
        (BUFFER_MAGNIFICATION
         CCL_MAIN_BLOCK
         [ CCL_EOF_BLOCK ])

BUFFER_MAGNIFICATION := integer
CCL_MAIN_BLOCK := CCL_BLOCK
CCL_EOF_BLOCK := CCL_BLOCK

CCL_BLOCK :=
        STATEMENT | (STATEMENT [STATEMENT ...])
STATEMENT :=
        SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
        | CALL | END

SET :=
        (REG = EXPRESSION)
        | (REG ASSIGNMENT_OPERATOR EXPRESSION)
        | integer

EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)

IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
LOOP := (loop STATEMENT [STATEMENT ...])
BREAK := (break)
REPEAT :=
        (repeat)
        | (write-repeat [REG | integer | string])
        | (write-read-repeat REG [integer | ARRAY])
READ :=
        (read REG ...)
        | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
        | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
WRITE :=
        (write REG ...)
        | (write EXPRESSION)
        | (write integer) | (write string) | (write REG ARRAY)
        | string
CALL := (call ccl-program-name)
END := (end)

REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
ARG := REG | integer
OPERATOR :=
        + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
        | < | > | == | <= | >= | != | de-sjis | en-sjis
ASSIGNMENT_OPERATOR :=
        += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
ARRAY := '[' integer ... ']'


File: lispref.info,  Node: CCL Statements,  Next: CCL Expressions,  Prev: CCL Syntax,  Up: CCL

CCL Statements
--------------

The Emacs Code Conversion Language provides the following statement
types: "set", "if", "branch", "loop", "repeat", "break", "read",
"write", "call", and "end".

Set statement:
==============

The "set" statement has three variants with the syntaxes `(REG =
EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'.
The assignment operator variation of the "set" statement works the same
way as the corresponding C expression statement does.  The assignment
operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=',
and `>>=', and they have the same meanings as in C.  A "naked integer"
INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'.

I/O statements:
===============

The "read" statement takes one or more registers as arguments.  It
reads one byte (a C char) from the input into each register in turn.

   The "write" takes several forms.  In the form `(write REG ...)' it
takes one or more registers as arguments and writes each in turn to the
output.  The integer in a register (interpreted as an Emchar) is
encoded to multibyte form (ie, Bufbytes) and written to the current
output buffer.  If it is less than 256, it is written as is.  The forms
`(write EXPRESSION)' and `(write INTEGER)' are treated analogously.
The form `(write STRING)' writes the constant string to the output.  A
"naked string" `STRING' is equivalent to the statement `(write
STRING)'.  The form `(write REG ARRAY)' writes the REGth element of the
ARRAY to the output.

Conditional statements:
=======================

The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional
SECOND CCL BLOCK as arguments.  If the EXPRESSION evaluates to
non-zero, the first CCL BLOCK is executed.  Otherwise, if there is a
SECOND CCL BLOCK, it is executed.

   The "read-if" variant of the "if" statement takes an EXPRESSION, a
CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.  The
EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is
a register or an integer).  The `read-if' statement first reads from
the input into the first register operand in the EXPRESSION, then
conditionally executes a CCL block just as the `if' statement does.

   The "branch" statement takes an EXPRESSION and one or more CCL
blocks as arguments.  The CCL blocks are treated as a zero-indexed
array, and the `branch' statement uses the EXPRESSION as the index of
the CCL block to execute.  Null CCL blocks may be used as no-ops,
continuing execution with the statement following the `branch'
statement in the containing CCL block.  Out-of-range values for the
EXPRESSION are also treated as no-ops.

   The "read-branch" variant of the "branch" statement takes an
REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.
The `read-branch' statement first reads from the input into the
REGISTER, then conditionally executes a CCL block just as the `branch'
statement does.

Loop control statements:
========================

The "loop" statement creates a block with an implied jump from the end
of the block back to its head.  The loop is exited on a `break'
statement, and continued without executing the tail by a `repeat'
statement.

   The "break" statement, written `(break)', terminates the current
loop and continues with the next statement in the current block.

   The "repeat" statement has three variants, `repeat', `write-repeat',
and `write-read-repeat'.  Each continues the current loop from its
head, possibly after performing I/O.  `repeat' takes no arguments and
does no I/O before jumping.  `write-repeat' takes a single argument (a
register, an integer, or a string), writes it to the output, then jumps.
`write-read-repeat' takes one or two arguments.  The first must be a
register.  The second may be an integer or an array; if absent, it is
implicitly set to the first (register) argument.  `write-read-repeat'
writes its second argument to the output, then reads from the input
into the register, and finally jumps.  See the `write' and `read'
statements for the semantics of the I/O operations for each type of
argument.

Other control statements:
=========================

The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a CCL
program as a subroutine.  It does not return a value to the caller, but
can modify the register status.

   The "end" statement, written `(end)', terminates the CCL program
successfully, and returns to caller (which may be a CCL program).  It
does not alter the status of the registers.


File: lispref.info,  Node: CCL Expressions,  Next: Calling CCL,  Prev: CCL Statements,  Up: CCL

CCL Expressions
---------------

CCL, unlike Lisp, uses infix expressions.  The simplest CCL expressions
consist of a single OPERAND, either a register (one of `r0', ..., `r0')
or an integer.  Complex expressions are lists of the form `( EXPRESSION
OPERATOR OPERAND )'.  Unlike C, assignments are not expressions.

   In the following table, X is the target resister for a "set".  In
subexpressions, this is implicitly `r7'.  This means that `>8', `//',
`de-sjis', and `en-sjis' cannot be used freely in subexpressions, since
they return parts of their values in `r7'.  Y may be an expression,
register, or integer, while Z must be a register or an integer.

Name             Operator   Code   C-like Description
CCL_PLUS         `+'        0x00   X = Y + Z
CCL_MINUS        `-'        0x01   X = Y - Z
CCL_MUL          `*'        0x02   X = Y * Z
CCL_DIV          `/'        0x03   X = Y / Z
CCL_MOD          `%'        0x04   X = Y % Z
CCL_AND          `&'        0x05   X = Y & Z
CCL_OR           `|'        0x06   X = Y | Z
CCL_XOR          `^'        0x07   X = Y ^ Z
CCL_LSH          `<<'       0x08   X = Y << Z
CCL_RSH          `>>'       0x09   X = Y >> Z
CCL_LSH8         `<8'       0x0A   X = (Y << 8) | Z
CCL_RSH8         `>8'       0x0B   X = Y >> 8, r[7] = Y & 0xFF
CCL_DIVMOD       `//'       0x0C   X = Y / Z, r[7] = Y % Z
CCL_LS           `<'        0x10   X = (X < Y)
CCL_GT           `>'        0x11   X = (X > Y)
CCL_EQ           `=='       0x12   X = (X == Y)
CCL_LE           `<='       0x13   X = (X <= Y)
CCL_GE           `>='       0x14   X = (X >= Y)
CCL_NE           `!='       0x15   X = (X != Y)
CCL_ENCODE_SJIS  `en-sjis'  0x16   X = HIGHER_BYTE (SJIS (Y, Z))
                                   r[7] = LOWER_BYTE (SJIS (Y, Z)
CCL_DECODE_SJIS  `de-sjis'  0x17   X = HIGHER_BYTE (DE-SJIS (Y, Z))
                                   r[7] = LOWER_BYTE (DE-SJIS (Y, Z))

   The CCL operators are as in C, with the addition of CCL_LSH8,
CCL_RSH8, CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS.  The
CCL_ENCODE_SJIS and CCL_DECODE_SJIS treat their first and second bytes
as the high and low bytes of a two-byte character code.  (SJIS stands
for Shift JIS, an encoding of Japanese characters used by Microsoft.
CCL_ENCODE_SJIS is a complicated transformation of the Japanese
standard JIS encoding to Shift JIS.  CCL_DECODE_SJIS is its inverse.)
It is somewhat odd to represent the SJIS operations in infix form.


File: lispref.info,  Node: Calling CCL,  Next: CCL Examples,  Prev: CCL Expressions,  Up: CCL

Calling CCL
-----------

CCL programs are called automatically during Emacs buffer I/O when the
external representation has a coding system type of `shift-jis',
`big5', or `ccl'.  The program is specified by the coding system (*note
Coding Systems::).  You can also call CCL programs from other CCL
programs, and from Lisp using these functions:

 - Function: ccl-execute ccl-program status
     Execute CCL-PROGRAM with registers initialized by STATUS.
     CCL-PROGRAM is a vector of compiled CCL code created by
     `ccl-compile'.  It is an error for the program to try to execute a
     CCL I/O command.  STATUS must be a vector of nine values,
     specifying the initial value for the R0, R1 .. R7 registers and
     for the instruction counter IC.  A `nil' value for a register
     initializer causes the register to be set to 0.  A `nil' value for
     the IC initializer causes execution to start at the beginning of
     the program.  When the program is done, STATUS is modified (by
     side-effect) to contain the ending values for the corresponding
     registers and IC.

 - Function: ccl-execute-on-string ccl-program status string &optional
          continue
     Execute CCL-PROGRAM with initial STATUS on STRING.  CCL-PROGRAM is
     a vector of compiled CCL code created by `ccl-compile'.  STATUS
     must be a vector of nine values, specifying the initial value for
     the R0, R1 .. R7 registers and for the instruction counter IC.  A
     `nil' value for a register initializer causes the register to be
     set to 0.  A `nil' value for the IC initializer causes execution
     to start at the beginning of the program.  An optional fourth
     argument CONTINUE, if non-`nil', causes the IC to remain on the
     unsatisfied read operation if the program terminates due to
     exhaustion of the input buffer.  Otherwise the IC is set to the end
     of the program.  When the program is done, STATUS is modified (by
     side-effect) to contain the ending values for the corresponding
     registers and IC.  Returns the resulting string.

   To call a CCL program from another CCL program, it must first be
registered:

 - Function: register-ccl-program name ccl-program
     Register NAME for CCL program CCL-PROGRAM in `ccl-program-table'.
     CCL-PROGRAM should be the compiled form of a CCL program, or
     `nil'.  Return index number of the registered CCL program.

   Information about the processor time used by the CCL interpreter can
be obtained using these functions:

 - Function: ccl-elapsed-time
     Returns the elapsed processor time of the CCL interpreter as cons
     of user and system time, as floating point numbers measured in
     seconds.  If only one overall value can be determined, the return
     value will be a cons of that value and 0.

 - Function: ccl-reset-elapsed-time
     Resets the CCL interpreter's internal elapsed time registers.


File: lispref.info,  Node: CCL Examples,  Prev: Calling CCL,  Up: CCL

CCL Examples
------------

This section is not yet written.


File: lispref.info,  Node: Category Tables,  Prev: CCL,  Up: MULE

Category Tables
===============

A category table is a type of char table used for keeping track of
categories.  Categories are used for classifying characters for use in
regexps--you can refer to a category rather than having to use a
complicated [] expression (and category lookups are significantly
faster).

   There are 95 different categories available, one for each printable
character (including space) in the ASCII charset.  Each category is
designated by one such character, called a "category designator".  They
are specified in a regexp using the syntax `\cX', where X is a category
designator. (This is not yet implemented.)

   A category table specifies, for each character, the categories that
the character is in.  Note that a character can be in more than one
category.  More specifically, a category table maps from a character to
either the value `nil' (meaning the character is in no categories) or a
95-element bit vector, specifying for each of the 95 categories whether
the character is in that category.

   Special Lisp functions are provided that abstract this, so you do not
have to directly manipulate bit vectors.

 - Function: category-table-p object
     This function returns `t' if OBJECT is a category table.

 - Function: category-table &optional buffer
     This function returns the current category table.  This is the one
     specified by the current buffer, or by BUFFER if it is non-`nil'.

 - Function: standard-category-table
     This function returns the standard category table.  This is the
     one used for new buffers.

 - Function: copy-category-table &optional category-table
     This function returns a new category table which is a copy of
     CATEGORY-TABLE, which defaults to the standard category table.

 - Function: set-category-table category-table &optional buffer
     This function selects CATEGORY-TABLE as the new category table for
     BUFFER.  BUFFER defaults to the current buffer if omitted.

 - Function: category-designator-p object
     This function returns `t' if OBJECT is a category designator (a
     char in the range `' '' to `'~'').

 - Function: category-table-value-p object
     This function returns `t' if OBJECT is a category table value.
     Valid values are `nil' or a bit vector of size 95.


File: lispref.info,  Node: Tips,  Next: Building XEmacs and Object Allocation,  Prev: MULE,  Up: Top

Tips and Standards
******************

This chapter describes no additional features of XEmacs Lisp.  Instead
it gives advice on making effective use of the features described in
the previous chapters.

* Menu:

* Style Tips::                Writing clean and robust programs.
* Compilation Tips::          Making compiled code run fast.
* Documentation Tips::        Writing readable documentation strings.
* Comment Tips::	      Conventions for writing comments.
* Library Headers::           Standard headers for library packages.


File: lispref.info,  Node: Style Tips,  Next: Compilation Tips,  Up: Tips

Writing Clean Lisp Programs
===========================

Here are some tips for avoiding common errors in writing Lisp code
intended for widespread use:

   * Since all global variables share the same name space, and all
     functions share another name space, you should choose a short word
     to distinguish your program from other Lisp programs.  Then take
     care to begin the names of all global variables, constants, and
     functions with the chosen prefix.  This helps avoid name conflicts.

     This recommendation applies even to names for traditional Lisp
     primitives that are not primitives in XEmacs Lisp--even to `cadr'.
     Believe it or not, there is more than one plausible way to define
     `cadr'.  Play it safe; append your name prefix to produce a name
     like `foo-cadr' or `mylib-cadr' instead.

     If you write a function that you think ought to be added to Emacs
     under a certain name, such as `twiddle-files', don't call it by
     that name in your program.  Call it `mylib-twiddle-files' in your
     program, and send mail to `bug-gnu-emacs@prep.ai.mit.edu'
     suggesting we add it to Emacs.  If and when we do, we can change
     the name easily enough.

     If one prefix is insufficient, your package may use two or three
     alternative common prefixes, so long as they make sense.

     Separate the prefix from the rest of the symbol name with a hyphen,
     `-'.  This will be consistent with XEmacs itself and with most
     Emacs Lisp programs.

   * It is often useful to put a call to `provide' in each separate
     library program, at least if there is more than one entry point to
     the program.

   * If a file requires certain other library programs to be loaded
     beforehand, then the comments at the beginning of the file should
     say so.  Also, use `require' to make sure they are loaded.

   * If one file FOO uses a macro defined in another file BAR, FOO
     should contain this expression before the first use of the macro:

          (eval-when-compile (require 'BAR))

     (And BAR should contain `(provide 'BAR)', to make the `require'
     work.)  This will cause BAR to be loaded when you byte-compile
     FOO.  Otherwise, you risk compiling FOO without the necessary
     macro loaded, and that would produce compiled code that won't work
     right.  *Note Compiling Macros::.

     Using `eval-when-compile' avoids loading BAR when the compiled
     version of FOO is _used_.

   * If you define a major mode, make sure to run a hook variable using
     `run-hooks', just as the existing major modes do.  *Note Hooks::.

   * If the purpose of a function is to tell you whether a certain
     condition is true or false, give the function a name that ends in
     `p'.  If the name is one word, add just `p'; if the name is
     multiple words, add `-p'.  Examples are `framep' and
     `frame-live-p'.

   * If a user option variable records a true-or-false condition, give
     it a name that ends in `-flag'.

   * Please do not define `C-c LETTER' as a key in your major modes.
     These sequences are reserved for users; they are the *only*
     sequences reserved for users, so we cannot do without them.

     Instead, define sequences consisting of `C-c' followed by a
     non-letter.  These sequences are reserved for major modes.

     Changing all the major modes in Emacs 18 so they would follow this
     convention was a lot of work.  Abandoning this convention would
     make that work go to waste, and inconvenience users.

   * Sequences consisting of `C-c' followed by `{', `}', `<', `>', `:'
     or `;' are also reserved for major modes.

   * Sequences consisting of `C-c' followed by any other punctuation
     character are allocated for minor modes.  Using them in a major
     mode is not absolutely prohibited, but if you do that, the major
     mode binding may be shadowed from time to time by minor modes.

   * You should not bind `C-h' following any prefix character (including
     `C-c').  If you don't bind `C-h', it is automatically available as
     a help character for listing the subcommands of the prefix
     character.

   * You should not bind a key sequence ending in <ESC> except following
     another <ESC>.  (That is, it is ok to bind a sequence ending in
     `<ESC> <ESC>'.)

     The reason for this rule is that a non-prefix binding for <ESC> in
     any context prevents recognition of escape sequences as function
     keys in that context.

   * Applications should not bind mouse events based on button 1 with
     the shift key held down.  These events include `S-mouse-1',
     `M-S-mouse-1', `C-S-mouse-1', and so on.  They are reserved for
     users.

   * Modes should redefine `mouse-2' as a command to follow some sort of
     reference in the text of a buffer, if users usually would not want
     to alter the text in that buffer by hand.  Modes such as Dired,
     Info, Compilation, and Occur redefine it in this way.

   * When a package provides a modification of ordinary Emacs behavior,
     it is good to include a command to enable and disable the feature,
     Provide a command named `WHATEVER-mode' which turns the feature on
     or off, and make it autoload (*note Autoload::).  Design the
     package so that simply loading it has no visible effect--that
     should not enable the feature.  Users will request the feature by
     invoking the command.

   * It is a bad idea to define aliases for the Emacs primitives.  Use
     the standard names instead.

   * Redefining an Emacs primitive is an even worse idea.  It may do
     the right thing for a particular program, but there is no telling
     what other programs might break as a result.

   * If a file does replace any of the functions or library programs of
     standard XEmacs, prominent comments at the beginning of the file
     should say which functions are replaced, and how the behavior of
     the replacements differs from that of the originals.

   * Please keep the names of your XEmacs Lisp source files to 13
     characters or less.  This way, if the files are compiled, the
     compiled files' names will be 14 characters or less, which is
     short enough to fit on all kinds of Unix systems.

   * Don't use `next-line' or `previous-line' in programs; nearly
     always, `forward-line' is more convenient as well as more
     predictable and robust.  *Note Text Lines::.

   * Don't call functions that set the mark, unless setting the mark is
     one of the intended features of your program.  The mark is a
     user-level feature, so it is incorrect to change the mark except
     to supply a value for the user's benefit.  *Note The Mark::.

     In particular, don't use these functions:

        * `beginning-of-buffer', `end-of-buffer'

        * `replace-string', `replace-regexp'

     If you just want to move point, or replace a certain string,
     without any of the other features intended for interactive users,
     you can replace these functions with one or two lines of simple
     Lisp code.

   * Use lists rather than vectors, except when there is a particular
     reason to use a vector.  Lisp has more facilities for manipulating
     lists than for vectors, and working with lists is usually more
     convenient.

     Vectors are advantageous for tables that are substantial in size
     and are accessed in random order (not searched front to back),
     provided there is no need to insert or delete elements (only lists
     allow that).

   * The recommended way to print a message in the echo area is with
     the `message' function, not `princ'.  *Note The Echo Area::.

   * When you encounter an error condition, call the function `error'
     (or `signal').  The function `error' does not return.  *Note
     Signaling Errors::.

     Do not use `message', `throw', `sleep-for', or `beep' to report
     errors.

   * An error message should start with a capital letter but should not
     end with a period.

   * Try to avoid using recursive edits.  Instead, do what the Rmail `e'
     command does: use a new local keymap that contains one command
     defined to switch back to the old local keymap.  Or do what the
     `edit-options' command does: switch to another buffer and let the
     user switch back at will.  *Note Recursive Editing::.

   * In some other systems there is a convention of choosing variable
     names that begin and end with `*'.  We don't use that convention
     in Emacs Lisp, so please don't use it in your programs.  (Emacs
     uses such names only for program-generated buffers.)  The users
     will find Emacs more coherent if all libraries use the same
     conventions.

   * Use names starting with a space for temporary buffers (*note
     Buffer Names::), or at least call `buffer-disable-undo' on them.
     Otherwise they may stay referenced by internal undo variable(s)
     after getting killed.  If this happens before dumping (*note
     Building XEmacs::), this may cause fatal error when portable
     dumper is used.

   * Indent each function with `C-M-q' (`indent-sexp') using the
     default indentation parameters.

   * Don't make a habit of putting close-parentheses on lines by
     themselves; Lisp programmers find this disconcerting.  Once in a
     while, when there is a sequence of many consecutive
     close-parentheses, it may make sense to split them in one or two
     significant places.

   * Please put a copyright notice on the file if you give copies to
     anyone.  Use the same lines that appear at the top of the Lisp
     files in XEmacs itself.  If you have not signed papers to assign
     the copyright to the Foundation, then place your name in the
     copyright notice in place of the Foundation's name.


File: lispref.info,  Node: Compilation Tips,  Next: Documentation Tips,  Prev: Style Tips,  Up: Tips

Tips for Making Compiled Code Fast
==================================

Here are ways of improving the execution speed of byte-compiled Lisp
programs.

   * Use the `profile' library to profile your program.  See the file
     `profile.el' for instructions.

   * Use iteration rather than recursion whenever possible.  Function
     calls are slow in XEmacs Lisp even when a compiled function is
     calling another compiled function.

   * Using the primitive list-searching functions `memq', `member',
     `assq', or `assoc' is even faster than explicit iteration.  It may
     be worth rearranging a data structure so that one of these
     primitive search functions can be used.

   * Certain built-in functions are handled specially in byte-compiled
     code, avoiding the need for an ordinary function call.  It is a
     good idea to use these functions rather than alternatives.  To see
     whether a function is handled specially by the compiler, examine
     its `byte-compile' property.  If the property is non-`nil', then
     the function is handled specially.

     For example, the following input will show you that `aref' is
     compiled specially (*note Array Functions::) while `elt' is not
     (*note Sequence Functions::):

          (get 'aref 'byte-compile)
               => byte-compile-two-args
          
          (get 'elt 'byte-compile)
               => nil

   * If calling a small function accounts for a  substantial part of
     your program's running time, make the function inline.  This
     eliminates the function call overhead.  Since making a function
     inline reduces the flexibility of changing the program, don't do
     it unless it gives a noticeable speedup in something slow enough
     that users care about the speed.  *Note Inline Functions::.


File: lispref.info,  Node: Documentation Tips,  Next: Comment Tips,  Prev: Compilation Tips,  Up: Tips

Tips for Documentation Strings
==============================

Here are some tips for the writing of documentation strings.

   * Every command, function, or variable intended for users to know
     about should have a documentation string.

   * An internal variable or subroutine of a Lisp program might as well
     have a documentation string.  In earlier Emacs versions, you could
     save space by using a comment instead of a documentation string,
     but that is no longer the case.

   * The first line of the documentation string should consist of one
     or two complete sentences that stand on their own as a summary.
     `M-x apropos' displays just the first line, and if it doesn't
     stand on its own, the result looks bad.  In particular, start the
     first line with a capital letter and end with a period.

     The documentation string can have additional lines that expand on
     the details of how to use the function or variable.  The
     additional lines should be made up of complete sentences also, but
     they may be filled if that looks good.

   * For consistency, phrase the verb in the first sentence of a
     documentation string as an infinitive with "to" omitted.  For
     instance, use "Return the cons of A and B." in preference to
     "Returns the cons of A and B."  Usually it looks good to do
     likewise for the rest of the first paragraph.  Subsequent
     paragraphs usually look better if they have proper subjects.

   * Write documentation strings in the active voice, not the passive,
     and in the present tense, not the future.  For instance, use
     "Return a list containing A and B." instead of "A list containing
     A and B will be returned."

   * Avoid using the word "cause" (or its equivalents) unnecessarily.
     Instead of, "Cause Emacs to display text in boldface," write just
     "Display text in boldface."

   * Do not start or end a documentation string with whitespace.

   * Format the documentation string so that it fits in an Emacs window
     on an 80-column screen.  It is a good idea for most lines to be no
     wider than 60 characters.  The first line can be wider if
     necessary to fit the information that ought to be there.

     However, rather than simply filling the entire documentation
     string, you can make it much more readable by choosing line breaks
     with care.  Use blank lines between topics if the documentation
     string is long.

   * *Do not* indent subsequent lines of a documentation string so that
     the text is lined up in the source code with the text of the first
     line.  This looks nice in the source code, but looks bizarre when
     users view the documentation.  Remember that the indentation
     before the starting double-quote is not part of the string!

   * A variable's documentation string should start with `*' if the
     variable is one that users would often want to set interactively.
     If the value is a long list, or a function, or if the variable
     would be set only in init files, then don't start the
     documentation string with `*'.  *Note Defining Variables::.

   * The documentation string for a variable that is a yes-or-no flag
     should start with words such as "Non-nil means...", to make it
     clear that all non-`nil' values are equivalent and indicate
     explicitly what `nil' and non-`nil' mean.

   * When a function's documentation string mentions the value of an
     argument of the function, use the argument name in capital letters
     as if it were a name for that value.  Thus, the documentation
     string of the function `/' refers to its second argument as
     `DIVISOR', because the actual argument name is `divisor'.

     Also use all caps for meta-syntactic variables, such as when you
     show the decomposition of a list or vector into subunits, some of
     which may vary.

   * When a documentation string refers to a Lisp symbol, write it as it
     would be printed (which usually means in lower case), with
     single-quotes around it.  For example: `lambda'.  There are two
     exceptions: write t and nil without single-quotes.  (In this
     manual, we normally do use single-quotes for those symbols.)

   * Don't write key sequences directly in documentation strings.
     Instead, use the `\\[...]' construct to stand for them.  For
     example, instead of writing `C-f', write `\\[forward-char]'.  When
     Emacs displays the documentation string, it substitutes whatever
     key is currently bound to `forward-char'.  (This is normally `C-f',
     but it may be some other character if the user has moved key
     bindings.)  *Note Keys in Documentation::.

   * In documentation strings for a major mode, you will want to refer
     to the key bindings of that mode's local map, rather than global
     ones.  Therefore, use the construct `\\<...>' once in the
     documentation string to specify which key map to use.  Do this
     before the first use of `\\[...]'.  The text inside the `\\<...>'
     should be the name of the variable containing the local keymap for
     the major mode.

     It is not practical to use `\\[...]' very many times, because
     display of the documentation string will become slow.  So use this
     to describe the most important commands in your major mode, and
     then use `\\{...}' to display the rest of the mode's keymap.


File: lispref.info,  Node: Comment Tips,  Next: Library Headers,  Prev: Documentation Tips,  Up: Tips

Tips on Writing Comments
========================

We recommend these conventions for where to put comments and how to
indent them:

`;'
     Comments that start with a single semicolon, `;', should all be
     aligned to the same column on the right of the source code.  Such
     comments usually explain how the code on the same line does its
     job.  In Lisp mode and related modes, the `M-;'
     (`indent-for-comment') command automatically inserts such a `;' in
     the right place, or aligns such a comment if it is already present.

     This and following examples are taken from the Emacs sources.

          (setq base-version-list                 ; there was a base
                (assoc (substring fn 0 start-vn)  ; version to which
                       file-version-assoc-list))  ; this looks like
                                                  ; a subversion

`;;'
     Comments that start with two semicolons, `;;', should be aligned to
     the same level of indentation as the code.  Such comments usually
     describe the purpose of the following lines or the state of the
     program at that point.  For example:

          (prog1 (setq auto-fill-function
                       ...
                       ...
            ;; update modeline
            (redraw-modeline)))

     Every function that has no documentation string (because it is
     used only internally within the package it belongs to), should
     have instead a two-semicolon comment right before the function,
     explaining what the function does and how to call it properly.
     Explain precisely what each argument means and how the function
     interprets its possible values.

`;;;'
     Comments that start with three semicolons, `;;;', should start at
     the left margin.  Such comments are used outside function
     definitions to make general statements explaining the design
     principles of the program.  For example:

          ;;; This Lisp code is run in XEmacs
          ;;; when it is to operate as a server
          ;;; for other processes.

     Another use for triple-semicolon comments is for commenting out
     lines within a function.  We use triple-semicolons for this
     precisely so that they remain at the left margin.

          (defun foo (a)
          ;;; This is no longer necessary.
          ;;;  (force-mode-line-update)
            (message "Finished with %s" a))

`;;;;'
     Comments that start with four semicolons, `;;;;', should be aligned
     to the left margin and are used for headings of major sections of a
     program.  For example:

          ;;;; The kill ring

The indentation commands of the Lisp modes in XEmacs, such as `M-;'
(`indent-for-comment') and <TAB> (`lisp-indent-line') automatically
indent comments according to these conventions, depending on the number
of semicolons.  *Note Manipulating Comments: (xemacs)Comments.


File: lispref.info,  Node: Library Headers,  Prev: Comment Tips,  Up: Tips

Conventional Headers for XEmacs Libraries
=========================================

XEmacs has conventions for using special comments in Lisp libraries to
divide them into sections and give information such as who wrote them.
This section explains these conventions.  First, an example:

     ;;; lisp-mnt.el --- minor mode for Emacs Lisp maintainers
     
     ;; Copyright (C) 1992 Free Software Foundation, Inc.
     
     ;; Author: Eric S. Raymond <esr@snark.thyrsus.com>
     ;; Maintainer: Eric S. Raymond <esr@snark.thyrsus.com>
     ;; Created: 14 Jul 1992
     ;; Version: 1.2
     ;; Keywords: docs
     
     ;; This file is part of XEmacs.
     COPYING PERMISSIONS...

   The very first line should have this format:

     ;;; FILENAME --- DESCRIPTION

The description should be complete in one line.

   After the copyright notice come several "header comment" lines, each
beginning with `;; HEADER-NAME:'.  Here is a table of the conventional
possibilities for HEADER-NAME:

`Author'
     This line states the name and net address of at least the principal
     author of the library.

     If there are multiple authors, you can list them on continuation
     lines led by `;;' and a tab character, like this:

          ;; Author: Ashwin Ram <Ram-Ashwin@cs.yale.edu>
          ;;      Dave Sill <de5@ornl.gov>
          ;;      Dave Brennan <brennan@hal.com>
          ;;      Eric Raymond <esr@snark.thyrsus.com>

`Maintainer'
     This line should contain a single name/address as in the Author
     line, or an address only, or the string `FSF'.  If there is no
     maintainer line, the person(s) in the Author field are presumed to
     be the maintainers.  The example above is mildly bogus because the
     maintainer line is redundant.

     The idea behind the `Author' and `Maintainer' lines is to make
     possible a Lisp function to "send mail to the maintainer" without
     having to mine the name out by hand.

     Be sure to surround the network address with `<...>' if you
     include the person's full name as well as the network address.

`Created'
     This optional line gives the original creation date of the file.
     For historical interest only.

`Version'
     If you wish to record version numbers for the individual Lisp
     program, put them in this line.

`Adapted-By'
     In this header line, place the name of the person who adapted the
     library for installation (to make it fit the style conventions, for
     example).

`Keywords'
     This line lists keywords for the `finder-by-keyword' help command.
     This field is important; it's how people will find your package
     when they're looking for things by topic area.  To separate the
     keywords, you can use spaces, commas, or both.

   Just about every Lisp library ought to have the `Author' and
`Keywords' header comment lines.  Use the others if they are
appropriate.  You can also put in header lines with other header
names--they have no standard meanings, so they can't do any harm.

   We use additional stylized comments to subdivide the contents of the
library file.  Here is a table of them:

`;;; Commentary:'
     This begins introductory comments that explain how the library
     works.  It should come right after the copying permissions.

`;;; Change log:'
     This begins change log information stored in the library file (if
     you store the change history there).  For most of the Lisp files
     distributed with XEmacs, the change history is kept in the file
     `ChangeLog' and not in the source file at all; these files do not
     have a `;;; Change log:' line.

`;;; Code:'
     This begins the actual code of the program.

`;;; FILENAME ends here'
     This is the "footer line"; it appears at the very end of the file.
     Its purpose is to enable people to detect truncated versions of
     the file from the lack of a footer line.


File: lispref.info,  Node: Building XEmacs and Object Allocation,  Next: Standard Errors,  Prev: Tips,  Up: Top

Building XEmacs; Allocation of Objects
**************************************

This chapter describes how the runnable XEmacs executable is dumped
with the preloaded Lisp libraries in it and how storage is allocated.

   There is an entire separate document, the `XEmacs Internals Manual',
devoted to the internals of XEmacs from the perspective of the C
programmer.  It contains much more detailed information about the build
process, the allocation and garbage-collection process, and other
aspects related to the internals of XEmacs.

* Menu:

* Building XEmacs::     How to preload Lisp libraries into XEmacs.
* Pure Storage::        A kludge to make preloaded Lisp functions sharable.
* Garbage Collection::  Reclaiming space for Lisp objects no longer used.


File: lispref.info,  Node: Building XEmacs,  Next: Pure Storage,  Up: Building XEmacs and Object Allocation

Building XEmacs
===============

This section explains the steps involved in building the XEmacs
executable.  You don't have to know this material to build and install
XEmacs, since the makefiles do all these things automatically.  This
information is pertinent to XEmacs maintenance.

   The `XEmacs Internals Manual' contains more information about this.

   Compilation of the C source files in the `src' directory produces an
executable file called `temacs', also called a "bare impure XEmacs".
It contains the XEmacs Lisp interpreter and I/O routines, but not the
editing commands.

   Before XEmacs is actually usable, a number of Lisp files need to be
loaded.  These define all the editing commands, plus most of the startup
code and many very basic Lisp primitives.  This is accomplished by
loading the file `loadup.el', which in turn loads all of the other
standardly-loaded Lisp files.

   It takes a substantial time to load the standard Lisp files.
Luckily, you don't have to do this each time you run XEmacs; `temacs'
can dump out an executable program called `xemacs' that has these files
preloaded.  `xemacs' starts more quickly because it does not need to
load the files.  This is the XEmacs executable that is normally
installed.

   To create `xemacs', use the command `temacs -batch -l loadup dump'.
The purpose of `-batch' here is to tell `temacs' to run in
non-interactive, command-line mode. (`temacs' can _only_ run in this
fashion.  Part of the code required to initialize frames and faces is
in Lisp, and must be loaded before XEmacs is able to create any frames.)
The argument `dump' tells `loadup.el' to dump a new executable named
`xemacs'.

   The dumping process is highly system-specific, and some operating
systems don't support dumping.  On those systems, you must start XEmacs
with the `temacs -batch -l loadup run-temacs' command each time you use
it.  This takes a substantial time, but since you need to start Emacs
once a day at most--or once a week if you never log out--the extra time
is not too severe a problem. (In older versions of Emacs, you started
Emacs from `temacs' using `temacs -l loadup'.)

   You are free to start XEmacs directly from `temacs' if you want,
even if there is already a dumped `xemacs'.  Normally you wouldn't want
to do that; but the Makefiles do this when you rebuild XEmacs using
`make all-elc', which builds XEmacs and simultaneously compiles any
out-of-date Lisp files. (You need `xemacs' in order to compile Lisp
files.  However, you also need the compiled Lisp files in order to dump
out `xemacs'.  If both of these are missing or corrupted, you are out
of luck unless you're able to bootstrap `xemacs' from `temacs'.  Note
that `make all-elc' actually loads the alternative loadup file
`loadup-el.el', which works like `loadup.el' but disables the
pure-copying process and forces XEmacs to ignore any compiled Lisp
files even if they exist.)

   You can specify additional files to preload by writing a library
named `site-load.el' that loads them.  You may need to increase the
value of `PURESIZE', in `src/puresize.h', to make room for the
additional files.  You should _not_ modify this file directly, however;
instead, use the `--puresize' configuration option. (If you run out of
pure space while dumping `xemacs', you will be told how much pure space
you actually will need.) However, the advantage of preloading
additional files decreases as machines get faster.  On modern machines,
it is often not advisable, especially if the Lisp code is on a file
system local to the machine running XEmacs.

   You can specify other Lisp expressions to execute just before dumping
by putting them in a library named `site-init.el'.  However, if they
might alter the behavior that users expect from an ordinary unmodified
XEmacs, it is better to put them in `default.el', so that users can
override them if they wish.  *Note Start-up Summary::.

   Before `loadup.el' dumps the new executable, it finds the
documentation strings for primitive and preloaded functions (and
variables) in the file where they are stored, by calling
`Snarf-documentation' (*note Accessing Documentation::).  These strings
were moved out of the `xemacs' executable to make it smaller.  *Note
Documentation Basics::.

 - Function: dump-emacs to-file from-file
     This function dumps the current state of XEmacs into an executable
     file TO-FILE.  It takes symbols from FROM-FILE (this is normally
     the executable file `temacs').

     If you use this function in an XEmacs that was already dumped, you
     must set `command-line-processed' to `nil' first for good results.
     *Note Command Line Arguments::.

 - Function: run-emacs-from-temacs &rest args
     This is the function that implements the `run-temacs' command-line
     argument.  It is called from `loadup.el' as appropriate.  You
     should most emphatically _not_ call this yourself; it will
     reinitialize your XEmacs process and you'll be sorry.

 - Command: emacs-version &optional arg
     This function returns a string describing the version of XEmacs
     that is running.  It is useful to include this string in bug
     reports.

     When called interactively with a prefix argument, insert string at
     point.  Don't use this function in programs to choose actions
     according to the system configuration; look at
     `system-configuration' instead.

          (emacs-version)
            => "XEmacs 20.1 [Lucid] (i586-unknown-linux2.0.29)
                           of Mon Apr  7 1997 on altair.xemacs.org"

     Called interactively, the function prints the same information in
     the echo area.

 - Variable: emacs-build-time
     The value of this variable is the time at which XEmacs was built
     at the local site.

          emacs-build-time "Mon Apr  7 20:28:52 1997"
               =>

 - Variable: emacs-version
     The value of this variable is the version of Emacs being run.  It
     is a string, e.g. `"20.1 XEmacs Lucid"'.

   The following two variables did not exist before FSF GNU Emacs
version 19.23 and XEmacs version 19.10, which reduces their usefulness
at present, but we hope they will be convenient in the future.

 - Variable: emacs-major-version
     The major version number of Emacs, as an integer.  For XEmacs
     version 20.1, the value is 20.

 - Variable: emacs-minor-version
     The minor version number of Emacs, as an integer.  For XEmacs
     version 20.1, the value is 1.


File: lispref.info,  Node: Pure Storage,  Next: Garbage Collection,  Prev: Building XEmacs,  Up: Building XEmacs and Object Allocation

Pure Storage
============

XEmacs Lisp uses two kinds of storage for user-created Lisp objects:
"normal storage" and "pure storage".  Normal storage is where all the
new data created during an XEmacs session is kept; see the following
section for information on normal storage.  Pure storage is used for
certain data in the preloaded standard Lisp files--data that should
never change during actual use of XEmacs.

   Pure storage is allocated only while `temacs' is loading the
standard preloaded Lisp libraries.  In the file `xemacs', it is marked
as read-only (on operating systems that permit this), so that the
memory space can be shared by all the XEmacs jobs running on the machine
at once.  Pure storage is not expandable; a fixed amount is allocated
when XEmacs is compiled, and if that is not sufficient for the preloaded
libraries, `temacs' aborts with an error message.  If that happens, you
must increase the compilation parameter `PURESIZE' using the
`--puresize' option to `configure'.  This normally won't happen unless
you try to preload additional libraries or add features to the standard
ones.

 - Function: purecopy object
     This function makes a copy of OBJECT in pure storage and returns
     it.  It copies strings by simply making a new string with the same
     characters in pure storage.  It recursively copies the contents of
     vectors and cons cells.  It does not make copies of other objects
     such as symbols, but just returns them unchanged.  It signals an
     error if asked to copy markers.

     This function is a no-op in XEmacs, and its use in new code is
     deprecated.

 - Variable: pure-bytes-used
     The value of this variable is the number of bytes of pure storage
     allocated so far.  Typically, in a dumped XEmacs, this number is
     very close to the total amount of pure storage available--if it
     were not, we would preallocate less.

 - Variable: purify-flag
     This variable determines whether `defun' should make a copy of the
     function definition in pure storage.  If it is non-`nil', then the
     function definition is copied into pure storage.

     This flag is `t' while loading all of the basic functions for
     building XEmacs initially (allowing those functions to be sharable
     and non-collectible).  Dumping XEmacs as an executable always
     writes `nil' in this variable, regardless of the value it actually
     has before and after dumping.

     You should not change this flag in a running XEmacs.


File: lispref.info,  Node: Garbage Collection,  Prev: Pure Storage,  Up: Building XEmacs and Object Allocation

Garbage Collection
==================

When a program creates a list or the user defines a new function (such
as by loading a library), that data is placed in normal storage.  If
normal storage runs low, then XEmacs asks the operating system to
allocate more memory in blocks of 2k bytes.  Each block is used for one
type of Lisp object, so symbols, cons cells, markers, etc., are
segregated in distinct blocks in memory.  (Vectors, long strings,
buffers and certain other editing types, which are fairly large, are
allocated in individual blocks, one per object, while small strings are
packed into blocks of 8k bytes. [More correctly, a string is allocated
in two sections: a fixed size chunk containing the length, list of
extents, etc.; and a chunk containing the actual characters in the
string.  It is this latter chunk that is either allocated individually
or packed into 8k blocks.  The fixed size chunk is packed into 2k
blocks, as for conses, markers, etc.])

   It is quite common to use some storage for a while, then release it
by (for example) killing a buffer or deleting the last pointer to an
object.  XEmacs provides a "garbage collector" to reclaim this
abandoned storage.  (This name is traditional, but "garbage recycler"
might be a more intuitive metaphor for this facility.)

   The garbage collector operates by finding and marking all Lisp
objects that are still accessible to Lisp programs.  To begin with, it
assumes all the symbols, their values and associated function
definitions, and any data presently on the stack, are accessible.  Any
objects that can be reached indirectly through other accessible objects
are also accessible.

   When marking is finished, all objects still unmarked are garbage.  No
matter what the Lisp program or the user does, it is impossible to refer
to them, since there is no longer a way to reach them.  Their space
might as well be reused, since no one will miss them.  The second
("sweep") phase of the garbage collector arranges to reuse them.

   The sweep phase puts unused cons cells onto a "free list" for future
allocation; likewise for symbols, markers, extents, events, floats,
compiled-function objects, and the fixed-size portion of strings.  It
compacts the accessible small string-chars chunks so they occupy fewer
8k blocks; then it frees the other 8k blocks.  Vectors, buffers,
windows, and other large objects are individually allocated and freed
using `malloc' and `free'.

     Common Lisp note: unlike other Lisps, XEmacs Lisp does not call
     the garbage collector when the free list is empty.  Instead, it
     simply requests the operating system to allocate more storage, and
     processing continues until `gc-cons-threshold' bytes have been
     used.

     This means that you can make sure that the garbage collector will
     not run during a certain portion of a Lisp program by calling the
     garbage collector explicitly just before it (provided that portion
     of the program does not use so much space as to force a second
     garbage collection).

 - Command: garbage-collect
     This command runs a garbage collection, and returns information on
     the amount of space in use.  (Garbage collection can also occur
     spontaneously if you use more than `gc-cons-threshold' bytes of
     Lisp data since the previous garbage collection.)

     `garbage-collect' returns a list containing the following
     information:

          ((USED-CONSES . FREE-CONSES)
           (USED-SYMS . FREE-SYMS)
           (USED-MARKERS . FREE-MARKERS)
           USED-STRING-CHARS
           USED-VECTOR-SLOTS
           (PLIST))
          
          => ((73362 . 8325) (13718 . 164)
          (5089 . 5098) 949121 118677
          (conses-used 73362 conses-free 8329 cons-storage 658168
          symbols-used 13718 symbols-free 164 symbol-storage 335216
          bit-vectors-used 0 bit-vectors-total-length 0
          bit-vector-storage 0 vectors-used 7882
          vectors-total-length 118677 vector-storage 537764
          compiled-functions-used 1336 compiled-functions-free 37
          compiled-function-storage 44440 short-strings-used 28829
          long-strings-used 2 strings-free 7722
          short-strings-total-length 916657 short-string-storage 1179648
          long-strings-total-length 32464 string-header-storage 441504
          floats-used 3 floats-free 43 float-storage 2044 markers-used 5089
          markers-free 5098 marker-storage 245280 events-used 103
          events-free 835 event-storage 110656 extents-used 10519
          extents-free 2718 extent-storage 372736
          extent-auxiliarys-used 111 extent-auxiliarys-freed 3
          extent-auxiliary-storage 4440 window-configurations-used 39
          window-configurations-on-free-list 5
          window-configurations-freed 10 window-configuration-storage 9492
          popup-datas-used 3 popup-data-storage 72 toolbar-buttons-used 62
          toolbar-button-storage 4960 toolbar-datas-used 12
          toolbar-data-storage 240 symbol-value-buffer-locals-used 182
          symbol-value-buffer-local-storage 5824
          symbol-value-lisp-magics-used 22
          symbol-value-lisp-magic-storage 1496
          symbol-value-varaliases-used 43
          symbol-value-varalias-storage 1032 opaque-lists-used 2
          opaque-list-storage 48 color-instances-used 12
          color-instance-storage 288 font-instances-used 5
          font-instance-storage 180 opaques-used 11 opaque-storage 312
          range-tables-used 1 range-table-storage 16 faces-used 34
          face-storage 2584 glyphs-used 124 glyph-storage 4464
          specifiers-used 775 specifier-storage 43869 weak-lists-used 786
          weak-list-storage 18864 char-tables-used 40
          char-table-storage 41920 buffers-used 25 buffer-storage 7000
          extent-infos-used 457 extent-infos-freed 73
          extent-info-storage 9140 keymaps-used 275 keymap-storage 12100
          consoles-used 4 console-storage 384 command-builders-used 2
          command-builder-storage 120 devices-used 2 device-storage 344
          frames-used 3 frame-storage 624 image-instances-used 47
          image-instance-storage 3008 windows-used 27 windows-freed 2
          window-storage 9180 lcrecord-lists-used 15
          lcrecord-list-storage 360 hash-tables-used 631
          hash-table-storage 25240 streams-used 1 streams-on-free-list 3
          streams-freed 12 stream-storage 91))

     Here is a table explaining each element:

    USED-CONSES
          The number of cons cells in use.

    FREE-CONSES
          The number of cons cells for which space has been obtained
          from the operating system, but that are not currently being
          used.

    USED-SYMS
          The number of symbols in use.

    FREE-SYMS
          The number of symbols for which space has been obtained from
          the operating system, but that are not currently being used.

    USED-MARKERS
          The number of markers in use.

    FREE-MARKERS
          The number of markers for which space has been obtained from
          the operating system, but that are not currently being used.

    USED-STRING-CHARS
          The total size of all strings, in characters.

    USED-VECTOR-SLOTS
          The total number of elements of existing vectors.

    PLIST
          A list of alternating keyword/value pairs providing more
          detailed information. (As you can see above, quite a lot of
          information is provided.)

 - User Option: gc-cons-threshold
     The value of this variable is the number of bytes of storage that
     must be allocated for Lisp objects after one garbage collection in
     order to trigger another garbage collection.  A cons cell counts
     as eight bytes, a string as one byte per character plus a few
     bytes of overhead, and so on; space allocated to the contents of
     buffers does not count.  Note that the subsequent garbage
     collection does not happen immediately when the threshold is
     exhausted, but only the next time the Lisp evaluator is called.

     The initial threshold value is 500,000.  If you specify a larger
     value, garbage collection will happen less often.  This reduces the
     amount of time spent garbage collecting, but increases total
     memory use.  You may want to do this when running a program that
     creates lots of Lisp data.

     You can make collections more frequent by specifying a smaller
     value, down to 10,000.  A value less than 10,000 will remain in
     effect only until the subsequent garbage collection, at which time
     `garbage-collect' will set the threshold back to 10,000. (This does
     not apply if XEmacs was configured with `--debug'.  Therefore, be
     careful when setting `gc-cons-threshold' in that case!)

 - Variable: pre-gc-hook
     This is a normal hook to be run just before each garbage
     collection.  Interrupts, garbage collection, and errors are
     inhibited while this hook runs, so be extremely careful in what
     you add here.  In particular, avoid consing, and do not interact
     with the user.

 - Variable: post-gc-hook
     This is a normal hook to be run just after each garbage collection.
     Interrupts, garbage collection, and errors are inhibited while
     this hook runs, so be extremely careful in what you add here.  In
     particular, avoid consing, and do not interact with the user.

 - Variable: gc-message
     This is a string to print to indicate that a garbage collection is
     in progress.  This is printed in the echo area.  If the selected
     frame is on a window system and `gc-pointer-glyph' specifies a
     value (i.e. a pointer image instance) in the domain of the
     selected frame, the mouse cursor will change instead of this
     message being printed.

 - Glyph: gc-pointer-glyph
     This holds the pointer glyph used to indicate that a garbage
     collection is in progress.  If the selected window is on a window
     system and this glyph specifies a value (i.e. a pointer image
     instance) in the domain of the selected window, the cursor will be
     changed as specified during garbage collection.  Otherwise, a
     message will be printed in the echo area, as controlled by
     `gc-message'.  *Note Glyphs::.

   If XEmacs was configured with `--debug', you can set the following
two variables to get direct information about all the allocation that
is happening in a segment of Lisp code.

 - Variable: debug-allocation
     If non-zero, print out information to stderr about all objects
     allocated.

 - Variable: debug-allocation-backtrace
     Length (in stack frames) of short backtrace printed out by
     `debug-allocation'.


File: lispref.info,  Node: Standard Errors,  Next: Standard Buffer-Local Variables,  Prev: Building XEmacs and Object Allocation,  Up: Top

Standard Errors
***************

Here is the complete list of the error symbols in standard Emacs,
grouped by concept.  The list includes each symbol's message (on the
`error-message' property of the symbol) and a cross reference to a
description of how the error can occur.

   Each error symbol has an `error-conditions' property that is a list
of symbols.  Normally this list includes the error symbol itself and
the symbol `error'.  Occasionally it includes additional symbols, which
are intermediate classifications, narrower than `error' but broader
than a single error symbol.  For example, all the errors in accessing
files have the condition `file-error'.

   As a special exception, the error symbol `quit' does not have the
condition `error', because quitting is not considered an error.

   *Note Errors::, for an explanation of how errors are generated and
handled.

`SYMBOL'
     STRING; REFERENCE.

`error'
     `"error"'
     *Note Errors::.

`quit'
     `"Quit"'
     *Note Quitting::.

`args-out-of-range'
     `"Args out of range"'
     *Note Sequences Arrays Vectors::.

`arith-error'
     `"Arithmetic error"'
     See `/' and `%' in *Note Numbers::.

`beginning-of-buffer'
     `"Beginning of buffer"'
     *Note Motion::.

`buffer-read-only'
     `"Buffer is read-only"'
     *Note Read Only Buffers::.

`cyclic-function-indirection'
     `"Symbol's chain of function indirections contains a loop"'
     *Note Function Indirection::.

`domain-error'
     `"Arithmetic domain error"'
`end-of-buffer'
     `"End of buffer"'
     *Note Motion::.

`end-of-file'
     `"End of file during parsing"'
     This is not a `file-error'.
     *Note Input Functions::.

`file-error'
     This error and its subcategories do not have error-strings,
     because the error message is constructed from the data items alone
     when the error condition `file-error' is present.
     *Note Files::.

`file-locked'
     This is a `file-error'.
     *Note File Locks::.

`file-already-exists'
     This is a `file-error'.
     *Note Writing to Files::.

`file-supersession'
     This is a `file-error'.
     *Note Modification Time::.

`invalid-byte-code'
     `"Invalid byte code"'
     *Note Byte Compilation::.

`invalid-function'
     `"Invalid function"'
     *Note Classifying Lists::.

`invalid-read-syntax'
     `"Invalid read syntax"'
     *Note Input Functions::.

`invalid-regexp'
     `"Invalid regexp"'
     *Note Regular Expressions::.

`mark-inactive'
     `"The mark is not active now"'
`no-catch'
     `"No catch for tag"'
     *Note Catch and Throw::.

`overflow-error'
     `"Arithmetic overflow error"'
`protected-field'
     `"Attempt to modify a protected field"'
`range-error'
     `"Arithmetic range error"'
`search-failed'
     `"Search failed"'
     *Note Searching and Matching::.

`setting-constant'
     `"Attempt to set a constant symbol"'
     *Note Variables that Never Change: Constant Variables.

`singularity-error'
     `"Arithmetic singularity error"'
`tooltalk-error'
     `"ToolTalk error"'
     *Note ToolTalk Support::.

`undefined-keystroke-sequence'
     `"Undefined keystroke sequence"'
`void-function'
     `"Symbol's function definition is void"'
     *Note Function Cells::.

`void-variable'
     `"Symbol's value as variable is void"'
     *Note Accessing Variables::.

`wrong-number-of-arguments'
     `"Wrong number of arguments"'
     *Note Classifying Lists::.

`wrong-type-argument'
     `"Wrong type argument"'
     *Note Type Predicates::.

   These error types, which are all classified as special cases of
`arith-error', can occur on certain systems for invalid use of
mathematical functions.

`domain-error'
     `"Arithmetic domain error"'
     *Note Math Functions::.

`overflow-error'
     `"Arithmetic overflow error"'
     *Note Math Functions::.

`range-error'
     `"Arithmetic range error"'
     *Note Math Functions::.

`singularity-error'
     `"Arithmetic singularity error"'
     *Note Math Functions::.

`underflow-error'
     `"Arithmetic underflow error"'
     *Note Math Functions::.


File: lispref.info,  Node: Standard Buffer-Local Variables,  Next: Standard Keymaps,  Prev: Standard Errors,  Up: Top

Buffer-Local Variables
**********************

The table below lists the general-purpose Emacs variables that are
automatically local (when set) in each buffer.  Many Lisp packages
define such variables for their internal use; we don't list them here.

`abbrev-mode'
     *note Abbrevs::

`auto-fill-function'
     *note Auto Filling::

`buffer-auto-save-file-name'
     *note Auto-Saving::

`buffer-backed-up'
     *note Backup Files::

`buffer-display-table'
     *note Display Tables::

`buffer-file-format'
     *note Format Conversion::

`buffer-file-name'
     *note Buffer File Name::

`buffer-file-number'
     *note Buffer File Name::

`buffer-file-truename'
     *note Buffer File Name::

`buffer-file-type'
     *note Files and MS-DOS::

`buffer-invisibility-spec'
     *note Invisible Text::

`buffer-offer-save'
     *note Saving Buffers::

`buffer-read-only'
     *note Read Only Buffers::

`buffer-saved-size'
     *note Point::

`buffer-undo-list'
     *note Undo::

`cache-long-line-scans'
     *note Text Lines::

`case-fold-search'
     *note Searching and Case::

`ctl-arrow'
     *note Usual Display::

`comment-column'
     *note Comments: (xemacs)Comments.

`default-directory'
     *note System Environment::

`defun-prompt-regexp'
     *note List Motion::

`fill-column'
     *note Auto Filling::

`goal-column'
     *note Moving Point: (xemacs)Moving Point.

`left-margin'
     *note Indentation::

`local-abbrev-table'
     *note Abbrevs::

`local-write-file-hooks'
     *note Saving Buffers::

`major-mode'
     *note Mode Help::

`mark-active'
     *note The Mark::

`mark-ring'
     *note The Mark::

`minor-modes'
     *note Minor Modes::

`modeline-format'
     *note Modeline Data::

`modeline-buffer-identification'
     *note Modeline Variables::

`modeline-format'
     *note Modeline Data::

`modeline-modified'
     *note Modeline Variables::

`modeline-process'
     *note Modeline Variables::

`mode-name'
     *note Modeline Variables::

`overwrite-mode'
     *note Insertion::

`paragraph-separate'
     *note Standard Regexps::

`paragraph-start'
     *note Standard Regexps::

`point-before-scroll'
     Used for communication between mouse commands and scroll-bar
     commands.

`require-final-newline'
     *note Insertion::

`selective-display'
     *note Selective Display::

`selective-display-ellipses'
     *note Selective Display::

`tab-width'
     *note Usual Display::

`truncate-lines'
     *note Truncation::

`vc-mode'
     *note Modeline Variables::


File: lispref.info,  Node: Standard Keymaps,  Next: Standard Hooks,  Prev: Standard Buffer-Local Variables,  Up: Top

Standard Keymaps
****************

The following symbols are used as the names for various keymaps.  Some
of these exist when XEmacs is first started, others are loaded only
when their respective mode is used.  This is not an exhaustive list.

   Almost all of these maps are used as local maps.  Indeed, of the
modes that presently exist, only Vip mode and Terminal mode ever change
the global keymap.

`bookmark-map'
     A keymap containing bindings to bookmark functions.

`Buffer-menu-mode-map'
     A keymap used by Buffer Menu mode.

`c++-mode-map'
     A keymap used by C++ mode.

`c-mode-map'
     A keymap used by C mode.  A sparse keymap used by C mode.

`command-history-map'
     A keymap used by Command History mode.

`ctl-x-4-map'
     A keymap for subcommands of the prefix `C-x 4'.

`ctl-x-5-map'
     A keymap for subcommands of the prefix `C-x 5'.

`ctl-x-map'
     A keymap for `C-x' commands.

`debugger-mode-map'
     A keymap used by Debugger mode.

`dired-mode-map'
     A keymap for `dired-mode' buffers.

`edit-abbrevs-map'
     A keymap used in `edit-abbrevs'.

`edit-tab-stops-map'
     A keymap used in `edit-tab-stops'.

`electric-buffer-menu-mode-map'
     A keymap used by Electric Buffer Menu mode.

`electric-history-map'
     A keymap used by Electric Command History mode.

`emacs-lisp-mode-map'
     A keymap used by Emacs Lisp mode.

`help-map'
     A keymap for characters following the Help key.

`Helper-help-map'
     A keymap used by the help utility package.
     It has the same keymap in its value cell and in its function cell.

`Info-edit-map'
     A keymap used by the `e' command of Info.

`Info-mode-map'
     A keymap containing Info commands.

`isearch-mode-map'
     A keymap that defines the characters you can type within
     incremental search.

`itimer-edit-map'
     A keymap used when in Itimer Edit mode.

`lisp-interaction-mode-map'
     A keymap used by Lisp mode.

`lisp-mode-map'
     A keymap used by Lisp mode.

     A keymap for minibuffer input with completion.

`minibuffer-local-isearch-map'
     A keymap for editing isearch strings in the minibuffer.

`minibuffer-local-map'
     Default keymap to use when reading from the minibuffer.

`minibuffer-local-must-match-map'
     A keymap for minibuffer input with completion, for exact match.

`mode-specific-map'
     The keymap for characters following `C-c'.  Note, this is in the
     global map.  This map is not actually mode specific: its name was
     chosen to be informative for the user in `C-h b'
     (`display-bindings'), where it describes the main use of the `C-c'
     prefix key.

`modeline-map'
     The keymap consulted for mouse-clicks on the modeline of a window.

`objc-mode-map'
     A keymap used in Objective C mode as a local map.

`occur-mode-map'
     A local keymap used by Occur mode.

`overriding-local-map'
     A keymap that overrides all other local keymaps.

`query-replace-map'
     A local keymap used for responses in `query-replace' and related
     commands; also for `y-or-n-p' and `map-y-or-n-p'.  The functions
     that use this map do not support prefix keys; they look up one
     event at a time.

`read-expression-map'
     The minibuffer keymap used for reading Lisp expressions.

`read-shell-command-map'
     The minibuffer keymap used by `shell-command' and related commands.

`shared-lisp-mode-map'
     A keymap for commands shared by all sorts of Lisp modes.

`text-mode-map'
     A keymap used by Text mode.

`toolbar-map'
     The keymap consulted for mouse-clicks over a toolbar.

`view-mode-map'
     A keymap used by View mode.


File: lispref.info,  Node: Standard Hooks,  Next: Index,  Prev: Standard Keymaps,  Up: Top

Standard Hooks
**************

The following is a list of hook variables that let you provide
functions to be called from within Emacs on suitable occasions.

   Most of these variables have names ending with `-hook'.  They are
"normal hooks", run by means of `run-hooks'.  The value of such a hook
is a list of functions.  The recommended way to put a new function on
such a hook is to call `add-hook'.  *Note Hooks::, for more information
about using hooks.

   The variables whose names end in `-function' have single functions
as their values.  Usually there is a specific reason why the variable is
not a normal hook, such as the need to pass arguments to the function.
(In older Emacs versions, some of these variables had names ending in
`-hook' even though they were not normal hooks.)

   The variables whose names end in `-hooks' or `-functions' have lists
of functions as their values, but these functions are called in a
special way (they are passed arguments, or else their values are used).

`activate-menubar-hook'

`activate-popup-menu-hook'

`ad-definition-hooks'

`adaptive-fill-function'

`add-log-current-defun-function'

`after-change-functions'

`after-delete-annotation-hook'

`after-init-hook'

`after-insert-file-functions'

`after-revert-hook'

`after-save-hook'

`after-set-visited-file-name-hooks'

`after-write-file-hooks'

`auto-fill-function'

`auto-save-hook'

`before-change-functions'

`before-delete-annotation-hook'

`before-init-hook'

`before-revert-hook'

`blink-paren-function'

`buffers-menu-switch-to-buffer-function'

`c++-mode-hook'

`c-delete-function'

`c-mode-common-hook'

`c-mode-hook'

`c-special-indent-hook'

`calendar-load-hook'

`change-major-mode-hook'

`command-history-hook'

`comment-indent-function'

`compilation-buffer-name-function'

`compilation-exit-message-function'

`compilation-finish-function'

`compilation-parse-errors-function'

`compilation-mode-hook'

`create-console-hook'

`create-device-hook'

`create-frame-hook'

`dabbrev-friend-buffer-function'

`dabbrev-select-buffers-function'

`delete-console-hook'

`delete-device-hook'

`delete-frame-hook'

`deselect-frame-hook'

`diary-display-hook'

`diary-hook'

`dired-after-readin-hook'

`dired-before-readin-hook'

`dired-load-hook'

`dired-mode-hook'

`disabled-command-hook'

`display-buffer-function'

`ediff-after-setup-control-frame-hook'

`ediff-after-setup-windows-hook'

`ediff-before-setup-control-frame-hook'

`ediff-before-setup-windows-hook'

`ediff-brief-help-message-function'

`ediff-cleanup-hook'

`ediff-control-frame-position-function'

`ediff-display-help-hook'

`ediff-focus-on-regexp-matches-function'

`ediff-forward-word-function'

`ediff-hide-regexp-matches-function'

`ediff-keymap-setup-hook'

`ediff-load-hook'

`ediff-long-help-message-function'

`ediff-make-wide-display-function'

`ediff-merge-split-window-function'

`ediff-meta-action-function'

`ediff-meta-redraw-function'

`ediff-mode-hook'

`ediff-prepare-buffer-hook'

`ediff-quit-hook'

`ediff-registry-setup-hook'

`ediff-select-hook'

`ediff-session-action-function'

`ediff-session-group-setup-hook'

`ediff-setup-diff-regions-function'

`ediff-show-registry-hook'

`ediff-show-session-group-hook'

`ediff-skip-diff-region-function'

`ediff-split-window-function'

`ediff-startup-hook'

`ediff-suspend-hook'

`ediff-toggle-read-only-function'

`ediff-unselect-hook'

`ediff-window-setup-function'

`edit-picture-hook'

`electric-buffer-menu-mode-hook'

`electric-command-history-hook'

`electric-help-mode-hook'

`emacs-lisp-mode-hook'

`fill-paragraph-function'

`find-file-hooks'

`find-file-not-found-hooks'

`first-change-hook'

`font-lock-after-fontify-buffer-hook'

`font-lock-beginning-of-syntax-function'

`font-lock-mode-hook'

`fume-found-function-hook'

`fume-list-mode-hook'

`fume-rescan-buffer-hook'

`fume-sort-function'

`gnus-startup-hook'

`hack-local-variables-hook'

`highlight-headers-follow-url-function'

`hyper-apropos-mode-hook'

`indent-line-function'

`indent-mim-hook'

`indent-region-function'

`initial-calendar-window-hook'

`isearch-mode-end-hook'

`isearch-mode-hook'

`java-mode-hook'

`kill-buffer-hook'

`kill-buffer-query-functions'

`kill-emacs-hook'

`kill-emacs-query-functions'

`kill-hooks'

`LaTeX-mode-hook'

`latex-mode-hook'

`ledit-mode-hook'

`lisp-indent-function'

`lisp-interaction-mode-hook'

`lisp-mode-hook'

`list-diary-entries-hook'

`load-read-function'

`log-message-filter-function'

`m2-mode-hook'

`mail-citation-hook'

`mail-mode-hook'

`mail-setup-hook'

`make-annotation-hook'

`makefile-mode-hook'

`map-frame-hook'

`mark-diary-entries-hook'

`medit-mode-hook'

`menu-no-selection-hook'

`mh-compose-letter-hook'

`mh-folder-mode-hook'

`mh-letter-mode-hook'

`mim-mode-hook'

`minibuffer-exit-hook'

`minibuffer-setup-hook'

`mode-motion-hook'

`mouse-enter-frame-hook'

`mouse-leave-frame-hook'

`mouse-track-cleanup-hook'

`mouse-track-click-hook'

`mouse-track-down-hook'

`mouse-track-drag-hook'

`mouse-track-drag-up-hook'

`mouse-track-up-hook'

`mouse-yank-function'

`news-mode-hook'

`news-reply-mode-hook'

`news-setup-hook'

`nongregorian-diary-listing-hook'

`nongregorian-diary-marking-hook'

`nroff-mode-hook'

`objc-mode-hook'

`outline-mode-hook'

`perl-mode-hook'

`plain-TeX-mode-hook'

`post-command-hook'

`post-gc-hook'

`pre-abbrev-expand-hook'

`pre-command-hook'

`pre-display-buffer-function'

`pre-gc-hook'

`pre-idle-hook'

`print-diary-entries-hook'

`prolog-mode-hook'

`protect-innocence-hook'

`remove-message-hook'

`revert-buffer-function'

`revert-buffer-insert-contents-function'

`rmail-edit-mode-hook'

`rmail-mode-hook'

`rmail-retry-setup-hook'

`rmail-summary-mode-hook'

`scheme-indent-hook'

`scheme-mode-hook'

`scribe-mode-hook'

`select-frame-hook'

`send-mail-function'

`shell-mode-hook'

`shell-set-directory-error-hook'

`special-display-function'

`suspend-hook'

`suspend-resume-hook'

`temp-buffer-show-function'

`term-setup-hook'

`terminal-mode-hook'

`terminal-mode-break-hook'

`TeX-mode-hook'

`tex-mode-hook'

`text-mode-hook'

`today-visible-calendar-hook'

`today-invisible-calendar-hook'

`tooltalk-message-handler-hook'

`tooltalk-pattern-handler-hook'

`tooltalk-unprocessed-message-hook'

`unmap-frame-hook'

`vc-checkin-hook'

`vc-checkout-writable-buffer-hook'

`vc-log-after-operation-hook'

`vc-make-buffer-writable-hook'

`view-hook'

`vm-arrived-message-hook'

`vm-arrived-messages-hook'

`vm-chop-full-name-function'

`vm-display-buffer-hook'

`vm-edit-message-hook'

`vm-forward-message-hook'

`vm-iconify-frame-hook'

`vm-inhibit-write-file-hook'

`vm-key-functions'

`vm-mail-hook'

`vm-mail-mode-hook'

`vm-menu-setup-hook'

`vm-mode-hook'

`vm-quit-hook'

`vm-rename-current-buffer-function'

`vm-reply-hook'

`vm-resend-bounced-message-hook'

`vm-resend-message-hook'

`vm-retrieved-spooled-mail-hook'

`vm-select-message-hook'

`vm-select-new-message-hook'

`vm-select-unread-message-hook'

`vm-send-digest-hook'

`vm-summary-mode-hook'

`vm-summary-pointer-update-hook'

`vm-summary-redo-hook'

`vm-summary-update-hook'

`vm-undisplay-buffer-hook'

`vm-visit-folder-hook'

`window-setup-hook'

`write-contents-hooks'

`write-file-data-hooks'

`write-file-hooks'

`write-region-annotate-functions'

`x-lost-selection-hooks'

`x-sent-selection-hooks'

`zmacs-activate-region-hook'

`zmacs-deactivate-region-hook'

`zmacs-update-region-hook'