This is ../info/lispref.info, produced by makeinfo version 4.6 from lispref/lispref.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Lispref: (lispref). XEmacs Lisp Reference Manual. END-INFO-DIR-ENTRY Edition History: GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May, November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc. Copyright (C) 1995, 1996 Ben Wing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: lispref.info, Node: The LDAP Lisp Object, Next: Opening and Closing a LDAP Connection, Prev: The Low-Level LDAP API, Up: The Low-Level LDAP API The LDAP Lisp Object .................... An internal built-in `ldap' lisp object represents a LDAP connection. - Function: ldapp object This function returns non-`nil' if OBJECT is a `ldap' object. - Function: ldap-host ldap Return the server host of the connection represented by LDAP. - Function: ldap-live-p ldap Return non-`nil' if LDAP is an active LDAP connection.  File: lispref.info, Node: Opening and Closing a LDAP Connection, Next: Low-level Operations on a LDAP Server, Prev: The LDAP Lisp Object, Up: The Low-Level LDAP API Opening and Closing a LDAP Connection ..................................... - Function: ldap-open host &optional plist Open a LDAP connection to HOST. PLIST is a property list containing additional parameters for the connection. Valid keys in that list are: `port' The TCP port to use for the connection if different from `ldap-default-port' or the library builtin value `auth' The authentication method to use, possible values depend on the LDAP library XEmacs was compiled with, they may include `simple', `krbv41' and `krbv42'. `binddn' The distinguished name of the user to bind as. This may look like `c=com, o=Acme, cn=Babs Jensen', see RFC 1779 for details. `passwd' The password to use for authentication. `deref' The dereference policy is one of the symbols `never', `always', `search' or `find' and defines how aliases are dereferenced. `never' Aliases are never dereferenced. `always' Aliases are always dereferenced. `search' Aliases are dereferenced when searching. `find' Aliases are dereferenced when locating the base object for the search. The default is `never'. `timelimit' The timeout limit for the connection in seconds. `sizelimit' The maximum number of matches to return for searches performed on this connection. - Function: ldap-close ldap Close the connection represented by LDAP.  File: lispref.info, Node: Low-level Operations on a LDAP Server, Prev: Opening and Closing a LDAP Connection, Up: The Low-Level LDAP API Low-level Operations on a LDAP Server ..................................... `ldap-search-basic' is the low-level primitive to perform a search on a LDAP server. It works directly on an open LDAP connection thus requiring a preliminary call to `ldap-open'. Multiple searches can be made on the same connection, then the session must be closed with `ldap-close'. - Function: ldap-search-basic ldap filter &optional base scope attrs attrsonly withdn verbose Perform a search on an open connection LDAP created with `ldap-open'. FILTER is a filter string for the search *note Syntax of Search Filters:: BASE is the distinguished name at which to start the search. SCOPE is one of the symbols `base', `onelevel' or `subtree' indicating the scope of the search limited to a base object, to a single level or to the whole subtree. The default is `subtree'. ATTRS is a list of strings indicating which attributes to retrieve for each matching entry. If `nil' all available attributes are returned. If ATTRSONLY is non-`nil' then only the attributes are retrieved, not their associated values. If WITHDN is non-`nil' then each entry in the result is prepended with its distinguished name DN. If VERBOSE is non-`nil' then progress messages are echoed The function returns a list of matching entries. Each entry is itself an alist of attribute/value pairs optionally preceded by the DN of the entry according to the value of WITHDN. - Function: ldap-add ldap dn entry Add ENTRY to a LDAP directory which a connection LDAP has been opened to with `ldap-open'. DN is the distinguished name of the entry to add. ENTRY is an entry specification, i.e., a list of cons cells containing attribute/value string pairs. - Function: ldap-modify ldap dn mods Modify an entry in an LDAP directory. LDAP is an LDAP connection object created with `ldap-open'. DN is the distinguished name of the entry to modify. MODS is a list of modifications to apply. A modification is a list of the form `(MOD-OP ATTR VALUE1 VALUE2 ...)' MOD-OP and ATTR are mandatory, VALUES are optional depending on MOD-OP. MOD-OP is the type of modification, one of the symbols `add', `delete' or `replace'. ATTR is the LDAP attribute type to modify. - Function: ldap-delete ldap dn Delete an entry to an LDAP directory. LDAP is an LDAP connection object created with `ldap-open'. DN is the distinguished name of the entry to delete.  File: lispref.info, Node: LDAP Internationalization, Prev: The Low-Level LDAP API, Up: XEmacs LDAP API LDAP Internationalization ------------------------- The XEmacs LDAP API provides basic internationalization features based on the LDAP v3 specification (essentially RFC2252 on "LDAP v3 Attribute Syntax Definitions"). Unfortunately since there is currently no free LDAP v3 server software, this part has not received much testing and should be considered experimental. The framework is in place though. - Function: ldap-decode-attribute attr Decode the attribute/value pair ATTR according to LDAP rules. The attribute name is looked up in `ldap-attribute-syntaxes-alist' and the corresponding decoder is then retrieved from `ldap-attribute-syntax-decoders'' and applied on the value(s). * Menu: * LDAP Internationalization Variables:: * Encoder/Decoder Functions::  File: lispref.info, Node: LDAP Internationalization Variables, Next: Encoder/Decoder Functions, Prev: LDAP Internationalization, Up: LDAP Internationalization LDAP Internationalization Variables ................................... - Variable: ldap-ignore-attribute-codings If non-`nil', no encoding/decoding will be performed LDAP attribute values - Variable: ldap-coding-system Coding system of LDAP string values. LDAP v3 specifies the coding system of strings to be UTF-8. You need an XEmacs with Mule support for this. - Variable: ldap-default-attribute-decoder Decoder function to use for attributes whose syntax is unknown. Such a function receives an encoded attribute value as a string and should return the decoded value as a string. - Variable: ldap-attribute-syntax-encoders A vector of functions used to encode LDAP attribute values. The sequence of functions corresponds to the sequence of LDAP attribute syntax object identifiers of the form 1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2. As of this writing, only a few encoder functions are available. - Variable: ldap-attribute-syntax-decoders A vector of functions used to decode LDAP attribute values. The sequence of functions corresponds to the sequence of LDAP attribute syntax object identifiers of the form 1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2. As of this writing, only a few decoder functions are available. - Variable: ldap-attribute-syntaxes-alist A map of LDAP attribute names to their type object id minor number. This table is built from RFC2252 Section 5 and RFC2256 Section 5.  File: lispref.info, Node: Encoder/Decoder Functions, Prev: LDAP Internationalization Variables, Up: LDAP Internationalization Encoder/Decoder Functions ......................... - Function: ldap-encode-boolean bool A function that encodes an elisp boolean BOOL into a LDAP boolean string representation. - Function: ldap-decode-boolean str A function that decodes a LDAP boolean string representation STR into an elisp boolean. - Function: ldap-decode-string str Decode a string STR according to `ldap-coding-system'. - Function: ldap-encode-string str Encode a string STR according to `ldap-coding-system'. - Function: ldap-decode-address str Decode an address STR according to `ldap-coding-system' and replacing $ signs with newlines as specified by LDAP encoding rules for addresses. - Function: ldap-encode-address str Encode an address STR according to `ldap-coding-system' and replacing newlines with $ signs as specified by LDAP encoding rules for addresses.  File: lispref.info, Node: Syntax of Search Filters, Prev: XEmacs LDAP API, Up: LDAP Support Syntax of Search Filters ======================== LDAP search functions use RFC1558 syntax to describe the search filter. In that syntax simple filters have the form: ( ) `' is an attribute name such as `cn' for Common Name, `o' for Organization, etc... `' is the corresponding value. This is generally an exact string but may also contain `*' characters as wildcards `filtertype' is one `=' `~=', `<=', `>=' which respectively describe equality, approximate equality, inferiority and superiority. Thus `(cn=John Smith)' matches all records having a canonical name equal to John Smith. A special case is the presence filter `(=*' which matches records containing a particular attribute. For instance `(mail=*)' matches all records containing a `mail' attribute. Simple filters can be connected together with the logical operators `&', `|' and `!' which stand for the usual and, or and not operators. `(&(objectClass=Person)(mail=*)(|(sn=Smith)(givenname=John)))' matches records of class `Person' containing a `mail' attribute and corresponding to people whose last name is `Smith' or whose first name is `John'.  File: lispref.info, Node: PostgreSQL Support, Next: Internationalization, Prev: LDAP Support, Up: Top PostgreSQL Support ****************** XEmacs can be linked with PostgreSQL libpq run-time support to provide relational database access from Emacs Lisp code. * Menu: * Building XEmacs with PostgreSQL support:: * XEmacs PostgreSQL libpq API:: * XEmacs PostgreSQL libpq Examples::  File: lispref.info, Node: Building XEmacs with PostgreSQL support, Next: XEmacs PostgreSQL libpq API, Up: PostgreSQL Support Building XEmacs with PostgreSQL support ======================================= XEmacs PostgreSQL support requires linking to the PostgreSQL libpq library. Describing how to build and install PostgreSQL is beyond the scope of this document. See the PostgreSQL manual for details. If you have installed XEmacs from one of the binary kits on (), or are using an XEmacs binary from a CD ROM, you may have XEmacs PostgreSQL support by default. `M-x describe-installation' will tell you if you do. If you are building XEmacs from source, you need to install PostgreSQL first. On some systems, PostgreSQL will come pre-installed in /usr. In this case, it should be autodetected when you run configure. If PostgreSQL is installed into its default location, `/usr/local/pgsql', you must specify `--site-prefixes=/usr/local/pgsql' when you run configure. If PostgreSQL is installed into another location, use that instead of `/usr/local/pgsql' when specifying `--site-prefixes'. As of XEmacs 21.2, PostgreSQL versions 6.5.3 and 7.0 are supported. XEmacs Lisp support for V7.0 is somewhat more extensive than support for V6.5. In particular, asynchronous queries are supported.  File: lispref.info, Node: XEmacs PostgreSQL libpq API, Next: XEmacs PostgreSQL libpq Examples, Prev: Building XEmacs with PostgreSQL support, Up: PostgreSQL Support XEmacs PostgreSQL libpq API =========================== The XEmacs PostgreSQL API is intended to be a policy-free, low-level binding to libpq. The intent is to provide all the basic functionality and then let high level Lisp code decide its own policies. This documentation assumes that the reader has knowledge of SQL, but requires no prior knowledge of libpq. There are many examples in this manual and some setup will be required. In order to run most of the following examples, the following code needs to be executed. In addition to the data is in this table, nearly all of the examples will assume that the free variable `P' refers to this database connection. The examples in the original edition of this manual were run against Postgres 7.0beta1. (progn (setq P (pq-connectdb "")) ;; id is the primary key, shikona is a Japanese word that ;; means `the professional name of a Sumo wrestler', and ;; rank is the Sumo rank name. (pq-exec P (concat "CREATE TABLE xemacs_test" " (id int, shikona text, rank text);")) (pq-exec P "COPY xemacs_test FROM stdin;") (pq-put-line P "1\tMusashimaru\tYokuzuna\n") (pq-put-line P "2\tDejima\tOozeki\n") (pq-put-line P "3\tMusoyama\tSekiwake\n") (pq-put-line P "4\tMiyabiyama\tSekiwake\n") (pq-put-line P "5\tWakanoyama\tMaegashira\n") (pq-put-line P "\\.\n") (pq-end-copy P)) => nil * Menu: * libpq Lisp Variables:: * libpq Lisp Symbols and DataTypes:: * Synchronous Interface Functions:: * Asynchronous Interface Functions:: * Large Object Support:: * Other libpq Functions:: * Unimplemented libpq Functions::  File: lispref.info, Node: libpq Lisp Variables, Next: libpq Lisp Symbols and DataTypes, Prev: XEmacs PostgreSQL libpq API, Up: XEmacs PostgreSQL libpq API libpq Lisp Variables -------------------- Various Unix environment variables are used by libpq to provide defaults to the many different parameters. In the XEmacs Lisp API, these environment variables are bound to Lisp variables to provide more convenient access to Lisp Code. These variables are passed to the backend database server during the establishment of a database connection and when the `pq-setenv' call is made. - Variable: pg:host Initialized from the `PGHOST' environment variable. The default host to connect to. - Variable: pg:user Initialized from the `PGUSER' environment variable. The default database user name. - Variable: pg:options Initialized from the `PGOPTIONS' environment variable. Default additional server options. - Variable: pg:port Initialized from the `PGPORT' environment variable. The default TCP port to connect to. - Variable: pg:tty Initialized from the `PGTTY' environment variable. The default debugging TTY. Compatibility note: Debugging TTYs are turned off in the XEmacs Lisp binding. - Variable: pg:database Initialized from the `PGDATABASE' environment variable. The default database to connect to. - Variable: pg:realm Initialized from the `PGREALM' environment variable. The default Kerberos realm. - Variable: pg:client-encoding Initialized from the `PGCLIENTENCODING' environment variable. The default client encoding. Compatibility note: This variable is not present in non-Mule XEmacsen. This variable is not present in versions of libpq prior to 7.0. In the current implementation, client encoding is equivalent to the `file-name-coding-system' format. - Variable: pg:authtype Initialized from the `PGAUTHTYPE' environment variable. The default authentication scheme used. Compatibility note: This variable is unused in versions of libpq after 6.5. It is not implemented at all in the XEmacs Lisp binding. - Variable: pg:geqo Initialized from the `PGGEQO' environment variable. Genetic optimizer options. - Variable: pg:cost-index Initialized from the `PGCOSTINDEX' environment variable. Cost index options. - Variable: pg:cost-heap Initialized from the `PGCOSTHEAP' environment variable. Cost heap options. - Variable: pg:tz Initialized from the `PGTZ' environment variable. Default timezone. - Variable: pg:date-style Initialized from the `PGDATESTYLE' environment variable. Default date style in returned date objects. - Variable: pg-coding-system This is a variable controlling which coding system is used to encode non-ASCII strings sent to the database. Compatibility Note: This variable is not present in InfoDock.  File: lispref.info, Node: libpq Lisp Symbols and DataTypes, Next: Synchronous Interface Functions, Prev: libpq Lisp Variables, Up: XEmacs PostgreSQL libpq API libpq Lisp Symbols and Datatypes -------------------------------- The following set of symbols are used to represent the intermediate states involved in the asynchronous interface. - Symbol: pgres::polling-failed Undocumented. A fatal error has occurred during processing of an asynchronous operation. - Symbol: pgres::polling-reading An intermediate status return during an asynchronous operation. It indicates that one may use `select' before polling again. - Symbol: pgres::polling-writing An intermediate status return during an asynchronous operation. It indicates that one may use `select' before polling again. - Symbol: pgres::polling-ok An asynchronous operation has successfully completed. - Symbol: pgres::polling-active An intermediate status return during an asynchronous operation. One can call the poll function again immediately. - Function: pq-pgconn conn field CONN A database connection object. FIELD A symbol indicating which field of PGconn to fetch. Possible values are shown in the following table. `pq::db' Database name `pq::user' Database user name `pq::pass' Database user's password `pq::host' Hostname database server is running on `pq::port' TCP port number used in the connection `pq::tty' Debugging TTY Compatibility note: Debugging TTYs are not used in the XEmacs Lisp API. `pq::options' Additional server options `pq::status' Connection status. Possible return values are shown in the following table. `pg::connection-ok' The normal, connected status. `pg::connection-bad' The connection is not open and the PGconn object needs to be deleted by `pq-finish'. `pg::connection-started' An asynchronous connection has been started, but is not yet complete. `pg::connection-made' An asynchronous connect has been made, and there is data waiting to be sent. `pg::connection-awaiting-response' Awaiting data from the backend during an asynchronous connection. `pg::connection-auth-ok' Received authentication, waiting for the backend to start up. `pg::connection-setenv' Negotiating environment during an asynchronous connection. `pq::error-message' The last error message that was delivered to this connection. `pq::backend-pid' The process ID of the backend database server. The `PGresult' object is used by libpq to encapsulate the results of queries. The printed representation takes on four forms. When the PGresult object contains tuples from an SQL `SELECT' it will look like: (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # The number in brackets indicates how many rows of data are available. When the PGresult object is the result of a command query that doesn't return anything, it will look like: (pq-exec P "CREATE TABLE a_new_table (i int);") => # When either the query is a command-type query that can affect a number of different rows, but doesn't return any of them it will look like: (progn (pq-exec P "INSERT INTO a_new_table VALUES (1);") (pq-exec P "INSERT INTO a_new_table VALUES (2);") (pq-exec P "INSERT INTO a_new_table VALUES (3);") (setq R (pq-exec P "DELETE FROM a_new_table;"))) => # Lastly, when the underlying PGresult object has been deallocated directly by `pq-clear' the printed representation will look like: (progn (setq R (pq-exec P "SELECT * FROM xemacs_test;")) (pq-clear R) R) => # The following set of functions are accessors to various data in the PGresult object. - Function: pq-result-status result Return status of a query result. RESULT is a PGresult object. The return value is one of the symbols in the following table. `pgres::empty-query' A query contained no text. This is usually the result of a recoverable error, or a minor programming error. `pgres::command-ok' A query command that doesn't return anything was executed properly by the backend. `pgres::tuples-ok' A query command that returns tuples was executed properly by the backend. `pgres::copy-out' Copy Out data transfer is in progress. `pgres::copy-in' Copy In data transfer is in progress. `pgres::bad-response' An unexpected response was received from the backend. `pgres::nonfatal-error' Undocumented. This value is returned when the libpq function `PQresultStatus' is called with a `NULL' pointer. `pgres::fatal-error' Undocumented. An error has occurred in processing the query and the operation was not completed. - Function: pq-res-status result Return the query result status as a string, not a symbol. RESULT is a PGresult object. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-res-status R) => "PGRES_TUPLES_OK" - Function: pq-result-error-message result Return an error message generated by the query, if any. RESULT is a PGresult object. (setq R (pq-exec P "SELECT * FROM xemacs-test;")) => (pq-result-error-message R) => "ERROR: parser: parse error at or near \"-\" " - Function: pq-ntuples result Return the number of tuples in the query result. RESULT is a PGresult object. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-ntuples R) => 5 - Function: pq-nfields result Return the number of fields in each tuple of the query result. RESULT is a PGresult object. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-nfields R) => 3 - Function: pq-binary-tuples result Returns t if binary tuples are present in the results, nil otherwise. RESULT is a PGresult object. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-binary-tuples R) => nil - Function: pq-fname result field-index Returns the name of a specific field. RESULT is a PGresult object. FIELD-INDEX is the number of the column to select from. The first column is number zero. (let (i l) (setq R (pq-exec P "SELECT * FROM xemacs_test;")) (setq i (pq-nfields R)) (while (>= (decf i) 0) (push (pq-fname R i) l)) l) => ("id" "shikona" "rank") - Function: pq-fnumber result field-name Return the field number corresponding to the given field name. -1 is returned on a bad field name. RESULT is a PGresult object. FIELD-NAME is a string representing the field name to find. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-fnumber R "id") => 0 (pq-fnumber R "Not a field") => -1 - Function: pq-ftype result field-num Return an integer code representing the data type of the specified column. RESULT is a PGresult object. FIELD-NUM is the field number. The return value of this function is the Object ID (Oid) in the database of the type. Further queries need to be made to various system tables in order to convert this value into something useful. - Function: pq-fmod result field-num Return the type modifier code associated with a field. Field numbers start at zero. RESULT is a PGresult object. FIELD-INDEX selects which field to use. - Function: pq-fsize result field-index Return size of the given field. RESULT is a PGresult object. FIELD-INDEX selects which field to use. (let (i l) (setq R (pq-exec P "SELECT * FROM xemacs_test;")) (setq i (pq-nfields R)) (while (>= (decf i) 0) (push (list (pq-ftype R i) (pq-fsize R i)) l)) l) => ((23 23) (25 25) (25 25)) - Function: pq-get-value result tup-num field-num Retrieve a return value. RESULT is a PGresult object. TUP-NUM selects which tuple to fetch from. FIELD-NUM selects which field to fetch from. Both tuples and fields are numbered from zero. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-get-value R 0 1) => "Musashimaru" (pq-get-value R 1 1) => "Dejima" (pq-get-value R 2 1) => "Musoyama" - Function: pq-get-length result tup-num field-num Return the length of a specific value. RESULT is a PGresult object. TUP-NUM selects which tuple to fetch from. FIELD-NUM selects which field to fetch from. (setq R (pq-exec P "SELECT * FROM xemacs_test;")) => # (pq-get-length R 0 1) => 11 (pq-get-length R 1 1) => 6 (pq-get-length R 2 1) => 8 - Function: pq-get-is-null result tup-num field-num Return t if the specific value is the SQL `NULL'. RESULT is a PGresult object. TUP-NUM selects which tuple to fetch from. FIELD-NUM selects which field to fetch from. - Function: pq-cmd-status result Return a summary string from the query. RESULT is a PGresult object. (setq R (pq-exec P "INSERT INTO xemacs_test VALUES (6, 'Wakanohana', 'Yokozuna');")) => # (pq-cmd-status R) => "INSERT 542086 1" (setq R (pq-exec P "UPDATE xemacs_test SET rank='retired' WHERE shikona='Wakanohana';")) => # (pq-cmd-status R) => "UPDATE 1" Note that the first number returned from an insertion, like in the example, is an object ID number and will almost certainly vary from system to system since object ID numbers in Postgres must be unique across all databases. - Function: pq-cmd-tuples result Return the number of tuples if the last command was an INSERT/UPDATE/DELETE. If the last command was something else, the empty string is returned. RESULT is a PGresult object. (setq R (pq-exec P "INSERT INTO xemacs_test VALUES (7, 'Takanohana', 'Yokuzuna');")) => # (pq-cmd-tuples R) => "1" (setq R (pq-exec P "SELECT * from xemacs_test;")) => # (pq-cmd-tuples R) => "" (setq R (pq-exec P "DELETE FROM xemacs_test WHERE shikona LIKE '%hana';")) => # (pq-cmd-tuples R) => "2" - Function: pq-oid-value result Return the object id of the insertion if the last command was an INSERT. 0 is returned if the last command was not an insertion. RESULT is a PGresult object. In the first example, the numbers you will see on your local system will almost certainly be different, however the second number from the right in the unprintable PGresult object and the number returned by `pq-oid-value' should match. (setq R (pq-exec P "INSERT INTO xemacs_test VALUES (8, 'Terao', 'Maegashira');")) => # (pq-oid-value R) => 542089 (setq R (pq-exec P "SELECT shikona FROM xemacs_test WHERE rank='Maegashira';")) => # (pq-oid-value R) => 0 - Function: pq-make-empty-pgresult conn status Create an empty pgresult with the given status. CONN a database connection object STATUS a value that can be returned by `pq-result-status'. The caller is responsible for making sure the return value gets properly freed.  File: lispref.info, Node: Synchronous Interface Functions, Next: Asynchronous Interface Functions, Prev: libpq Lisp Symbols and DataTypes, Up: XEmacs PostgreSQL libpq API Synchronous Interface Functions ------------------------------- - Function: pq-connectdb conninfo Establish a (synchronous) database connection. CONNINFO A string of blank separated options. Options are of the form "OPTION = VALUE". If VALUE contains blanks, it must be single quoted. Blanks around the equal sign are optional. Multiple option assignments are blank separated. (pq-connectdb "dbname=japanese port = 25432") => # The printed representation of a database connection object has four fields. The first field is the hostname where the database server is running (in this case localhost), the second field is the port number, the third field is the database user name, and the fourth field is the name of the database. Database connection objects which have been disconnected and will generate an immediate error if they are used look like: # Bad connections can be reestablished with `pq-reset', or deleted entirely with `pq-finish'. A database connection object that has been deleted looks like: (let ((P1 (pq-connectdb ""))) (pq-finish P1) P1) => # Note that database connection objects are the most heavy weight objects in XEmacs Lisp at this writing, usually representing as much as several megabytes of virtual memory on the machine the database server is running on. It is wisest to explicitly delete them when you are finished with them, rather than letting garbage collection do it. An example idiom is: (let ((P (pq-connectiondb ""))) (unwind-protect (progn (...)) ; access database here (pq-finish P))) The following options are available in the options string: `authtype' Authentication type. Same as `PGAUTHTYPE'. This is no longer used. `user' Database user name. Same as `PGUSER'. `password' Database password. `dbname' Database name. Same as `PGDATABASE' `host' Symbolic hostname. Same as `PGHOST'. `hostaddr' Host address as four octets (eg. like 192.168.1.1). `port' TCP port to connect to. Same as `PGPORT'. `tty' Debugging TTY. Same as `PGTTY'. This value is suppressed in the XEmacs Lisp API. `options' Extra backend database options. Same as `PGOPTIONS'. A database connection object is returned regardless of whether a connection was established or not. - Function: pq-reset conn Reestablish database connection. CONN A database connection object. This function reestablishes a database connection using the original connection parameters. This is useful if something has happened to the TCP link and it has become broken. - Function: pq-exec conn query Make a synchronous database query. CONN A database connection object. QUERY A string containing an SQL query. A PGresult object is returned, which in turn may be queried by its many accessor functions to retrieve state out of it. If the query string contains multiple SQL commands, only results from the final command are returned. (setq R (pq-exec P "SELECT * FROM xemacs_test; DELETE FROM xemacs_test WHERE id=8;")) => # - Function: pq-notifies conn Return the latest async notification that has not yet been handled. CONN A database connection object. If there has been a notification, then a list of two elements will be returned. The first element contains the relation name being notified, the second element contains the backend process ID number. nil is returned if there aren't any notifications to process. - Function: PQsetenv conn Synchronous transfer of environment variables to a backend CONN A database connection object. Environment variable transfer is done as a normal part of database connection. Compatibility note: This function was present but not documented in versions of libpq prior to 7.0.  File: lispref.info, Node: Asynchronous Interface Functions, Next: Large Object Support, Prev: Synchronous Interface Functions, Up: XEmacs PostgreSQL libpq API Asynchronous Interface Functions -------------------------------- Making command by command examples is too complex with the asynchronous interface functions. See the examples section for complete calling sequences. - Function: pq-connect-start conninfo Begin establishing an asynchronous database connection. CONNINFO A string containing the connection options. See the documentation of `pq-connectdb' for a listing of all the available flags. - Function: pq-connect-poll conn An intermediate function to be called during an asynchronous database connection. CONN A database connection object. The result codes are documented in a previous section. - Function: pq-is-busy conn Returns t if `pq-get-result' would block waiting for input. CONN A database connection object. - Function: pq-consume-input conn Consume any available input from the backend. CONN A database connection object. Nil is returned if anything bad happens. - Function: pq-reset-start conn Reset connection to the backend asynchronously. CONN A database connection object. - Function: pq-reset-poll conn Poll an asynchronous reset for completion CONN A database connection object. - Function: pq-reset-cancel conn Attempt to request cancellation of the current operation. CONN A database connection object. The return value is t if the cancel request was successfully dispatched, nil if not (in which case conn->errorMessage is set). Note: successful dispatch is no guarantee that there will be any effect at the backend. The application must read the operation result as usual. - Function: pq-send-query conn query Submit a query to Postgres and don't wait for the result. CONN A database connection object. Returns: t if successfully submitted nil if error (conn->errorMessage is set) - Function: pq-get-result conn Retrieve an asynchronous result from a query. CONN A database connection object. `nil' is returned when no more query work remains. - Function: pq-set-nonblocking conn arg Sets the PGconn's database connection non-blocking if the arg is TRUE or makes it non-blocking if the arg is FALSE, this will not protect you from PQexec(), you'll only be safe when using the non-blocking API. CONN A database connection object. - Function: pq-is-nonblocking conn Return the blocking status of the database connection CONN A database connection object. - Function: pq-flush conn Force the write buffer to be written (or at least try) CONN A database connection object. - Function: PQsetenvStart conn Start asynchronously passing environment variables to a backend. CONN A database connection object. Compatibility note: this function is only available with libpq-7.0. - Function: PQsetenvPoll conn Check an asynchronous environment variables transfer for completion. CONN A database connection object. Compatibility note: this function is only available with libpq-7.0. - Function: PQsetenvAbort conn Attempt to terminate an asynchronous environment variables transfer. CONN A database connection object. Compatibility note: this function is only available with libpq-7.0.  File: lispref.info, Node: Large Object Support, Next: Other libpq Functions, Prev: Asynchronous Interface Functions, Up: XEmacs PostgreSQL libpq API Large Object Support -------------------- - Function: pq-lo-import conn filename Import a file as a large object into the database. CONN a database connection object FILENAME filename to import On success, the object id is returned. - Function: pq-lo-export conn oid filename Copy a large object in the database into a file. CONN a database connection object. OID object id number of a large object. FILENAME filename to export to.  File: lispref.info, Node: Other libpq Functions, Next: Unimplemented libpq Functions, Prev: Large Object Support, Up: XEmacs PostgreSQL libpq API Other libpq Functions --------------------- - Function: pq-finish conn Destroy a database connection object by calling free on it. CONN a database connection object It is possible to not call this routine because the usual XEmacs garbage collection mechanism will call the underlying libpq routine whenever it is releasing stale `PGconn' objects. However, this routine is useful in `unwind-protect' clauses to make connections go away quickly when unrecoverable errors have occurred. After calling this routine, the printed representation of the XEmacs wrapper object will contain the string "DEAD". - Function: pq-client-encoding conn Return the client encoding as an integer code. CONN a database connection object (pq-client-encoding P) => 1 Compatibility note: This function did not exist prior to libpq-7.0 and does not exist in a non-Mule XEmacs. - Function: pq-set-client-encoding conn encoding Set client coding system. CONN a database connection object ENCODING a string representing the desired coding system (pq-set-client-encoding P "EUC_JP") => 0 The current idiom for ensuring proper coding system conversion is the following (illustrated for EUC Japanese encoding): (setq P (pq-connectdb "...")) (let ((file-name-coding-system 'euc-jp) (pg-coding-system 'euc-jp)) (pq-set-client-encoding "EUC_JP") ...) (pq-finish P) Compatibility note: This function did not exist prior to libpq-7.0 and does not exist in a non-Mule XEmacs. - Function: pq-env-2-encoding Return the integer code representing the coding system in `PGCLIENTENCODING'. (pq-env-2-encoding) => 0 Compatibility note: This function did not exist prior to libpq-7.0 and does not exist in a non-Mule XEmacs. - Function: pq-clear res Destroy a query result object by calling free() on it. RES a query result object Note: The memory allocation systems of libpq and XEmacs are different. The XEmacs representation of a query result object will have both the XEmacs version and the libpq version freed at the next garbage collection when the object is no longer being referenced. Calling this function does not release the XEmacs object, it is still subject to the usual rules for Lisp objects. The printed representation of the XEmacs object will contain the string "DEAD" after this routine is called indicating that it is no longer useful for anything. - Function: pq-conn-defaults Return a data structure that represents the connection defaults. The data is returned as a list of lists, where each sublist contains info regarding a single option.  File: lispref.info, Node: Unimplemented libpq Functions, Prev: Other libpq Functions, Up: XEmacs PostgreSQL libpq API Unimplemented libpq Functions ----------------------------- - Unimplemented Function: PGconn *PQsetdbLogin (char *pghost, char *pgport, char *pgoptions, char *pgtty, char *dbName, char *login, char *pwd) Synchronous database connection. PGHOST is the hostname of the PostgreSQL backend to connect to. PGPORT is the TCP port number to use. PGOPTIONS specifies other backend options. PGTTY specifies the debugging tty to use. DBNAME specifies the database name to use. LOGIN specifies the database user name. PWD specifies the database user's password. This routine is deprecated as of libpq-7.0, and its functionality can be replaced by external Lisp code if needed. - Unimplemented Function: PGconn *PQsetdb (char *pghost, char *pgport, char *pgoptions, char *pgtty, char *dbName) Synchronous database connection. PGHOST is the hostname of the PostgreSQL backend to connect to. PGPORT is the TCP port number to use. PGOPTIONS specifies other backend options. PGTTY specifies the debugging tty to use. DBNAME specifies the database name to use. This routine was deprecated in libpq-6.5. - Unimplemented Function: int PQsocket (PGconn *conn) Return socket file descriptor to a backend database process. CONN database connection object. - Unimplemented Function: void PQprint (FILE *fout, PGresult *res, PGprintOpt *ps) Print out the results of a query to a designated C stream. FOUT C stream to print to RES the query result object to print PS the print options structure. This routine is deprecated as of libpq-7.0 and cannot be sensibly exported to XEmacs Lisp. - Unimplemented Function: void PQdisplayTuples (PGresult *res, FILE *fp, int fillAlign, char *fieldSep, int printHeader, int quiet) RES query result object to print FP C stream to print to FILLALIGN pad the fields with spaces FIELDSEP field separator PRINTHEADER display headers? QUIET This routine was deprecated in libpq-6.5. - Unimplemented Function: void PQprintTuples (PGresult *res, FILE *fout, int printAttName, int terseOutput, int width) RES query result object to print FOUT C stream to print to PRINTATTNAME print attribute names TERSEOUTPUT delimiter bars WIDTH width of column, if 0, use variable width This routine was deprecated in libpq-6.5. - Unimplemented Function: int PQmblen (char *s, int encoding) Determine length of a multibyte encoded char at `*s'. S encoded string ENCODING type of encoding Compatibility note: This function was introduced in libpq-7.0. - Unimplemented Function: void PQtrace (PGconn *conn, FILE *debug_port) Enable tracing on `debug_port'. CONN database connection object. DEBUG_PORT C output stream to use. - Unimplemented Function: void PQuntrace (PGconn *conn) Disable tracing. CONN database connection object. - Unimplemented Function: char *PQoidStatus (PGconn *conn) Return the object id as a string of the last tuple inserted. CONN database connection object. Compatibility note: This function is deprecated in libpq-7.0, however it is used internally by the XEmacs binding code when linked against versions prior to 7.0. - Unimplemented Function: PGresult *PQfn (PGconn *conn, int fnid, int *result_buf, int *result_len, int result_is_int, PQArgBlock *args, int nargs) "Fast path" interface -- not really recommended for application use CONN A database connection object. FNID RESULT_BUF RESULT_LEN RESULT_IS_INT ARGS NARGS The following set of very low level large object functions aren't appropriate to be exported to Lisp. - Unimplemented Function: int pq-lo-open (PGconn *conn, int lobjid, int mode) CONN a database connection object. LOBJID a large object ID. MODE opening modes. - Unimplemented Function: int pq-lo-close (PGconn *conn, int fd) CONN a database connection object. FD a large object file descriptor - Unimplemented Function: int pq-lo-read (PGconn *conn, int fd, char *buf, int len) CONN a database connection object. FD a large object file descriptor. BUF buffer to read into. LEN size of buffer. - Unimplemented Function: int pq-lo-write (PGconn *conn, int fd, char *buf, size_t len) CONN a database connection object. FD a large object file descriptor. BUF buffer to write from. LEN size of buffer. - Unimplemented Function: int pq-lo-lseek (PGconn *conn, int fd, int offset, int whence) CONN a database connection object. FD a large object file descriptor. OFFSET WHENCE - Unimplemented Function: int pq-lo-creat (PGconn *conn, int mode) CONN a database connection object. MODE opening modes. - Unimplemented Function: int pq-lo-tell (PGconn *conn, int fd) CONN a database connection object. FD a large object file descriptor. - Unimplemented Function: int pq-lo-unlink (PGconn *conn, int lobjid) CONN a database connection object. LBOJID a large object ID.  File: lispref.info, Node: XEmacs PostgreSQL libpq Examples, Prev: XEmacs PostgreSQL libpq API, Up: PostgreSQL Support XEmacs PostgreSQL libpq Examples ================================ This is an example of one method of establishing an asynchronous connection. (defun database-poller (P) (message "%S before poll" (pq-pgconn P 'pq::status)) (pq-connect-poll P) (message "%S after poll" (pq-pgconn P 'pq::status)) (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok) (message "Done!") (add-timeout .1 'database-poller P))) => database-poller (progn (setq P (pq-connect-start "")) (add-timeout .1 'database-poller P)) => pg::connection-started before poll => pg::connection-made after poll => pg::connection-made before poll => pg::connection-awaiting-response after poll => pg::connection-awaiting-response before poll => pg::connection-auth-ok after poll => pg::connection-auth-ok before poll => pg::connection-setenv after poll => pg::connection-setenv before poll => pg::connection-ok after poll => Done! P => # Here is an example of one method of doing an asynchronous reset. (defun database-poller (P) (let (PS) (message "%S before poll" (pq-pgconn P 'pq::status)) (setq PS (pq-reset-poll P)) (message "%S after poll [%S]" (pq-pgconn P 'pq::status) PS) (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok) (message "Done!") (add-timeout .1 'database-poller P)))) => database-poller (progn (pq-reset-start P) (add-timeout .1 'database-poller P)) => pg::connection-started before poll => pg::connection-made after poll [pgres::polling-writing] => pg::connection-made before poll => pg::connection-awaiting-response after poll [pgres::polling-reading] => pg::connection-awaiting-response before poll => pg::connection-setenv after poll [pgres::polling-reading] => pg::connection-setenv before poll => pg::connection-ok after poll [pgres::polling-ok] => Done! P => # And finally, an asynchronous query. (defun database-poller (P) (let (R) (pq-consume-input P) (if (pq-is-busy P) (add-timeout .1 'database-poller P) (setq R (pq-get-result P)) (if R (progn (push R result-list) (add-timeout .1 'database-poller P)))))) => database-poller (when (pq-send-query P "SELECT * FROM xemacs_test;") (setq result-list nil) (add-timeout .1 'database-poller P)) => 885 ;; wait a moment result-list => (#) Here is an example showing how multiple SQL statements in a single query can have all their results collected. ;; Using the same `database-poller' function from the previous example (when (pq-send-query P "SELECT * FROM xemacs_test; SELECT * FROM pg_database; SELECT * FROM pg_user;") (setq result-list nil) (add-timeout .1 'database-poller P)) => 1782 ;; wait a moment result-list => (# # #) Here is an example which illustrates collecting all data from a query, including the field names. (defun pg-util-query-results (results) "Retrieve results of last SQL query into a list structure." (let ((i (1- (pq-ntuples R))) j l1 l2) (while (>= i 0) (setq j (1- (pq-nfields R))) (setq l2 nil) (while (>= j 0) (push (pq-get-value R i j) l2) (decf j)) (push l2 l1) (decf i)) (setq j (1- (pq-nfields R))) (setq l2 nil) (while (>= j 0) (push (pq-fname R j) l2) (decf j)) (push l2 l1) l1)) => pg-util-query-results (setq R (pq-exec P "SELECT * FROM xemacs_test ORDER BY field2 DESC;")) => # (pg-util-query-results R) => (("f1" "field2") ("a" "97") ("b" "97") ("stuff" "42") ("a string" "12") ("foo" "10") ("string" "2") ("text" "1")) Here is an example of a query that uses a database cursor. (let (data R) (setq R (pq-exec P "BEGIN;")) (setq R (pq-exec P "DECLARE k_cursor CURSOR FOR SELECT * FROM xemacs_test ORDER BY f1 DESC;")) (setq R (pq-exec P "FETCH k_cursor;")) (while (eq (pq-ntuples R) 1) (push (list (pq-get-value R 0 0) (pq-get-value R 0 1)) data) (setq R (pq-exec P "FETCH k_cursor;"))) (setq R (pq-exec P "END;")) data) => (("a" "97") ("a string" "12") ("b" "97") ("foo" "10") ("string" "2") ("stuff" "42") ("text" "1")) Here's another example of cursors, this time with a Lisp macro to implement a mapping function over a table. (defmacro map-db (P table condition callout) `(let (R) (pq-exec ,P "BEGIN;") (pq-exec ,P (concat "DECLARE k_cursor CURSOR FOR SELECT * FROM " ,table " " ,condition " ORDER BY f1 DESC;")) (setq R (pq-exec P "FETCH k_cursor;")) (while (eq (pq-ntuples R) 1) (,callout (pq-get-value R 0 0) (pq-get-value R 0 1)) (setq R (pq-exec P "FETCH k_cursor;"))) (pq-exec P "END;"))) => map-db (defun callback (arg1 arg2) (message "arg1 = %s, arg2 = %s" arg1 arg2)) => callback (map-db P "xemacs_test" "WHERE field2 > 10" callback) => arg1 = stuff, arg2 = 42 => arg1 = b, arg2 = 97 => arg1 = a string, arg2 = 12 => arg1 = a, arg2 = 97 => #  File: lispref.info, Node: Internationalization, Next: MULE, Prev: PostgreSQL Support, Up: Top Internationalization ******************** * Menu: * I18N Levels 1 and 2:: Support for different time, date, and currency formats. * I18N Level 3:: Support for localized messages. * I18N Level 4:: Support for Asian languages.  File: lispref.info, Node: I18N Levels 1 and 2, Next: I18N Level 3, Up: Internationalization I18N Levels 1 and 2 =================== XEmacs is now compliant with I18N levels 1 and 2. Specifically, this means that it is 8-bit clean and correctly handles time and date functions. XEmacs will correctly display the entire ISO-Latin 1 character set. The compose key may now be used to create any character in the ISO-Latin 1 character set not directly available via the keyboard.. In order for the compose key to work it is necessary to load the file `x-compose.el'. At any time while composing a character, `C-h' will display all valid completions and the character which would be produced.  File: lispref.info, Node: I18N Level 3, Next: I18N Level 4, Prev: I18N Levels 1 and 2, Up: Internationalization I18N Level 3 ============ * Menu: * Level 3 Basics:: * Level 3 Primitives:: * Dynamic Messaging:: * Domain Specification:: * Documentation String Extraction::  File: lispref.info, Node: Level 3 Basics, Next: Level 3 Primitives, Up: I18N Level 3 Level 3 Basics -------------- XEmacs now provides alpha-level functionality for I18N Level 3. This means that everything necessary for full messaging is available, but not every file has been converted. The two message files which have been created are `src/emacs.po' and `lisp/packages/mh-e.po'. Both files need to be converted using `msgfmt', and the resulting `.mo' files placed in some locale's `LC_MESSAGES' directory. The test "translations" in these files are the original messages prefixed by `TRNSLT_'. The domain for a variable is stored on the variable's property list under the property name VARIABLE-DOMAIN. The function `documentation-property' uses this information when translating a variable's documentation.  File: lispref.info, Node: Level 3 Primitives, Next: Dynamic Messaging, Prev: Level 3 Basics, Up: I18N Level 3 Level 3 Primitives ------------------ - Function: gettext string This function looks up STRING in the default message domain and returns its translation. If `I18N3' was not enabled when XEmacs was compiled, it just returns STRING. - Function: dgettext domain string This function looks up STRING in the specified message domain and returns its translation. If `I18N3' was not enabled when XEmacs was compiled, it just returns STRING. - Function: bind-text-domain domain pathname This function associates a pathname with a message domain. Here's how the path to message file is constructed under SunOS 5.x: `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo' If `I18N3' was not enabled when XEmacs was compiled, this function does nothing. - Special Form: domain string This function specifies the text domain used for translating documentation strings and interactive prompts of a function. For example, write: (defun foo (arg) "Doc string" (domain "emacs-foo") ...) to specify `emacs-foo' as the text domain of the function `foo'. The "call" to `domain' is actually a declaration rather than a function; when actually called, `domain' just returns `nil'. - Function: domain-of function This function returns the text domain of FUNCTION; it returns `nil' if it is the default domain. If `I18N3' was not enabled when XEmacs was compiled, it always returns `nil'.  File: lispref.info, Node: Dynamic Messaging, Next: Domain Specification, Prev: Level 3 Primitives, Up: I18N Level 3 Dynamic Messaging ----------------- The `format' function has been extended to permit you to change the order of parameter insertion. For example, the conversion format `%1$s' inserts parameter one as a string, while `%2$s' inserts parameter two. This is useful when creating translations which require you to change the word order.  File: lispref.info, Node: Domain Specification, Next: Documentation String Extraction, Prev: Dynamic Messaging, Up: I18N Level 3 Domain Specification -------------------- The default message domain of XEmacs is `emacs'. For add-on packages, it is best to use a different domain. For example, let us say we want to convert the "gorilla" package to use the domain `emacs-gorilla'. To translate the message "What gorilla?", use `dgettext' as follows: (dgettext "emacs-gorilla" "What gorilla?") A function (or macro) which has a documentation string or an interactive prompt needs to be associated with the domain in order for the documentation or prompt to be translated. This is done with the `domain' special form as follows: (defun scratch (location) "Scratch the specified location." (domain "emacs-gorilla") (interactive "sScratch: ") ... ) It is most efficient to specify the domain in the first line of the function body, before the `interactive' form. For variables and constants which have documentation strings, specify the domain after the documentation. - Special Form: defvar symbol [value [doc-string [domain]]] Example: (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla") - Special Form: defconst symbol [value [doc-string [domain]]] Example: (defconst limbs 4 "Number of limbs" "emacs-gorilla") - Function: autoload function filename &optional docstring interactive type This function defines FUNCTION to autoload from FILENAME Example: (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")  File: lispref.info, Node: Documentation String Extraction, Prev: Domain Specification, Up: I18N Level 3 Documentation String Extraction ------------------------------- The utility `etc/make-po' scans the file `DOC' to extract documentation strings and creates a message file `doc.po'. This file may then be inserted within `emacs.po'. Currently, `make-po' is hard-coded to read from `DOC' and write to `doc.po'. In order to extract documentation strings from an add-on package, first run `make-docfile' on the package to produce the `DOC' file. Then run `make-po -p' with the `-p' argument to indicate that we are extracting documentation for an add-on package. (The `-p' argument is a kludge to make up for a subtle difference between pre-loaded documentation and add-on documentation: For add-on packages, the final carriage returns in the strings produced by `make-docfile' must be ignored.)  File: lispref.info, Node: I18N Level 4, Prev: I18N Level 3, Up: Internationalization I18N Level 4 ============ The Asian-language support in XEmacs is called "MULE". *Note MULE::.  File: lispref.info, Node: MULE, Next: Tips, Prev: Internationalization, Up: Top MULE **** "MULE" is the name originally given to the version of GNU Emacs extended for multi-lingual (and in particular Asian-language) support. "MULE" is short for "MUlti-Lingual Emacs". It is an extension and complete rewrite of Nemacs ("Nihon Emacs" where "Nihon" is the Japanese word for "Japan"), which only provided support for Japanese. XEmacs refers to its multi-lingual support as "MULE support" since it is based on "MULE". * Menu: * Internationalization Terminology:: Definition of various internationalization terms. * Charsets:: Sets of related characters. * MULE Characters:: Working with characters in XEmacs/MULE. * Composite Characters:: Making new characters by overstriking other ones. * Coding Systems:: Ways of representing a string of chars using integers. * CCL:: A special language for writing fast converters. * Category Tables:: Subdividing charsets into groups.  File: lispref.info, Node: Internationalization Terminology, Next: Charsets, Up: MULE Internationalization Terminology ================================ In internationalization terminology, a string of text is divided up into "characters", which are the printable units that make up the text. A single character is (for example) a capital `A', the number `2', a Katakana character, a Hangul character, a Kanji ideograph (an "ideograph" is a "picture" character, such as is used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there are thousands of such ideographs in each language), etc. The basic property of a character is that it is the smallest unit of text with semantic significance in text processing. Human beings normally process text visually, so to a first approximation a character may be identified with its shape. Note that the same character may be drawn by two different people (or in two different fonts) in slightly different ways, although the "basic shape" will be the same. But consider the works of Scott Kim; human beings can recognize hugely variant shapes as the "same" character. Sometimes, especially where characters are extremely complicated to write, completely different shapes may be defined as the "same" character in national standards. The Taiwanese variant of Hanzi is generally the most complicated; over the centuries, the Japanese, Koreans, and the People's Republic of China have adopted simplifications of the shape, but the line of descent from the original shape is recorded, and the meanings and pronunciation of different forms of the same character are considered to be identical within each language. (Of course, it may take a specialist to recognize the related form; the point is that the relations are standardized, despite the differing shapes.) In some cases, the differences will be significant enough that it is actually possible to identify two or more distinct shapes that both represent the same character. For example, the lowercase letters `a' and `g' each have two distinct possible shapes--the `a' can optionally have a curved tail projecting off the top, and the `g' can be formed either of two loops, or of one loop and a tail hanging off the bottom. Such distinct possible shapes of a character are called "glyphs". The important characteristic of two glyphs making up the same character is that the choice between one or the other is purely stylistic and has no linguistic effect on a word (this is the reason why a capital `A' and lowercase `a' are different characters rather than different glyphs--e.g. `Aspen' is a city while `aspen' is a kind of tree). Note that "character" and "glyph" are used differently here than elsewhere in XEmacs. A "character set" is essentially a set of related characters. ASCII, for example, is a set of 94 characters (or 128, if you count non-printing characters). Other character sets are ISO8859-1 (ASCII plus various accented characters and other international symbols), JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji), GB2312 (Mainland Chinese Hanzi), etc. The definition of a character set will implicitly or explicitly give it an "ordering", a way of assigning a number to each character in the set. For many character sets, there is a natural ordering, for example the "ABC" ordering of the Roman letters. But it is not clear whether digits should come before or after the letters, and in fact different European languages treat the ordering of accented characters differently. It is useful to use the natural order where available, of course. The number assigned to any particular character is called the character's "code point". (Within a given character set, each character has a unique code point. Thus the word "set" is ill-chosen; different orderings of the same characters are different character sets. Identifying characters is simple enough for alphabetic character sets, but the difference in ordering can cause great headaches when the same thousands of characters are used by different cultures as in the Hanzi.) A code point may be broken into a number of "position codes". The number of position codes required to index a particular character in a character set is called the "dimension" of the character set. For practical purposes, a position code may be thought of as a byte-sized index. The printing characters of ASCII, being a relatively small character set, is of dimension one, and each character in the set is indexed using a single position code, in the range 1 through 94. Use of this unusual range, rather than the familiar 33 through 126, is an intentional abstraction; to understand the programming issues you must break the equation between character sets and encodings. JIS X 0208, i.e. Japanese Kanji, has thousands of characters, and is of dimension two - every character is indexed by two position codes, each in the range 1 through 94. (This number "94" is not a coincidence; we shall see that the JIS position codes were chosen so that JIS kanji could be encoded without using codes that in ASCII are associated with device control functions.) Note that the choice of the range here is somewhat arbitrary. You could just as easily index the printing characters in ASCII using numbers in the range 0 through 93, 2 through 95, 3 through 96, etc. In fact, the standardized _encoding_ for the ASCII _character set_ uses the range 33 through 126. An "encoding" is a way of numerically representing characters from one or more character sets into a stream of like-sized numerical values called "words"; typically these are 8-bit, 16-bit, or 32-bit quantities. If an encoding encompasses only one character set, then the position codes for the characters in that character set could be used directly. (This is the case with the trivial cipher used by children, assigning 1 to `A', 2 to `B', and so on.) However, even with ASCII, other considerations intrude. For example, why are the upper- and lowercase alphabets separated by 8 characters? Why do the digits start with `0' being assigned the code 48? In both cases because semantically interesting operations (case conversion and numerical value extraction) become convenient masking operations. Other artificial aspects (the control characters being assigned to codes 0-31 and 127) are historical accidents. (The use of 127 for `DEL' is an artifact of the "punch once" nature of paper tape, for example.) Naive use of the position code is not possible, however, if more than one character set is to be used in the encoding. For example, printed Japanese text typically requires characters from multiple character sets - ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is indexed using one or more position codes in the range 1 through 94, so the position codes could not be used directly or there would be no way to tell which character was meant. Different Japanese encodings handle this differently - JIS uses special escape characters to denote different character sets; EUC sets the high bit of the position codes for JIS X 0208 and JIS X 0212, and puts a special extra byte before each JIS X 0212 character; etc. (JIS, EUC, and most of the other encodings you will encounter in files are 7-bit or 8-bit encodings. There is one common 16-bit encoding, which is Unicode; this strives to represent all the world's characters in a single large character set. 32-bit encodings are often used internally in programs, such as XEmacs with MULE support, to simplify the code that manipulates them; however, they are not used externally because they are not very space-efficient.) A general method of handling text using multiple character sets (whether for multilingual text, or simply text in an extremely complicated single language like Japanese) is defined in the international standard ISO 2022. ISO 2022 will be discussed in more detail later (*note ISO 2022::), but for now suffice it to say that text needs control functions (at least spacing), and if escape sequences are to be used, an escape sequence introducer. It was decided to make all text streams compatible with ASCII in the sense that the codes 0-31 (and 128-159) would always be control codes, never graphic characters, and where defined by the character set the `SPC' character would be assigned code 32, and `DEL' would be assigned 127. Thus there are 94 code points remaining if 7 bits are used. This is the reason that most character sets are defined using position codes in the range 1 through 94. Then ISO 2022 compatible encodings are produced by shifting the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit codes are available) into character codes 161 to 254. Encodings are classified as either "modal" or "non-modal". In a "modal encoding", there are multiple states that the encoding can be in, and the interpretation of the values in the stream depends on the current global state of the encoding. Special values in the encoding, called "escape sequences", are used to change the global state. JIS, for example, is a modal encoding. The bytes `ESC $ B' indicate that, from then on, bytes are to be interpreted as position codes for JIS X 0208, rather than as ASCII. This effect is cancelled using the bytes `ESC ( B', which mean "switch from whatever the current state is to ASCII". To switch to JIS X 0212, the escape sequence `ESC $ ( D'. (Note that here, as is common, the escape sequences do in fact begin with `ESC'. This is not necessarily the case, however. Some encodings use control characters called "locking shifts" (effect persists until cancelled) to switch character sets.) A "non-modal encoding" has no global state that extends past the character currently being interpreted. EUC, for example, is a non-modal encoding. Characters in JIS X 0208 are encoded by setting the high bit of the position codes, and characters in JIS X 0212 are encoded by doing the same but also prefixing the character with the byte 0x8F. The advantage of a modal encoding is that it is generally more space-efficient, and is easily extendible because there are essentially an arbitrary number of escape sequences that can be created. The disadvantage, however, is that it is much more difficult to work with if it is not being processed in a sequential manner. In the non-modal EUC encoding, for example, the byte 0x41 always refers to the letter `A'; whereas in JIS, it could either be the letter `A', or one of the two position codes in a JIS X 0208 character, or one of the two position codes in a JIS X 0212 character. Determining exactly which one is meant could be difficult and time-consuming if the previous bytes in the string have not already been processed, or impossible if they are drawn from an external stream that cannot be rewound. Non-modal encodings are further divided into "fixed-width" and "variable-width" formats. A fixed-width encoding always uses the same number of words per character, whereas a variable-width encoding does not. EUC is a good example of a variable-width encoding: one to three bytes are used per character, depending on the character set. 16-bit and 32-bit encodings are nearly always fixed-width, and this is in fact one of the main reasons for using an encoding with a larger word size. The advantages of fixed-width encodings should be obvious. The advantages of variable-width encodings are that they are generally more space-efficient and allow for compatibility with existing 8-bit encodings such as ASCII. (For example, in Unicode ASCII characters are simply promoted to a 16-bit representation. That means that every ASCII character contains a `NUL' byte; evidently all of the standard string manipulation functions will lose badly in a fixed-width Unicode environment.) The bytes in an 8-bit encoding are often referred to as "octets" rather than simply as bytes. This terminology dates back to the days before 8-bit bytes were universal, when some computers had 9-bit bytes, others had 10-bit bytes, etc.  File: lispref.info, Node: Charsets, Next: MULE Characters, Prev: Internationalization Terminology, Up: MULE Charsets ======== A "charset" in MULE is an object that encapsulates a particular character set as well as an ordering of those characters. Charsets are permanent objects and are named using symbols, like faces. - Function: charsetp object This function returns non-`nil' if OBJECT is a charset. * Menu: * Charset Properties:: Properties of a charset. * Basic Charset Functions:: Functions for working with charsets. * Charset Property Functions:: Functions for accessing charset properties. * Predefined Charsets:: Predefined charset objects.  File: lispref.info, Node: Charset Properties, Next: Basic Charset Functions, Up: Charsets Charset Properties ------------------ Charsets have the following properties: `name' A symbol naming the charset. Every charset must have a different name; this allows a charset to be referred to using its name rather than the actual charset object. `doc-string' A documentation string describing the charset. `registry' A regular expression matching the font registry field for this character set. For example, both the `ascii' and `latin-iso8859-1' charsets use the registry `"ISO8859-1"'. This field is used to choose an appropriate font when the user gives a general font specification such as `-*-courier-medium-r-*-140-*', i.e. a 14-point upright medium-weight Courier font. `dimension' Number of position codes used to index a character in the character set. XEmacs/MULE can only handle character sets of dimension 1 or 2. This property defaults to 1. `chars' Number of characters in each dimension. In XEmacs/MULE, the only allowed values are 94 or 96. (There are a couple of pre-defined character sets, such as ASCII, that do not follow this, but you cannot define new ones like this.) Defaults to 94. Note that if the dimension is 2, the character set thus described is 94x94 or 96x96. `columns' Number of columns used to display a character in this charset. Only used in TTY mode. (Under X, the actual width of a character can be derived from the font used to display the characters.) If unspecified, defaults to the dimension. (This is almost always the correct value, because character sets with dimension 2 are usually ideograph character sets, which need two columns to display the intricate ideographs.) `direction' A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left). Defaults to `l2r'. This specifies the direction that the text should be displayed in, and will be left-to-right for most charsets but right-to-left for Hebrew and Arabic. (Right-to-left display is not currently implemented.) `final' Final byte of the standard ISO 2022 escape sequence designating this charset. Must be supplied. Each combination of (DIMENSION, CHARS) defines a separate namespace for final bytes, and each charset within a particular namespace must have a different final byte. Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final bytes in the range 0x30 - 0x3F are reserved for user-defined (not official) character sets. For more information on ISO 2022, see *Note Coding Systems::. `graphic' 0 (use left half of font on output) or 1 (use right half of font on output). Defaults to 0. This specifies how to convert the position codes that index a character in a character set into an index into the font used to display the character set. With `graphic' set to 0, position codes 33 through 126 map to font indices 33 through 126; with it set to 1, position codes 33 through 126 map to font indices 161 through 254 (i.e. the same number but with the high bit set). For example, for a font whose registry is ISO8859-1, the left half of the font (octets 0x20 - 0x7F) is the `ascii' charset, while the right half (octets 0xA0 - 0xFF) is the `latin-iso8859-1' charset. `ccl-program' A compiled CCL program used to convert a character in this charset into an index into the font. This is in addition to the `graphic' property. If a CCL program is defined, the position codes of a character will first be processed according to `graphic' and then passed through the CCL program, with the resulting values used to index the font. This is used, for example, in the Big5 character set (used in Taiwan). This character set is not ISO-2022-compliant, and its size (94x157) does not fit within the maximum 96x96 size of ISO-2022-compliant character sets. As a result, XEmacs/MULE splits it (in a rather complex fashion, so as to group the most commonly used characters together) into two charset objects (`big5-1' and `big5-2'), each of size 94x94, and each charset object uses a CCL program to convert the modified position codes back into standard Big5 indices to retrieve a character from a Big5 font. Most of the above properties can only be set when the charset is initialized, and cannot be changed later. *Note Charset Property Functions::.  File: lispref.info, Node: Basic Charset Functions, Next: Charset Property Functions, Prev: Charset Properties, Up: Charsets Basic Charset Functions ----------------------- - Function: find-charset charset-or-name This function retrieves the charset of the given name. If CHARSET-OR-NAME is a charset object, it is simply returned. Otherwise, CHARSET-OR-NAME should be a symbol. If there is no such charset, `nil' is returned. Otherwise the associated charset object is returned. - Function: get-charset name This function retrieves the charset of the given name. Same as `find-charset' except an error is signalled if there is no such charset instead of returning `nil'. - Function: charset-list This function returns a list of the names of all defined charsets. - Function: make-charset name doc-string props This function defines a new character set. This function is for use with MULE support. NAME is a symbol, the name by which the character set is normally referred. DOC-STRING is a string describing the character set. PROPS is a property list, describing the specific nature of the character set. The recognized properties are `registry', `dimension', `columns', `chars', `final', `graphic', `direction', and `ccl-program', as previously described. - Function: make-reverse-direction-charset charset new-name This function makes a charset equivalent to CHARSET but which goes in the opposite direction. NEW-NAME is the name of the new charset. The new charset is returned. - Function: charset-from-attributes dimension chars final &optional direction This function returns a charset with the given DIMENSION, CHARS, FINAL, and DIRECTION. If DIRECTION is omitted, both directions will be checked (left-to-right will be returned if character sets exist for both directions). - Function: charset-reverse-direction-charset charset This function returns the charset (if any) with the same dimension, number of characters, and final byte as CHARSET, but which is displayed in the opposite direction.  File: lispref.info, Node: Charset Property Functions, Next: Predefined Charsets, Prev: Basic Charset Functions, Up: Charsets Charset Property Functions -------------------------- All of these functions accept either a charset name or charset object. - Function: charset-property charset prop This function returns property PROP of CHARSET. *Note Charset Properties::. Convenience functions are also provided for retrieving individual properties of a charset. - Function: charset-name charset This function returns the name of CHARSET. This will be a symbol. - Function: charset-description charset This function returns the documentation string of CHARSET. - Function: charset-registry charset This function returns the registry of CHARSET. - Function: charset-dimension charset This function returns the dimension of CHARSET. - Function: charset-chars charset This function returns the number of characters per dimension of CHARSET. - Function: charset-width charset This function returns the number of display columns per character (in TTY mode) of CHARSET. - Function: charset-direction charset This function returns the display direction of CHARSET--either `l2r' or `r2l'. - Function: charset-iso-final-char charset This function returns the final byte of the ISO 2022 escape sequence designating CHARSET. - Function: charset-iso-graphic-plane charset This function returns either 0 or 1, depending on whether the position codes of characters in CHARSET map to the left or right half of their font, respectively. - Function: charset-ccl-program charset This function returns the CCL program, if any, for converting position codes of characters in CHARSET into font indices. The two properties of a charset that can currently be set after the charset has been created are the CCL program and the font registry. - Function: set-charset-ccl-program charset ccl-program This function sets the `ccl-program' property of CHARSET to CCL-PROGRAM. - Function: set-charset-registry charset registry This function sets the `registry' property of CHARSET to REGISTRY.  File: lispref.info, Node: Predefined Charsets, Prev: Charset Property Functions, Up: Charsets Predefined Charsets ------------------- The following charsets are predefined in the C code. Name Type Fi Gr Dir Registry -------------------------------------------------------------- ascii 94 B 0 l2r ISO8859-1 control-1 94 0 l2r --- latin-iso8859-1 94 A 1 l2r ISO8859-1 latin-iso8859-2 96 B 1 l2r ISO8859-2 latin-iso8859-3 96 C 1 l2r ISO8859-3 latin-iso8859-4 96 D 1 l2r ISO8859-4 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5 arabic-iso8859-6 96 G 1 r2l ISO8859-6 greek-iso8859-7 96 F 1 l2r ISO8859-7 hebrew-iso8859-8 96 H 1 r2l ISO8859-8 latin-iso8859-9 96 M 1 l2r ISO8859-9 thai-tis620 96 T 1 l2r TIS620 katakana-jisx0201 94 I 1 l2r JISX0201.1976 latin-jisx0201 94 J 0 l2r JISX0201.1976 japanese-jisx0208-1978 94x94 @ 0 l2r JISX0208.1978 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90) japanese-jisx0212 94x94 D 0 l2r JISX0212 chinese-gb2312 94x94 A 0 l2r GB2312 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2 chinese-big5-1 94x94 0 0 l2r Big5 chinese-big5-2 94x94 1 0 l2r Big5 korean-ksc5601 94x94 C 0 l2r KSC5601 composite 96x96 0 l2r --- The following charsets are predefined in the Lisp code. Name Type Fi Gr Dir Registry -------------------------------------------------------------- arabic-digit 94 2 0 l2r MuleArabic-0 arabic-1-column 94 3 0 r2l MuleArabic-1 arabic-2-column 94 4 0 r2l MuleArabic-2 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH chinese-cns11643-3 94x94 I 0 l2r CNS11643.1 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1 ethiopic 94x94 2 0 l2r Ethio ascii-r2l 94 B 0 r2l ISO8859-1 ipa 96 0 1 l2r MuleIPA vietnamese-viscii-lower 96 1 1 l2r VISCII1.1 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1 For all of the above charsets, the dimension and number of columns are the same. Note that ASCII, Control-1, and Composite are handled specially. This is why some of the fields are blank; and some of the filled-in fields (e.g. the type) are not really accurate.  File: lispref.info, Node: MULE Characters, Next: Composite Characters, Prev: Charsets, Up: MULE MULE Characters =============== - Function: make-char charset arg1 &optional arg2 This function makes a multi-byte character from CHARSET and octets ARG1 and ARG2. - Function: char-charset character This function returns the character set of char CHARACTER. - Function: char-octet character &optional n This function returns the octet (i.e. position code) numbered N (should be 0 or 1) of char CHARACTER. N defaults to 0 if omitted. - Function: find-charset-region start end &optional buffer This function returns a list of the charsets in the region between START and END. BUFFER defaults to the current buffer if omitted. - Function: find-charset-string string This function returns a list of the charsets in STRING.  File: lispref.info, Node: Composite Characters, Next: Coding Systems, Prev: MULE Characters, Up: MULE Composite Characters ==================== Composite characters are not yet completely implemented. - Function: make-composite-char string This function converts a string into a single composite character. The character is the result of overstriking all the characters in the string. - Function: composite-char-string character This function returns a string of the characters comprising a composite character. - Function: compose-region start end &optional buffer This function composes the characters in the region from START to END in BUFFER into one composite character. The composite character replaces the composed characters. BUFFER defaults to the current buffer if omitted. - Function: decompose-region start end &optional buffer This function decomposes any composite characters in the region from START to END in BUFFER. This converts each composite character into one or more characters, the individual characters out of which the composite character was formed. Non-composite characters are left as-is. BUFFER defaults to the current buffer if omitted.  File: lispref.info, Node: Coding Systems, Next: CCL, Prev: Composite Characters, Up: MULE Coding Systems ============== A coding system is an object that defines how text containing multiple character sets is encoded into a stream of (typically 8-bit) bytes. The coding system is used to decode the stream into a series of characters (which may be from multiple charsets) when the text is read from a file or process, and is used to encode the text back into the same format when it is written out to a file or process. For example, many ISO-2022-compliant coding systems (such as Compound Text, which is used for inter-client data under the X Window System) use escape sequences to switch between different charsets - Japanese Kanji, for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC ( B'; and Cyrillic is invoked with `ESC - L'. See `make-coding-system' for more information. Coding systems are normally identified using a symbol, and the symbol is accepted in place of the actual coding system object whenever a coding system is called for. (This is similar to how faces and charsets work.) - Function: coding-system-p object This function returns non-`nil' if OBJECT is a coding system. * Menu: * Coding System Types:: Classifying coding systems. * ISO 2022:: An international standard for charsets and encodings. * EOL Conversion:: Dealing with different ways of denoting the end of a line. * Coding System Properties:: Properties of a coding system. * Basic Coding System Functions:: Working with coding systems. * Coding System Property Functions:: Retrieving a coding system's properties. * Encoding and Decoding Text:: Encoding and decoding text. * Detection of Textual Encoding:: Determining how text is encoded. * Big5 and Shift-JIS Functions:: Special functions for these non-standard encodings. * Predefined Coding Systems:: Coding systems implemented by MULE.  File: lispref.info, Node: Coding System Types, Next: ISO 2022, Up: Coding Systems Coding System Types ------------------- The coding system type determines the basic algorithm XEmacs will use to decode or encode a data stream. Character encodings will be converted to the MULE encoding, escape sequences processed, and newline sequences converted to XEmacs's internal representation. There are three basic classes of coding system type: no-conversion, ISO-2022, and special. No conversion allows you to look at the file's internal representation. Since XEmacs is basically a text editor, "no conversion" does convert newline conventions by default. (Use the 'binary coding-system if this is not desired.) ISO 2022 (*note ISO 2022::) is the basic international standard regulating use of "coded character sets for the exchange of data", ie, text streams. ISO 2022 contains functions that make it possible to encode text streams to comply with restrictions of the Internet mail system and de facto restrictions of most file systems (eg, use of the separator character in file names). Coding systems which are not ISO 2022 conformant can be difficult to handle. Perhaps more important, they are not adaptable to multilingual information interchange, with the obvious exception of ISO 10646 (Unicode). (Unicode is partially supported by XEmacs with the addition of the Lisp package ucs-conv.) The special class of coding systems includes automatic detection, CCL (a "little language" embedded as an interpreter, useful for translating between variants of a single character set), non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5, and MULE internal coding. (NB: this list is based on XEmacs 21.2. Terminology may vary slightly for other versions of XEmacs and for GNU Emacs 20.) `no-conversion' No conversion, for binary files, and a few special cases of non-ISO-2022 coding systems where conversion is done by hook functions (usually implemented in CCL). On output, graphic characters that are not in ASCII or Latin-1 will be replaced by a `?'. (For a no-conversion-encoded buffer, these characters will only be present if you explicitly insert them.) `iso2022' Any ISO-2022-compliant encoding. Among others, this includes JIS (the Japanese encoding commonly used for e-mail), national variants of EUC (the standard Unix encoding for Japanese and other languages), and Compound Text (an encoding used in X11). You can specify more specific information about the conversion with the FLAGS argument. `ucs-4' ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode. `utf-8' ISO 10646 UTF-8 encoding. A "file system safe" transformation format that can be used with both UCS-4 and Unicode. `undecided' Automatic conversion. XEmacs attempts to detect the coding system used in the file. `shift-jis' Shift-JIS (a Japanese encoding commonly used in PC operating systems). `big5' Big5 (the encoding commonly used for Taiwanese). `ccl' The conversion is performed using a user-written pseudo-code program. CCL (Code Conversion Language) is the name of this pseudo-code. For example, CCL is used to map KOI8-R characters (an encoding for Russian Cyrillic) to ISO8859-5 (the form used internally by MULE). `internal' Write out or read in the raw contents of the memory representing the buffer's text. This is primarily useful for debugging purposes, and is only enabled when XEmacs has been compiled with `DEBUG_XEMACS' set (the `--debug' configure option). *Warning*: Reading in a file using `internal' conversion can result in an internal inconsistency in the memory representing a buffer's text, which will produce unpredictable results and may cause XEmacs to crash. Under normal circumstances you should never use `internal' conversion.  File: lispref.info, Node: ISO 2022, Next: EOL Conversion, Prev: Coding System Types, Up: Coding Systems ISO 2022 ======== This section briefly describes the ISO 2022 encoding standard. A more thorough treatment is available in the original document of ISO 2022 as well as various national standards (such as JIS X 0202). Character sets ("charsets") are classified into the following four categories, according to the number of characters in the charset: 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means that although an ISO 2022 coding system may have variable width characters, each charset used is fixed-width (in contrast to the MULE character set and UTF-8, for example). ISO 2022 provides for switching between character sets via escape sequences. This switching is somewhat complicated, because ISO 2022 provides for both legacy applications like Internet mail that accept only 7 significant bits in some contexts (RFC 822 headers, for example), and more modern "8-bit clean" applications. It also provides for compact and transparent representation of languages like Japanese which mix ASCII and a national script (even outside of computer programs). First, ISO 2022 codified prevailing practice by dividing the code space into "control" and "graphic" regions. The code points 0x00-0x1F and 0x80-0x9F are reserved for "control characters", while "graphic characters" must be assigned to code points in the regions 0x20-0x7F and 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some circumstances must be assigned the graphic character "ASCII SPACE" and the control character "ASCII DEL" respectively. The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left" and "graphic right", respectively, because of the standard method of displaying graphic character sets in tables with the high byte indexing columns and the low byte indexing rows. I don't find it very intuitive, but these are called "registers". An ISO 2022-conformant encoding for a graphic character set must use a fixed number of bytes per character, and the values must fit into a single register; that is, each byte must range over either 0x20-0x7F, or 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a character set by using both ranges at the same. This is why a standard character set such as ISO 8859-1 is actually considered by ISO 2022 to be an aggregation of two character sets, ASCII and LATIN-1, and why it is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a single character's bytes must all be drawn from the same register; this is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO 2022-compatible encodings. The reason for this restriction becomes clear when you attempt to define an efficient, robust encoding for a language like Japanese. Like ISO 8859, Japanese encodings are aggregations of several character sets. In practice, the vast majority of characters are drawn from the "JIS Roman" character set (a derivative of ASCII; it won't hurt to think of it as ASCII) and the JIS X 0208 standard "basic Japanese" character set including not only ideographic characters ("kanji") but syllabic Japanese characters ("kana"), a wide variety of symbols, and many alphabetic characters (Roman, Greek, and Cyrillic) as well. Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not suited to programming; thus the inclusion of ASCII in the standard Japanese encodings. For normal Japanese text such as in newspapers, a broad repertoire of approximately 3000 characters is used. Evidently this won't fit into one byte; two must be used. But much of the text processed by Japanese computers is computer source code, nearly all of which is ASCII. A not insignificant portion of ordinary text is English (as such or as borrowed Japanese vocabulary) or other languages which can represented at least approximately in ASCII, as well. It seems reasonable then to represent ASCII in one byte, and JIS X 0208 in two. And this is exactly what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is invoked to the GL register, and JIS X 0208 is invoked to the GR register. Thus, each byte can be tested for its character set by looking at the high bit; if set, it is Japanese, if clear, it is ASCII. Furthermore, since control characters like newline can never be part of a graphic character, even in the case of corruption in transmission the stream will be resynchronized at every line break, on the order of 60-80 bytes. This coding system requires no escape sequences or special control codes to represent 99.9% of all Japanese text. Note carefully the distinction between the character sets (ASCII and JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The JIS X 0208 character set is used in three different encodings for Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is always clear), in EUC-JP it is invoked into GR (setting the high bit in the process), and in Shift JIS the high bit may be set or reset, and the significant bits are shifted within the 16-bit character so that the two main character sets can coexist with a third (the "halfwidth katakana" of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a version of the ISO-2022 coding system. In order to systematically treat subsidiary character sets (like the "halfwidth katakana" already mentioned, and the "supplementary kanji" of JIS X 0212), four further registers are defined: G0, G1, G2, and G3. Unlike GL and GR, they are not logically distinguished by internal format. Instead, the process of "invocation" mentioned earlier is broken into two steps: first, a character set is "designated" to one of the registers G0-G3 by use of an "escape sequence" of the form: ESC [I] I F where I is an intermediate character or characters in the range 0x20 - 0x3F, and F, from the range 0x30-0x7Fm is the final character identifying this charset. (Final characters in the range 0x30-0x3F are reserved for private use and will never have a publicly registered meaning.) Then that register is "invoked" to either GL or GR, either automatically (designations to G0 normally involve invocation to GL as well), or by use of shifting (affecting only the following character in the data stream) or locking (effective until the next designation or locking) control sequences. An encoding conformant to ISO 2022 is typically defined by designating the initial contents of the G0-G3 registers, specifying a 7 or 8 bit environment, and specifying whether further designations will be recognized. Some examples of character sets and the registered final characters F used to designate them: 94-charset ASCII (B), left (J) and right (I) half of JIS X 0201, ... 96-charset Latin-1 (A), Latin-2 (B), Latin-3 (C), ... 94x94-charset GB2312 (A), JIS X 0208 (B), KSC5601 (C), ... 96x96-charset none for the moment The meanings of the various characters in these sequences, where not specified by the ISO 2022 standard (such as the ESC character), are assigned by "ECMA", the European Computer Manufacturers Association. The meaning of intermediate characters are: $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96). ( [0x28]: designate to G0 a 94-charset whose final byte is F. ) [0x29]: designate to G1 a 94-charset whose final byte is F. * [0x2A]: designate to G2 a 94-charset whose final byte is F. + [0x2B]: designate to G3 a 94-charset whose final byte is F. , [0x2C]: designate to G0 a 96-charset whose final byte is F. - [0x2D]: designate to G1 a 96-charset whose final byte is F. . [0x2E]: designate to G2 a 96-charset whose final byte is F. / [0x2F]: designate to G3 a 96-charset whose final byte is F. The comma may be used in files read and written only by MULE, as a MULE extension, but this is illegal in ISO 2022. (The reason is that in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned the value SPACE, and 0x7F assigned the value DEL.) Here are examples of designations: ESC ( B : designate to G0 ASCII ESC - A : designate to G1 Latin-1 ESC $ ( A or ESC $ A : designate to G0 GB2312 ESC $ ( B or ESC $ B : designate to G0 JISX0208 ESC $ ) C : designate to G1 KSC5601 (The short forms used to designate GB2312 and JIS X 0208 are for backwards compatibility; the long forms are preferred.) To use a charset designated to G2 or G3, and to use a charset designated to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3 into GL. There are two types of invocation, Locking Shift (forever) and Single Shift (one character only). Locking Shift is done as follows: LS0 or SI (0x0F): invoke G0 into GL LS1 or SO (0x0E): invoke G1 into GL LS2: invoke G2 into GL LS3: invoke G3 into GL LS1R: invoke G1 into GR LS2R: invoke G2 into GR LS3R: invoke G3 into GR Single Shift is done as follows: SS2 or ESC N: invoke G2 into GL SS3 or ESC O: invoke G3 into GL The shift functions (such as LS1R and SS3) are represented by control characters (from C1) in 8 bit environments and by escape sequences in 7 bit environments. (#### Ben says: I think the above is slightly incorrect. It appears that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and ESC O behave as indicated. The above definitions will not parse EUC-encoded text correctly, and it looks like the code in mule-coding.c has similar problems.) Evidently there are a lot of ISO-2022-compliant ways of encoding multilingual text. Now, in the world, there exist many coding systems such as X11's Compound Text, Japanese JUNET code, and so-called EUC (Extended UNIX Code); all of these are variants of ISO 2022. In MULE, we characterize a version of ISO 2022 by the following attributes: 1. The character sets initially designated to G0 thru G3. 2. Whether short form designations are allowed for Japanese and Chinese. 3. Whether ASCII should be designated to G0 before control characters. 4. Whether ASCII should be designated to G0 at the end of line. 5. 7-bit environment or 8-bit environment. 6. Whether Locking Shifts are used or not. 7. Whether to use ASCII or the variant JIS X 0201-1976-Roman. 8. Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976. (The last two are only for Japanese.) By specifying these attributes, you can create any variant of ISO 2022. Here are several examples: ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check). 1. G0 <- ASCII, G1..3 <- never used 2. Yes. 3. Yes. 4. Yes. 5. 7-bit environment 6. No. 7. Use ASCII 8. Use JIS X 0208-1983 ctext -- X11 Compound Text 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used. 2. No. 3. No. 4. Yes. 5. 8-bit environment. 6. No. 7. Use ASCII. 8. Use JIS X 0208-1983. euc-china -- Chinese EUC. Often called the "GB encoding", but that is technically incorrect. 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used. 2. No. 3. Yes. 4. Yes. 5. 8-bit environment. 6. No. 7. Use ASCII. 8. Use JIS X 0208-1983. ISO-2022-KR -- Coding system used in Korean email. 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used. 2. No. 3. Yes. 4. Yes. 5. 7-bit environment. 6. Yes. 7. Use ASCII. 8. Use JIS X 0208-1983. MULE creates all of these coding systems by default.  File: lispref.info, Node: EOL Conversion, Next: Coding System Properties, Prev: ISO 2022, Up: Coding Systems EOL Conversion -------------- `nil' Automatically detect the end-of-line type (LF, CRLF, or CR). Also generate subsidiary coding systems named `NAME-unix', `NAME-dos', and `NAME-mac', that are identical to this coding system but have an EOL-TYPE value of `lf', `crlf', and `cr', respectively. `lf' The end of a line is marked externally using ASCII LF. Since this is also the way that XEmacs represents an end-of-line internally, specifying this option results in no end-of-line conversion. This is the standard format for Unix text files. `crlf' The end of a line is marked externally using ASCII CRLF. This is the standard format for MS-DOS text files. `cr' The end of a line is marked externally using ASCII CR. This is the standard format for Macintosh text files. `t' Automatically detect the end-of-line type but do not generate subsidiary coding systems. (This value is converted to `nil' when stored internally, and `coding-system-property' will return `nil'.)  File: lispref.info, Node: Coding System Properties, Next: Basic Coding System Functions, Prev: EOL Conversion, Up: Coding Systems Coding System Properties ------------------------ `mnemonic' String to be displayed in the modeline when this coding system is active. `eol-type' End-of-line conversion to be used. It should be one of the types listed in *Note EOL Conversion::. `eol-lf' The coding system which is the same as this one, except that it uses the Unix line-breaking convention. `eol-crlf' The coding system which is the same as this one, except that it uses the DOS line-breaking convention. `eol-cr' The coding system which is the same as this one, except that it uses the Macintosh line-breaking convention. `post-read-conversion' Function called after a file has been read in, to perform the decoding. Called with two arguments, START and END, denoting a region of the current buffer to be decoded. `pre-write-conversion' Function called before a file is written out, to perform the encoding. Called with two arguments, START and END, denoting a region of the current buffer to be encoded. The following additional properties are recognized if TYPE is `iso2022': `charset-g0' `charset-g1' `charset-g2' `charset-g3' The character set initially designated to the G0 - G3 registers. The value should be one of * A charset object (designate that character set) * `nil' (do not ever use this register) * `t' (no character set is initially designated to the register, but may be later on; this automatically sets the corresponding `force-g*-on-output' property) `force-g0-on-output' `force-g1-on-output' `force-g2-on-output' `force-g3-on-output' If non-`nil', send an explicit designation sequence on output before using the specified register. `short' If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $ B' on output in place of the full designation sequences `ESC $ ( @', `ESC $ ( A', and `ESC $ ( B'. `no-ascii-eol' If non-`nil', don't designate ASCII to G0 at each end of line on output. Setting this to non-`nil' also suppresses other state-resetting that normally happens at the end of a line. `no-ascii-cntl' If non-`nil', don't designate ASCII to G0 before control chars on output. `seven' If non-`nil', use 7-bit environment on output. Otherwise, use 8-bit environment. `lock-shift' If non-`nil', use locking-shift (SO/SI) instead of single-shift or designation by escape sequence. `no-iso6429' If non-`nil', don't use ISO6429's direction specification. `escape-quoted' If non-`nil', literal control characters that are the same as the beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F), and CSI (0x9B)) are "quoted" with an escape character so that they can be properly distinguished from an escape sequence. (Note that doing this results in a non-portable encoding.) This encoding flag is used for byte-compiled files. Note that ESC is a good choice for a quoting character because there are no escape sequences whose second byte is a character from the Control-0 or Control-1 character sets; this is explicitly disallowed by the ISO 2022 standard. `input-charset-conversion' A list of conversion specifications, specifying conversion of characters in one charset to another when decoding is performed. Each specification is a list of two elements: the source charset, and the destination charset. `output-charset-conversion' A list of conversion specifications, specifying conversion of characters in one charset to another when encoding is performed. The form of each specification is the same as for `input-charset-conversion'. The following additional properties are recognized (and required) if TYPE is `ccl': `decode' CCL program used for decoding (converting to internal format). `encode' CCL program used for encoding (converting to external format). The following properties are used internally: EOL-CR, EOL-CRLF, EOL-LF, and BASE.  File: lispref.info, Node: Basic Coding System Functions, Next: Coding System Property Functions, Prev: Coding System Properties, Up: Coding Systems Basic Coding System Functions ----------------------------- - Function: find-coding-system coding-system-or-name This function retrieves the coding system of the given name. If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply returned. Otherwise, CODING-SYSTEM-OR-NAME should be a symbol. If there is no such coding system, `nil' is returned. Otherwise the associated coding system object is returned. - Function: get-coding-system name This function retrieves the coding system of the given name. Same as `find-coding-system' except an error is signalled if there is no such coding system instead of returning `nil'. - Function: coding-system-list This function returns a list of the names of all defined coding systems. - Function: coding-system-name coding-system This function returns the name of the given coding system. - Function: coding-system-base coding-system Returns the base coding system (undecided EOL convention) coding system. - Function: make-coding-system name type &optional doc-string props This function registers symbol NAME as a coding system. TYPE describes the conversion method used and should be one of the types listed in *Note Coding System Types::. DOC-STRING is a string describing the coding system. PROPS is a property list, describing the specific nature of the character set. Recognized properties are as in *Note Coding System Properties::. - Function: copy-coding-system old-coding-system new-name This function copies OLD-CODING-SYSTEM to NEW-NAME. If NEW-NAME does not name an existing coding system, a new one will be created. - Function: subsidiary-coding-system coding-system eol-type This function returns the subsidiary coding system of CODING-SYSTEM with eol type EOL-TYPE.  File: lispref.info, Node: Coding System Property Functions, Next: Encoding and Decoding Text, Prev: Basic Coding System Functions, Up: Coding Systems Coding System Property Functions -------------------------------- - Function: coding-system-doc-string coding-system This function returns the doc string for CODING-SYSTEM. - Function: coding-system-type coding-system This function returns the type of CODING-SYSTEM. - Function: coding-system-property coding-system prop This function returns the PROP property of CODING-SYSTEM.  File: lispref.info, Node: Encoding and Decoding Text, Next: Detection of Textual Encoding, Prev: Coding System Property Functions, Up: Coding Systems Encoding and Decoding Text -------------------------- - Function: decode-coding-region start end coding-system &optional buffer This function decodes the text between START and END which is encoded in CODING-SYSTEM. This is useful if you've read in encoded text from a file without decoding it (e.g. you read in a JIS-formatted file but used the `binary' or `no-conversion' coding system, so that it shows up as `^[$B!> | <8 | >8 | // | < | > | == | <= | >= | != | de-sjis | en-sjis ASSIGNMENT_OPERATOR := += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= ARRAY := '[' integer ... ']'  File: lispref.info, Node: CCL Statements, Next: CCL Expressions, Prev: CCL Syntax, Up: CCL CCL Statements -------------- The Emacs Code Conversion Language provides the following statement types: "set", "if", "branch", "loop", "repeat", "break", "read", "write", "call", and "end". Set statement: ============== The "set" statement has three variants with the syntaxes `(REG = EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'. The assignment operator variation of the "set" statement works the same way as the corresponding C expression statement does. The assignment operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=', and `>>=', and they have the same meanings as in C. A "naked integer" INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'. I/O statements: =============== The "read" statement takes one or more registers as arguments. It reads one byte (a C char) from the input into each register in turn. The "write" takes several forms. In the form `(write REG ...)' it takes one or more registers as arguments and writes each in turn to the output. The integer in a register (interpreted as an Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the current output buffer. If it is less than 256, it is written as is. The forms `(write EXPRESSION)' and `(write INTEGER)' are treated analogously. The form `(write STRING)' writes the constant string to the output. A "naked string" `STRING' is equivalent to the statement `(write STRING)'. The form `(write REG ARRAY)' writes the REGth element of the ARRAY to the output. Conditional statements: ======================= The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. If the EXPRESSION evaluates to non-zero, the first CCL BLOCK is executed. Otherwise, if there is a SECOND CCL BLOCK, it is executed. The "read-if" variant of the "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is a register or an integer). The `read-if' statement first reads from the input into the first register operand in the EXPRESSION, then conditionally executes a CCL block just as the `if' statement does. The "branch" statement takes an EXPRESSION and one or more CCL blocks as arguments. The CCL blocks are treated as a zero-indexed array, and the `branch' statement uses the EXPRESSION as the index of the CCL block to execute. Null CCL blocks may be used as no-ops, continuing execution with the statement following the `branch' statement in the containing CCL block. Out-of-range values for the EXPRESSION are also treated as no-ops. The "read-branch" variant of the "branch" statement takes an REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The `read-branch' statement first reads from the input into the REGISTER, then conditionally executes a CCL block just as the `branch' statement does. Loop control statements: ======================== The "loop" statement creates a block with an implied jump from the end of the block back to its head. The loop is exited on a `break' statement, and continued without executing the tail by a `repeat' statement. The "break" statement, written `(break)', terminates the current loop and continues with the next statement in the current block. The "repeat" statement has three variants, `repeat', `write-repeat', and `write-read-repeat'. Each continues the current loop from its head, possibly after performing I/O. `repeat' takes no arguments and does no I/O before jumping. `write-repeat' takes a single argument (a register, an integer, or a string), writes it to the output, then jumps. `write-read-repeat' takes one or two arguments. The first must be a register. The second may be an integer or an array; if absent, it is implicitly set to the first (register) argument. `write-read-repeat' writes its second argument to the output, then reads from the input into the register, and finally jumps. See the `write' and `read' statements for the semantics of the I/O operations for each type of argument. Other control statements: ========================= The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a CCL program as a subroutine. It does not return a value to the caller, but can modify the register status. The "end" statement, written `(end)', terminates the CCL program successfully, and returns to caller (which may be a CCL program). It does not alter the status of the registers.  File: lispref.info, Node: CCL Expressions, Next: Calling CCL, Prev: CCL Statements, Up: CCL CCL Expressions --------------- CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions consist of a single OPERAND, either a register (one of `r0', ..., `r0') or an integer. Complex expressions are lists of the form `( EXPRESSION OPERATOR OPERAND )'. Unlike C, assignments are not expressions. In the following table, X is the target resister for a "set". In subexpressions, this is implicitly `r7'. This means that `>8', `//', `de-sjis', and `en-sjis' cannot be used freely in subexpressions, since they return parts of their values in `r7'. Y may be an expression, register, or integer, while Z must be a register or an integer. Name Operator Code C-like Description CCL_PLUS `+' 0x00 X = Y + Z CCL_MINUS `-' 0x01 X = Y - Z CCL_MUL `*' 0x02 X = Y * Z CCL_DIV `/' 0x03 X = Y / Z CCL_MOD `%' 0x04 X = Y % Z CCL_AND `&' 0x05 X = Y & Z CCL_OR `|' 0x06 X = Y | Z CCL_XOR `^' 0x07 X = Y ^ Z CCL_LSH `<<' 0x08 X = Y << Z CCL_RSH `>>' 0x09 X = Y >> Z CCL_LSH8 `<8' 0x0A X = (Y << 8) | Z CCL_RSH8 `>8' 0x0B X = Y >> 8, r[7] = Y & 0xFF CCL_DIVMOD `//' 0x0C X = Y / Z, r[7] = Y % Z CCL_LS `<' 0x10 X = (X < Y) CCL_GT `>' 0x11 X = (X > Y) CCL_EQ `==' 0x12 X = (X == Y) CCL_LE `<=' 0x13 X = (X <= Y) CCL_GE `>=' 0x14 X = (X >= Y) CCL_NE `!=' 0x15 X = (X != Y) CCL_ENCODE_SJIS `en-sjis' 0x16 X = HIGHER_BYTE (SJIS (Y, Z)) r[7] = LOWER_BYTE (SJIS (Y, Z) CCL_DECODE_SJIS `de-sjis' 0x17 X = HIGHER_BYTE (DE-SJIS (Y, Z)) r[7] = LOWER_BYTE (DE-SJIS (Y, Z)) The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8, CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS and CCL_DECODE_SJIS treat their first and second bytes as the high and low bytes of a two-byte character code. (SJIS stands for Shift JIS, an encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a complicated transformation of the Japanese standard JIS encoding to Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to represent the SJIS operations in infix form.  File: lispref.info, Node: Calling CCL, Next: CCL Examples, Prev: CCL Expressions, Up: CCL Calling CCL ----------- CCL programs are called automatically during Emacs buffer I/O when the external representation has a coding system type of `shift-jis', `big5', or `ccl'. The program is specified by the coding system (*note Coding Systems::). You can also call CCL programs from other CCL programs, and from Lisp using these functions: - Function: ccl-execute ccl-program status Execute CCL-PROGRAM with registers initialized by STATUS. CCL-PROGRAM is a vector of compiled CCL code created by `ccl-compile'. It is an error for the program to try to execute a CCL I/O command. STATUS must be a vector of nine values, specifying the initial value for the R0, R1 .. R7 registers and for the instruction counter IC. A `nil' value for a register initializer causes the register to be set to 0. A `nil' value for the IC initializer causes execution to start at the beginning of the program. When the program is done, STATUS is modified (by side-effect) to contain the ending values for the corresponding registers and IC. - Function: ccl-execute-on-string ccl-program status string &optional continue Execute CCL-PROGRAM with initial STATUS on STRING. CCL-PROGRAM is a vector of compiled CCL code created by `ccl-compile'. STATUS must be a vector of nine values, specifying the initial value for the R0, R1 .. R7 registers and for the instruction counter IC. A `nil' value for a register initializer causes the register to be set to 0. A `nil' value for the IC initializer causes execution to start at the beginning of the program. An optional fourth argument CONTINUE, if non-`nil', causes the IC to remain on the unsatisfied read operation if the program terminates due to exhaustion of the input buffer. Otherwise the IC is set to the end of the program. When the program is done, STATUS is modified (by side-effect) to contain the ending values for the corresponding registers and IC. Returns the resulting string. To call a CCL program from another CCL program, it must first be registered: - Function: register-ccl-program name ccl-program Register NAME for CCL program CCL-PROGRAM in `ccl-program-table'. CCL-PROGRAM should be the compiled form of a CCL program, or `nil'. Return index number of the registered CCL program. Information about the processor time used by the CCL interpreter can be obtained using these functions: - Function: ccl-elapsed-time Returns the elapsed processor time of the CCL interpreter as cons of user and system time, as floating point numbers measured in seconds. If only one overall value can be determined, the return value will be a cons of that value and 0. - Function: ccl-reset-elapsed-time Resets the CCL interpreter's internal elapsed time registers.  File: lispref.info, Node: CCL Examples, Prev: Calling CCL, Up: CCL CCL Examples ------------ This section is not yet written.  File: lispref.info, Node: Category Tables, Prev: CCL, Up: MULE Category Tables =============== A category table is a type of char table used for keeping track of categories. Categories are used for classifying characters for use in regexps--you can refer to a category rather than having to use a complicated [] expression (and category lookups are significantly faster). There are 95 different categories available, one for each printable character (including space) in the ASCII charset. Each category is designated by one such character, called a "category designator". They are specified in a regexp using the syntax `\cX', where X is a category designator. (This is not yet implemented.) A category table specifies, for each character, the categories that the character is in. Note that a character can be in more than one category. More specifically, a category table maps from a character to either the value `nil' (meaning the character is in no categories) or a 95-element bit vector, specifying for each of the 95 categories whether the character is in that category. Special Lisp functions are provided that abstract this, so you do not have to directly manipulate bit vectors. - Function: category-table-p object This function returns `t' if OBJECT is a category table. - Function: category-table &optional buffer This function returns the current category table. This is the one specified by the current buffer, or by BUFFER if it is non-`nil'. - Function: standard-category-table This function returns the standard category table. This is the one used for new buffers. - Function: copy-category-table &optional category-table This function returns a new category table which is a copy of CATEGORY-TABLE, which defaults to the standard category table. - Function: set-category-table category-table &optional buffer This function selects CATEGORY-TABLE as the new category table for BUFFER. BUFFER defaults to the current buffer if omitted. - Function: category-designator-p object This function returns `t' if OBJECT is a category designator (a char in the range `' '' to `'~''). - Function: category-table-value-p object This function returns `t' if OBJECT is a category table value. Valid values are `nil' or a bit vector of size 95.  File: lispref.info, Node: Tips, Next: Building XEmacs and Object Allocation, Prev: MULE, Up: Top Tips and Standards ****************** This chapter describes no additional features of XEmacs Lisp. Instead it gives advice on making effective use of the features described in the previous chapters. * Menu: * Style Tips:: Writing clean and robust programs. * Compilation Tips:: Making compiled code run fast. * Documentation Tips:: Writing readable documentation strings. * Comment Tips:: Conventions for writing comments. * Library Headers:: Standard headers for library packages.  File: lispref.info, Node: Style Tips, Next: Compilation Tips, Up: Tips Writing Clean Lisp Programs =========================== Here are some tips for avoiding common errors in writing Lisp code intended for widespread use: * Since all global variables share the same name space, and all functions share another name space, you should choose a short word to distinguish your program from other Lisp programs. Then take care to begin the names of all global variables, constants, and functions with the chosen prefix. This helps avoid name conflicts. This recommendation applies even to names for traditional Lisp primitives that are not primitives in XEmacs Lisp--even to `cadr'. Believe it or not, there is more than one plausible way to define `cadr'. Play it safe; append your name prefix to produce a name like `foo-cadr' or `mylib-cadr' instead. If you write a function that you think ought to be added to Emacs under a certain name, such as `twiddle-files', don't call it by that name in your program. Call it `mylib-twiddle-files' in your program, and send mail to `bug-gnu-emacs@prep.ai.mit.edu' suggesting we add it to Emacs. If and when we do, we can change the name easily enough. If one prefix is insufficient, your package may use two or three alternative common prefixes, so long as they make sense. Separate the prefix from the rest of the symbol name with a hyphen, `-'. This will be consistent with XEmacs itself and with most Emacs Lisp programs. * It is often useful to put a call to `provide' in each separate library program, at least if there is more than one entry point to the program. * If a file requires certain other library programs to be loaded beforehand, then the comments at the beginning of the file should say so. Also, use `require' to make sure they are loaded. * If one file FOO uses a macro defined in another file BAR, FOO should contain this expression before the first use of the macro: (eval-when-compile (require 'BAR)) (And BAR should contain `(provide 'BAR)', to make the `require' work.) This will cause BAR to be loaded when you byte-compile FOO. Otherwise, you risk compiling FOO without the necessary macro loaded, and that would produce compiled code that won't work right. *Note Compiling Macros::. Using `eval-when-compile' avoids loading BAR when the compiled version of FOO is _used_. * If you define a major mode, make sure to run a hook variable using `run-hooks', just as the existing major modes do. *Note Hooks::. * If the purpose of a function is to tell you whether a certain condition is true or false, give the function a name that ends in `p'. If the name is one word, add just `p'; if the name is multiple words, add `-p'. Examples are `framep' and `frame-live-p'. * If a user option variable records a true-or-false condition, give it a name that ends in `-flag'. * Please do not define `C-c LETTER' as a key in your major modes. These sequences are reserved for users; they are the *only* sequences reserved for users, so we cannot do without them. Instead, define sequences consisting of `C-c' followed by a non-letter. These sequences are reserved for major modes. Changing all the major modes in Emacs 18 so they would follow this convention was a lot of work. Abandoning this convention would make that work go to waste, and inconvenience users. * Sequences consisting of `C-c' followed by `{', `}', `<', `>', `:' or `;' are also reserved for major modes. * Sequences consisting of `C-c' followed by any other punctuation character are allocated for minor modes. Using them in a major mode is not absolutely prohibited, but if you do that, the major mode binding may be shadowed from time to time by minor modes. * You should not bind `C-h' following any prefix character (including `C-c'). If you don't bind `C-h', it is automatically available as a help character for listing the subcommands of the prefix character. * You should not bind a key sequence ending in except following another . (That is, it is ok to bind a sequence ending in ` '.) The reason for this rule is that a non-prefix binding for in any context prevents recognition of escape sequences as function keys in that context. * Applications should not bind mouse events based on button 1 with the shift key held down. These events include `S-mouse-1', `M-S-mouse-1', `C-S-mouse-1', and so on. They are reserved for users. * Modes should redefine `mouse-2' as a command to follow some sort of reference in the text of a buffer, if users usually would not want to alter the text in that buffer by hand. Modes such as Dired, Info, Compilation, and Occur redefine it in this way. * When a package provides a modification of ordinary Emacs behavior, it is good to include a command to enable and disable the feature, Provide a command named `WHATEVER-mode' which turns the feature on or off, and make it autoload (*note Autoload::). Design the package so that simply loading it has no visible effect--that should not enable the feature. Users will request the feature by invoking the command. * It is a bad idea to define aliases for the Emacs primitives. Use the standard names instead. * Redefining an Emacs primitive is an even worse idea. It may do the right thing for a particular program, but there is no telling what other programs might break as a result. * If a file does replace any of the functions or library programs of standard XEmacs, prominent comments at the beginning of the file should say which functions are replaced, and how the behavior of the replacements differs from that of the originals. * Please keep the names of your XEmacs Lisp source files to 13 characters or less. This way, if the files are compiled, the compiled files' names will be 14 characters or less, which is short enough to fit on all kinds of Unix systems. * Don't use `next-line' or `previous-line' in programs; nearly always, `forward-line' is more convenient as well as more predictable and robust. *Note Text Lines::. * Don't call functions that set the mark, unless setting the mark is one of the intended features of your program. The mark is a user-level feature, so it is incorrect to change the mark except to supply a value for the user's benefit. *Note The Mark::. In particular, don't use these functions: * `beginning-of-buffer', `end-of-buffer' * `replace-string', `replace-regexp' If you just want to move point, or replace a certain string, without any of the other features intended for interactive users, you can replace these functions with one or two lines of simple Lisp code. * Use lists rather than vectors, except when there is a particular reason to use a vector. Lisp has more facilities for manipulating lists than for vectors, and working with lists is usually more convenient. Vectors are advantageous for tables that are substantial in size and are accessed in random order (not searched front to back), provided there is no need to insert or delete elements (only lists allow that). * The recommended way to print a message in the echo area is with the `message' function, not `princ'. *Note The Echo Area::. * When you encounter an error condition, call the function `error' (or `signal'). The function `error' does not return. *Note Signaling Errors::. Do not use `message', `throw', `sleep-for', or `beep' to report errors. * An error message should start with a capital letter but should not end with a period. * Try to avoid using recursive edits. Instead, do what the Rmail `e' command does: use a new local keymap that contains one command defined to switch back to the old local keymap. Or do what the `edit-options' command does: switch to another buffer and let the user switch back at will. *Note Recursive Editing::. * In some other systems there is a convention of choosing variable names that begin and end with `*'. We don't use that convention in Emacs Lisp, so please don't use it in your programs. (Emacs uses such names only for program-generated buffers.) The users will find Emacs more coherent if all libraries use the same conventions. * Use names starting with a space for temporary buffers (*note Buffer Names::), or at least call `buffer-disable-undo' on them. Otherwise they may stay referenced by internal undo variable(s) after getting killed. If this happens before dumping (*note Building XEmacs::), this may cause fatal error when portable dumper is used. * Indent each function with `C-M-q' (`indent-sexp') using the default indentation parameters. * Don't make a habit of putting close-parentheses on lines by themselves; Lisp programmers find this disconcerting. Once in a while, when there is a sequence of many consecutive close-parentheses, it may make sense to split them in one or two significant places. * Please put a copyright notice on the file if you give copies to anyone. Use the same lines that appear at the top of the Lisp files in XEmacs itself. If you have not signed papers to assign the copyright to the Foundation, then place your name in the copyright notice in place of the Foundation's name.  File: lispref.info, Node: Compilation Tips, Next: Documentation Tips, Prev: Style Tips, Up: Tips Tips for Making Compiled Code Fast ================================== Here are ways of improving the execution speed of byte-compiled Lisp programs. * Use the `profile' library to profile your program. See the file `profile.el' for instructions. * Use iteration rather than recursion whenever possible. Function calls are slow in XEmacs Lisp even when a compiled function is calling another compiled function. * Using the primitive list-searching functions `memq', `member', `assq', or `assoc' is even faster than explicit iteration. It may be worth rearranging a data structure so that one of these primitive search functions can be used. * Certain built-in functions are handled specially in byte-compiled code, avoiding the need for an ordinary function call. It is a good idea to use these functions rather than alternatives. To see whether a function is handled specially by the compiler, examine its `byte-compile' property. If the property is non-`nil', then the function is handled specially. For example, the following input will show you that `aref' is compiled specially (*note Array Functions::) while `elt' is not (*note Sequence Functions::): (get 'aref 'byte-compile) => byte-compile-two-args (get 'elt 'byte-compile) => nil * If calling a small function accounts for a substantial part of your program's running time, make the function inline. This eliminates the function call overhead. Since making a function inline reduces the flexibility of changing the program, don't do it unless it gives a noticeable speedup in something slow enough that users care about the speed. *Note Inline Functions::.  File: lispref.info, Node: Documentation Tips, Next: Comment Tips, Prev: Compilation Tips, Up: Tips Tips for Documentation Strings ============================== Here are some tips for the writing of documentation strings. * Every command, function, or variable intended for users to know about should have a documentation string. * An internal variable or subroutine of a Lisp program might as well have a documentation string. In earlier Emacs versions, you could save space by using a comment instead of a documentation string, but that is no longer the case. * The first line of the documentation string should consist of one or two complete sentences that stand on their own as a summary. `M-x apropos' displays just the first line, and if it doesn't stand on its own, the result looks bad. In particular, start the first line with a capital letter and end with a period. The documentation string can have additional lines that expand on the details of how to use the function or variable. The additional lines should be made up of complete sentences also, but they may be filled if that looks good. * For consistency, phrase the verb in the first sentence of a documentation string as an infinitive with "to" omitted. For instance, use "Return the cons of A and B." in preference to "Returns the cons of A and B." Usually it looks good to do likewise for the rest of the first paragraph. Subsequent paragraphs usually look better if they have proper subjects. * Write documentation strings in the active voice, not the passive, and in the present tense, not the future. For instance, use "Return a list containing A and B." instead of "A list containing A and B will be returned." * Avoid using the word "cause" (or its equivalents) unnecessarily. Instead of, "Cause Emacs to display text in boldface," write just "Display text in boldface." * Do not start or end a documentation string with whitespace. * Format the documentation string so that it fits in an Emacs window on an 80-column screen. It is a good idea for most lines to be no wider than 60 characters. The first line can be wider if necessary to fit the information that ought to be there. However, rather than simply filling the entire documentation string, you can make it much more readable by choosing line breaks with care. Use blank lines between topics if the documentation string is long. * *Do not* indent subsequent lines of a documentation string so that the text is lined up in the source code with the text of the first line. This looks nice in the source code, but looks bizarre when users view the documentation. Remember that the indentation before the starting double-quote is not part of the string! * A variable's documentation string should start with `*' if the variable is one that users would often want to set interactively. If the value is a long list, or a function, or if the variable would be set only in init files, then don't start the documentation string with `*'. *Note Defining Variables::. * The documentation string for a variable that is a yes-or-no flag should start with words such as "Non-nil means...", to make it clear that all non-`nil' values are equivalent and indicate explicitly what `nil' and non-`nil' mean. * When a function's documentation string mentions the value of an argument of the function, use the argument name in capital letters as if it were a name for that value. Thus, the documentation string of the function `/' refers to its second argument as `DIVISOR', because the actual argument name is `divisor'. Also use all caps for meta-syntactic variables, such as when you show the decomposition of a list or vector into subunits, some of which may vary. * When a documentation string refers to a Lisp symbol, write it as it would be printed (which usually means in lower case), with single-quotes around it. For example: `lambda'. There are two exceptions: write t and nil without single-quotes. (In this manual, we normally do use single-quotes for those symbols.) * Don't write key sequences directly in documentation strings. Instead, use the `\\[...]' construct to stand for them. For example, instead of writing `C-f', write `\\[forward-char]'. When Emacs displays the documentation string, it substitutes whatever key is currently bound to `forward-char'. (This is normally `C-f', but it may be some other character if the user has moved key bindings.) *Note Keys in Documentation::. * In documentation strings for a major mode, you will want to refer to the key bindings of that mode's local map, rather than global ones. Therefore, use the construct `\\<...>' once in the documentation string to specify which key map to use. Do this before the first use of `\\[...]'. The text inside the `\\<...>' should be the name of the variable containing the local keymap for the major mode. It is not practical to use `\\[...]' very many times, because display of the documentation string will become slow. So use this to describe the most important commands in your major mode, and then use `\\{...}' to display the rest of the mode's keymap.  File: lispref.info, Node: Comment Tips, Next: Library Headers, Prev: Documentation Tips, Up: Tips Tips on Writing Comments ======================== We recommend these conventions for where to put comments and how to indent them: `;' Comments that start with a single semicolon, `;', should all be aligned to the same column on the right of the source code. Such comments usually explain how the code on the same line does its job. In Lisp mode and related modes, the `M-;' (`indent-for-comment') command automatically inserts such a `;' in the right place, or aligns such a comment if it is already present. This and following examples are taken from the Emacs sources. (setq base-version-list ; there was a base (assoc (substring fn 0 start-vn) ; version to which file-version-assoc-list)) ; this looks like ; a subversion `;;' Comments that start with two semicolons, `;;', should be aligned to the same level of indentation as the code. Such comments usually describe the purpose of the following lines or the state of the program at that point. For example: (prog1 (setq auto-fill-function ... ... ;; update modeline (redraw-modeline))) Every function that has no documentation string (because it is used only internally within the package it belongs to), should have instead a two-semicolon comment right before the function, explaining what the function does and how to call it properly. Explain precisely what each argument means and how the function interprets its possible values. `;;;' Comments that start with three semicolons, `;;;', should start at the left margin. Such comments are used outside function definitions to make general statements explaining the design principles of the program. For example: ;;; This Lisp code is run in XEmacs ;;; when it is to operate as a server ;;; for other processes. Another use for triple-semicolon comments is for commenting out lines within a function. We use triple-semicolons for this precisely so that they remain at the left margin. (defun foo (a) ;;; This is no longer necessary. ;;; (force-mode-line-update) (message "Finished with %s" a)) `;;;;' Comments that start with four semicolons, `;;;;', should be aligned to the left margin and are used for headings of major sections of a program. For example: ;;;; The kill ring The indentation commands of the Lisp modes in XEmacs, such as `M-;' (`indent-for-comment') and (`lisp-indent-line') automatically indent comments according to these conventions, depending on the number of semicolons. *Note Manipulating Comments: (xemacs)Comments.  File: lispref.info, Node: Library Headers, Prev: Comment Tips, Up: Tips Conventional Headers for XEmacs Libraries ========================================= XEmacs has conventions for using special comments in Lisp libraries to divide them into sections and give information such as who wrote them. This section explains these conventions. First, an example: ;;; lisp-mnt.el --- minor mode for Emacs Lisp maintainers ;; Copyright (C) 1992 Free Software Foundation, Inc. ;; Author: Eric S. Raymond ;; Maintainer: Eric S. Raymond ;; Created: 14 Jul 1992 ;; Version: 1.2 ;; Keywords: docs ;; This file is part of XEmacs. COPYING PERMISSIONS... The very first line should have this format: ;;; FILENAME --- DESCRIPTION The description should be complete in one line. After the copyright notice come several "header comment" lines, each beginning with `;; HEADER-NAME:'. Here is a table of the conventional possibilities for HEADER-NAME: `Author' This line states the name and net address of at least the principal author of the library. If there are multiple authors, you can list them on continuation lines led by `;;' and a tab character, like this: ;; Author: Ashwin Ram ;; Dave Sill ;; Dave Brennan ;; Eric Raymond `Maintainer' This line should contain a single name/address as in the Author line, or an address only, or the string `FSF'. If there is no maintainer line, the person(s) in the Author field are presumed to be the maintainers. The example above is mildly bogus because the maintainer line is redundant. The idea behind the `Author' and `Maintainer' lines is to make possible a Lisp function to "send mail to the maintainer" without having to mine the name out by hand. Be sure to surround the network address with `<...>' if you include the person's full name as well as the network address. `Created' This optional line gives the original creation date of the file. For historical interest only. `Version' If you wish to record version numbers for the individual Lisp program, put them in this line. `Adapted-By' In this header line, place the name of the person who adapted the library for installation (to make it fit the style conventions, for example). `Keywords' This line lists keywords for the `finder-by-keyword' help command. This field is important; it's how people will find your package when they're looking for things by topic area. To separate the keywords, you can use spaces, commas, or both. Just about every Lisp library ought to have the `Author' and `Keywords' header comment lines. Use the others if they are appropriate. You can also put in header lines with other header names--they have no standard meanings, so they can't do any harm. We use additional stylized comments to subdivide the contents of the library file. Here is a table of them: `;;; Commentary:' This begins introductory comments that explain how the library works. It should come right after the copying permissions. `;;; Change log:' This begins change log information stored in the library file (if you store the change history there). For most of the Lisp files distributed with XEmacs, the change history is kept in the file `ChangeLog' and not in the source file at all; these files do not have a `;;; Change log:' line. `;;; Code:' This begins the actual code of the program. `;;; FILENAME ends here' This is the "footer line"; it appears at the very end of the file. Its purpose is to enable people to detect truncated versions of the file from the lack of a footer line.  File: lispref.info, Node: Building XEmacs and Object Allocation, Next: Standard Errors, Prev: Tips, Up: Top Building XEmacs; Allocation of Objects ************************************** This chapter describes how the runnable XEmacs executable is dumped with the preloaded Lisp libraries in it and how storage is allocated. There is an entire separate document, the `XEmacs Internals Manual', devoted to the internals of XEmacs from the perspective of the C programmer. It contains much more detailed information about the build process, the allocation and garbage-collection process, and other aspects related to the internals of XEmacs. * Menu: * Building XEmacs:: How to preload Lisp libraries into XEmacs. * Pure Storage:: A kludge to make preloaded Lisp functions sharable. * Garbage Collection:: Reclaiming space for Lisp objects no longer used.  File: lispref.info, Node: Building XEmacs, Next: Pure Storage, Up: Building XEmacs and Object Allocation Building XEmacs =============== This section explains the steps involved in building the XEmacs executable. You don't have to know this material to build and install XEmacs, since the makefiles do all these things automatically. This information is pertinent to XEmacs maintenance. The `XEmacs Internals Manual' contains more information about this. Compilation of the C source files in the `src' directory produces an executable file called `temacs', also called a "bare impure XEmacs". It contains the XEmacs Lisp interpreter and I/O routines, but not the editing commands. Before XEmacs is actually usable, a number of Lisp files need to be loaded. These define all the editing commands, plus most of the startup code and many very basic Lisp primitives. This is accomplished by loading the file `loadup.el', which in turn loads all of the other standardly-loaded Lisp files. It takes a substantial time to load the standard Lisp files. Luckily, you don't have to do this each time you run XEmacs; `temacs' can dump out an executable program called `xemacs' that has these files preloaded. `xemacs' starts more quickly because it does not need to load the files. This is the XEmacs executable that is normally installed. To create `xemacs', use the command `temacs -batch -l loadup dump'. The purpose of `-batch' here is to tell `temacs' to run in non-interactive, command-line mode. (`temacs' can _only_ run in this fashion. Part of the code required to initialize frames and faces is in Lisp, and must be loaded before XEmacs is able to create any frames.) The argument `dump' tells `loadup.el' to dump a new executable named `xemacs'. The dumping process is highly system-specific, and some operating systems don't support dumping. On those systems, you must start XEmacs with the `temacs -batch -l loadup run-temacs' command each time you use it. This takes a substantial time, but since you need to start Emacs once a day at most--or once a week if you never log out--the extra time is not too severe a problem. (In older versions of Emacs, you started Emacs from `temacs' using `temacs -l loadup'.) You are free to start XEmacs directly from `temacs' if you want, even if there is already a dumped `xemacs'. Normally you wouldn't want to do that; but the Makefiles do this when you rebuild XEmacs using `make all-elc', which builds XEmacs and simultaneously compiles any out-of-date Lisp files. (You need `xemacs' in order to compile Lisp files. However, you also need the compiled Lisp files in order to dump out `xemacs'. If both of these are missing or corrupted, you are out of luck unless you're able to bootstrap `xemacs' from `temacs'. Note that `make all-elc' actually loads the alternative loadup file `loadup-el.el', which works like `loadup.el' but disables the pure-copying process and forces XEmacs to ignore any compiled Lisp files even if they exist.) You can specify additional files to preload by writing a library named `site-load.el' that loads them. You may need to increase the value of `PURESIZE', in `src/puresize.h', to make room for the additional files. You should _not_ modify this file directly, however; instead, use the `--puresize' configuration option. (If you run out of pure space while dumping `xemacs', you will be told how much pure space you actually will need.) However, the advantage of preloading additional files decreases as machines get faster. On modern machines, it is often not advisable, especially if the Lisp code is on a file system local to the machine running XEmacs. You can specify other Lisp expressions to execute just before dumping by putting them in a library named `site-init.el'. However, if they might alter the behavior that users expect from an ordinary unmodified XEmacs, it is better to put them in `default.el', so that users can override them if they wish. *Note Start-up Summary::. Before `loadup.el' dumps the new executable, it finds the documentation strings for primitive and preloaded functions (and variables) in the file where they are stored, by calling `Snarf-documentation' (*note Accessing Documentation::). These strings were moved out of the `xemacs' executable to make it smaller. *Note Documentation Basics::. - Function: dump-emacs to-file from-file This function dumps the current state of XEmacs into an executable file TO-FILE. It takes symbols from FROM-FILE (this is normally the executable file `temacs'). If you use this function in an XEmacs that was already dumped, you must set `command-line-processed' to `nil' first for good results. *Note Command Line Arguments::. - Function: run-emacs-from-temacs &rest args This is the function that implements the `run-temacs' command-line argument. It is called from `loadup.el' as appropriate. You should most emphatically _not_ call this yourself; it will reinitialize your XEmacs process and you'll be sorry. - Command: emacs-version &optional arg This function returns a string describing the version of XEmacs that is running. It is useful to include this string in bug reports. When called interactively with a prefix argument, insert string at point. Don't use this function in programs to choose actions according to the system configuration; look at `system-configuration' instead. (emacs-version) => "XEmacs 20.1 [Lucid] (i586-unknown-linux2.0.29) of Mon Apr 7 1997 on altair.xemacs.org" Called interactively, the function prints the same information in the echo area. - Variable: emacs-build-time The value of this variable is the time at which XEmacs was built at the local site. emacs-build-time "Mon Apr 7 20:28:52 1997" => - Variable: emacs-version The value of this variable is the version of Emacs being run. It is a string, e.g. `"20.1 XEmacs Lucid"'. The following two variables did not exist before FSF GNU Emacs version 19.23 and XEmacs version 19.10, which reduces their usefulness at present, but we hope they will be convenient in the future. - Variable: emacs-major-version The major version number of Emacs, as an integer. For XEmacs version 20.1, the value is 20. - Variable: emacs-minor-version The minor version number of Emacs, as an integer. For XEmacs version 20.1, the value is 1.  File: lispref.info, Node: Pure Storage, Next: Garbage Collection, Prev: Building XEmacs, Up: Building XEmacs and Object Allocation Pure Storage ============ XEmacs Lisp uses two kinds of storage for user-created Lisp objects: "normal storage" and "pure storage". Normal storage is where all the new data created during an XEmacs session is kept; see the following section for information on normal storage. Pure storage is used for certain data in the preloaded standard Lisp files--data that should never change during actual use of XEmacs. Pure storage is allocated only while `temacs' is loading the standard preloaded Lisp libraries. In the file `xemacs', it is marked as read-only (on operating systems that permit this), so that the memory space can be shared by all the XEmacs jobs running on the machine at once. Pure storage is not expandable; a fixed amount is allocated when XEmacs is compiled, and if that is not sufficient for the preloaded libraries, `temacs' aborts with an error message. If that happens, you must increase the compilation parameter `PURESIZE' using the `--puresize' option to `configure'. This normally won't happen unless you try to preload additional libraries or add features to the standard ones. - Function: purecopy object This function makes a copy of OBJECT in pure storage and returns it. It copies strings by simply making a new string with the same characters in pure storage. It recursively copies the contents of vectors and cons cells. It does not make copies of other objects such as symbols, but just returns them unchanged. It signals an error if asked to copy markers. This function is a no-op in XEmacs, and its use in new code is deprecated. - Variable: pure-bytes-used The value of this variable is the number of bytes of pure storage allocated so far. Typically, in a dumped XEmacs, this number is very close to the total amount of pure storage available--if it were not, we would preallocate less. - Variable: purify-flag This variable determines whether `defun' should make a copy of the function definition in pure storage. If it is non-`nil', then the function definition is copied into pure storage. This flag is `t' while loading all of the basic functions for building XEmacs initially (allowing those functions to be sharable and non-collectible). Dumping XEmacs as an executable always writes `nil' in this variable, regardless of the value it actually has before and after dumping. You should not change this flag in a running XEmacs.  File: lispref.info, Node: Garbage Collection, Prev: Pure Storage, Up: Building XEmacs and Object Allocation Garbage Collection ================== When a program creates a list or the user defines a new function (such as by loading a library), that data is placed in normal storage. If normal storage runs low, then XEmacs asks the operating system to allocate more memory in blocks of 2k bytes. Each block is used for one type of Lisp object, so symbols, cons cells, markers, etc., are segregated in distinct blocks in memory. (Vectors, long strings, buffers and certain other editing types, which are fairly large, are allocated in individual blocks, one per object, while small strings are packed into blocks of 8k bytes. [More correctly, a string is allocated in two sections: a fixed size chunk containing the length, list of extents, etc.; and a chunk containing the actual characters in the string. It is this latter chunk that is either allocated individually or packed into 8k blocks. The fixed size chunk is packed into 2k blocks, as for conses, markers, etc.]) It is quite common to use some storage for a while, then release it by (for example) killing a buffer or deleting the last pointer to an object. XEmacs provides a "garbage collector" to reclaim this abandoned storage. (This name is traditional, but "garbage recycler" might be a more intuitive metaphor for this facility.) The garbage collector operates by finding and marking all Lisp objects that are still accessible to Lisp programs. To begin with, it assumes all the symbols, their values and associated function definitions, and any data presently on the stack, are accessible. Any objects that can be reached indirectly through other accessible objects are also accessible. When marking is finished, all objects still unmarked are garbage. No matter what the Lisp program or the user does, it is impossible to refer to them, since there is no longer a way to reach them. Their space might as well be reused, since no one will miss them. The second ("sweep") phase of the garbage collector arranges to reuse them. The sweep phase puts unused cons cells onto a "free list" for future allocation; likewise for symbols, markers, extents, events, floats, compiled-function objects, and the fixed-size portion of strings. It compacts the accessible small string-chars chunks so they occupy fewer 8k blocks; then it frees the other 8k blocks. Vectors, buffers, windows, and other large objects are individually allocated and freed using `malloc' and `free'. Common Lisp note: unlike other Lisps, XEmacs Lisp does not call the garbage collector when the free list is empty. Instead, it simply requests the operating system to allocate more storage, and processing continues until `gc-cons-threshold' bytes have been used. This means that you can make sure that the garbage collector will not run during a certain portion of a Lisp program by calling the garbage collector explicitly just before it (provided that portion of the program does not use so much space as to force a second garbage collection). - Command: garbage-collect This command runs a garbage collection, and returns information on the amount of space in use. (Garbage collection can also occur spontaneously if you use more than `gc-cons-threshold' bytes of Lisp data since the previous garbage collection.) `garbage-collect' returns a list containing the following information: ((USED-CONSES . FREE-CONSES) (USED-SYMS . FREE-SYMS) (USED-MARKERS . FREE-MARKERS) USED-STRING-CHARS USED-VECTOR-SLOTS (PLIST)) => ((73362 . 8325) (13718 . 164) (5089 . 5098) 949121 118677 (conses-used 73362 conses-free 8329 cons-storage 658168 symbols-used 13718 symbols-free 164 symbol-storage 335216 bit-vectors-used 0 bit-vectors-total-length 0 bit-vector-storage 0 vectors-used 7882 vectors-total-length 118677 vector-storage 537764 compiled-functions-used 1336 compiled-functions-free 37 compiled-function-storage 44440 short-strings-used 28829 long-strings-used 2 strings-free 7722 short-strings-total-length 916657 short-string-storage 1179648 long-strings-total-length 32464 string-header-storage 441504 floats-used 3 floats-free 43 float-storage 2044 markers-used 5089 markers-free 5098 marker-storage 245280 events-used 103 events-free 835 event-storage 110656 extents-used 10519 extents-free 2718 extent-storage 372736 extent-auxiliarys-used 111 extent-auxiliarys-freed 3 extent-auxiliary-storage 4440 window-configurations-used 39 window-configurations-on-free-list 5 window-configurations-freed 10 window-configuration-storage 9492 popup-datas-used 3 popup-data-storage 72 toolbar-buttons-used 62 toolbar-button-storage 4960 toolbar-datas-used 12 toolbar-data-storage 240 symbol-value-buffer-locals-used 182 symbol-value-buffer-local-storage 5824 symbol-value-lisp-magics-used 22 symbol-value-lisp-magic-storage 1496 symbol-value-varaliases-used 43 symbol-value-varalias-storage 1032 opaque-lists-used 2 opaque-list-storage 48 color-instances-used 12 color-instance-storage 288 font-instances-used 5 font-instance-storage 180 opaques-used 11 opaque-storage 312 range-tables-used 1 range-table-storage 16 faces-used 34 face-storage 2584 glyphs-used 124 glyph-storage 4464 specifiers-used 775 specifier-storage 43869 weak-lists-used 786 weak-list-storage 18864 char-tables-used 40 char-table-storage 41920 buffers-used 25 buffer-storage 7000 extent-infos-used 457 extent-infos-freed 73 extent-info-storage 9140 keymaps-used 275 keymap-storage 12100 consoles-used 4 console-storage 384 command-builders-used 2 command-builder-storage 120 devices-used 2 device-storage 344 frames-used 3 frame-storage 624 image-instances-used 47 image-instance-storage 3008 windows-used 27 windows-freed 2 window-storage 9180 lcrecord-lists-used 15 lcrecord-list-storage 360 hash-tables-used 631 hash-table-storage 25240 streams-used 1 streams-on-free-list 3 streams-freed 12 stream-storage 91)) Here is a table explaining each element: USED-CONSES The number of cons cells in use. FREE-CONSES The number of cons cells for which space has been obtained from the operating system, but that are not currently being used. USED-SYMS The number of symbols in use. FREE-SYMS The number of symbols for which space has been obtained from the operating system, but that are not currently being used. USED-MARKERS The number of markers in use. FREE-MARKERS The number of markers for which space has been obtained from the operating system, but that are not currently being used. USED-STRING-CHARS The total size of all strings, in characters. USED-VECTOR-SLOTS The total number of elements of existing vectors. PLIST A list of alternating keyword/value pairs providing more detailed information. (As you can see above, quite a lot of information is provided.) - User Option: gc-cons-threshold The value of this variable is the number of bytes of storage that must be allocated for Lisp objects after one garbage collection in order to trigger another garbage collection. A cons cell counts as eight bytes, a string as one byte per character plus a few bytes of overhead, and so on; space allocated to the contents of buffers does not count. Note that the subsequent garbage collection does not happen immediately when the threshold is exhausted, but only the next time the Lisp evaluator is called. The initial threshold value is 500,000. If you specify a larger value, garbage collection will happen less often. This reduces the amount of time spent garbage collecting, but increases total memory use. You may want to do this when running a program that creates lots of Lisp data. You can make collections more frequent by specifying a smaller value, down to 10,000. A value less than 10,000 will remain in effect only until the subsequent garbage collection, at which time `garbage-collect' will set the threshold back to 10,000. (This does not apply if XEmacs was configured with `--debug'. Therefore, be careful when setting `gc-cons-threshold' in that case!) - Variable: pre-gc-hook This is a normal hook to be run just before each garbage collection. Interrupts, garbage collection, and errors are inhibited while this hook runs, so be extremely careful in what you add here. In particular, avoid consing, and do not interact with the user. - Variable: post-gc-hook This is a normal hook to be run just after each garbage collection. Interrupts, garbage collection, and errors are inhibited while this hook runs, so be extremely careful in what you add here. In particular, avoid consing, and do not interact with the user. - Variable: gc-message This is a string to print to indicate that a garbage collection is in progress. This is printed in the echo area. If the selected frame is on a window system and `gc-pointer-glyph' specifies a value (i.e. a pointer image instance) in the domain of the selected frame, the mouse cursor will change instead of this message being printed. - Glyph: gc-pointer-glyph This holds the pointer glyph used to indicate that a garbage collection is in progress. If the selected window is on a window system and this glyph specifies a value (i.e. a pointer image instance) in the domain of the selected window, the cursor will be changed as specified during garbage collection. Otherwise, a message will be printed in the echo area, as controlled by `gc-message'. *Note Glyphs::. If XEmacs was configured with `--debug', you can set the following two variables to get direct information about all the allocation that is happening in a segment of Lisp code. - Variable: debug-allocation If non-zero, print out information to stderr about all objects allocated. - Variable: debug-allocation-backtrace Length (in stack frames) of short backtrace printed out by `debug-allocation'.  File: lispref.info, Node: Standard Errors, Next: Standard Buffer-Local Variables, Prev: Building XEmacs and Object Allocation, Up: Top Standard Errors *************** Here is the complete list of the error symbols in standard Emacs, grouped by concept. The list includes each symbol's message (on the `error-message' property of the symbol) and a cross reference to a description of how the error can occur. Each error symbol has an `error-conditions' property that is a list of symbols. Normally this list includes the error symbol itself and the symbol `error'. Occasionally it includes additional symbols, which are intermediate classifications, narrower than `error' but broader than a single error symbol. For example, all the errors in accessing files have the condition `file-error'. As a special exception, the error symbol `quit' does not have the condition `error', because quitting is not considered an error. *Note Errors::, for an explanation of how errors are generated and handled. `SYMBOL' STRING; REFERENCE. `error' `"error"' *Note Errors::. `quit' `"Quit"' *Note Quitting::. `args-out-of-range' `"Args out of range"' *Note Sequences Arrays Vectors::. `arith-error' `"Arithmetic error"' See `/' and `%' in *Note Numbers::. `beginning-of-buffer' `"Beginning of buffer"' *Note Motion::. `buffer-read-only' `"Buffer is read-only"' *Note Read Only Buffers::. `cyclic-function-indirection' `"Symbol's chain of function indirections contains a loop"' *Note Function Indirection::. `domain-error' `"Arithmetic domain error"' `end-of-buffer' `"End of buffer"' *Note Motion::. `end-of-file' `"End of file during parsing"' This is not a `file-error'. *Note Input Functions::. `file-error' This error and its subcategories do not have error-strings, because the error message is constructed from the data items alone when the error condition `file-error' is present. *Note Files::. `file-locked' This is a `file-error'. *Note File Locks::. `file-already-exists' This is a `file-error'. *Note Writing to Files::. `file-supersession' This is a `file-error'. *Note Modification Time::. `invalid-byte-code' `"Invalid byte code"' *Note Byte Compilation::. `invalid-function' `"Invalid function"' *Note Classifying Lists::. `invalid-read-syntax' `"Invalid read syntax"' *Note Input Functions::. `invalid-regexp' `"Invalid regexp"' *Note Regular Expressions::. `mark-inactive' `"The mark is not active now"' `no-catch' `"No catch for tag"' *Note Catch and Throw::. `overflow-error' `"Arithmetic overflow error"' `protected-field' `"Attempt to modify a protected field"' `range-error' `"Arithmetic range error"' `search-failed' `"Search failed"' *Note Searching and Matching::. `setting-constant' `"Attempt to set a constant symbol"' *Note Variables that Never Change: Constant Variables. `singularity-error' `"Arithmetic singularity error"' `tooltalk-error' `"ToolTalk error"' *Note ToolTalk Support::. `undefined-keystroke-sequence' `"Undefined keystroke sequence"' `void-function' `"Symbol's function definition is void"' *Note Function Cells::. `void-variable' `"Symbol's value as variable is void"' *Note Accessing Variables::. `wrong-number-of-arguments' `"Wrong number of arguments"' *Note Classifying Lists::. `wrong-type-argument' `"Wrong type argument"' *Note Type Predicates::. These error types, which are all classified as special cases of `arith-error', can occur on certain systems for invalid use of mathematical functions. `domain-error' `"Arithmetic domain error"' *Note Math Functions::. `overflow-error' `"Arithmetic overflow error"' *Note Math Functions::. `range-error' `"Arithmetic range error"' *Note Math Functions::. `singularity-error' `"Arithmetic singularity error"' *Note Math Functions::. `underflow-error' `"Arithmetic underflow error"' *Note Math Functions::.  File: lispref.info, Node: Standard Buffer-Local Variables, Next: Standard Keymaps, Prev: Standard Errors, Up: Top Buffer-Local Variables ********************** The table below lists the general-purpose Emacs variables that are automatically local (when set) in each buffer. Many Lisp packages define such variables for their internal use; we don't list them here. `abbrev-mode' *note Abbrevs:: `auto-fill-function' *note Auto Filling:: `buffer-auto-save-file-name' *note Auto-Saving:: `buffer-backed-up' *note Backup Files:: `buffer-display-table' *note Display Tables:: `buffer-file-format' *note Format Conversion:: `buffer-file-name' *note Buffer File Name:: `buffer-file-number' *note Buffer File Name:: `buffer-file-truename' *note Buffer File Name:: `buffer-file-type' *note Files and MS-DOS:: `buffer-invisibility-spec' *note Invisible Text:: `buffer-offer-save' *note Saving Buffers:: `buffer-read-only' *note Read Only Buffers:: `buffer-saved-size' *note Point:: `buffer-undo-list' *note Undo:: `cache-long-line-scans' *note Text Lines:: `case-fold-search' *note Searching and Case:: `ctl-arrow' *note Usual Display:: `comment-column' *note Comments: (xemacs)Comments. `default-directory' *note System Environment:: `defun-prompt-regexp' *note List Motion:: `fill-column' *note Auto Filling:: `goal-column' *note Moving Point: (xemacs)Moving Point. `left-margin' *note Indentation:: `local-abbrev-table' *note Abbrevs:: `local-write-file-hooks' *note Saving Buffers:: `major-mode' *note Mode Help:: `mark-active' *note The Mark:: `mark-ring' *note The Mark:: `minor-modes' *note Minor Modes:: `modeline-format' *note Modeline Data:: `modeline-buffer-identification' *note Modeline Variables:: `modeline-format' *note Modeline Data:: `modeline-modified' *note Modeline Variables:: `modeline-process' *note Modeline Variables:: `mode-name' *note Modeline Variables:: `overwrite-mode' *note Insertion:: `paragraph-separate' *note Standard Regexps:: `paragraph-start' *note Standard Regexps:: `point-before-scroll' Used for communication between mouse commands and scroll-bar commands. `require-final-newline' *note Insertion:: `selective-display' *note Selective Display:: `selective-display-ellipses' *note Selective Display:: `tab-width' *note Usual Display:: `truncate-lines' *note Truncation:: `vc-mode' *note Modeline Variables::  File: lispref.info, Node: Standard Keymaps, Next: Standard Hooks, Prev: Standard Buffer-Local Variables, Up: Top Standard Keymaps **************** The following symbols are used as the names for various keymaps. Some of these exist when XEmacs is first started, others are loaded only when their respective mode is used. This is not an exhaustive list. Almost all of these maps are used as local maps. Indeed, of the modes that presently exist, only Vip mode and Terminal mode ever change the global keymap. `bookmark-map' A keymap containing bindings to bookmark functions. `Buffer-menu-mode-map' A keymap used by Buffer Menu mode. `c++-mode-map' A keymap used by C++ mode. `c-mode-map' A keymap used by C mode. A sparse keymap used by C mode. `command-history-map' A keymap used by Command History mode. `ctl-x-4-map' A keymap for subcommands of the prefix `C-x 4'. `ctl-x-5-map' A keymap for subcommands of the prefix `C-x 5'. `ctl-x-map' A keymap for `C-x' commands. `debugger-mode-map' A keymap used by Debugger mode. `dired-mode-map' A keymap for `dired-mode' buffers. `edit-abbrevs-map' A keymap used in `edit-abbrevs'. `edit-tab-stops-map' A keymap used in `edit-tab-stops'. `electric-buffer-menu-mode-map' A keymap used by Electric Buffer Menu mode. `electric-history-map' A keymap used by Electric Command History mode. `emacs-lisp-mode-map' A keymap used by Emacs Lisp mode. `help-map' A keymap for characters following the Help key. `Helper-help-map' A keymap used by the help utility package. It has the same keymap in its value cell and in its function cell. `Info-edit-map' A keymap used by the `e' command of Info. `Info-mode-map' A keymap containing Info commands. `isearch-mode-map' A keymap that defines the characters you can type within incremental search. `itimer-edit-map' A keymap used when in Itimer Edit mode. `lisp-interaction-mode-map' A keymap used by Lisp mode. `lisp-mode-map' A keymap used by Lisp mode. A keymap for minibuffer input with completion. `minibuffer-local-isearch-map' A keymap for editing isearch strings in the minibuffer. `minibuffer-local-map' Default keymap to use when reading from the minibuffer. `minibuffer-local-must-match-map' A keymap for minibuffer input with completion, for exact match. `mode-specific-map' The keymap for characters following `C-c'. Note, this is in the global map. This map is not actually mode specific: its name was chosen to be informative for the user in `C-h b' (`display-bindings'), where it describes the main use of the `C-c' prefix key. `modeline-map' The keymap consulted for mouse-clicks on the modeline of a window. `objc-mode-map' A keymap used in Objective C mode as a local map. `occur-mode-map' A local keymap used by Occur mode. `overriding-local-map' A keymap that overrides all other local keymaps. `query-replace-map' A local keymap used for responses in `query-replace' and related commands; also for `y-or-n-p' and `map-y-or-n-p'. The functions that use this map do not support prefix keys; they look up one event at a time. `read-expression-map' The minibuffer keymap used for reading Lisp expressions. `read-shell-command-map' The minibuffer keymap used by `shell-command' and related commands. `shared-lisp-mode-map' A keymap for commands shared by all sorts of Lisp modes. `text-mode-map' A keymap used by Text mode. `toolbar-map' The keymap consulted for mouse-clicks over a toolbar. `view-mode-map' A keymap used by View mode.  File: lispref.info, Node: Standard Hooks, Next: Index, Prev: Standard Keymaps, Up: Top Standard Hooks ************** The following is a list of hook variables that let you provide functions to be called from within Emacs on suitable occasions. Most of these variables have names ending with `-hook'. They are "normal hooks", run by means of `run-hooks'. The value of such a hook is a list of functions. The recommended way to put a new function on such a hook is to call `add-hook'. *Note Hooks::, for more information about using hooks. The variables whose names end in `-function' have single functions as their values. Usually there is a specific reason why the variable is not a normal hook, such as the need to pass arguments to the function. (In older Emacs versions, some of these variables had names ending in `-hook' even though they were not normal hooks.) The variables whose names end in `-hooks' or `-functions' have lists of functions as their values, but these functions are called in a special way (they are passed arguments, or else their values are used). `activate-menubar-hook' `activate-popup-menu-hook' `ad-definition-hooks' `adaptive-fill-function' `add-log-current-defun-function' `after-change-functions' `after-delete-annotation-hook' `after-init-hook' `after-insert-file-functions' `after-revert-hook' `after-save-hook' `after-set-visited-file-name-hooks' `after-write-file-hooks' `auto-fill-function' `auto-save-hook' `before-change-functions' `before-delete-annotation-hook' `before-init-hook' `before-revert-hook' `blink-paren-function' `buffers-menu-switch-to-buffer-function' `c++-mode-hook' `c-delete-function' `c-mode-common-hook' `c-mode-hook' `c-special-indent-hook' `calendar-load-hook' `change-major-mode-hook' `command-history-hook' `comment-indent-function' `compilation-buffer-name-function' `compilation-exit-message-function' `compilation-finish-function' `compilation-parse-errors-function' `compilation-mode-hook' `create-console-hook' `create-device-hook' `create-frame-hook' `dabbrev-friend-buffer-function' `dabbrev-select-buffers-function' `delete-console-hook' `delete-device-hook' `delete-frame-hook' `deselect-frame-hook' `diary-display-hook' `diary-hook' `dired-after-readin-hook' `dired-before-readin-hook' `dired-load-hook' `dired-mode-hook' `disabled-command-hook' `display-buffer-function' `ediff-after-setup-control-frame-hook' `ediff-after-setup-windows-hook' `ediff-before-setup-control-frame-hook' `ediff-before-setup-windows-hook' `ediff-brief-help-message-function' `ediff-cleanup-hook' `ediff-control-frame-position-function' `ediff-display-help-hook' `ediff-focus-on-regexp-matches-function' `ediff-forward-word-function' `ediff-hide-regexp-matches-function' `ediff-keymap-setup-hook' `ediff-load-hook' `ediff-long-help-message-function' `ediff-make-wide-display-function' `ediff-merge-split-window-function' `ediff-meta-action-function' `ediff-meta-redraw-function' `ediff-mode-hook' `ediff-prepare-buffer-hook' `ediff-quit-hook' `ediff-registry-setup-hook' `ediff-select-hook' `ediff-session-action-function' `ediff-session-group-setup-hook' `ediff-setup-diff-regions-function' `ediff-show-registry-hook' `ediff-show-session-group-hook' `ediff-skip-diff-region-function' `ediff-split-window-function' `ediff-startup-hook' `ediff-suspend-hook' `ediff-toggle-read-only-function' `ediff-unselect-hook' `ediff-window-setup-function' `edit-picture-hook' `electric-buffer-menu-mode-hook' `electric-command-history-hook' `electric-help-mode-hook' `emacs-lisp-mode-hook' `fill-paragraph-function' `find-file-hooks' `find-file-not-found-hooks' `first-change-hook' `font-lock-after-fontify-buffer-hook' `font-lock-beginning-of-syntax-function' `font-lock-mode-hook' `fume-found-function-hook' `fume-list-mode-hook' `fume-rescan-buffer-hook' `fume-sort-function' `gnus-startup-hook' `hack-local-variables-hook' `highlight-headers-follow-url-function' `hyper-apropos-mode-hook' `indent-line-function' `indent-mim-hook' `indent-region-function' `initial-calendar-window-hook' `isearch-mode-end-hook' `isearch-mode-hook' `java-mode-hook' `kill-buffer-hook' `kill-buffer-query-functions' `kill-emacs-hook' `kill-emacs-query-functions' `kill-hooks' `LaTeX-mode-hook' `latex-mode-hook' `ledit-mode-hook' `lisp-indent-function' `lisp-interaction-mode-hook' `lisp-mode-hook' `list-diary-entries-hook' `load-read-function' `log-message-filter-function' `m2-mode-hook' `mail-citation-hook' `mail-mode-hook' `mail-setup-hook' `make-annotation-hook' `makefile-mode-hook' `map-frame-hook' `mark-diary-entries-hook' `medit-mode-hook' `menu-no-selection-hook' `mh-compose-letter-hook' `mh-folder-mode-hook' `mh-letter-mode-hook' `mim-mode-hook' `minibuffer-exit-hook' `minibuffer-setup-hook' `mode-motion-hook' `mouse-enter-frame-hook' `mouse-leave-frame-hook' `mouse-track-cleanup-hook' `mouse-track-click-hook' `mouse-track-down-hook' `mouse-track-drag-hook' `mouse-track-drag-up-hook' `mouse-track-up-hook' `mouse-yank-function' `news-mode-hook' `news-reply-mode-hook' `news-setup-hook' `nongregorian-diary-listing-hook' `nongregorian-diary-marking-hook' `nroff-mode-hook' `objc-mode-hook' `outline-mode-hook' `perl-mode-hook' `plain-TeX-mode-hook' `post-command-hook' `post-gc-hook' `pre-abbrev-expand-hook' `pre-command-hook' `pre-display-buffer-function' `pre-gc-hook' `pre-idle-hook' `print-diary-entries-hook' `prolog-mode-hook' `protect-innocence-hook' `remove-message-hook' `revert-buffer-function' `revert-buffer-insert-contents-function' `rmail-edit-mode-hook' `rmail-mode-hook' `rmail-retry-setup-hook' `rmail-summary-mode-hook' `scheme-indent-hook' `scheme-mode-hook' `scribe-mode-hook' `select-frame-hook' `send-mail-function' `shell-mode-hook' `shell-set-directory-error-hook' `special-display-function' `suspend-hook' `suspend-resume-hook' `temp-buffer-show-function' `term-setup-hook' `terminal-mode-hook' `terminal-mode-break-hook' `TeX-mode-hook' `tex-mode-hook' `text-mode-hook' `today-visible-calendar-hook' `today-invisible-calendar-hook' `tooltalk-message-handler-hook' `tooltalk-pattern-handler-hook' `tooltalk-unprocessed-message-hook' `unmap-frame-hook' `vc-checkin-hook' `vc-checkout-writable-buffer-hook' `vc-log-after-operation-hook' `vc-make-buffer-writable-hook' `view-hook' `vm-arrived-message-hook' `vm-arrived-messages-hook' `vm-chop-full-name-function' `vm-display-buffer-hook' `vm-edit-message-hook' `vm-forward-message-hook' `vm-iconify-frame-hook' `vm-inhibit-write-file-hook' `vm-key-functions' `vm-mail-hook' `vm-mail-mode-hook' `vm-menu-setup-hook' `vm-mode-hook' `vm-quit-hook' `vm-rename-current-buffer-function' `vm-reply-hook' `vm-resend-bounced-message-hook' `vm-resend-message-hook' `vm-retrieved-spooled-mail-hook' `vm-select-message-hook' `vm-select-new-message-hook' `vm-select-unread-message-hook' `vm-send-digest-hook' `vm-summary-mode-hook' `vm-summary-pointer-update-hook' `vm-summary-redo-hook' `vm-summary-update-hook' `vm-undisplay-buffer-hook' `vm-visit-folder-hook' `window-setup-hook' `write-contents-hooks' `write-file-data-hooks' `write-file-hooks' `write-region-annotate-functions' `x-lost-selection-hooks' `x-sent-selection-hooks' `zmacs-activate-region-hook' `zmacs-deactivate-region-hook' `zmacs-update-region-hook'