1 This is ../info/lispref.info, produced by makeinfo version 4.8 from
4 INFO-DIR-SECTION XEmacs Editor
6 * Lispref: (lispref). XEmacs Lisp Reference Manual.
11 GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
21 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
22 Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc.
23 Copyright (C) 1995, 1996 Ben Wing.
25 Permission is granted to make and distribute verbatim copies of this
26 manual provided the copyright notice and this permission notice are
27 preserved on all copies.
29 Permission is granted to copy and distribute modified versions of
30 this manual under the conditions for verbatim copying, provided that the
31 entire resulting derived work is distributed under the terms of a
32 permission notice identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that this permission notice may be stated in a
37 translation approved by the Foundation.
39 Permission is granted to copy and distribute modified versions of
40 this manual under the conditions for verbatim copying, provided also
41 that the section entitled "GNU General Public License" is included
42 exactly as in the original, and provided that the entire resulting
43 derived work is distributed under the terms of a permission notice
44 identical to this one.
46 Permission is granted to copy and distribute translations of this
47 manual into another language, under the above conditions for modified
48 versions, except that the section entitled "GNU General Public License"
49 may be included in a translation approved by the Free Software
50 Foundation instead of in the original English.
53 File: lispref.info, Node: Flow Control, Next: Batch Mode, Prev: Terminal Output, Up: System Interface
58 This section attempts to answer the question "Why does XEmacs choose to
59 use flow-control characters in its command character set?" For a
60 second view on this issue, read the comments on flow control in the
61 `emacs/INSTALL' file from the distribution; for help with Termcap
62 entries and DEC terminal concentrators, see `emacs/etc/TERMS'.
64 At one time, most terminals did not need flow control, and none used
65 `C-s' and `C-q' for flow control. Therefore, the choice of `C-s' and
66 `C-q' as command characters was uncontroversial. XEmacs, for economy
67 of keystrokes and portability, used nearly all the ASCII control
68 characters, with mnemonic meanings when possible; thus, `C-s' for
69 search and `C-q' for quote.
71 Later, some terminals were introduced which required these characters
72 for flow control. They were not very good terminals for full-screen
73 editing, so XEmacs maintainers did not pay attention. In later years,
74 flow control with `C-s' and `C-q' became widespread among terminals,
75 but by this time it was usually an option. And the majority of users,
76 who can turn flow control off, were unwilling to switch to less
77 mnemonic key bindings for the sake of flow control.
79 So which usage is "right", XEmacs's or that of some terminal and
80 concentrator manufacturers? This question has no simple answer.
82 One reason why we are reluctant to cater to the problems caused by
83 `C-s' and `C-q' is that they are gratuitous. There are other
84 techniques (albeit less common in practice) for flow control that
85 preserve transparency of the character stream. Note also that their use
86 for flow control is not an official standard. Interestingly, on the
87 model 33 teletype with a paper tape punch (which is very old), `C-s'
88 and `C-q' were sent by the computer to turn the punch on and off!
90 As X servers and other window systems replace character-only
91 terminals, this problem is gradually being cured. For the mean time,
92 XEmacs provides a convenient way of enabling flow control if you want
93 it: call the function `enable-flow-control'.
95 -- Command: enable-flow-control &optional argument
96 This function enables use of `C-s' and `C-q' for output flow
97 control, and provides the characters `C-\' and `C-^' as aliases
98 for them using `keyboard-translate-table' (*note Translating
101 With optional argument ARGUMENT (interactively the prefix
102 argument), enable flow control mode if ARGUMENT is positive; else
105 You can use the function `enable-flow-control-on' in your `.emacs'
106 file to enable flow control automatically on certain terminal types.
108 -- Function: enable-flow-control-on &rest termtypes
109 This function enables flow control, and the aliases `C-\' and
110 `C-^', if the terminal type is one of TERMTYPES. For example:
112 (enable-flow-control-on "vt200" "vt300" "vt101" "vt131")
114 Here is how `enable-flow-control' does its job:
116 1. It sets CBREAK mode for terminal input, and tells the operating
117 system to handle flow control, with `(set-input-mode nil t)'.
119 2. It sets up `keyboard-translate-table' to translate `C-\' and `C-^'
120 into `C-s' and `C-q'. Except at its very lowest level, XEmacs
121 never knows that the characters typed were anything but `C-s' and
122 `C-q', so you can in effect type them as `C-\' and `C-^' even when
123 they are input for other commands. *Note Translating Input::.
125 If the terminal is the source of the flow control characters, then
126 once you enable kernel flow control handling, you probably can make do
127 with less padding than normal for that terminal. You can reduce the
128 amount of padding by customizing the Termcap entry. You can also
129 reduce it by setting `baud-rate' to a smaller value so that XEmacs uses
130 a smaller speed when calculating the padding needed. *Note Terminal
134 File: lispref.info, Node: Batch Mode, Prev: Flow Control, Up: System Interface
139 The command line option `-batch' causes XEmacs to run noninteractively.
140 In this mode, XEmacs does not read commands from the terminal, it does
141 not alter the terminal modes, and it does not expect to be outputting
142 to an erasable screen. The idea is that you specify Lisp programs to
143 run; when they are finished, XEmacs should exit. The way to specify
144 the programs to run is with `-l FILE', which loads the library named
145 FILE, and `-f FUNCTION', which calls FUNCTION with no arguments.
147 Any Lisp program output that would normally go to the echo area,
148 either using `message' or using `prin1', etc., with `t' as the stream,
149 goes instead to XEmacs's standard error descriptor when in batch mode.
150 Thus, XEmacs behaves much like a noninteractive application program.
151 (The echo area output that XEmacs itself normally generates, such as
152 command echoing, is suppressed entirely.)
154 -- Function: noninteractive
155 This function returns non-`nil' when XEmacs is running in batch
158 -- Variable: noninteractive
159 This variable is non-`nil' when XEmacs is running in batch mode.
160 Setting this variable to `nil', however, will not change whether
161 XEmacs is running in batch mode, and will not change the return
162 value of the `noninteractive' function.
165 File: lispref.info, Node: X-Windows, Next: ToolTalk Support, Prev: System Interface, Up: Top
167 58 Functions Specific to the X Window System
168 ********************************************
170 XEmacs provides the concept of "devices", which generalizes connections
171 to an X server, a TTY device, etc. Most information about an X server
172 that XEmacs is connected to can be determined through general console
173 and device functions. *Note Consoles and Devices::. However, there
174 are some features of the X Window System that do not generalize well,
175 and they are covered specially here.
179 * X Selections:: Transferring text to and from other X clients.
180 * X Server:: Information about the X server connected to
182 * X Miscellaneous:: Other X-specific functions and variables.
185 File: lispref.info, Node: X Selections, Next: X Server, Up: X-Windows
190 The X server records a set of "selections" which permit transfer of
191 data between application programs. The various selections are
192 distinguished by "selection types", represented in XEmacs by symbols.
193 X clients including XEmacs can read or set the selection for any given
196 -- Function: x-own-selection data &optional type
197 This function sets a "selection" in the X server. It takes two
198 arguments: a value, DATA, and the selection type TYPE to assign it
199 to. DATA may be a string, a cons of two markers, or an extent.
200 In the latter cases, the selection is considered to be the text
201 between the markers, or between the extent's endpoints.
203 Each possible TYPE has its own selection value, which changes
204 independently. The usual values of TYPE are `PRIMARY' and
205 `SECONDARY'; these are symbols with upper-case names, in accord
206 with X Windows conventions. The default is `PRIMARY'.
208 (In FSF Emacs, this function is called `x-set-selection' and takes
209 different arguments.)
211 -- Function: x-get-selection
212 This function accesses selections set up by XEmacs or by other X
213 clients. It returns the value of the current primary selection.
215 -- Function: x-disown-selection &optional secondary-p
216 Assuming we own the selection, this function disowns it. If
217 SECONDARY-P is non-`nil', the secondary selection instead of the
218 primary selection is discarded.
220 The X server also has a set of numbered "cut buffers" which can
221 store text or other data being moved between applications. Cut buffers
222 are considered obsolete, but XEmacs supports them for the sake of X
223 clients that still use them.
225 -- Function: x-get-cutbuffer &optional n
226 This function returns the contents of cut buffer number N. (This
227 function is called `x-get-cut-buffer' in FSF Emacs.)
229 -- Function: x-store-cutbuffer string &optional push
230 This function stores STRING into the first cut buffer (cut buffer
233 Normally, the contents of the first cut buffer are simply replaced
234 by STRING. However, if optional argument PUSH is non-`nil', the
235 cut buffers are rotated. This means that the previous value of
236 the first cut buffer moves to the second cut buffer, and the
237 second to the third, and so on, moving the other values down
238 through the series of cut buffers, kill-ring-style. There are 8
239 cut buffers altogether.
241 Cut buffers are considered obsolete; you should use selections
244 This function has no effect if support for cut buffers was not
247 This function is called `x-set-cut-buffer' in FSF Emacs.
250 File: lispref.info, Node: X Server, Next: X Miscellaneous, Prev: X Selections, Up: X-Windows
255 This section describes how to access and change the overall status of
256 the X server XEmacs is using.
260 * Resources:: Getting resource values from the server.
261 * Server Data:: Getting info about the X server.
262 * Grabs:: Restricting access to the server by other apps.
265 File: lispref.info, Node: Resources, Next: Server Data, Up: X Server
270 -- Function: default-x-device
271 This function return the default X device for resourcing. This is
272 the first-created X device that still exists.
274 -- Function: x-get-resource name class type &optional locale device
276 This function retrieves a resource value from the X resource
279 * The first arg is the name of the resource to retrieve, such as
282 * The second arg is the class of the resource to retrieve, like
285 * The third arg should be one of the symbols `string',
286 `integer', `natnum', or `boolean', specifying the type of
287 object that the database is searched for.
289 * The fourth arg is the locale to search for the resources on,
290 and can currently be a buffer, a frame, a device, or the
291 symbol `global'. If omitted, it defaults to `global'.
293 * The fifth arg is the device to search for the resources on.
294 (The resource database for a particular device is constructed
295 by combining non-device- specific resources such any
296 command-line resources specified and any app-defaults files
297 found [or the fallback resources supplied by XEmacs, if no
298 app-defaults file is found] with device-specific resources
299 such as those supplied using `xrdb'.) If omitted, it defaults
300 to the device of LOCALE, if a device can be derived (i.e. if
301 LOCALE is a frame or device), and otherwise defaults to the
302 value of `default-x-device'.
304 * The sixth arg NOERROR, if non-`nil', means do not signal an
305 error if a bogus resource specification was retrieved (e.g.
306 if a non-integer was given when an integer was requested).
307 In this case, a warning is issued instead.
309 The resource names passed to this function are looked up relative
312 If you want to search for a subresource, you just need to specify
313 the resource levels in NAME and CLASS. For example, NAME could be
314 `"modeline.attributeFont"', and CLASS `"Face.AttributeFont"'.
318 1. If LOCALE is a buffer, a call
320 `(x-get-resource "foreground" "Foreground" 'string SOME-BUFFER)'
322 is an interface to a C call something like
324 `XrmGetResource (db, "xemacs.buffer.BUFFER-NAME.foreground",
325 "Emacs.EmacsLocaleType.EmacsBuffer.Foreground",
328 2. If LOCALE is a frame, a call
330 `(x-get-resource "foreground" "Foreground" 'string SOME-FRAME)'
332 is an interface to a C call something like
334 `XrmGetResource (db, "xemacs.frame.FRAME-NAME.foreground",
335 "Emacs.EmacsLocaleType.EmacsFrame.Foreground",
338 3. If LOCALE is a device, a call
340 `(x-get-resource "foreground" "Foreground" 'string SOME-DEVICE)'
342 is an interface to a C call something like
344 `XrmGetResource (db, "xemacs.device.DEVICE-NAME.foreground",
345 "Emacs.EmacsLocaleType.EmacsDevice.Foreground",
348 4. If LOCALE is the symbol `global', a call
350 `(x-get-resource "foreground" "Foreground" 'string 'global)'
352 is an interface to a C call something like
354 `XrmGetResource (db, "xemacs.foreground",
358 Note that for `global', no prefix is added other than that of the
359 application itself; thus, you can use this locale to retrieve
360 arbitrary application resources, if you really want to.
362 The returned value of this function is `nil' if the queried
363 resource is not found. If TYPE is `string', a string is returned,
364 and if it is `integer', an integer is returned. If TYPE is
365 `boolean', then the returned value is the list `(t)' for true,
366 `(nil)' for false, and is `nil' to mean "unspecified".
368 -- Function: x-put-resource resource-line &optional device
369 This function adds a resource to the resource database for DEVICE.
370 RESOURCE-LINE specifies the resource to add and should be a
371 standard resource specification.
373 -- Variable: x-emacs-application-class
374 This variable holds The X application class of the XEmacs process.
375 This controls, among other things, the name of the "app-defaults"
376 file that XEmacs will use. For changes to this variable to take
377 effect, they must be made before the connection to the X server is
378 initialized, that is, this variable may only be changed before
379 XEmacs is dumped, or by setting it in the file
380 `lisp/term/x-win.el'.
382 By default, this variable is `nil' at startup. When the connection
383 to the X server is first initialized, the X resource database will
384 be consulted and the value will be set according to whether any
385 resources are found for the application class "XEmacs".
388 File: lispref.info, Node: Server Data, Next: Grabs, Prev: Resources, Up: X Server
390 58.2.2 Data about the X Server
391 ------------------------------
393 This section describes functions and a variable that you can use to get
394 information about the capabilities and origin of the X server
395 corresponding to a particular device. The device argument is generally
396 optional and defaults to the selected device.
398 -- Function: x-server-version &optional device
399 This function returns the list of version numbers of the X server
400 DEVICE is on. The returned value is a list of three integers: the
401 major and minor version numbers of the X protocol in use, and the
402 vendor-specific release number.
404 -- Function: x-server-vendor &optional device
405 This function returns the vendor supporting the X server DEVICE is
408 -- Function: x-display-visual-class &optional device
409 This function returns the visual class of the display DEVICE is
410 on. The value is one of the symbols `static-gray', `gray-scale',
411 `static-color', `pseudo-color', `true-color', and `direct-color'.
412 (Note that this is different from previous versions of XEmacs,
413 which returned `StaticGray', `GrayScale', etc.)
416 File: lispref.info, Node: Grabs, Prev: Server Data, Up: X Server
418 58.2.3 Restricting Access to the Server by Other Apps
419 -----------------------------------------------------
421 -- Function: x-grab-keyboard &optional device
422 This function grabs the keyboard on the given device (defaulting
423 to the selected one). So long as the keyboard is grabbed, all
424 keyboard events will be delivered to XEmacs--it is not possible
425 for other X clients to eavesdrop on them. Ungrab the keyboard
426 with `x-ungrab-keyboard' (use an `unwind-protect'). Returns `t'
427 if the grab was successful; `nil' otherwise.
429 -- Function: x-ungrab-keyboard &optional device
430 This function releases a keyboard grab made with `x-grab-keyboard'.
432 -- Function: x-grab-pointer &optional device cursor ignore-keyboard
433 This function grabs the pointer and restricts it to its current
434 window. If optional DEVICE argument is `nil', the selected device
435 will be used. If optional CURSOR argument is non-`nil', change
436 the pointer shape to that until `x-ungrab-pointer' is called (it
437 should be an object returned by the `make-cursor' function). If
438 the second optional argument IGNORE-KEYBOARD is non-`nil', ignore
439 all keyboard events during the grab. Returns `t' if the grab is
440 successful, `nil' otherwise.
442 -- Function: x-ungrab-pointer &optional device
443 This function releases a pointer grab made with `x-grab-pointer'.
444 If optional first arg DEVICE is `nil' the selected device is used.
445 If it is `t' the pointer will be released on all X devices.
448 File: lispref.info, Node: X Miscellaneous, Prev: X Server, Up: X-Windows
450 58.3 Miscellaneous X Functions and Variables
451 ============================================
453 -- Variable: x-bitmap-file-path
454 This variable holds a list of the directories in which X bitmap
455 files may be found. If `nil', this is initialized from the
456 `"*bitmapFilePath"' resource. This is used by the
457 `make-image-instance' function (however, note that if the
458 environment variable `XBMLANGPATH' is set, it is consulted first).
460 -- Variable: x-library-search-path
461 This variable holds the search path used by `read-color' to find
464 -- Function: x-valid-keysym-name-p keysym
465 This function returns true if KEYSYM names a keysym that the X
466 library knows about. Valid keysyms are listed in the files
467 `/usr/include/X11/keysymdef.h' and in `/usr/lib/X11/XKeysymDB', or
468 whatever the equivalents are on your system.
470 -- Function: x-window-id &optional frame
471 This function returns the ID of the X11 window. This gives us a
472 chance to manipulate the Emacs window from within a different
473 program. Since the ID is an unsigned long, we return it as a
476 -- Variable: x-allow-sendevents
477 If non-`nil', synthetic events are allowed. `nil' means they are
478 ignored. Beware: allowing XEmacs to process SendEvents opens a
481 -- Function: x-debug-mode arg &optional device
482 With a true arg, make the connection to the X server synchronous.
483 With false, make it asynchronous. Synchronous connections are
484 much slower, but are useful for debugging. (If you get X errors,
485 make the connection synchronous, and use a debugger to set a
486 breakpoint on `x_error_handler'. Your backtrace of the C stack
487 will now be useful. In asynchronous mode, the stack above
488 `x_error_handler' isn't helpful because of buffering.) If DEVICE
489 is not specified, the selected device is assumed.
491 Calling this function is the same as calling the C function
492 `XSynchronize', or starting the program with the `-sync' command
495 -- Variable: x-debug-events
496 If non-zero, debug information about events that XEmacs sees is
497 displayed. Information is displayed on stderr. Currently defined
500 * 1 == non-verbose output
502 * 2 == verbose output
505 File: lispref.info, Node: ToolTalk Support, Next: LDAP Support, Prev: X-Windows, Up: Top
512 * XEmacs ToolTalk API Summary::
514 * Receiving Messages::
517 File: lispref.info, Node: XEmacs ToolTalk API Summary, Next: Sending Messages, Up: ToolTalk Support
519 59.1 XEmacs ToolTalk API Summary
520 ================================
522 The XEmacs Lisp interface to ToolTalk is similar, at least in spirit,
523 to the standard C ToolTalk API. Only the message and pattern parts of
524 the API are supported at present; more of the API could be added if
525 needed. The Lisp interface departs from the C API in a few ways:
527 * ToolTalk is initialized automatically at XEmacs startup-time.
528 Messages can only be sent other ToolTalk applications connected to
529 the same X11 server that XEmacs is running on.
531 * There are fewer entry points; polymorphic functions with keyword
532 arguments are used instead.
534 * The callback interface is simpler and marginally less functional.
535 A single callback may be associated with a message or a pattern;
536 the callback is specified with a Lisp symbol (the symbol should
537 have a function binding).
539 * The session attribute for messages and patterns is always
540 initialized to the default session.
542 * Anywhere a ToolTalk enum constant, e.g. `TT_SESSION', is valid, one
543 can substitute the corresponding symbol, e.g. `'TT_SESSION'. This
544 simplifies building lists that represent messages and patterns.
547 File: lispref.info, Node: Sending Messages, Next: Receiving Messages, Prev: XEmacs ToolTalk API Summary, Up: ToolTalk Support
549 59.2 Sending Messages
550 =====================
554 * Example of Sending Messages::
555 * Elisp Interface for Sending Messages::
558 File: lispref.info, Node: Example of Sending Messages, Next: Elisp Interface for Sending Messages, Up: Sending Messages
560 59.2.1 Example of Sending Messages
561 ----------------------------------
563 Here's a simple example that sends a query to another application and
564 then displays its reply. Both the query and the reply are stored in
565 the first argument of the message.
567 (defun tooltalk-random-query-handler (msg)
568 (let ((state (get-tooltalk-message-attribute msg 'state)))
570 ((eq state 'TT_HANDLED)
571 (message (get-tooltalk-message-attribute msg arg_val 0)))
572 ((memq state '(TT_FAILED TT_REJECTED))
573 (message "Random query turns up nothing")))))
575 (defvar random-query-message
580 args '((TT_INOUT "?" "string"))
581 callback tooltalk-random-query-handler))
583 (let ((m (make-tooltalk-message random-query-message)))
584 (send-tooltalk-message m))
587 File: lispref.info, Node: Elisp Interface for Sending Messages, Prev: Example of Sending Messages, Up: Sending Messages
589 59.2.2 Elisp Interface for Sending Messages
590 -------------------------------------------
592 -- Function: make-tooltalk-message attributes
593 Create a ToolTalk message and initialize its attributes. The
594 value of ATTRIBUTES must be a list of alternating keyword/values,
595 where keywords are symbols that name valid message attributes.
598 (make-tooltalk-message
603 args ("arg1" 12345 (TT_INOUT "arg3" "string"))))
605 Values must always be strings, integers, or symbols that represent
606 ToolTalk constants. Attribute names are the same as those
607 supported by `set-tooltalk-message-attribute', plus `args'.
609 The value of `args' should be a list of message arguments where
610 each message argument has the following form:
612 `(mode [value [type]])' or just `value'
614 Where MODE is one of `TT_IN', `TT_OUT', or `TT_INOUT' and TYPE is
615 a string. If TYPE isn't specified then `int' is used if VALUE is
616 a number; otherwise `string' is used. If TYPE is `string' then
617 VALUE is converted to a string (if it isn't a string already) with
618 `prin1-to-string'. If only a value is specified then MODE
619 defaults to `TT_IN'. If MODE is `TT_OUT' then VALUE and TYPE
620 don't need to be specified. You can find out more about the
621 semantics and uses of ToolTalk message arguments in chapter 4 of
622 the `ToolTalk Programmer's Guide'.
625 -- Function: send-tooltalk-message msg
626 Send the message on its way. Once the message has been sent it's
627 almost always a good idea to get rid of it with
628 `destroy-tooltalk-message'.
631 -- Function: return-tooltalk-message msg &optional mode
632 Send a reply to this message. The second argument can be `reply',
633 `reject' or `fail'; the default is `reply'. Before sending a
634 reply, all message arguments whose mode is `TT_INOUT' or `TT_OUT'
635 should have been filled in--see `set-tooltalk-message-attribute'.
638 -- Function: get-tooltalk-message-attribute msg attribute &optional
640 Returns the indicated ToolTalk message attribute. Attributes are
641 identified by symbols with the same name (underscores and all) as
642 the suffix of the ToolTalk `tt_message_<attribute>' function that
643 extracts the value. String attribute values are copied and
644 enumerated type values (except disposition) are converted to
645 symbols; e.g. `TT_HANDLER' is `'TT_HANDLER', `uid' and `gid' are
646 represented by fixnums (small integers), `opnum' is converted to a
647 string, and `disposition' is converted to a fixnum. We convert
648 `opnum' (a C int) to a string (e.g. `123' => `"123"') because
649 there's no guarantee that opnums will fit within the range of
650 XEmacs Lisp integers.
652 [TBD] Use the `plist' attribute instead of C API `user' attribute
653 for user-defined message data. To retrieve the value of a message
654 property, specify the indicator for ARGN. For example, to get the
655 value of a property called `rflag', use
657 (get-tooltalk-message-attribute msg 'plist 'rflag)
659 To get the value of a message argument use one of the `arg_val'
660 (strings), `arg_ival' (integers), or `arg_bval' (strings with
661 embedded nulls), attributes. For example, to get the integer
662 value of the third argument:
664 (get-tooltalk-message-attribute msg 'arg_ival 2)
666 As you can see, argument numbers are zero-based. The type of each
667 arguments can be retrieved with the `arg_type' attribute; however
668 ToolTalk doesn't define any semantics for the string value of
669 `arg_type'. Conventionally `string' is used for strings and `int'
670 for 32 bit integers. Note that XEmacs Lisp stores the lengths of
671 strings explicitly (unlike C) so treating the value returned by
672 `arg_bval' like a string is fine.
675 -- Function: set-tooltalk-message-attribute value msg attribute
677 Initialize one ToolTalk message attribute.
679 Attribute names and values are the same as for
680 `get-tooltalk-message-attribute'. A property list is provided for
681 user data (instead of the `user' message attribute); see
682 `get-tooltalk-message-attribute'.
684 Callbacks are handled slightly differently than in the C ToolTalk
685 API. The value of CALLBACK should be the name of a function of one
686 argument. It will be called each time the state of the message
687 changes. This is usually used to notice when the message's state
688 has changed to `TT_HANDLED' (or `TT_FAILED'), so that reply
689 argument values can be used.
691 If one of the argument attributes is specified as `arg_val',
692 `arg_ival', or `arg_bval', then ARGN must be the number of an
693 already created argument. Arguments can be added to a message
694 with `add-tooltalk-message-arg'.
697 -- Function: add-tooltalk-message-arg msg mode type &optional value
698 Append one new argument to the message. MODE must be one of
699 `TT_IN', `TT_INOUT', or `TT_OUT', TYPE must be a string, and VALUE
700 can be a string or an integer. ToolTalk doesn't define any
701 semantics for TYPE, so only the participants in the protocol
702 you're using need to agree what types mean (if anything).
703 Conventionally `string' is used for strings and `int' for 32 bit
704 integers. Arguments can initialized by providing a value or with
705 `set-tooltalk-message-attribute'; the latter is necessary if you
706 want to initialize the argument with a string that can contain
707 embedded nulls (use `arg_bval').
710 -- Function: create-tooltalk-message &optional no-callback
711 Create a new ToolTalk message. The message's session attribute is
712 initialized to the default session. Other attributes can be
713 initialized with `set-tooltalk-message-attribute'.
714 `make-tooltalk-message' is the preferred way to create and
715 initialize a message.
717 Optional arg NO-CALLBACK says don't add a C-level callback at all.
718 Normally don't do that; just don't specify the Lisp callback when
719 calling `make-tooltalk-message'.
722 -- Function: destroy-tooltalk-message msg
723 Apply `tt_message_destroy' to the message. It's not necessary to
724 destroy messages after they've been processed by a message or
725 pattern callback, the Lisp/ToolTalk callback machinery does this
729 File: lispref.info, Node: Receiving Messages, Prev: Sending Messages, Up: ToolTalk Support
731 59.3 Receiving Messages
732 =======================
736 * Example of Receiving Messages::
737 * Elisp Interface for Receiving Messages::
740 File: lispref.info, Node: Example of Receiving Messages, Next: Elisp Interface for Receiving Messages, Up: Receiving Messages
742 59.3.1 Example of Receiving Messages
743 ------------------------------------
745 Here's a simple example of a handler for a message that tells XEmacs to
746 display a string in the mini-buffer area. The message operation is
747 called `emacs-display-string'. Its first (0th) argument is the string
750 (defun tooltalk-display-string-handler (msg)
751 (message (get-tooltalk-message-attribute msg 'arg_val 0)))
753 (defvar display-string-pattern
756 op "emacs-display-string"
757 callback tooltalk-display-string-handler))
759 (let ((p (make-tooltalk-pattern display-string-pattern)))
760 (register-tooltalk-pattern p))
763 File: lispref.info, Node: Elisp Interface for Receiving Messages, Prev: Example of Receiving Messages, Up: Receiving Messages
765 59.3.2 Elisp Interface for Receiving Messages
766 ---------------------------------------------
768 -- Function: make-tooltalk-pattern attributes
769 Create a ToolTalk pattern and initialize its attributes. The
770 value of attributes must be a list of alternating keyword/values,
771 where keywords are symbols that name valid pattern attributes or
772 lists of valid attributes. For example:
774 (make-tooltalk-pattern
775 '(category TT_OBSERVE
777 op ("operation1" "operation2")
778 args ("arg1" 12345 (TT_INOUT "arg3" "string"))))
780 Attribute names are the same as those supported by
781 `add-tooltalk-pattern-attribute', plus `'args'.
783 Values must always be strings, integers, or symbols that represent
784 ToolTalk constants or lists of same. When a list of values is
785 provided all of the list elements are added to the attribute. In
786 the example above, messages whose `op' attribute is `"operation1"'
787 or `"operation2"' would match the pattern.
789 The value of ARGS should be a list of pattern arguments where each
790 pattern argument has the following form:
792 `(mode [value [type]])' or just `value'
794 Where MODE is one of `TT_IN', `TT_OUT', or `TT_INOUT' and TYPE is
795 a string. If TYPE isn't specified then `int' is used if VALUE is
796 a number; otherwise `string' is used. If TYPE is `string' then
797 VALUE is converted to a string (if it isn't a string already) with
798 `prin1-to-string'. If only a value is specified then MODE
799 defaults to `TT_IN'. If MODE is `TT_OUT' then VALUE and TYPE
800 don't need to be specified. You can find out more about the
801 semantics and uses of ToolTalk pattern arguments in chapter 3 of
802 the `ToolTalk Programmer's Guide'.
805 -- Function: register-tooltalk-pattern pattern
806 XEmacs will begin receiving messages that match this pattern.
808 -- Function: unregister-tooltalk-pattern pattern
809 XEmacs will stop receiving messages that match this pattern.
811 -- Function: add-tooltalk-pattern-attribute value pattern indicator
812 Add one value to the indicated pattern attribute. The names of
813 attributes are the same as the ToolTalk accessors used to set them
814 less the `tooltalk_pattern_' prefix and the `_add' suffix. For
815 example, the name of the attribute for the
816 `tt_pattern_disposition_add' attribute is `disposition'. The
817 `category' attribute is handled specially, since a pattern can only
818 be a member of one category (`TT_OBSERVE' or `TT_HANDLE').
820 Callbacks are handled slightly differently than in the C ToolTalk
821 API. The value of CALLBACK should be the name of a function of one
822 argument. It will be called each time the pattern matches an
825 -- Function: add-tooltalk-pattern-arg pattern mode vtype &optional
827 Add one fully-specified argument to a ToolTalk pattern. MODE must
828 be one of `TT_IN', `TT_INOUT', or `TT_OUT'. VTYPE must be a
829 string. VALUE can be an integer, string or `nil'. If VALUE is an
830 integer then an integer argument (`tt_pattern_iarg_add') is added;
831 otherwise a string argument is added. At present there's no way
832 to add a binary data argument.
835 -- Function: create-tooltalk-pattern
836 Create a new ToolTalk pattern and initialize its session attribute
837 to be the default session.
839 -- Function: destroy-tooltalk-pattern pattern
840 Apply `tt_pattern_destroy' to the pattern. This effectively
841 unregisters the pattern.
843 -- Function: describe-tooltalk-message msg &optional stream
844 Print the message's attributes and arguments to STREAM. This is
845 often useful for debugging.
848 File: lispref.info, Node: LDAP Support, Next: PostgreSQL Support, Prev: ToolTalk Support, Up: Top
853 XEmacs can be linked with a LDAP client library to provide Elisp
854 primitives to access directory servers using the Lightweight Directory
859 * Building XEmacs with LDAP support:: How to add LDAP support to XEmacs
860 * XEmacs LDAP API:: Lisp access to LDAP functions
861 * Syntax of Search Filters:: A brief summary of RFC 1558
864 File: lispref.info, Node: Building XEmacs with LDAP support, Next: XEmacs LDAP API, Prev: LDAP Support, Up: LDAP Support
866 60.1 Building XEmacs with LDAP support
867 ======================================
869 LDAP support must be added to XEmacs at build time since it requires
870 linking to an external LDAP client library. As of 21.2, XEmacs has been
871 successfully built and tested with
873 * OpenLDAP 1.2 (`http://www.openldap.org/')
875 * University of Michigan's LDAP 3.3
876 (`http://www.umich.edu/~dirsvcs/ldap/')
878 * LDAP SDK 1.0 from Netscape Corp. (`http://developer.netscape.com/')
880 Other libraries conforming to RFC 1823 will probably work also but
881 may require some minor tweaking at C level.
883 The standard XEmacs configure script auto-detects an installed LDAP
884 library provided the library itself and the corresponding header files
885 can be found in the library and include paths. A successful detection
886 will be signalled in the final output of the configure script.
889 File: lispref.info, Node: XEmacs LDAP API, Next: Syntax of Search Filters, Prev: Building XEmacs with LDAP support, Up: LDAP Support
894 XEmacs LDAP API consists of two layers: a low-level layer which tries
895 to stay as close as possible to the C API (where practical) and a
896 higher-level layer which provides more convenient primitives to
897 effectively use LDAP.
899 The low-level API should be used directly for very specific purposes
900 (such as multiple operations on a connection) only. The higher-level
901 functions provide a more convenient way to access LDAP directories
902 hiding the subtleties of handling the connection, translating arguments
903 and ensuring compliance with LDAP internationalization rules and formats
904 (currently partly implemented only).
908 * LDAP Variables:: Lisp variables related to LDAP
909 * The High-Level LDAP API:: High-level LDAP lisp functions
910 * The Low-Level LDAP API:: Low-level LDAP lisp primitives
911 * LDAP Internationalization:: I18n variables and functions
914 File: lispref.info, Node: LDAP Variables, Next: The High-Level LDAP API, Prev: XEmacs LDAP API, Up: XEmacs LDAP API
916 60.2.1 LDAP Variables
917 ---------------------
919 -- Variable: ldap-default-host
920 The default LDAP server hostname. A TCP port number can be
921 appended to that name using a colon as a separator.
923 -- Variable: ldap-default-port
924 Default TCP port for LDAP connections. Initialized from the LDAP
925 library. Default value is 389.
927 -- Variable: ldap-default-base
928 Default base for LDAP searches. This is a string using the syntax
929 of RFC 1779. For instance, "o=ACME, c=US" limits the search to the
930 Acme organization in the United States.
932 -- Variable: ldap-host-parameters-alist
933 An alist of per host options for LDAP transactions. The list
934 elements look like `(HOST PROP1 VAL1 PROP2 VAL2 ...)' HOST is the
935 name of an LDAP server. A TCP port number can be appended to that
936 name using a colon as a separator. PROPN and VALN are
937 property/value pairs describing parameters for the server. Valid
940 The distinguished name of the user to bind as. This may look
941 like `cn=Babs Jensen,o=ACME,c=US', see RFC 1779 for details.
944 The password to use for authentication.
947 The authentication method to use, possible values depend on
948 the LDAP library XEmacs was compiled with, they may include
949 `simple', `krbv41' and `krbv42'.
952 The base for the search. This may look like `cÿ, o¬me', see
953 RFC 1779 for syntax details.
956 One of the symbols `base', `onelevel' or `subtree' indicating
957 the scope of the search limited to a base object, to a single
958 level or to the whole subtree.
961 The dereference policy is one of the symbols `never',
962 `always', `search' or `find' and defines how aliases are
965 Aliases are never dereferenced
968 Aliases are always dereferenced
971 Aliases are dereferenced when searching
974 Aliases are dereferenced when locating the base object
978 The timeout limit for the connection in seconds.
981 The maximum number of matches to return for searches
982 performed on this connection.
984 -- Variable: ldap-verbose
985 If non-`nil', LDAP operations will echo progress messages.
989 File: lispref.info, Node: The High-Level LDAP API, Next: The Low-Level LDAP API, Prev: LDAP Variables, Up: XEmacs LDAP API
991 60.2.2 The High-Level LDAP API
992 ------------------------------
994 The following functions provide the most convenient interface to perform
995 LDAP operations. All of them open a connection to a host, perform an
996 operation (add/search/modify/delete) on one or several entries and
997 cleanly close the connection thus insulating the user from all the
998 details of the low-level interface such as LDAP Lisp objects *note The
999 Low-Level LDAP API::.
1001 Note that `ldap-search' which used to be the name of the high-level
1002 search function in XEmacs 21.1 is now obsolete. For consistency in the
1003 naming as well as backward compatibility, that function now acts as a
1004 wrapper that calls either `ldap-search-basic' (low-level search
1005 function) or `ldap-search-entries' (high-level search function)
1006 according to the actual parameters. A direct call to one of these two
1007 functions is preferred since it is faster and unambiguous.
1009 -- Command: ldap-search-entries filter &optional host attributes
1011 Perform an LDAP search. FILTER is the search filter *note Syntax
1012 of Search Filters:: HOST is the LDAP host on which to perform the
1013 search. ATTRIBUTES is the specific attributes to retrieve, `nil'
1014 means retrieve all. ATTRSONLY if non-`nil' retrieves the
1015 attributes only without their associated values. If WITHDN is
1016 non-`nil' each entry in the result will be prepended with its
1017 distinguished name DN. Additional search parameters can be
1018 specified through `ldap-host-parameters-alist'. The function
1019 returns a list of matching entries. Each entry is itself an alist
1020 of attribute/value pairs optionally preceded by the DN of the
1021 entry according to the value of WITHDN.
1023 -- Function: ldap-add-entries entries &optional host binddn passwd
1024 Add entries to an LDAP directory. ENTRIES is a list of entry
1025 specifications of the form `(DN (ATTR . VALUE) (ATTR . VALUE) ...)'
1026 where DN the distinguished name of an entry to add, the following
1027 are cons cells containing attribute/value string pairs. HOST is
1028 the LDAP host, defaulting to `ldap-default-host'. BINDDN is the
1029 DN to bind as to the server. PASSWD is the corresponding password.
1031 -- Function: ldap-modify-entries entry-mods &optional host binddn
1033 Modify entries of an LDAP directory. ENTRY_MODS is a list of
1034 entry modifications of the form `(DN MOD-SPEC1 MOD-SPEC2 ...)'
1035 where DN is the distinguished name of the entry to modify, the
1036 following are modification specifications. A modification
1037 specification is itself a list of the form `(MOD-OP ATTR VALUE1
1038 VALUE2 ...)' MOD-OP and ATTR are mandatory, VALUES are optional
1039 depending on MOD-OP. MOD-OP is the type of modification, one of
1040 the symbols `add', `delete' or `replace'. ATTR is the LDAP
1041 attribute type to modify. HOST is the LDAP host, defaulting to
1042 `ldap-default-host'. BINDDN is the DN to bind as to the server.
1043 PASSWD is the corresponding password.
1045 -- Function: ldap-delete-entries dn &optional host binddn passwd
1046 Delete an entry from an LDAP directory. DN is the distinguished
1047 name of an entry to delete or a list of those. HOST is the LDAP
1048 host, defaulting to `ldap-default-host'. BINDDN is the DN to bind
1049 as to the server. PASSWD is the corresponding password.
1052 File: lispref.info, Node: The Low-Level LDAP API, Next: LDAP Internationalization, Prev: The High-Level LDAP API, Up: XEmacs LDAP API
1054 60.2.3 The Low-Level LDAP API
1055 -----------------------------
1057 The low-level API should be used directly for very specific purposes
1058 (such as multiple operations on a connection) only. The higher-level
1059 functions provide a more convenient way to access LDAP directories
1060 hiding the subtleties of handling the connection, translating arguments
1061 and ensuring compliance with LDAP internationalization rules and formats
1062 (currently partly implemented only). See *note The High-Level LDAP API::
1064 Note that the former functions `ldap-*-internal' functions have been
1065 renamed in XEmacs 21.2
1069 * The LDAP Lisp Object::
1070 * Opening and Closing a LDAP Connection::
1071 * Low-level Operations on a LDAP Server::
1074 File: lispref.info, Node: The LDAP Lisp Object, Next: Opening and Closing a LDAP Connection, Prev: The Low-Level LDAP API, Up: The Low-Level LDAP API
1076 60.2.3.1 The LDAP Lisp Object
1077 .............................
1079 An internal built-in `ldap' lisp object represents a LDAP connection.
1081 -- Function: ldapp object
1082 This function returns non-`nil' if OBJECT is a `ldap' object.
1084 -- Function: ldap-host ldap
1085 Return the server host of the connection represented by LDAP.
1087 -- Function: ldap-live-p ldap
1088 Return non-`nil' if LDAP is an active LDAP connection.
1091 File: lispref.info, Node: Opening and Closing a LDAP Connection, Next: Low-level Operations on a LDAP Server, Prev: The LDAP Lisp Object, Up: The Low-Level LDAP API
1093 60.2.3.2 Opening and Closing a LDAP Connection
1094 ..............................................
1096 -- Function: ldap-open host &optional plist
1097 Open a LDAP connection to HOST. PLIST is a property list
1098 containing additional parameters for the connection. Valid keys
1101 The TCP port to use for the connection if different from
1102 `ldap-default-port' or the library builtin value
1105 The authentication method to use, possible values depend on
1106 the LDAP library XEmacs was compiled with, they may include
1107 `simple', `krbv41' and `krbv42'.
1110 The distinguished name of the user to bind as. This may look
1111 like `c=com, o=Acme, cn=Babs Jensen', see RFC 1779 for
1115 The password to use for authentication.
1118 The dereference policy is one of the symbols `never',
1119 `always', `search' or `find' and defines how aliases are
1122 Aliases are never dereferenced.
1125 Aliases are always dereferenced.
1128 Aliases are dereferenced when searching.
1131 Aliases are dereferenced when locating the base object
1133 The default is `never'.
1136 The timeout limit for the connection in seconds.
1139 The maximum number of matches to return for searches
1140 performed on this connection.
1142 -- Function: ldap-close ldap
1143 Close the connection represented by LDAP.
1146 File: lispref.info, Node: Low-level Operations on a LDAP Server, Prev: Opening and Closing a LDAP Connection, Up: The Low-Level LDAP API
1148 60.2.3.3 Low-level Operations on a LDAP Server
1149 ..............................................
1151 `ldap-search-basic' is the low-level primitive to perform a search on a
1152 LDAP server. It works directly on an open LDAP connection thus
1153 requiring a preliminary call to `ldap-open'. Multiple searches can be
1154 made on the same connection, then the session must be closed with
1157 -- Function: ldap-search-basic ldap filter &optional base scope attrs
1158 attrsonly withdn verbose
1159 Perform a search on an open connection LDAP created with
1160 `ldap-open'. FILTER is a filter string for the search *note
1161 Syntax of Search Filters:: BASE is the distinguished name at which
1162 to start the search. SCOPE is one of the symbols `base',
1163 `onelevel' or `subtree' indicating the scope of the search limited
1164 to a base object, to a single level or to the whole subtree. The
1165 default is `subtree'. ATTRS is a list of strings indicating which
1166 attributes to retrieve for each matching entry. If `nil' all
1167 available attributes are returned. If ATTRSONLY is non-`nil' then
1168 only the attributes are retrieved, not their associated values.
1169 If WITHDN is non-`nil' then each entry in the result is prepended
1170 with its distinguished name DN. If VERBOSE is non-`nil' then
1171 progress messages are echoed The function returns a list of
1172 matching entries. Each entry is itself an alist of
1173 attribute/value pairs optionally preceded by the DN of the entry
1174 according to the value of WITHDN.
1176 -- Function: ldap-add ldap dn entry
1177 Add ENTRY to a LDAP directory which a connection LDAP has been
1178 opened to with `ldap-open'. DN is the distinguished name of the
1179 entry to add. ENTRY is an entry specification, i.e., a list of
1180 cons cells containing attribute/value string pairs.
1182 -- Function: ldap-modify ldap dn mods
1183 Modify an entry in an LDAP directory. LDAP is an LDAP connection
1184 object created with `ldap-open'. DN is the distinguished name of
1185 the entry to modify. MODS is a list of modifications to apply. A
1186 modification is a list of the form `(MOD-OP ATTR VALUE1 VALUE2
1187 ...)' MOD-OP and ATTR are mandatory, VALUES are optional
1188 depending on MOD-OP. MOD-OP is the type of modification, one of
1189 the symbols `add', `delete' or `replace'. ATTR is the LDAP
1190 attribute type to modify.
1192 -- Function: ldap-delete ldap dn
1193 Delete an entry to an LDAP directory. LDAP is an LDAP connection
1194 object created with `ldap-open'. DN is the distinguished name of
1195 the entry to delete.
1198 File: lispref.info, Node: LDAP Internationalization, Prev: The Low-Level LDAP API, Up: XEmacs LDAP API
1200 60.2.4 LDAP Internationalization
1201 --------------------------------
1203 The XEmacs LDAP API provides basic internationalization features based
1204 on the LDAP v3 specification (essentially RFC2252 on "LDAP v3 Attribute
1205 Syntax Definitions"). Unfortunately since there is currently no free
1206 LDAP v3 server software, this part has not received much testing and
1207 should be considered experimental. The framework is in place though.
1209 -- Function: ldap-decode-attribute attr
1210 Decode the attribute/value pair ATTR according to LDAP rules. The
1211 attribute name is looked up in `ldap-attribute-syntaxes-alist' and
1212 the corresponding decoder is then retrieved from
1213 `ldap-attribute-syntax-decoders'' and applied on the value(s).
1217 * LDAP Internationalization Variables::
1218 * Encoder/Decoder Functions::
1221 File: lispref.info, Node: LDAP Internationalization Variables, Next: Encoder/Decoder Functions, Prev: LDAP Internationalization, Up: LDAP Internationalization
1223 60.2.4.1 LDAP Internationalization Variables
1224 ............................................
1226 -- Variable: ldap-ignore-attribute-codings
1227 If non-`nil', no encoding/decoding will be performed LDAP
1230 -- Variable: ldap-coding-system
1231 Coding system of LDAP string values. LDAP v3 specifies the coding
1232 system of strings to be UTF-8. You need an XEmacs with Mule
1235 -- Variable: ldap-default-attribute-decoder
1236 Decoder function to use for attributes whose syntax is unknown.
1237 Such a function receives an encoded attribute value as a string
1238 and should return the decoded value as a string.
1240 -- Variable: ldap-attribute-syntax-encoders
1241 A vector of functions used to encode LDAP attribute values. The
1242 sequence of functions corresponds to the sequence of LDAP
1243 attribute syntax object identifiers of the form
1244 1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2.
1245 As of this writing, only a few encoder functions are available.
1247 -- Variable: ldap-attribute-syntax-decoders
1248 A vector of functions used to decode LDAP attribute values. The
1249 sequence of functions corresponds to the sequence of LDAP
1250 attribute syntax object identifiers of the form
1251 1.3.6.1.4.1.1466.1115.121.1.* as defined in RFC2252 section 4.3.2.
1252 As of this writing, only a few decoder functions are available.
1254 -- Variable: ldap-attribute-syntaxes-alist
1255 A map of LDAP attribute names to their type object id minor number.
1256 This table is built from RFC2252 Section 5 and RFC2256 Section 5.
1259 File: lispref.info, Node: Encoder/Decoder Functions, Prev: LDAP Internationalization Variables, Up: LDAP Internationalization
1261 60.2.4.2 Encoder/Decoder Functions
1262 ..................................
1264 -- Function: ldap-encode-boolean bool
1265 A function that encodes an elisp boolean BOOL into a LDAP boolean
1266 string representation.
1268 -- Function: ldap-decode-boolean str
1269 A function that decodes a LDAP boolean string representation STR
1270 into an elisp boolean.
1272 -- Function: ldap-decode-string str
1273 Decode a string STR according to `ldap-coding-system'.
1275 -- Function: ldap-encode-string str
1276 Encode a string STR according to `ldap-coding-system'.
1278 -- Function: ldap-decode-address str
1279 Decode an address STR according to `ldap-coding-system' and
1280 replacing $ signs with newlines as specified by LDAP encoding
1281 rules for addresses.
1283 -- Function: ldap-encode-address str
1284 Encode an address STR according to `ldap-coding-system' and
1285 replacing newlines with $ signs as specified by LDAP encoding
1286 rules for addresses.
1289 File: lispref.info, Node: Syntax of Search Filters, Prev: XEmacs LDAP API, Up: LDAP Support
1291 60.3 Syntax of Search Filters
1292 =============================
1294 LDAP search functions use RFC1558 syntax to describe the search filter.
1295 In that syntax simple filters have the form:
1297 (<attr> <filtertype> <value>)
1299 `<attr>' is an attribute name such as `cn' for Common Name, `o' for
1300 Organization, etc...
1302 `<value>' is the corresponding value. This is generally an exact
1303 string but may also contain `*' characters as wildcards
1305 `filtertype' is one `=' `~=', `<=', `>=' which respectively describe
1306 equality, approximate equality, inferiority and superiority.
1308 Thus `(cn=John Smith)' matches all records having a canonical name
1309 equal to John Smith.
1311 A special case is the presence filter `(<attr>=*' which matches
1312 records containing a particular attribute. For instance `(mail=*)'
1313 matches all records containing a `mail' attribute.
1315 Simple filters can be connected together with the logical operators
1316 `&', `|' and `!' which stand for the usual and, or and not operators.
1318 `(&(objectClass=Person)(mail=*)(|(sn=Smith)(givenname=John)))'
1319 matches records of class `Person' containing a `mail' attribute and
1320 corresponding to people whose last name is `Smith' or whose first name
1324 File: lispref.info, Node: PostgreSQL Support, Next: Internationalization, Prev: LDAP Support, Up: Top
1326 61 PostgreSQL Support
1327 *********************
1329 XEmacs can be linked with PostgreSQL libpq run-time support to provide
1330 relational database access from Emacs Lisp code.
1334 * Building XEmacs with PostgreSQL support::
1335 * XEmacs PostgreSQL libpq API::
1336 * XEmacs PostgreSQL libpq Examples::
1339 File: lispref.info, Node: Building XEmacs with PostgreSQL support, Next: XEmacs PostgreSQL libpq API, Up: PostgreSQL Support
1341 61.1 Building XEmacs with PostgreSQL support
1342 ============================================
1344 XEmacs PostgreSQL support requires linking to the PostgreSQL libpq
1345 library. Describing how to build and install PostgreSQL is beyond the
1346 scope of this document. See the PostgreSQL manual for details.
1348 If you have installed XEmacs from one of the binary kits on
1349 (`ftp://ftp.xemacs.org/'), or are using an XEmacs binary from a CD ROM,
1350 you may have XEmacs PostgreSQL support by default. `M-x
1351 describe-installation' will tell you if you do.
1353 If you are building XEmacs from source, you need to install
1354 PostgreSQL first. On some systems, PostgreSQL will come pre-installed
1355 in /usr. In this case, it should be autodetected when you run
1356 configure. If PostgreSQL is installed into its default location,
1357 `/usr/local/pgsql', you must specify `--site-prefixes=/usr/local/pgsql'
1358 when you run configure. If PostgreSQL is installed into another
1359 location, use that instead of `/usr/local/pgsql' when specifying
1362 As of XEmacs 21.2, PostgreSQL versions 6.5.3 and 7.0 are supported.
1363 XEmacs Lisp support for V7.0 is somewhat more extensive than support for
1364 V6.5. In particular, asynchronous queries are supported.
1367 File: lispref.info, Node: XEmacs PostgreSQL libpq API, Next: XEmacs PostgreSQL libpq Examples, Prev: Building XEmacs with PostgreSQL support, Up: PostgreSQL Support
1369 61.2 XEmacs PostgreSQL libpq API
1370 ================================
1372 The XEmacs PostgreSQL API is intended to be a policy-free, low-level
1373 binding to libpq. The intent is to provide all the basic functionality
1374 and then let high level Lisp code decide its own policies.
1376 This documentation assumes that the reader has knowledge of SQL, but
1377 requires no prior knowledge of libpq.
1379 There are many examples in this manual and some setup will be
1380 required. In order to run most of the following examples, the
1381 following code needs to be executed. In addition to the data is in
1382 this table, nearly all of the examples will assume that the free
1383 variable `P' refers to this database connection. The examples in the
1384 original edition of this manual were run against Postgres 7.0beta1.
1387 (setq P (pq-connectdb ""))
1388 ;; id is the primary key, shikona is a Japanese word that
1389 ;; means `the professional name of a Sumo wrestler', and
1390 ;; rank is the Sumo rank name.
1391 (pq-exec P (concat "CREATE TABLE xemacs_test"
1392 " (id int, shikona text, rank text);"))
1393 (pq-exec P "COPY xemacs_test FROM stdin;")
1394 (pq-put-line P "1\tMusashimaru\tYokuzuna\n")
1395 (pq-put-line P "2\tDejima\tOozeki\n")
1396 (pq-put-line P "3\tMusoyama\tSekiwake\n")
1397 (pq-put-line P "4\tMiyabiyama\tSekiwake\n")
1398 (pq-put-line P "5\tWakanoyama\tMaegashira\n")
1399 (pq-put-line P "\\.\n")
1405 * libpq Lisp Variables::
1406 * libpq Lisp Symbols and DataTypes::
1407 * Synchronous Interface Functions::
1408 * Asynchronous Interface Functions::
1409 * Large Object Support::
1410 * Other libpq Functions::
1411 * Unimplemented libpq Functions::
1414 File: lispref.info, Node: libpq Lisp Variables, Next: libpq Lisp Symbols and DataTypes, Prev: XEmacs PostgreSQL libpq API, Up: XEmacs PostgreSQL libpq API
1416 61.2.1 libpq Lisp Variables
1417 ---------------------------
1419 Various Unix environment variables are used by libpq to provide defaults
1420 to the many different parameters. In the XEmacs Lisp API, these
1421 environment variables are bound to Lisp variables to provide more
1422 convenient access to Lisp Code. These variables are passed to the
1423 backend database server during the establishment of a database
1424 connection and when the `pq-setenv' call is made.
1426 -- Variable: pg:host
1427 Initialized from the `PGHOST' environment variable. The default
1430 -- Variable: pg:user
1431 Initialized from the `PGUSER' environment variable. The default
1434 -- Variable: pg:options
1435 Initialized from the `PGOPTIONS' environment variable. Default
1436 additional server options.
1438 -- Variable: pg:port
1439 Initialized from the `PGPORT' environment variable. The default
1440 TCP port to connect to.
1443 Initialized from the `PGTTY' environment variable. The default
1446 Compatibility note: Debugging TTYs are turned off in the XEmacs
1449 -- Variable: pg:database
1450 Initialized from the `PGDATABASE' environment variable. The
1451 default database to connect to.
1453 -- Variable: pg:realm
1454 Initialized from the `PGREALM' environment variable. The default
1457 -- Variable: pg:client-encoding
1458 Initialized from the `PGCLIENTENCODING' environment variable. The
1459 default client encoding.
1461 Compatibility note: This variable is not present in non-Mule
1462 XEmacsen. This variable is not present in versions of libpq prior
1463 to 7.0. In the current implementation, client encoding is
1464 equivalent to the `file-name-coding-system' format.
1466 -- Variable: pg:authtype
1467 Initialized from the `PGAUTHTYPE' environment variable. The
1468 default authentication scheme used.
1470 Compatibility note: This variable is unused in versions of libpq
1471 after 6.5. It is not implemented at all in the XEmacs Lisp
1474 -- Variable: pg:geqo
1475 Initialized from the `PGGEQO' environment variable. Genetic
1478 -- Variable: pg:cost-index
1479 Initialized from the `PGCOSTINDEX' environment variable. Cost
1482 -- Variable: pg:cost-heap
1483 Initialized from the `PGCOSTHEAP' environment variable. Cost heap
1487 Initialized from the `PGTZ' environment variable. Default
1490 -- Variable: pg:date-style
1491 Initialized from the `PGDATESTYLE' environment variable. Default
1492 date style in returned date objects.
1494 -- Variable: pg-coding-system
1495 This is a variable controlling which coding system is used to
1496 encode non-ASCII strings sent to the database.
1498 Compatibility Note: This variable is not present in InfoDock.
1501 File: lispref.info, Node: libpq Lisp Symbols and DataTypes, Next: Synchronous Interface Functions, Prev: libpq Lisp Variables, Up: XEmacs PostgreSQL libpq API
1503 61.2.2 libpq Lisp Symbols and Datatypes
1504 ---------------------------------------
1506 The following set of symbols are used to represent the intermediate
1507 states involved in the asynchronous interface.
1509 -- Symbol: pgres::polling-failed
1510 Undocumented. A fatal error has occurred during processing of an
1511 asynchronous operation.
1513 -- Symbol: pgres::polling-reading
1514 An intermediate status return during an asynchronous operation. It
1515 indicates that one may use `select' before polling again.
1517 -- Symbol: pgres::polling-writing
1518 An intermediate status return during an asynchronous operation. It
1519 indicates that one may use `select' before polling again.
1521 -- Symbol: pgres::polling-ok
1522 An asynchronous operation has successfully completed.
1524 -- Symbol: pgres::polling-active
1525 An intermediate status return during an asynchronous operation.
1526 One can call the poll function again immediately.
1528 -- Function: pq-pgconn conn field
1529 CONN A database connection object. FIELD A symbol indicating
1530 which field of PGconn to fetch. Possible values are shown in the
1539 Database user's password
1542 Hostname database server is running on
1545 TCP port number used in the connection
1550 Compatibility note: Debugging TTYs are not used in the
1554 Additional server options
1557 Connection status. Possible return values are shown in the
1560 The normal, connected status.
1562 `pg::connection-bad'
1563 The connection is not open and the PGconn object needs
1564 to be deleted by `pq-finish'.
1566 `pg::connection-started'
1567 An asynchronous connection has been started, but is not
1570 `pg::connection-made'
1571 An asynchronous connect has been made, and there is data
1574 `pg::connection-awaiting-response'
1575 Awaiting data from the backend during an asynchronous
1578 `pg::connection-auth-ok'
1579 Received authentication, waiting for the backend to
1582 `pg::connection-setenv'
1583 Negotiating environment during an asynchronous
1587 The last error message that was delivered to this connection.
1590 The process ID of the backend database server.
1592 The `PGresult' object is used by libpq to encapsulate the results of
1593 queries. The printed representation takes on four forms. When the
1594 PGresult object contains tuples from an SQL `SELECT' it will look like:
1596 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1597 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1599 The number in brackets indicates how many rows of data are available.
1600 When the PGresult object is the result of a command query that doesn't
1601 return anything, it will look like:
1603 (pq-exec P "CREATE TABLE a_new_table (i int);")
1604 => #<PGresult PGRES_COMMAND_OK - CREATE>
1606 When either the query is a command-type query that can affect a
1607 number of different rows, but doesn't return any of them it will look
1611 (pq-exec P "INSERT INTO a_new_table VALUES (1);")
1612 (pq-exec P "INSERT INTO a_new_table VALUES (2);")
1613 (pq-exec P "INSERT INTO a_new_table VALUES (3);")
1614 (setq R (pq-exec P "DELETE FROM a_new_table;")))
1615 => #<PGresult PGRES_COMMAND_OK[3] - DELETE 3>
1617 Lastly, when the underlying PGresult object has been deallocated
1618 directly by `pq-clear' the printed representation will look like:
1621 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1626 The following set of functions are accessors to various data in the
1629 -- Function: pq-result-status result
1630 Return status of a query result. RESULT is a PGresult object.
1631 The return value is one of the symbols in the following table.
1632 `pgres::empty-query'
1633 A query contained no text. This is usually the result of a
1634 recoverable error, or a minor programming error.
1637 A query command that doesn't return anything was executed
1638 properly by the backend.
1641 A query command that returns tuples was executed properly by
1645 Copy Out data transfer is in progress.
1648 Copy In data transfer is in progress.
1650 `pgres::bad-response'
1651 An unexpected response was received from the backend.
1653 `pgres::nonfatal-error'
1654 Undocumented. This value is returned when the libpq function
1655 `PQresultStatus' is called with a `NULL' pointer.
1657 `pgres::fatal-error'
1658 Undocumented. An error has occurred in processing the query
1659 and the operation was not completed.
1661 -- Function: pq-res-status result
1662 Return the query result status as a string, not a symbol. RESULT
1663 is a PGresult object.
1665 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1666 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1668 => "PGRES_TUPLES_OK"
1670 -- Function: pq-result-error-message result
1671 Return an error message generated by the query, if any. RESULT is
1674 (setq R (pq-exec P "SELECT * FROM xemacs-test;"))
1675 => <A fatal error is signaled in the echo area>
1676 (pq-result-error-message R)
1677 => "ERROR: parser: parse error at or near \"-\"
1680 -- Function: pq-ntuples result
1681 Return the number of tuples in the query result. RESULT is a
1684 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1685 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1689 -- Function: pq-nfields result
1690 Return the number of fields in each tuple of the query result.
1691 RESULT is a PGresult object.
1693 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1694 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1698 -- Function: pq-binary-tuples result
1699 Returns t if binary tuples are present in the results, nil
1700 otherwise. RESULT is a PGresult object.
1702 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1703 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1704 (pq-binary-tuples R)
1707 -- Function: pq-fname result field-index
1708 Returns the name of a specific field. RESULT is a PGresult object.
1709 FIELD-INDEX is the number of the column to select from. The first
1710 column is number zero.
1713 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1714 (setq i (pq-nfields R))
1715 (while (>= (decf i) 0)
1716 (push (pq-fname R i) l))
1718 => ("id" "shikona" "rank")
1720 -- Function: pq-fnumber result field-name
1721 Return the field number corresponding to the given field name. -1
1722 is returned on a bad field name. RESULT is a PGresult object.
1723 FIELD-NAME is a string representing the field name to find.
1724 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1725 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1728 (pq-fnumber R "Not a field")
1731 -- Function: pq-ftype result field-num
1732 Return an integer code representing the data type of the specified
1733 column. RESULT is a PGresult object. FIELD-NUM is the field
1736 The return value of this function is the Object ID (Oid) in the
1737 database of the type. Further queries need to be made to various
1738 system tables in order to convert this value into something useful.
1740 -- Function: pq-fmod result field-num
1741 Return the type modifier code associated with a field. Field
1742 numbers start at zero. RESULT is a PGresult object. FIELD-INDEX
1743 selects which field to use.
1745 -- Function: pq-fsize result field-index
1746 Return size of the given field. RESULT is a PGresult object.
1747 FIELD-INDEX selects which field to use.
1750 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1751 (setq i (pq-nfields R))
1752 (while (>= (decf i) 0)
1753 (push (list (pq-ftype R i) (pq-fsize R i)) l))
1755 => ((23 23) (25 25) (25 25))
1757 -- Function: pq-get-value result tup-num field-num
1758 Retrieve a return value. RESULT is a PGresult object. TUP-NUM
1759 selects which tuple to fetch from. FIELD-NUM selects which field
1762 Both tuples and fields are numbered from zero.
1764 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1765 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1766 (pq-get-value R 0 1)
1768 (pq-get-value R 1 1)
1770 (pq-get-value R 2 1)
1773 -- Function: pq-get-length result tup-num field-num
1774 Return the length of a specific value. RESULT is a PGresult
1775 object. TUP-NUM selects which tuple to fetch from. FIELD-NUM
1776 selects which field to fetch from.
1778 (setq R (pq-exec P "SELECT * FROM xemacs_test;"))
1779 => #<PGresult PGRES_TUPLES_OK[5] - SELECT>
1780 (pq-get-length R 0 1)
1782 (pq-get-length R 1 1)
1784 (pq-get-length R 2 1)
1787 -- Function: pq-get-is-null result tup-num field-num
1788 Return t if the specific value is the SQL `NULL'. RESULT is a
1789 PGresult object. TUP-NUM selects which tuple to fetch from.
1790 FIELD-NUM selects which field to fetch from.
1792 -- Function: pq-cmd-status result
1793 Return a summary string from the query. RESULT is a PGresult
1795 (setq R (pq-exec P "INSERT INTO xemacs_test
1796 VALUES (6, 'Wakanohana', 'Yokozuna');"))
1797 => #<PGresult PGRES_COMMAND_OK[1] - INSERT 542086 1>
1799 => "INSERT 542086 1"
1800 (setq R (pq-exec P "UPDATE xemacs_test SET rank='retired'
1801 WHERE shikona='Wakanohana';"))
1802 => #<PGresult PGRES_COMMAND_OK[1] - UPDATE 1>
1806 Note that the first number returned from an insertion, like in the
1807 example, is an object ID number and will almost certainly vary from
1808 system to system since object ID numbers in Postgres must be unique
1809 across all databases.
1811 -- Function: pq-cmd-tuples result
1812 Return the number of tuples if the last command was an
1813 INSERT/UPDATE/DELETE. If the last command was something else, the
1814 empty string is returned. RESULT is a PGresult object.
1816 (setq R (pq-exec P "INSERT INTO xemacs_test VALUES
1817 (7, 'Takanohana', 'Yokuzuna');"))
1818 => #<PGresult PGRES_COMMAND_OK[1] - INSERT 38688 1>
1821 (setq R (pq-exec P "SELECT * from xemacs_test;"))
1822 => #<PGresult PGRES_TUPLES_OK[7] - SELECT>
1825 (setq R (pq-exec P "DELETE FROM xemacs_test
1826 WHERE shikona LIKE '%hana';"))
1827 => #<PGresult PGRES_COMMAND_OK[2] - DELETE 2>
1831 -- Function: pq-oid-value result
1832 Return the object id of the insertion if the last command was an
1833 INSERT. 0 is returned if the last command was not an insertion.
1834 RESULT is a PGresult object.
1836 In the first example, the numbers you will see on your local
1837 system will almost certainly be different, however the second
1838 number from the right in the unprintable PGresult object and the
1839 number returned by `pq-oid-value' should match.
1840 (setq R (pq-exec P "INSERT INTO xemacs_test VALUES
1841 (8, 'Terao', 'Maegashira');"))
1842 => #<PGresult PGRES_COMMAND_OK[1] - INSERT 542089 1>
1845 (setq R (pq-exec P "SELECT shikona FROM xemacs_test
1846 WHERE rank='Maegashira';"))
1847 => #<PGresult PGRES_TUPLES_OK[2] - SELECT>
1851 -- Function: pq-make-empty-pgresult conn status
1852 Create an empty pgresult with the given status. CONN a database
1853 connection object STATUS a value that can be returned by
1856 The caller is responsible for making sure the return value gets
1860 File: lispref.info, Node: Synchronous Interface Functions, Next: Asynchronous Interface Functions, Prev: libpq Lisp Symbols and DataTypes, Up: XEmacs PostgreSQL libpq API
1862 61.2.3 Synchronous Interface Functions
1863 --------------------------------------
1865 -- Function: pq-connectdb conninfo
1866 Establish a (synchronous) database connection. CONNINFO A string
1867 of blank separated options. Options are of the form "OPTION =
1868 VALUE". If VALUE contains blanks, it must be single quoted.
1869 Blanks around the equal sign are optional. Multiple option
1870 assignments are blank separated.
1871 (pq-connectdb "dbname=japanese port = 25432")
1872 => #<PGconn localhost:25432 steve/japanese>
1873 The printed representation of a database connection object has four
1874 fields. The first field is the hostname where the database server
1875 is running (in this case localhost), the second field is the port
1876 number, the third field is the database user name, and the fourth
1877 field is the name of the database.
1879 Database connection objects which have been disconnected and will
1880 generate an immediate error if they are used look like:
1882 Bad connections can be reestablished with `pq-reset', or deleted
1883 entirely with `pq-finish'.
1885 A database connection object that has been deleted looks like:
1886 (let ((P1 (pq-connectdb "")))
1891 Note that database connection objects are the most heavy weight
1892 objects in XEmacs Lisp at this writing, usually representing as
1893 much as several megabytes of virtual memory on the machine the
1894 database server is running on. It is wisest to explicitly delete
1895 them when you are finished with them, rather than letting garbage
1896 collection do it. An example idiom is:
1898 (let ((P (pq-connectiondb "")))
1901 (...)) ; access database here
1904 The following options are available in the options string:
1906 Authentication type. Same as `PGAUTHTYPE'. This is no
1910 Database user name. Same as `PGUSER'.
1916 Database name. Same as `PGDATABASE'
1919 Symbolic hostname. Same as `PGHOST'.
1922 Host address as four octets (eg. like 192.168.1.1).
1925 TCP port to connect to. Same as `PGPORT'.
1928 Debugging TTY. Same as `PGTTY'. This value is suppressed in
1929 the XEmacs Lisp API.
1932 Extra backend database options. Same as `PGOPTIONS'.
1933 A database connection object is returned regardless of whether a
1934 connection was established or not.
1936 -- Function: pq-reset conn
1937 Reestablish database connection. CONN A database connection
1940 This function reestablishes a database connection using the
1941 original connection parameters. This is useful if something has
1942 happened to the TCP link and it has become broken.
1944 -- Function: pq-exec conn query
1945 Make a synchronous database query. CONN A database connection
1946 object. QUERY A string containing an SQL query. A PGresult
1947 object is returned, which in turn may be queried by its many
1948 accessor functions to retrieve state out of it. If the query
1949 string contains multiple SQL commands, only results from the final
1950 command are returned.
1952 (setq R (pq-exec P "SELECT * FROM xemacs_test;
1953 DELETE FROM xemacs_test WHERE id=8;"))
1954 => #<PGresult PGRES_COMMAND_OK[1] - DELETE 1>
1956 -- Function: pq-notifies conn
1957 Return the latest async notification that has not yet been handled.
1958 CONN A database connection object. If there has been a
1959 notification, then a list of two elements will be returned. The
1960 first element contains the relation name being notified, the second
1961 element contains the backend process ID number. nil is returned
1962 if there aren't any notifications to process.
1964 -- Function: PQsetenv conn
1965 Synchronous transfer of environment variables to a backend CONN A
1966 database connection object.
1968 Environment variable transfer is done as a normal part of database
1971 Compatibility note: This function was present but not documented
1972 in versions of libpq prior to 7.0.
1975 File: lispref.info, Node: Asynchronous Interface Functions, Next: Large Object Support, Prev: Synchronous Interface Functions, Up: XEmacs PostgreSQL libpq API
1977 61.2.4 Asynchronous Interface Functions
1978 ---------------------------------------
1980 Making command by command examples is too complex with the asynchronous
1981 interface functions. See the examples section for complete calling
1984 -- Function: pq-connect-start conninfo
1985 Begin establishing an asynchronous database connection. CONNINFO
1986 A string containing the connection options. See the documentation
1987 of `pq-connectdb' for a listing of all the available flags.
1989 -- Function: pq-connect-poll conn
1990 An intermediate function to be called during an asynchronous
1991 database connection. CONN A database connection object. The
1992 result codes are documented in a previous section.
1994 -- Function: pq-is-busy conn
1995 Returns t if `pq-get-result' would block waiting for input. CONN
1996 A database connection object.
1998 -- Function: pq-consume-input conn
1999 Consume any available input from the backend. CONN A database
2002 Nil is returned if anything bad happens.
2004 -- Function: pq-reset-start conn
2005 Reset connection to the backend asynchronously. CONN A database
2008 -- Function: pq-reset-poll conn
2009 Poll an asynchronous reset for completion CONN A database
2012 -- Function: pq-reset-cancel conn
2013 Attempt to request cancellation of the current operation. CONN A
2014 database connection object.
2016 The return value is t if the cancel request was successfully
2017 dispatched, nil if not (in which case conn->errorMessage is set).
2018 Note: successful dispatch is no guarantee that there will be any
2019 effect at the backend. The application must read the operation
2022 -- Function: pq-send-query conn query
2023 Submit a query to Postgres and don't wait for the result. CONN A
2024 database connection object. Returns: t if successfully submitted
2025 nil if error (conn->errorMessage is set)
2027 -- Function: pq-get-result conn
2028 Retrieve an asynchronous result from a query. CONN A database
2031 `nil' is returned when no more query work remains.
2033 -- Function: pq-set-nonblocking conn arg
2034 Sets the PGconn's database connection non-blocking if the arg is
2035 TRUE or makes it non-blocking if the arg is FALSE, this will not
2036 protect you from PQexec(), you'll only be safe when using the
2037 non-blocking API. CONN A database connection object.
2039 -- Function: pq-is-nonblocking conn
2040 Return the blocking status of the database connection CONN A
2041 database connection object.
2043 -- Function: pq-flush conn
2044 Force the write buffer to be written (or at least try) CONN A
2045 database connection object.
2047 -- Function: PQsetenvStart conn
2048 Start asynchronously passing environment variables to a backend.
2049 CONN A database connection object.
2051 Compatibility note: this function is only available with libpq-7.0.
2053 -- Function: PQsetenvPoll conn
2054 Check an asynchronous environment variables transfer for
2055 completion. CONN A database connection object.
2057 Compatibility note: this function is only available with libpq-7.0.
2059 -- Function: PQsetenvAbort conn
2060 Attempt to terminate an asynchronous environment variables
2061 transfer. CONN A database connection object.
2063 Compatibility note: this function is only available with libpq-7.0.
2066 File: lispref.info, Node: Large Object Support, Next: Other libpq Functions, Prev: Asynchronous Interface Functions, Up: XEmacs PostgreSQL libpq API
2068 61.2.5 Large Object Support
2069 ---------------------------
2071 -- Function: pq-lo-import conn filename
2072 Import a file as a large object into the database. CONN a
2073 database connection object FILENAME filename to import
2075 On success, the object id is returned.
2077 -- Function: pq-lo-export conn oid filename
2078 Copy a large object in the database into a file. CONN a database
2079 connection object. OID object id number of a large object.
2080 FILENAME filename to export to.
2083 File: lispref.info, Node: Other libpq Functions, Next: Unimplemented libpq Functions, Prev: Large Object Support, Up: XEmacs PostgreSQL libpq API
2085 61.2.6 Other libpq Functions
2086 ----------------------------
2088 -- Function: pq-finish conn
2089 Destroy a database connection object by calling free on it. CONN
2090 a database connection object
2092 It is possible to not call this routine because the usual XEmacs
2093 garbage collection mechanism will call the underlying libpq
2094 routine whenever it is releasing stale `PGconn' objects. However,
2095 this routine is useful in `unwind-protect' clauses to make
2096 connections go away quickly when unrecoverable errors have
2099 After calling this routine, the printed representation of the
2100 XEmacs wrapper object will contain the string "DEAD".
2102 -- Function: pq-client-encoding conn
2103 Return the client encoding as an integer code. CONN a database
2106 (pq-client-encoding P)
2109 Compatibility note: This function did not exist prior to libpq-7.0
2110 and does not exist in a non-Mule XEmacs.
2112 -- Function: pq-set-client-encoding conn encoding
2113 Set client coding system. CONN a database connection object
2114 ENCODING a string representing the desired coding system
2116 (pq-set-client-encoding P "EUC_JP")
2119 The current idiom for ensuring proper coding system conversion is
2120 the following (illustrated for EUC Japanese encoding):
2121 (setq P (pq-connectdb "..."))
2122 (let ((file-name-coding-system 'euc-jp)
2123 (pg-coding-system 'euc-jp))
2124 (pq-set-client-encoding "EUC_JP")
2127 Compatibility note: This function did not exist prior to libpq-7.0
2128 and does not exist in a non-Mule XEmacs.
2130 -- Function: pq-env-2-encoding
2131 Return the integer code representing the coding system in
2136 Compatibility note: This function did not exist prior to libpq-7.0
2137 and does not exist in a non-Mule XEmacs.
2139 -- Function: pq-clear res
2140 Destroy a query result object by calling free() on it. RES a
2143 Note: The memory allocation systems of libpq and XEmacs are
2144 different. The XEmacs representation of a query result object
2145 will have both the XEmacs version and the libpq version freed at
2146 the next garbage collection when the object is no longer being
2147 referenced. Calling this function does not release the XEmacs
2148 object, it is still subject to the usual rules for Lisp objects.
2149 The printed representation of the XEmacs object will contain the
2150 string "DEAD" after this routine is called indicating that it is no
2151 longer useful for anything.
2153 -- Function: pq-conn-defaults
2154 Return a data structure that represents the connection defaults.
2155 The data is returned as a list of lists, where each sublist
2156 contains info regarding a single option.
2159 File: lispref.info, Node: Unimplemented libpq Functions, Prev: Other libpq Functions, Up: XEmacs PostgreSQL libpq API
2161 61.2.7 Unimplemented libpq Functions
2162 ------------------------------------
2164 -- Unimplemented Function: PGconn *PQsetdbLogin (char *pghost, char
2165 *pgport, char *pgoptions, char *pgtty, char *dbName, char
2167 Synchronous database connection. PGHOST is the hostname of the
2168 PostgreSQL backend to connect to. PGPORT is the TCP port number
2169 to use. PGOPTIONS specifies other backend options. PGTTY
2170 specifies the debugging tty to use. DBNAME specifies the database
2171 name to use. LOGIN specifies the database user name. PWD
2172 specifies the database user's password.
2174 This routine is deprecated as of libpq-7.0, and its functionality
2175 can be replaced by external Lisp code if needed.
2177 -- Unimplemented Function: PGconn *PQsetdb (char *pghost, char
2178 *pgport, char *pgoptions, char *pgtty, char *dbName)
2179 Synchronous database connection. PGHOST is the hostname of the
2180 PostgreSQL backend to connect to. PGPORT is the TCP port number
2181 to use. PGOPTIONS specifies other backend options. PGTTY
2182 specifies the debugging tty to use. DBNAME specifies the database
2185 This routine was deprecated in libpq-6.5.
2187 -- Unimplemented Function: int PQsocket (PGconn *conn)
2188 Return socket file descriptor to a backend database process. CONN
2189 database connection object.
2191 -- Unimplemented Function: void PQprint (FILE *fout, PGresult *res,
2193 Print out the results of a query to a designated C stream. FOUT C
2194 stream to print to RES the query result object to print PS the
2195 print options structure.
2197 This routine is deprecated as of libpq-7.0 and cannot be sensibly
2198 exported to XEmacs Lisp.
2200 -- Unimplemented Function: void PQdisplayTuples (PGresult *res, FILE
2201 *fp, int fillAlign, char *fieldSep, int printHeader, int
2203 RES query result object to print FP C stream to print to FILLALIGN
2204 pad the fields with spaces FIELDSEP field separator PRINTHEADER
2205 display headers? QUIET
2207 This routine was deprecated in libpq-6.5.
2209 -- Unimplemented Function: void PQprintTuples (PGresult *res, FILE
2210 *fout, int printAttName, int terseOutput, int width)
2211 RES query result object to print FOUT C stream to print to
2212 PRINTATTNAME print attribute names TERSEOUTPUT delimiter bars
2213 WIDTH width of column, if 0, use variable width
2215 This routine was deprecated in libpq-6.5.
2217 -- Unimplemented Function: int PQmblen (char *s, int encoding)
2218 Determine length of a multibyte encoded char at `*s'. S encoded
2219 string ENCODING type of encoding
2221 Compatibility note: This function was introduced in libpq-7.0.
2223 -- Unimplemented Function: void PQtrace (PGconn *conn, FILE
2225 Enable tracing on `debug_port'. CONN database connection object.
2226 DEBUG_PORT C output stream to use.
2228 -- Unimplemented Function: void PQuntrace (PGconn *conn)
2229 Disable tracing. CONN database connection object.
2231 -- Unimplemented Function: char *PQoidStatus (PGconn *conn)
2232 Return the object id as a string of the last tuple inserted. CONN
2233 database connection object.
2235 Compatibility note: This function is deprecated in libpq-7.0,
2236 however it is used internally by the XEmacs binding code when
2237 linked against versions prior to 7.0.
2239 -- Unimplemented Function: PGresult *PQfn (PGconn *conn, int fnid, int
2240 *result_buf, int *result_len, int result_is_int, PQArgBlock
2242 "Fast path" interface -- not really recommended for application use
2243 CONN A database connection object. FNID RESULT_BUF RESULT_LEN
2244 RESULT_IS_INT ARGS NARGS
2246 The following set of very low level large object functions aren't
2247 appropriate to be exported to Lisp.
2249 -- Unimplemented Function: int pq-lo-open (PGconn *conn, int lobjid,
2251 CONN a database connection object. LOBJID a large object ID.
2254 -- Unimplemented Function: int pq-lo-close (PGconn *conn, int fd)
2255 CONN a database connection object. FD a large object file
2258 -- Unimplemented Function: int pq-lo-read (PGconn *conn, int fd, char
2260 CONN a database connection object. FD a large object file
2261 descriptor. BUF buffer to read into. LEN size of buffer.
2263 -- Unimplemented Function: int pq-lo-write (PGconn *conn, int fd, char
2265 CONN a database connection object. FD a large object file
2266 descriptor. BUF buffer to write from. LEN size of buffer.
2268 -- Unimplemented Function: int pq-lo-lseek (PGconn *conn, int fd, int
2270 CONN a database connection object. FD a large object file
2271 descriptor. OFFSET WHENCE
2273 -- Unimplemented Function: int pq-lo-creat (PGconn *conn, int mode)
2274 CONN a database connection object. MODE opening modes.
2276 -- Unimplemented Function: int pq-lo-tell (PGconn *conn, int fd)
2277 CONN a database connection object. FD a large object file
2280 -- Unimplemented Function: int pq-lo-unlink (PGconn *conn, int lobjid)
2281 CONN a database connection object. LBOJID a large object ID.
2284 File: lispref.info, Node: XEmacs PostgreSQL libpq Examples, Prev: XEmacs PostgreSQL libpq API, Up: PostgreSQL Support
2286 61.3 XEmacs PostgreSQL libpq Examples
2287 =====================================
2289 This is an example of one method of establishing an asynchronous
2292 (defun database-poller (P)
2293 (message "%S before poll" (pq-pgconn P 'pq::status))
2295 (message "%S after poll" (pq-pgconn P 'pq::status))
2296 (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
2298 (add-timeout .1 'database-poller P)))
2301 (setq P (pq-connect-start ""))
2302 (add-timeout .1 'database-poller P))
2303 => pg::connection-started before poll
2304 => pg::connection-made after poll
2305 => pg::connection-made before poll
2306 => pg::connection-awaiting-response after poll
2307 => pg::connection-awaiting-response before poll
2308 => pg::connection-auth-ok after poll
2309 => pg::connection-auth-ok before poll
2310 => pg::connection-setenv after poll
2311 => pg::connection-setenv before poll
2312 => pg::connection-ok after poll
2315 => #<PGconn localhost:25432 steve/steve>
2317 Here is an example of one method of doing an asynchronous reset.
2319 (defun database-poller (P)
2321 (message "%S before poll" (pq-pgconn P 'pq::status))
2322 (setq PS (pq-reset-poll P))
2323 (message "%S after poll [%S]" (pq-pgconn P 'pq::status) PS)
2324 (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
2326 (add-timeout .1 'database-poller P))))
2330 (add-timeout .1 'database-poller P))
2331 => pg::connection-started before poll
2332 => pg::connection-made after poll [pgres::polling-writing]
2333 => pg::connection-made before poll
2334 => pg::connection-awaiting-response after poll [pgres::polling-reading]
2335 => pg::connection-awaiting-response before poll
2336 => pg::connection-setenv after poll [pgres::polling-reading]
2337 => pg::connection-setenv before poll
2338 => pg::connection-ok after poll [pgres::polling-ok]
2341 => #<PGconn localhost:25432 steve/steve>
2343 And finally, an asynchronous query.
2345 (defun database-poller (P)
2347 (pq-consume-input P)
2349 (add-timeout .1 'database-poller P)
2350 (setq R (pq-get-result P))
2353 (push R result-list)
2354 (add-timeout .1 'database-poller P))))))
2356 (when (pq-send-query P "SELECT * FROM xemacs_test;")
2357 (setq result-list nil)
2358 (add-timeout .1 'database-poller P))
2362 => (#<PGresult PGRES_TUPLES_OK - SELECT>)
2364 Here is an example showing how multiple SQL statements in a single
2365 query can have all their results collected.
2366 ;; Using the same `database-poller' function from the previous example
2367 (when (pq-send-query P "SELECT * FROM xemacs_test;
2368 SELECT * FROM pg_database;
2369 SELECT * FROM pg_user;")
2370 (setq result-list nil)
2371 (add-timeout .1 'database-poller P))
2375 => (#<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT>)
2377 Here is an example which illustrates collecting all data from a
2378 query, including the field names.
2380 (defun pg-util-query-results (results)
2381 "Retrieve results of last SQL query into a list structure."
2382 (let ((i (1- (pq-ntuples R)))
2385 (setq j (1- (pq-nfields R)))
2388 (push (pq-get-value R i j) l2)
2392 (setq j (1- (pq-nfields R)))
2395 (push (pq-fname R j) l2)
2399 => pg-util-query-results
2400 (setq R (pq-exec P "SELECT * FROM xemacs_test ORDER BY field2 DESC;"))
2401 => #<PGresult PGRES_TUPLES_OK - SELECT>
2402 (pg-util-query-results R)
2403 => (("f1" "field2") ("a" "97") ("b" "97") ("stuff" "42") ("a string" "12") ("foo" "10") ("string" "2") ("text" "1"))
2405 Here is an example of a query that uses a database cursor.
2408 (setq R (pq-exec P "BEGIN;"))
2409 (setq R (pq-exec P "DECLARE k_cursor CURSOR FOR SELECT * FROM xemacs_test ORDER BY f1 DESC;"))
2411 (setq R (pq-exec P "FETCH k_cursor;"))
2412 (while (eq (pq-ntuples R) 1)
2413 (push (list (pq-get-value R 0 0) (pq-get-value R 0 1)) data)
2414 (setq R (pq-exec P "FETCH k_cursor;")))
2415 (setq R (pq-exec P "END;"))
2417 => (("a" "97") ("a string" "12") ("b" "97") ("foo" "10") ("string" "2") ("stuff" "42") ("text" "1"))
2419 Here's another example of cursors, this time with a Lisp macro to
2420 implement a mapping function over a table.
2422 (defmacro map-db (P table condition callout)
2424 (pq-exec ,P "BEGIN;")
2425 (pq-exec ,P (concat "DECLARE k_cursor CURSOR FOR SELECT * FROM "
2429 " ORDER BY f1 DESC;"))
2430 (setq R (pq-exec P "FETCH k_cursor;"))
2431 (while (eq (pq-ntuples R) 1)
2432 (,callout (pq-get-value R 0 0) (pq-get-value R 0 1))
2433 (setq R (pq-exec P "FETCH k_cursor;")))
2434 (pq-exec P "END;")))
2436 (defun callback (arg1 arg2)
2437 (message "arg1 = %s, arg2 = %s" arg1 arg2))
2439 (map-db P "xemacs_test" "WHERE field2 > 10" callback)
2440 => arg1 = stuff, arg2 = 42
2441 => arg1 = b, arg2 = 97
2442 => arg1 = a string, arg2 = 12
2443 => arg1 = a, arg2 = 97
2444 => #<PGresult PGRES_COMMAND_OK - COMMIT>
2447 File: lispref.info, Node: Internationalization, Next: MULE, Prev: PostgreSQL Support, Up: Top
2449 62 Internationalization
2450 ***********************
2454 * I18N Levels 1 and 2:: Support for different time, date, and currency formats.
2455 * I18N Level 3:: Support for localized messages.
2456 * I18N Level 4:: Support for Asian languages.
2459 File: lispref.info, Node: I18N Levels 1 and 2, Next: I18N Level 3, Up: Internationalization
2461 62.1 I18N Levels 1 and 2
2462 ========================
2464 XEmacs is now compliant with I18N levels 1 and 2. Specifically, this
2465 means that it is 8-bit clean and correctly handles time and date
2466 functions. XEmacs will correctly display the entire ISO-Latin 1
2469 The compose key may now be used to create any character in the
2470 ISO-Latin 1 character set not directly available via the keyboard.. In
2471 order for the compose key to work it is necessary to load the file
2472 `x-compose.el'. At any time while composing a character, `C-h' will
2473 display all valid completions and the character which would be produced.
2476 File: lispref.info, Node: I18N Level 3, Next: I18N Level 4, Prev: I18N Levels 1 and 2, Up: Internationalization
2484 * Level 3 Primitives::
2485 * Dynamic Messaging::
2486 * Domain Specification::
2487 * Documentation String Extraction::
2490 File: lispref.info, Node: Level 3 Basics, Next: Level 3 Primitives, Up: I18N Level 3
2492 62.2.1 Level 3 Basics
2493 ---------------------
2495 XEmacs now provides alpha-level functionality for I18N Level 3. This
2496 means that everything necessary for full messaging is available, but
2497 not every file has been converted.
2499 The two message files which have been created are `src/emacs.po' and
2500 `lisp/packages/mh-e.po'. Both files need to be converted using
2501 `msgfmt', and the resulting `.mo' files placed in some locale's
2502 `LC_MESSAGES' directory. The test "translations" in these files are
2503 the original messages prefixed by `TRNSLT_'.
2505 The domain for a variable is stored on the variable's property list
2506 under the property name VARIABLE-DOMAIN. The function
2507 `documentation-property' uses this information when translating a
2508 variable's documentation.
2511 File: lispref.info, Node: Level 3 Primitives, Next: Dynamic Messaging, Prev: Level 3 Basics, Up: I18N Level 3
2513 62.2.2 Level 3 Primitives
2514 -------------------------
2516 -- Function: gettext string
2517 This function looks up STRING in the default message domain and
2518 returns its translation. If `I18N3' was not enabled when XEmacs
2519 was compiled, it just returns STRING.
2521 -- Function: dgettext domain string
2522 This function looks up STRING in the specified message domain and
2523 returns its translation. If `I18N3' was not enabled when XEmacs
2524 was compiled, it just returns STRING.
2526 -- Function: bind-text-domain domain pathname
2527 This function associates a pathname with a message domain. Here's
2528 how the path to message file is constructed under SunOS 5.x:
2530 `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo'
2532 If `I18N3' was not enabled when XEmacs was compiled, this function
2535 -- Special Form: domain string
2536 This function specifies the text domain used for translating
2537 documentation strings and interactive prompts of a function. For
2540 (defun foo (arg) "Doc string" (domain "emacs-foo") ...)
2542 to specify `emacs-foo' as the text domain of the function `foo'.
2543 The "call" to `domain' is actually a declaration rather than a
2544 function; when actually called, `domain' just returns `nil'.
2546 -- Function: domain-of function
2547 This function returns the text domain of FUNCTION; it returns
2548 `nil' if it is the default domain. If `I18N3' was not enabled
2549 when XEmacs was compiled, it always returns `nil'.
2552 File: lispref.info, Node: Dynamic Messaging, Next: Domain Specification, Prev: Level 3 Primitives, Up: I18N Level 3
2554 62.2.3 Dynamic Messaging
2555 ------------------------
2557 The `format' function has been extended to permit you to change the
2558 order of parameter insertion. For example, the conversion format
2559 `%1$s' inserts parameter one as a string, while `%2$s' inserts
2560 parameter two. This is useful when creating translations which require
2561 you to change the word order.
2564 File: lispref.info, Node: Domain Specification, Next: Documentation String Extraction, Prev: Dynamic Messaging, Up: I18N Level 3
2566 62.2.4 Domain Specification
2567 ---------------------------
2569 The default message domain of XEmacs is `emacs'. For add-on packages,
2570 it is best to use a different domain. For example, let us say we want
2571 to convert the "gorilla" package to use the domain `emacs-gorilla'. To
2572 translate the message "What gorilla?", use `dgettext' as follows:
2574 (dgettext "emacs-gorilla" "What gorilla?")
2576 A function (or macro) which has a documentation string or an
2577 interactive prompt needs to be associated with the domain in order for
2578 the documentation or prompt to be translated. This is done with the
2579 `domain' special form as follows:
2581 (defun scratch (location)
2582 "Scratch the specified location."
2583 (domain "emacs-gorilla")
2584 (interactive "sScratch: ")
2587 It is most efficient to specify the domain in the first line of the
2588 function body, before the `interactive' form.
2590 For variables and constants which have documentation strings,
2591 specify the domain after the documentation.
2593 -- Special Form: defvar symbol [value [doc-string [domain]]]
2595 (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla")
2597 -- Special Form: defconst symbol [value [doc-string [domain]]]
2599 (defconst limbs 4 "Number of limbs" "emacs-gorilla")
2601 -- Function: autoload function filename &optional docstring
2603 This function defines FUNCTION to autoload from FILENAME Example:
2604 (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")
2607 File: lispref.info, Node: Documentation String Extraction, Prev: Domain Specification, Up: I18N Level 3
2609 62.2.5 Documentation String Extraction
2610 --------------------------------------
2612 The utility `etc/make-po' scans the file `DOC' to extract documentation
2613 strings and creates a message file `doc.po'. This file may then be
2614 inserted within `emacs.po'.
2616 Currently, `make-po' is hard-coded to read from `DOC' and write to
2617 `doc.po'. In order to extract documentation strings from an add-on
2618 package, first run `make-docfile' on the package to produce the `DOC'
2619 file. Then run `make-po -p' with the `-p' argument to indicate that we
2620 are extracting documentation for an add-on package.
2622 (The `-p' argument is a kludge to make up for a subtle difference
2623 between pre-loaded documentation and add-on documentation: For add-on
2624 packages, the final carriage returns in the strings produced by
2625 `make-docfile' must be ignored.)
2628 File: lispref.info, Node: I18N Level 4, Prev: I18N Level 3, Up: Internationalization
2633 The Asian-language support in XEmacs is called "MULE". *Note MULE::.
2636 File: lispref.info, Node: MULE, Next: Tips, Prev: Internationalization, Up: Top
2641 "MULE" is the name originally given to the version of GNU Emacs
2642 extended for multi-lingual (and in particular Asian-language) support.
2643 "MULE" is short for "MUlti-Lingual Emacs". It is an extension and
2644 complete rewrite of Nemacs ("Nihon Emacs" where "Nihon" is the Japanese
2645 word for "Japan"), which only provided support for Japanese. XEmacs
2646 refers to its multi-lingual support as "MULE support" since it is based
2651 * Internationalization Terminology::
2652 Definition of various internationalization terms.
2653 * Charsets:: Sets of related characters.
2654 * MULE Characters:: Working with characters in XEmacs/MULE.
2655 * Composite Characters:: Making new characters by overstriking other ones.
2656 * Coding Systems:: Ways of representing a string of chars using integers.
2657 * CCL:: A special language for writing fast converters.
2658 * Category Tables:: Subdividing charsets into groups.
2661 File: lispref.info, Node: Internationalization Terminology, Next: Charsets, Up: MULE
2663 63.1 Internationalization Terminology
2664 =====================================
2666 In internationalization terminology, a string of text is divided up
2667 into "characters", which are the printable units that make up the text.
2668 A single character is (for example) a capital `A', the number `2', a
2669 Katakana character, a Hangul character, a Kanji ideograph (an
2670 "ideograph" is a "picture" character, such as is used in Japanese
2671 Kanji, Chinese Hanzi, and Korean Hanja; typically there are thousands
2672 of such ideographs in each language), etc. The basic property of a
2673 character is that it is the smallest unit of text with semantic
2674 significance in text processing.
2676 Human beings normally process text visually, so to a first
2677 approximation a character may be identified with its shape. Note that
2678 the same character may be drawn by two different people (or in two
2679 different fonts) in slightly different ways, although the "basic shape"
2680 will be the same. But consider the works of Scott Kim; human beings
2681 can recognize hugely variant shapes as the "same" character.
2682 Sometimes, especially where characters are extremely complicated to
2683 write, completely different shapes may be defined as the "same"
2684 character in national standards. The Taiwanese variant of Hanzi is
2685 generally the most complicated; over the centuries, the Japanese,
2686 Koreans, and the People's Republic of China have adopted
2687 simplifications of the shape, but the line of descent from the original
2688 shape is recorded, and the meanings and pronunciation of different
2689 forms of the same character are considered to be identical within each
2690 language. (Of course, it may take a specialist to recognize the
2691 related form; the point is that the relations are standardized, despite
2692 the differing shapes.)
2694 In some cases, the differences will be significant enough that it is
2695 actually possible to identify two or more distinct shapes that both
2696 represent the same character. For example, the lowercase letters `a'
2697 and `g' each have two distinct possible shapes--the `a' can optionally
2698 have a curved tail projecting off the top, and the `g' can be formed
2699 either of two loops, or of one loop and a tail hanging off the bottom.
2700 Such distinct possible shapes of a character are called "glyphs". The
2701 important characteristic of two glyphs making up the same character is
2702 that the choice between one or the other is purely stylistic and has no
2703 linguistic effect on a word (this is the reason why a capital `A' and
2704 lowercase `a' are different characters rather than different
2705 glyphs--e.g. `Aspen' is a city while `aspen' is a kind of tree).
2707 Note that "character" and "glyph" are used differently here than
2708 elsewhere in XEmacs.
2710 A "character set" is essentially a set of related characters. ASCII,
2711 for example, is a set of 94 characters (or 128, if you count
2712 non-printing characters). Other character sets are ISO8859-1 (ASCII
2713 plus various accented characters and other international symbols), JIS
2714 X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
2715 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
2716 GB2312 (Mainland Chinese Hanzi), etc.
2718 The definition of a character set will implicitly or explicitly give
2719 it an "ordering", a way of assigning a number to each character in the
2720 set. For many character sets, there is a natural ordering, for example
2721 the "ABC" ordering of the Roman letters. But it is not clear whether
2722 digits should come before or after the letters, and in fact different
2723 European languages treat the ordering of accented characters
2724 differently. It is useful to use the natural order where available, of
2725 course. The number assigned to any particular character is called the
2726 character's "code point". (Within a given character set, each
2727 character has a unique code point. Thus the word "set" is ill-chosen;
2728 different orderings of the same characters are different character sets.
2729 Identifying characters is simple enough for alphabetic character sets,
2730 but the difference in ordering can cause great headaches when the same
2731 thousands of characters are used by different cultures as in the Hanzi.)
2733 A code point may be broken into a number of "position codes". The
2734 number of position codes required to index a particular character in a
2735 character set is called the "dimension" of the character set. For
2736 practical purposes, a position code may be thought of as a byte-sized
2737 index. The printing characters of ASCII, being a relatively small
2738 character set, is of dimension one, and each character in the set is
2739 indexed using a single position code, in the range 1 through 94. Use of
2740 this unusual range, rather than the familiar 33 through 126, is an
2741 intentional abstraction; to understand the programming issues you must
2742 break the equation between character sets and encodings.
2744 JIS X 0208, i.e. Japanese Kanji, has thousands of characters, and is
2745 of dimension two - every character is indexed by two position codes,
2746 each in the range 1 through 94. (This number "94" is not a
2747 coincidence; we shall see that the JIS position codes were chosen so
2748 that JIS kanji could be encoded without using codes that in ASCII are
2749 associated with device control functions.) Note that the choice of the
2750 range here is somewhat arbitrary. You could just as easily index the
2751 printing characters in ASCII using numbers in the range 0 through 93, 2
2752 through 95, 3 through 96, etc. In fact, the standardized _encoding_
2753 for the ASCII _character set_ uses the range 33 through 126.
2755 An "encoding" is a way of numerically representing characters from
2756 one or more character sets into a stream of like-sized numerical values
2757 called "words"; typically these are 8-bit, 16-bit, or 32-bit
2758 quantities. If an encoding encompasses only one character set, then the
2759 position codes for the characters in that character set could be used
2760 directly. (This is the case with the trivial cipher used by children,
2761 assigning 1 to `A', 2 to `B', and so on.) However, even with ASCII,
2762 other considerations intrude. For example, why are the upper- and
2763 lowercase alphabets separated by 8 characters? Why do the digits start
2764 with `0' being assigned the code 48? In both cases because semantically
2765 interesting operations (case conversion and numerical value extraction)
2766 become convenient masking operations. Other artificial aspects (the
2767 control characters being assigned to codes 0-31 and 127) are historical
2768 accidents. (The use of 127 for `DEL' is an artifact of the "punch
2769 once" nature of paper tape, for example.)
2771 Naive use of the position code is not possible, however, if more than
2772 one character set is to be used in the encoding. For example, printed
2773 Japanese text typically requires characters from multiple character sets
2774 - ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
2775 indexed using one or more position codes in the range 1 through 94, so
2776 the position codes could not be used directly or there would be no way
2777 to tell which character was meant. Different Japanese encodings handle
2778 this differently - JIS uses special escape characters to denote
2779 different character sets; EUC sets the high bit of the position codes
2780 for JIS X 0208 and JIS X 0212, and puts a special extra byte before each
2781 JIS X 0212 character; etc. (JIS, EUC, and most of the other encodings
2782 you will encounter in files are 7-bit or 8-bit encodings. There is one
2783 common 16-bit encoding, which is Unicode; this strives to represent all
2784 the world's characters in a single large character set. 32-bit
2785 encodings are often used internally in programs, such as XEmacs with
2786 MULE support, to simplify the code that manipulates them; however, they
2787 are not used externally because they are not very space-efficient.)
2789 A general method of handling text using multiple character sets
2790 (whether for multilingual text, or simply text in an extremely
2791 complicated single language like Japanese) is defined in the
2792 international standard ISO 2022. ISO 2022 will be discussed in more
2793 detail later (*note ISO 2022::), but for now suffice it to say that text
2794 needs control functions (at least spacing), and if escape sequences are
2795 to be used, an escape sequence introducer. It was decided to make all
2796 text streams compatible with ASCII in the sense that the codes 0-31
2797 (and 128-159) would always be control codes, never graphic characters,
2798 and where defined by the character set the `SPC' character would be
2799 assigned code 32, and `DEL' would be assigned 127. Thus there are 94
2800 code points remaining if 7 bits are used. This is the reason that most
2801 character sets are defined using position codes in the range 1 through
2802 94. Then ISO 2022 compatible encodings are produced by shifting the
2803 position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
2804 codes are available) into character codes 161 to 254.
2806 Encodings are classified as either "modal" or "non-modal". In a
2807 "modal encoding", there are multiple states that the encoding can be
2808 in, and the interpretation of the values in the stream depends on the
2809 current global state of the encoding. Special values in the encoding,
2810 called "escape sequences", are used to change the global state. JIS,
2811 for example, is a modal encoding. The bytes `ESC $ B' indicate that,
2812 from then on, bytes are to be interpreted as position codes for JIS X
2813 0208, rather than as ASCII. This effect is cancelled using the bytes
2814 `ESC ( B', which mean "switch from whatever the current state is to
2815 ASCII". To switch to JIS X 0212, the escape sequence `ESC $ ( D'.
2816 (Note that here, as is common, the escape sequences do in fact begin
2817 with `ESC'. This is not necessarily the case, however. Some encodings
2818 use control characters called "locking shifts" (effect persists until
2819 cancelled) to switch character sets.)
2821 A "non-modal encoding" has no global state that extends past the
2822 character currently being interpreted. EUC, for example, is a
2823 non-modal encoding. Characters in JIS X 0208 are encoded by setting
2824 the high bit of the position codes, and characters in JIS X 0212 are
2825 encoded by doing the same but also prefixing the character with the
2828 The advantage of a modal encoding is that it is generally more
2829 space-efficient, and is easily extendible because there are essentially
2830 an arbitrary number of escape sequences that can be created. The
2831 disadvantage, however, is that it is much more difficult to work with
2832 if it is not being processed in a sequential manner. In the non-modal
2833 EUC encoding, for example, the byte 0x41 always refers to the letter
2834 `A'; whereas in JIS, it could either be the letter `A', or one of the
2835 two position codes in a JIS X 0208 character, or one of the two
2836 position codes in a JIS X 0212 character. Determining exactly which
2837 one is meant could be difficult and time-consuming if the previous
2838 bytes in the string have not already been processed, or impossible if
2839 they are drawn from an external stream that cannot be rewound.
2841 Non-modal encodings are further divided into "fixed-width" and
2842 "variable-width" formats. A fixed-width encoding always uses the same
2843 number of words per character, whereas a variable-width encoding does
2844 not. EUC is a good example of a variable-width encoding: one to three
2845 bytes are used per character, depending on the character set. 16-bit
2846 and 32-bit encodings are nearly always fixed-width, and this is in fact
2847 one of the main reasons for using an encoding with a larger word size.
2848 The advantages of fixed-width encodings should be obvious. The
2849 advantages of variable-width encodings are that they are generally more
2850 space-efficient and allow for compatibility with existing 8-bit
2851 encodings such as ASCII. (For example, in Unicode ASCII characters are
2852 simply promoted to a 16-bit representation. That means that every
2853 ASCII character contains a `NUL' byte; evidently all of the standard
2854 string manipulation functions will lose badly in a fixed-width Unicode
2857 The bytes in an 8-bit encoding are often referred to as "octets"
2858 rather than simply as bytes. This terminology dates back to the days
2859 before 8-bit bytes were universal, when some computers had 9-bit bytes,
2860 others had 10-bit bytes, etc.
2863 File: lispref.info, Node: Charsets, Next: MULE Characters, Prev: Internationalization Terminology, Up: MULE
2868 A "charset" in MULE is an object that encapsulates a particular
2869 character set as well as an ordering of those characters. Charsets are
2870 permanent objects and are named using symbols, like faces.
2872 -- Function: charsetp object
2873 This function returns non-`nil' if OBJECT is a charset.
2877 * Charset Properties:: Properties of a charset.
2878 * Basic Charset Functions:: Functions for working with charsets.
2879 * Charset Property Functions:: Functions for accessing charset properties.
2880 * Predefined Charsets:: Predefined charset objects.
2883 File: lispref.info, Node: Charset Properties, Next: Basic Charset Functions, Up: Charsets
2885 63.2.1 Charset Properties
2886 -------------------------
2888 Charsets have the following properties:
2891 A symbol naming the charset. Every charset must have a different
2892 name; this allows a charset to be referred to using its name
2893 rather than the actual charset object.
2896 A documentation string describing the charset.
2899 A regular expression matching the font registry field for this
2900 character set. For example, both the `ascii' and `latin-iso8859-1'
2901 charsets use the registry `"ISO8859-1"'. This field is used to
2902 choose an appropriate font when the user gives a general font
2903 specification such as `-*-courier-medium-r-*-140-*', i.e. a
2904 14-point upright medium-weight Courier font.
2907 Number of position codes used to index a character in the
2908 character set. XEmacs/MULE can only handle character sets of
2909 dimension 1 or 2. This property defaults to 1.
2912 Number of characters in each dimension. In XEmacs/MULE, the only
2913 allowed values are 94 or 96. (There are a couple of pre-defined
2914 character sets, such as ASCII, that do not follow this, but you
2915 cannot define new ones like this.) Defaults to 94. Note that if
2916 the dimension is 2, the character set thus described is 94x94 or
2920 Number of columns used to display a character in this charset.
2921 Only used in TTY mode. (Under X, the actual width of a character
2922 can be derived from the font used to display the characters.) If
2923 unspecified, defaults to the dimension. (This is almost always the
2924 correct value, because character sets with dimension 2 are usually
2925 ideograph character sets, which need two columns to display the
2926 intricate ideographs.)
2929 A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left).
2930 Defaults to `l2r'. This specifies the direction that the text
2931 should be displayed in, and will be left-to-right for most
2932 charsets but right-to-left for Hebrew and Arabic. (Right-to-left
2933 display is not currently implemented.)
2936 Final byte of the standard ISO 2022 escape sequence designating
2937 this charset. Must be supplied. Each combination of (DIMENSION,
2938 CHARS) defines a separate namespace for final bytes, and each
2939 charset within a particular namespace must have a different final
2940 byte. Note that ISO 2022 restricts the final byte to the range
2941 0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2.
2942 Note also that final bytes in the range 0x30 - 0x3F are reserved
2943 for user-defined (not official) character sets. For more
2944 information on ISO 2022, see *Note Coding Systems::.
2947 0 (use left half of font on output) or 1 (use right half of font on
2948 output). Defaults to 0. This specifies how to convert the
2949 position codes that index a character in a character set into an
2950 index into the font used to display the character set. With
2951 `graphic' set to 0, position codes 33 through 126 map to font
2952 indices 33 through 126; with it set to 1, position codes 33
2953 through 126 map to font indices 161 through 254 (i.e. the same
2954 number but with the high bit set). For example, for a font whose
2955 registry is ISO8859-1, the left half of the font (octets 0x20 -
2956 0x7F) is the `ascii' charset, while the right half (octets 0xA0 -
2957 0xFF) is the `latin-iso8859-1' charset.
2960 A compiled CCL program used to convert a character in this charset
2961 into an index into the font. This is in addition to the `graphic'
2962 property. If a CCL program is defined, the position codes of a
2963 character will first be processed according to `graphic' and then
2964 passed through the CCL program, with the resulting values used to
2967 This is used, for example, in the Big5 character set (used in
2968 Taiwan). This character set is not ISO-2022-compliant, and its
2969 size (94x157) does not fit within the maximum 96x96 size of
2970 ISO-2022-compliant character sets. As a result, XEmacs/MULE
2971 splits it (in a rather complex fashion, so as to group the most
2972 commonly used characters together) into two charset objects
2973 (`big5-1' and `big5-2'), each of size 94x94, and each charset
2974 object uses a CCL program to convert the modified position codes
2975 back into standard Big5 indices to retrieve a character from a
2978 Most of the above properties can only be set when the charset is
2979 initialized, and cannot be changed later. *Note Charset Property
2983 File: lispref.info, Node: Basic Charset Functions, Next: Charset Property Functions, Prev: Charset Properties, Up: Charsets
2985 63.2.2 Basic Charset Functions
2986 ------------------------------
2988 -- Function: find-charset charset-or-name
2989 This function retrieves the charset of the given name. If
2990 CHARSET-OR-NAME is a charset object, it is simply returned.
2991 Otherwise, CHARSET-OR-NAME should be a symbol. If there is no
2992 such charset, `nil' is returned. Otherwise the associated charset
2995 -- Function: get-charset name
2996 This function retrieves the charset of the given name. Same as
2997 `find-charset' except an error is signalled if there is no such
2998 charset instead of returning `nil'.
3000 -- Function: charset-list
3001 This function returns a list of the names of all defined charsets.
3003 -- Function: make-charset name doc-string props
3004 This function defines a new character set. This function is for
3005 use with MULE support. NAME is a symbol, the name by which the
3006 character set is normally referred. DOC-STRING is a string
3007 describing the character set. PROPS is a property list,
3008 describing the specific nature of the character set. The
3009 recognized properties are `registry', `dimension', `columns',
3010 `chars', `final', `graphic', `direction', and `ccl-program', as
3011 previously described.
3013 -- Function: make-reverse-direction-charset charset new-name
3014 This function makes a charset equivalent to CHARSET but which goes
3015 in the opposite direction. NEW-NAME is the name of the new
3016 charset. The new charset is returned.
3018 -- Function: charset-from-attributes dimension chars final &optional
3020 This function returns a charset with the given DIMENSION, CHARS,
3021 FINAL, and DIRECTION. If DIRECTION is omitted, both directions
3022 will be checked (left-to-right will be returned if character sets
3023 exist for both directions).
3025 -- Function: charset-reverse-direction-charset charset
3026 This function returns the charset (if any) with the same dimension,
3027 number of characters, and final byte as CHARSET, but which is
3028 displayed in the opposite direction.
3031 File: lispref.info, Node: Charset Property Functions, Next: Predefined Charsets, Prev: Basic Charset Functions, Up: Charsets
3033 63.2.3 Charset Property Functions
3034 ---------------------------------
3036 All of these functions accept either a charset name or charset object.
3038 -- Function: charset-property charset prop
3039 This function returns property PROP of CHARSET. *Note Charset
3042 Convenience functions are also provided for retrieving individual
3043 properties of a charset.
3045 -- Function: charset-name charset
3046 This function returns the name of CHARSET. This will be a symbol.
3048 -- Function: charset-description charset
3049 This function returns the documentation string of CHARSET.
3051 -- Function: charset-registry charset
3052 This function returns the registry of CHARSET.
3054 -- Function: charset-dimension charset
3055 This function returns the dimension of CHARSET.
3057 -- Function: charset-chars charset
3058 This function returns the number of characters per dimension of
3061 -- Function: charset-width charset
3062 This function returns the number of display columns per character
3063 (in TTY mode) of CHARSET.
3065 -- Function: charset-direction charset
3066 This function returns the display direction of CHARSET--either
3069 -- Function: charset-iso-final-char charset
3070 This function returns the final byte of the ISO 2022 escape
3071 sequence designating CHARSET.
3073 -- Function: charset-iso-graphic-plane charset
3074 This function returns either 0 or 1, depending on whether the
3075 position codes of characters in CHARSET map to the left or right
3076 half of their font, respectively.
3078 -- Function: charset-ccl-program charset
3079 This function returns the CCL program, if any, for converting
3080 position codes of characters in CHARSET into font indices.
3082 The two properties of a charset that can currently be set after the
3083 charset has been created are the CCL program and the font registry.
3085 -- Function: set-charset-ccl-program charset ccl-program
3086 This function sets the `ccl-program' property of CHARSET to
3089 -- Function: set-charset-registry charset registry
3090 This function sets the `registry' property of CHARSET to REGISTRY.
3093 File: lispref.info, Node: Predefined Charsets, Prev: Charset Property Functions, Up: Charsets
3095 63.2.4 Predefined Charsets
3096 --------------------------
3098 The following charsets are predefined in the C code.
3100 Name Type Fi Gr Dir Registry
3101 --------------------------------------------------------------
3102 ascii 94 B 0 l2r ISO8859-1
3103 control-1 94 0 l2r ---
3104 latin-iso8859-1 94 A 1 l2r ISO8859-1
3105 latin-iso8859-2 96 B 1 l2r ISO8859-2
3106 latin-iso8859-3 96 C 1 l2r ISO8859-3
3107 latin-iso8859-4 96 D 1 l2r ISO8859-4
3108 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
3109 arabic-iso8859-6 96 G 1 r2l ISO8859-6
3110 greek-iso8859-7 96 F 1 l2r ISO8859-7
3111 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
3112 latin-iso8859-9 96 M 1 l2r ISO8859-9
3113 thai-tis620 96 T 1 l2r TIS620
3114 katakana-jisx0201 94 I 1 l2r JISX0201.1976
3115 latin-jisx0201 94 J 0 l2r JISX0201.1976
3116 japanese-jisx0208-1978 94x94 @ 0 l2r JISX0208.1978
3117 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)
3118 japanese-jisx0212 94x94 D 0 l2r JISX0212
3119 chinese-gb2312 94x94 A 0 l2r GB2312
3120 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
3121 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
3122 chinese-big5-1 94x94 0 0 l2r Big5
3123 chinese-big5-2 94x94 1 0 l2r Big5
3124 korean-ksc5601 94x94 C 0 l2r KSC5601
3125 composite 96x96 0 l2r ---
3127 The following charsets are predefined in the Lisp code.
3129 Name Type Fi Gr Dir Registry
3130 --------------------------------------------------------------
3131 arabic-digit 94 2 0 l2r MuleArabic-0
3132 arabic-1-column 94 3 0 r2l MuleArabic-1
3133 arabic-2-column 94 4 0 r2l MuleArabic-2
3134 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH
3135 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
3136 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
3137 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
3138 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
3139 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
3140 ethiopic 94x94 2 0 l2r Ethio
3141 ascii-r2l 94 B 0 r2l ISO8859-1
3142 ipa 96 0 1 l2r MuleIPA
3143 vietnamese-viscii-lower 96 1 1 l2r VISCII1.1
3144 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1
3146 For all of the above charsets, the dimension and number of columns
3149 Note that ASCII, Control-1, and Composite are handled specially.
3150 This is why some of the fields are blank; and some of the filled-in
3151 fields (e.g. the type) are not really accurate.
3154 File: lispref.info, Node: MULE Characters, Next: Composite Characters, Prev: Charsets, Up: MULE
3156 63.3 MULE Characters
3157 ====================
3159 -- Function: make-char charset arg1 &optional arg2
3160 This function makes a multi-byte character from CHARSET and octets
3163 -- Function: char-charset character
3164 This function returns the character set of char CHARACTER.
3166 -- Function: char-octet character &optional n
3167 This function returns the octet (i.e. position code) numbered N
3168 (should be 0 or 1) of char CHARACTER. N defaults to 0 if omitted.
3170 -- Function: find-charset-region start end &optional buffer
3171 This function returns a list of the charsets in the region between
3172 START and END. BUFFER defaults to the current buffer if omitted.
3174 -- Function: find-charset-string string
3175 This function returns a list of the charsets in STRING.
3178 File: lispref.info, Node: Composite Characters, Next: Coding Systems, Prev: MULE Characters, Up: MULE
3180 63.4 Composite Characters
3181 =========================
3183 Composite characters are not yet completely implemented.
3185 -- Function: make-composite-char string
3186 This function converts a string into a single composite character.
3187 The character is the result of overstriking all the characters in
3190 -- Function: composite-char-string character
3191 This function returns a string of the characters comprising a
3192 composite character.
3194 -- Function: compose-region start end &optional buffer
3195 This function composes the characters in the region from START to
3196 END in BUFFER into one composite character. The composite
3197 character replaces the composed characters. BUFFER defaults to
3198 the current buffer if omitted.
3200 -- Function: decompose-region start end &optional buffer
3201 This function decomposes any composite characters in the region
3202 from START to END in BUFFER. This converts each composite
3203 character into one or more characters, the individual characters
3204 out of which the composite character was formed. Non-composite
3205 characters are left as-is. BUFFER defaults to the current buffer
3209 File: lispref.info, Node: Coding Systems, Next: CCL, Prev: Composite Characters, Up: MULE
3214 A coding system is an object that defines how text containing multiple
3215 character sets is encoded into a stream of (typically 8-bit) bytes. The
3216 coding system is used to decode the stream into a series of characters
3217 (which may be from multiple charsets) when the text is read from a file
3218 or process, and is used to encode the text back into the same format
3219 when it is written out to a file or process.
3221 For example, many ISO-2022-compliant coding systems (such as Compound
3222 Text, which is used for inter-client data under the X Window System) use
3223 escape sequences to switch between different charsets - Japanese Kanji,
3224 for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC (
3225 B'; and Cyrillic is invoked with `ESC - L'. See `make-coding-system'
3226 for more information.
3228 Coding systems are normally identified using a symbol, and the
3229 symbol is accepted in place of the actual coding system object whenever
3230 a coding system is called for. (This is similar to how faces and
3233 -- Function: coding-system-p object
3234 This function returns non-`nil' if OBJECT is a coding system.
3238 * Coding System Types:: Classifying coding systems.
3239 * ISO 2022:: An international standard for
3240 charsets and encodings.
3241 * EOL Conversion:: Dealing with different ways of denoting
3243 * Coding System Properties:: Properties of a coding system.
3244 * Basic Coding System Functions:: Working with coding systems.
3245 * Coding System Property Functions:: Retrieving a coding system's properties.
3246 * Encoding and Decoding Text:: Encoding and decoding text.
3247 * Detection of Textual Encoding:: Determining how text is encoded.
3248 * Big5 and Shift-JIS Functions:: Special functions for these non-standard
3250 * Predefined Coding Systems:: Coding systems implemented by MULE.
3253 File: lispref.info, Node: Coding System Types, Next: ISO 2022, Up: Coding Systems
3255 63.5.1 Coding System Types
3256 --------------------------
3258 The coding system type determines the basic algorithm XEmacs will use to
3259 decode or encode a data stream. Character encodings will be converted
3260 to the MULE encoding, escape sequences processed, and newline sequences
3261 converted to XEmacs's internal representation. There are three basic
3262 classes of coding system type: no-conversion, ISO-2022, and special.
3264 No conversion allows you to look at the file's internal
3265 representation. Since XEmacs is basically a text editor, "no
3266 conversion" does convert newline conventions by default. (Use the
3267 'binary coding-system if this is not desired.)
3269 ISO 2022 (*note ISO 2022::) is the basic international standard
3270 regulating use of "coded character sets for the exchange of data", ie,
3271 text streams. ISO 2022 contains functions that make it possible to
3272 encode text streams to comply with restrictions of the Internet mail
3273 system and de facto restrictions of most file systems (eg, use of the
3274 separator character in file names). Coding systems which are not ISO
3275 2022 conformant can be difficult to handle. Perhaps more important,
3276 they are not adaptable to multilingual information interchange, with
3277 the obvious exception of ISO 10646 (Unicode). (Unicode is partially
3278 supported by XEmacs with the addition of the Lisp package ucs-conv.)
3280 The special class of coding systems includes automatic detection,
3281 CCL (a "little language" embedded as an interpreter, useful for
3282 translating between variants of a single character set),
3283 non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5,
3284 and MULE internal coding. (NB: this list is based on XEmacs 21.2.
3285 Terminology may vary slightly for other versions of XEmacs and for GNU
3289 No conversion, for binary files, and a few special cases of
3290 non-ISO-2022 coding systems where conversion is done by hook
3291 functions (usually implemented in CCL). On output, graphic
3292 characters that are not in ASCII or Latin-1 will be replaced by a
3293 `?'. (For a no-conversion-encoded buffer, these characters will
3294 only be present if you explicitly insert them.)
3297 Any ISO-2022-compliant encoding. Among others, this includes JIS
3298 (the Japanese encoding commonly used for e-mail), national
3299 variants of EUC (the standard Unix encoding for Japanese and other
3300 languages), and Compound Text (an encoding used in X11). You can
3301 specify more specific information about the conversion with the
3305 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of
3309 ISO 10646 UTF-8 encoding. A "file system safe" transformation
3310 format that can be used with both UCS-4 and Unicode.
3313 Automatic conversion. XEmacs attempts to detect the coding system
3317 Shift-JIS (a Japanese encoding commonly used in PC operating
3321 Big5 (the encoding commonly used for Taiwanese).
3324 The conversion is performed using a user-written pseudo-code
3325 program. CCL (Code Conversion Language) is the name of this
3326 pseudo-code. For example, CCL is used to map KOI8-R characters
3327 (an encoding for Russian Cyrillic) to ISO8859-5 (the form used
3328 internally by MULE).
3331 Write out or read in the raw contents of the memory representing
3332 the buffer's text. This is primarily useful for debugging
3333 purposes, and is only enabled when XEmacs has been compiled with
3334 `DEBUG_XEMACS' set (the `--debug' configure option). *Warning*:
3335 Reading in a file using `internal' conversion can result in an
3336 internal inconsistency in the memory representing a buffer's text,
3337 which will produce unpredictable results and may cause XEmacs to
3338 crash. Under normal circumstances you should never use `internal'
3342 File: lispref.info, Node: ISO 2022, Next: EOL Conversion, Prev: Coding System Types, Up: Coding Systems
3347 This section briefly describes the ISO 2022 encoding standard. A more
3348 thorough treatment is available in the original document of ISO 2022 as
3349 well as various national standards (such as JIS X 0202).
3351 Character sets ("charsets") are classified into the following four
3352 categories, according to the number of characters in the charset:
3353 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means
3354 that although an ISO 2022 coding system may have variable width
3355 characters, each charset used is fixed-width (in contrast to the MULE
3356 character set and UTF-8, for example).
3358 ISO 2022 provides for switching between character sets via escape
3359 sequences. This switching is somewhat complicated, because ISO 2022
3360 provides for both legacy applications like Internet mail that accept
3361 only 7 significant bits in some contexts (RFC 822 headers, for example),
3362 and more modern "8-bit clean" applications. It also provides for
3363 compact and transparent representation of languages like Japanese which
3364 mix ASCII and a national script (even outside of computer programs).
3366 First, ISO 2022 codified prevailing practice by dividing the code
3367 space into "control" and "graphic" regions. The code points 0x00-0x1F
3368 and 0x80-0x9F are reserved for "control characters", while "graphic
3369 characters" must be assigned to code points in the regions 0x20-0x7F and
3370 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some
3371 circumstances must be assigned the graphic character "ASCII SPACE" and
3372 the control character "ASCII DEL" respectively.
3374 The various regions are given the name C0 (0x00-0x1F), GL
3375 (0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for
3376 "graphic left" and "graphic right", respectively, because of the
3377 standard method of displaying graphic character sets in tables with the
3378 high byte indexing columns and the low byte indexing rows. I don't
3379 find it very intuitive, but these are called "registers".
3381 An ISO 2022-conformant encoding for a graphic character set must use
3382 a fixed number of bytes per character, and the values must fit into a
3383 single register; that is, each byte must range over either 0x20-0x7F, or
3384 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a
3385 character set by using both ranges at the same. This is why a standard
3386 character set such as ISO 8859-1 is actually considered by ISO 2022 to
3387 be an aggregation of two character sets, ASCII and LATIN-1, and why it
3388 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a
3389 single character's bytes must all be drawn from the same register; this
3390 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
3391 2022-compatible encodings.
3393 The reason for this restriction becomes clear when you attempt to
3394 define an efficient, robust encoding for a language like Japanese.
3395 Like ISO 8859, Japanese encodings are aggregations of several character
3396 sets. In practice, the vast majority of characters are drawn from the
3397 "JIS Roman" character set (a derivative of ASCII; it won't hurt to
3398 think of it as ASCII) and the JIS X 0208 standard "basic Japanese"
3399 character set including not only ideographic characters ("kanji") but
3400 syllabic Japanese characters ("kana"), a wide variety of symbols, and
3401 many alphabetic characters (Roman, Greek, and Cyrillic) as well.
3402 Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code
3403 it is not suited to programming; thus the inclusion of ASCII in the
3404 standard Japanese encodings.
3406 For normal Japanese text such as in newspapers, a broad repertoire of
3407 approximately 3000 characters is used. Evidently this won't fit into
3408 one byte; two must be used. But much of the text processed by Japanese
3409 computers is computer source code, nearly all of which is ASCII. A not
3410 insignificant portion of ordinary text is English (as such or as
3411 borrowed Japanese vocabulary) or other languages which can represented
3412 at least approximately in ASCII, as well. It seems reasonable then to
3413 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly
3414 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is
3415 invoked to the GL register, and JIS X 0208 is invoked to the GR
3416 register. Thus, each byte can be tested for its character set by
3417 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
3418 Furthermore, since control characters like newline can never be part of
3419 a graphic character, even in the case of corruption in transmission the
3420 stream will be resynchronized at every line break, on the order of 60-80
3421 bytes. This coding system requires no escape sequences or special
3422 control codes to represent 99.9% of all Japanese text.
3424 Note carefully the distinction between the character sets (ASCII and
3425 JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022).
3426 The JIS X 0208 character set is used in three different encodings for
3427 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
3428 always clear), in EUC-JP it is invoked into GR (setting the high bit in
3429 the process), and in Shift JIS the high bit may be set or reset, and the
3430 significant bits are shifted within the 16-bit character so that the two
3431 main character sets can coexist with a third (the "halfwidth katakana"
3432 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a
3433 version of the ISO-2022 coding system.
3435 In order to systematically treat subsidiary character sets (like the
3436 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
3437 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
3438 Unlike GL and GR, they are not logically distinguished by internal
3439 format. Instead, the process of "invocation" mentioned earlier is
3440 broken into two steps: first, a character set is "designated" to one of
3441 the registers G0-G3 by use of an "escape sequence" of the form:
3445 where I is an intermediate character or characters in the range 0x20
3446 - 0x3F, and F, from the range 0x30-0x7Fm is the final character
3447 identifying this charset. (Final characters in the range 0x30-0x3F are
3448 reserved for private use and will never have a publicly registered
3451 Then that register is "invoked" to either GL or GR, either
3452 automatically (designations to G0 normally involve invocation to GL as
3453 well), or by use of shifting (affecting only the following character in
3454 the data stream) or locking (effective until the next designation or
3455 locking) control sequences. An encoding conformant to ISO 2022 is
3456 typically defined by designating the initial contents of the G0-G3
3457 registers, specifying a 7 or 8 bit environment, and specifying whether
3458 further designations will be recognized.
3460 Some examples of character sets and the registered final characters
3461 F used to designate them:
3464 ASCII (B), left (J) and right (I) half of JIS X 0201, ...
3467 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
3470 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
3475 The meanings of the various characters in these sequences, where not
3476 specified by the ISO 2022 standard (such as the ESC character), are
3477 assigned by "ECMA", the European Computer Manufacturers Association.
3479 The meaning of intermediate characters are:
3481 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
3482 ( [0x28]: designate to G0 a 94-charset whose final byte is F.
3483 ) [0x29]: designate to G1 a 94-charset whose final byte is F.
3484 * [0x2A]: designate to G2 a 94-charset whose final byte is F.
3485 + [0x2B]: designate to G3 a 94-charset whose final byte is F.
3486 , [0x2C]: designate to G0 a 96-charset whose final byte is F.
3487 - [0x2D]: designate to G1 a 96-charset whose final byte is F.
3488 . [0x2E]: designate to G2 a 96-charset whose final byte is F.
3489 / [0x2F]: designate to G3 a 96-charset whose final byte is F.
3491 The comma may be used in files read and written only by MULE, as a
3492 MULE extension, but this is illegal in ISO 2022. (The reason is that
3493 in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned
3494 the value SPACE, and 0x7F assigned the value DEL.)
3496 Here are examples of designations:
3498 ESC ( B : designate to G0 ASCII
3499 ESC - A : designate to G1 Latin-1
3500 ESC $ ( A or ESC $ A : designate to G0 GB2312
3501 ESC $ ( B or ESC $ B : designate to G0 JISX0208
3502 ESC $ ) C : designate to G1 KSC5601
3504 (The short forms used to designate GB2312 and JIS X 0208 are for
3505 backwards compatibility; the long forms are preferred.)
3507 To use a charset designated to G2 or G3, and to use a charset
3508 designated to G1 in a 7-bit environment, you must explicitly invoke G1,
3509 G2, or G3 into GL. There are two types of invocation, Locking Shift
3510 (forever) and Single Shift (one character only).
3512 Locking Shift is done as follows:
3514 LS0 or SI (0x0F): invoke G0 into GL
3515 LS1 or SO (0x0E): invoke G1 into GL
3516 LS2: invoke G2 into GL
3517 LS3: invoke G3 into GL
3518 LS1R: invoke G1 into GR
3519 LS2R: invoke G2 into GR
3520 LS3R: invoke G3 into GR
3522 Single Shift is done as follows:
3524 SS2 or ESC N: invoke G2 into GL
3525 SS3 or ESC O: invoke G3 into GL
3527 The shift functions (such as LS1R and SS3) are represented by control
3528 characters (from C1) in 8 bit environments and by escape sequences in 7
3531 (#### Ben says: I think the above is slightly incorrect. It appears
3532 that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
3533 and ESC O behave as indicated. The above definitions will not parse
3534 EUC-encoded text correctly, and it looks like the code in mule-coding.c
3535 has similar problems.)
3537 Evidently there are a lot of ISO-2022-compliant ways of encoding
3538 multilingual text. Now, in the world, there exist many coding systems
3539 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
3540 (Extended UNIX Code); all of these are variants of ISO 2022.
3542 In MULE, we characterize a version of ISO 2022 by the following
3545 1. The character sets initially designated to G0 thru G3.
3547 2. Whether short form designations are allowed for Japanese and
3550 3. Whether ASCII should be designated to G0 before control characters.
3552 4. Whether ASCII should be designated to G0 at the end of line.
3554 5. 7-bit environment or 8-bit environment.
3556 6. Whether Locking Shifts are used or not.
3558 7. Whether to use ASCII or the variant JIS X 0201-1976-Roman.
3560 8. Whether to use JIS X 0208-1983 or the older version JIS X
3563 (The last two are only for Japanese.)
3565 By specifying these attributes, you can create any variant of ISO
3568 Here are several examples:
3570 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
3571 1. G0 <- ASCII, G1..3 <- never used
3575 5. 7-bit environment
3578 8. Use JIS X 0208-1983
3580 ctext -- X11 Compound Text
3581 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
3585 5. 8-bit environment.
3588 8. Use JIS X 0208-1983.
3590 euc-china -- Chinese EUC. Often called the "GB encoding", but that is
3591 technically incorrect.
3592 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
3596 5. 8-bit environment.
3599 8. Use JIS X 0208-1983.
3601 ISO-2022-KR -- Coding system used in Korean email.
3602 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
3606 5. 7-bit environment.
3609 8. Use JIS X 0208-1983.
3611 MULE creates all of these coding systems by default.
3614 File: lispref.info, Node: EOL Conversion, Next: Coding System Properties, Prev: ISO 2022, Up: Coding Systems
3616 63.6.1 EOL Conversion
3617 ---------------------
3620 Automatically detect the end-of-line type (LF, CRLF, or CR). Also
3621 generate subsidiary coding systems named `NAME-unix', `NAME-dos',
3622 and `NAME-mac', that are identical to this coding system but have
3623 an EOL-TYPE value of `lf', `crlf', and `cr', respectively.
3626 The end of a line is marked externally using ASCII LF. Since this
3627 is also the way that XEmacs represents an end-of-line internally,
3628 specifying this option results in no end-of-line conversion. This
3629 is the standard format for Unix text files.
3632 The end of a line is marked externally using ASCII CRLF. This is
3633 the standard format for MS-DOS text files.
3636 The end of a line is marked externally using ASCII CR. This is the
3637 standard format for Macintosh text files.
3640 Automatically detect the end-of-line type but do not generate
3641 subsidiary coding systems. (This value is converted to `nil' when
3642 stored internally, and `coding-system-property' will return `nil'.)
3645 File: lispref.info, Node: Coding System Properties, Next: Basic Coding System Functions, Prev: EOL Conversion, Up: Coding Systems
3647 63.6.2 Coding System Properties
3648 -------------------------------
3651 String to be displayed in the modeline when this coding system is
3655 End-of-line conversion to be used. It should be one of the types
3656 listed in *Note EOL Conversion::.
3659 The coding system which is the same as this one, except that it
3660 uses the Unix line-breaking convention.
3663 The coding system which is the same as this one, except that it
3664 uses the DOS line-breaking convention.
3667 The coding system which is the same as this one, except that it
3668 uses the Macintosh line-breaking convention.
3670 `post-read-conversion'
3671 Function called after a file has been read in, to perform the
3672 decoding. Called with two arguments, START and END, denoting a
3673 region of the current buffer to be decoded.
3675 `pre-write-conversion'
3676 Function called before a file is written out, to perform the
3677 encoding. Called with two arguments, START and END, denoting a
3678 region of the current buffer to be encoded.
3680 The following additional properties are recognized if TYPE is
3687 The character set initially designated to the G0 - G3 registers.
3688 The value should be one of
3690 * A charset object (designate that character set)
3692 * `nil' (do not ever use this register)
3694 * `t' (no character set is initially designated to the
3695 register, but may be later on; this automatically sets the
3696 corresponding `force-g*-on-output' property)
3698 `force-g0-on-output'
3699 `force-g1-on-output'
3700 `force-g2-on-output'
3701 `force-g3-on-output'
3702 If non-`nil', send an explicit designation sequence on output
3703 before using the specified register.
3706 If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
3707 B' on output in place of the full designation sequences `ESC $ (
3708 @', `ESC $ ( A', and `ESC $ ( B'.
3711 If non-`nil', don't designate ASCII to G0 at each end of line on
3712 output. Setting this to non-`nil' also suppresses other
3713 state-resetting that normally happens at the end of a line.
3716 If non-`nil', don't designate ASCII to G0 before control chars on
3720 If non-`nil', use 7-bit environment on output. Otherwise, use
3724 If non-`nil', use locking-shift (SO/SI) instead of single-shift or
3725 designation by escape sequence.
3728 If non-`nil', don't use ISO6429's direction specification.
3731 If non-`nil', literal control characters that are the same as the
3732 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
3733 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
3734 (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
3735 that they can be properly distinguished from an escape sequence.
3736 (Note that doing this results in a non-portable encoding.) This
3737 encoding flag is used for byte-compiled files. Note that ESC is a
3738 good choice for a quoting character because there are no escape
3739 sequences whose second byte is a character from the Control-0 or
3740 Control-1 character sets; this is explicitly disallowed by the ISO
3743 `input-charset-conversion'
3744 A list of conversion specifications, specifying conversion of
3745 characters in one charset to another when decoding is performed.
3746 Each specification is a list of two elements: the source charset,
3747 and the destination charset.
3749 `output-charset-conversion'
3750 A list of conversion specifications, specifying conversion of
3751 characters in one charset to another when encoding is performed.
3752 The form of each specification is the same as for
3753 `input-charset-conversion'.
3755 The following additional properties are recognized (and required) if
3759 CCL program used for decoding (converting to internal format).
3762 CCL program used for encoding (converting to external format).
3764 The following properties are used internally: EOL-CR, EOL-CRLF,
3768 File: lispref.info, Node: Basic Coding System Functions, Next: Coding System Property Functions, Prev: Coding System Properties, Up: Coding Systems
3770 63.6.3 Basic Coding System Functions
3771 ------------------------------------
3773 -- Function: find-coding-system coding-system-or-name
3774 This function retrieves the coding system of the given name.
3776 If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
3777 returned. Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
3778 If there is no such coding system, `nil' is returned. Otherwise
3779 the associated coding system object is returned.
3781 -- Function: get-coding-system name
3782 This function retrieves the coding system of the given name. Same
3783 as `find-coding-system' except an error is signalled if there is no
3784 such coding system instead of returning `nil'.
3786 -- Function: coding-system-list
3787 This function returns a list of the names of all defined coding
3790 -- Function: coding-system-name coding-system
3791 This function returns the name of the given coding system.
3793 -- Function: coding-system-base coding-system
3794 Returns the base coding system (undecided EOL convention) coding
3797 -- Function: make-coding-system name type &optional doc-string props
3798 This function registers symbol NAME as a coding system.
3800 TYPE describes the conversion method used and should be one of the
3801 types listed in *Note Coding System Types::.
3803 DOC-STRING is a string describing the coding system.
3805 PROPS is a property list, describing the specific nature of the
3806 character set. Recognized properties are as in *Note Coding
3807 System Properties::.
3809 -- Function: copy-coding-system old-coding-system new-name
3810 This function copies OLD-CODING-SYSTEM to NEW-NAME. If NEW-NAME
3811 does not name an existing coding system, a new one will be created.
3813 -- Function: subsidiary-coding-system coding-system eol-type
3814 This function returns the subsidiary coding system of
3815 CODING-SYSTEM with eol type EOL-TYPE.
3818 File: lispref.info, Node: Coding System Property Functions, Next: Encoding and Decoding Text, Prev: Basic Coding System Functions, Up: Coding Systems
3820 63.6.4 Coding System Property Functions
3821 ---------------------------------------
3823 -- Function: coding-system-doc-string coding-system
3824 This function returns the doc string for CODING-SYSTEM.
3826 -- Function: coding-system-type coding-system
3827 This function returns the type of CODING-SYSTEM.
3829 -- Function: coding-system-property coding-system prop
3830 This function returns the PROP property of CODING-SYSTEM.
3833 File: lispref.info, Node: Encoding and Decoding Text, Next: Detection of Textual Encoding, Prev: Coding System Property Functions, Up: Coding Systems
3835 63.6.5 Encoding and Decoding Text
3836 ---------------------------------
3838 -- Function: decode-coding-region start end coding-system &optional
3840 This function decodes the text between START and END which is
3841 encoded in CODING-SYSTEM. This is useful if you've read in
3842 encoded text from a file without decoding it (e.g. you read in a
3843 JIS-formatted file but used the `binary' or `no-conversion' coding
3844 system, so that it shows up as `^[$B!<!+^[(B'). The length of the
3845 encoded text is returned. BUFFER defaults to the current buffer
3848 -- Function: encode-coding-region start end coding-system &optional
3850 This function encodes the text between START and END using
3851 CODING-SYSTEM. This will, for example, convert Japanese
3852 characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
3853 encoding. The length of the encoded text is returned. BUFFER
3854 defaults to the current buffer if unspecified.
3857 File: lispref.info, Node: Detection of Textual Encoding, Next: Big5 and Shift-JIS Functions, Prev: Encoding and Decoding Text, Up: Coding Systems
3859 63.6.6 Detection of Textual Encoding
3860 ------------------------------------
3862 -- Function: coding-category-list
3863 This function returns a list of all recognized coding categories.
3865 -- Function: set-coding-priority-list list
3866 This function changes the priority order of the coding categories.
3867 LIST should be a list of coding categories, in descending order of
3868 priority. Unspecified coding categories will be lower in priority
3869 than all specified ones, in the same relative order they were in
3872 -- Function: coding-priority-list
3873 This function returns a list of coding categories in descending
3876 -- Function: set-coding-category-system coding-category coding-system
3877 This function changes the coding system associated with a coding
3880 -- Function: coding-category-system coding-category
3881 This function returns the coding system associated with a coding
3884 -- Function: detect-coding-region start end &optional buffer
3885 This function detects coding system of the text in the region
3886 between START and END. Returned value is a list of possible coding
3887 systems ordered by priority. If only ASCII characters are found,
3888 it returns `autodetect' or one of its subsidiary coding systems
3889 according to a detected end-of-line type. Optional arg BUFFER
3890 defaults to the current buffer.
3893 File: lispref.info, Node: Big5 and Shift-JIS Functions, Next: Predefined Coding Systems, Prev: Detection of Textual Encoding, Up: Coding Systems
3895 63.6.7 Big5 and Shift-JIS Functions
3896 -----------------------------------
3898 These are special functions for working with the non-standard Shift-JIS
3901 -- Function: decode-shift-jis-char code
3902 This function decodes a JIS X 0208 character of Shift-JIS
3903 coding-system. CODE is the character code in Shift-JIS as a cons
3904 of type bytes. The corresponding character is returned.
3906 -- Function: encode-shift-jis-char character
3907 This function encodes a JIS X 0208 character CHARACTER to
3908 SHIFT-JIS coding-system. The corresponding character code in
3909 SHIFT-JIS is returned as a cons of two bytes.
3911 -- Function: decode-big5-char code
3912 This function decodes a Big5 character CODE of BIG5 coding-system.
3913 CODE is the character code in BIG5. The corresponding character
3916 -- Function: encode-big5-char character
3917 This function encodes the Big5 character CHARACTER to BIG5
3918 coding-system. The corresponding character code in Big5 is
3922 File: lispref.info, Node: Predefined Coding Systems, Prev: Big5 and Shift-JIS Functions, Up: Coding Systems
3924 63.6.8 Coding Systems Implemented
3925 ---------------------------------
3927 MULE initializes most of the commonly used coding systems at XEmacs's
3928 startup. A few others are initialized only when the relevant language
3929 environment is selected and support libraries are loaded. (NB: The
3930 following list is based on XEmacs 21.2.19, the development branch at the
3931 time of writing. The list may be somewhat different for other
3932 versions. Recent versions of GNU Emacs 20 implement a few more rare
3933 coding systems; work is being done to port these to XEmacs.)
3935 Unfortunately, there is not a consistent naming convention for
3936 character sets, and for practical purposes coding systems often take
3937 their name from their principal character sets (ASCII, KOI8-R, Shift
3938 JIS). Others take their names from the coding system (ISO-2022-JP,
3939 EUC-KR), and a few from their non-text usages (internal, binary). To
3940 provide for this, and for the fact that many coding systems have
3941 several common names, an aliasing system is provided. Finally, some
3942 effort has been made to use names that are registered as MIME charsets
3943 (this is why the name 'shift_jis contains that un-Lisp-y underscore).
3945 There is a systematic naming convention regarding end-of-line (EOL)
3946 conventions for different systems. A coding system whose name ends in
3947 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
3948 A coding system whose name ends in "-mac" forces the assumptions that
3949 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
3950 in "-dos" forces the assumptions that lines are broken by CRLF sequences
3951 (0x0D 0x0A). These subsidiary coding systems are automatically derived
3952 from a base coding system. Use of the base coding system implies
3953 autodetection of the text file convention. (The fact that the -unix,
3954 -mac, and -dos are derived from a base system results in them showing up
3955 as "aliases" in `list-coding-systems'.) These subsidiaries have a
3956 consistent modeline indicator as well. "-dos" coding systems have ":T"
3957 appended to their modeline indicator, while "-mac" coding systems have
3958 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
3960 In the following table, each coding system is given with its mode
3961 line indicator in parentheses. Non-textual coding systems are listed
3962 first, followed by textual coding systems and their aliases. (The
3963 coding system subsidiary modeline indicators ":T" and ":t" will be
3964 omitted from the table of coding systems.)
3966 ### SJT 1999-08-23 Maybe should order these by language? Definitely
3967 need language usage for the ISO-8859 family.
3969 Note that although true coding system aliases have been implemented
3970 for XEmacs 21.2, the coding system initialization has not yet been
3971 converted as of 21.2.19. So coding systems described as aliases have
3972 the same properties as the aliased coding system, but will not be equal
3975 `automatic-conversion'
3980 Modeline indicator: `Auto'. A type `undecided' coding system.
3981 Attempts to determine an appropriate coding system from file
3982 contents or the environment.
3991 `no-conversion-unix'
3992 Modeline indicator: `Raw'. A type `no-conversion' coding system,
3993 which converts only line-break-codes. An implementation quirk
3994 means that this coding system is also used for ISO8859-1.
3997 Modeline indicator: `Binary'. A type `no-conversion' coding
3998 system which does no character coding or EOL conversions. An
3999 alias for `raw-text-unix'.
4004 `alternativnyj-unix'
4005 Modeline indicator: `Cy.Alt'. A type `ccl' coding system used for
4006 Alternativnyj, an encoding of the Cyrillic alphabet.
4012 Modeline indicator: `Zh/Big5'. A type `big5' coding system used
4013 for BIG5, the most common encoding of traditional Chinese as used
4020 Modeline indicator: `Zh-GB/EUC'. A type `iso2022' coding system
4021 used for simplified Chinese (as used in the People's Republic of
4022 China), with the `ascii' (G0), `chinese-gb2312' (G1), and `sisheng'
4023 (G2) character sets initially designated. Chinese EUC (Extended
4030 Modeline indicator: `CText/Hbrw'. A type `iso2022' coding system
4031 with the `ascii' (G0) and `hebrew-iso8859-8' (G1) character sets
4032 initially designated for Hebrew.
4038 Modeline indicator: `CText'. A type `iso2022' 8-bit coding system
4039 with the `ascii' (G0) and `latin-iso8859-1' (G1) character sets
4040 initially designated. X11 Compound Text Encoding. Often
4041 mistakenly recognized instead of EUC encodings; usual cause is
4042 inappropriate setting of `coding-priority-list'.
4045 Modeline indicator: `ESC/Quot'. A type `iso2022' 8-bit coding
4046 system with the `ascii' (G0) and `latin-iso8859-1' (G1) character
4047 sets initially designated and escape quoting. Unix EOL conversion
4048 (ie, no conversion). It is used for .ELC files.
4054 Modeline indicator: `Ja/EUC'. A type `iso2022' 8-bit coding system
4055 with `ascii' (G0), `japanese-jisx0208' (G1), `katakana-jisx0201'
4056 (G2), and `japanese-jisx0212' (G3) initially designated. Japanese
4057 EUC (Extended Unix Code).
4063 Modeline indicator: `ko/EUC'. A type `iso2022' 8-bit coding system
4064 with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
4065 Korean EUC (Extended Unix Code).
4068 Modeline indicator: `Zh-GB/Hz'. A type `no-conversion' coding
4069 system with Unix EOL convention (ie, no conversion) using
4070 post-read-decode and pre-write-encode functions to translate the
4071 Hz/ZW coding system used for Chinese.
4074 `iso-2022-7bit-unix'
4078 Modeline indicator: `ISO7'. A type `iso2022' 7-bit coding system
4079 with `ascii' (G0) initially designated. Other character sets must
4080 be explicitly designated to be used.
4083 `iso-2022-7bit-ss2-dos'
4084 `iso-2022-7bit-ss2-mac'
4085 `iso-2022-7bit-ss2-unix'
4086 Modeline indicator: `ISO7/SS'. A type `iso2022' 7-bit coding
4087 system with `ascii' (G0) initially designated. Other character
4088 sets must be explicitly designated to be used. SS2 is used to
4089 invoke a 96-charset, one character at a time.
4095 Modeline indicator: `ISO8'. A type `iso2022' 8-bit coding system
4096 with `ascii' (G0) and `latin-iso8859-1' (G1) initially designated.
4097 Other character sets must be explicitly designated to be used.
4098 No single-shift or locking-shift.
4101 `iso-2022-8bit-ss2-dos'
4102 `iso-2022-8bit-ss2-mac'
4103 `iso-2022-8bit-ss2-unix'
4104 Modeline indicator: `ISO8/SS'. A type `iso2022' 8-bit coding
4105 system with `ascii' (G0) and `latin-iso8859-1' (G1) initially
4106 designated. Other character sets must be explicitly designated to
4107 be used. SS2 is used to invoke a 96-charset, one character at a
4111 `iso-2022-int-1-dos'
4112 `iso-2022-int-1-mac'
4113 `iso-2022-int-1-unix'
4114 Modeline indicator: `INT-1'. A type `iso2022' 7-bit coding system
4115 with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
4118 `iso-2022-jp-1978-irv'
4119 `iso-2022-jp-1978-irv-dos'
4120 `iso-2022-jp-1978-irv-mac'
4121 `iso-2022-jp-1978-irv-unix'
4122 Modeline indicator: `Ja-78/7bit'. A type `iso2022' 7-bit coding
4123 system. For compatibility with old Japanese terminals; if you
4124 need to know, look at the source.
4127 `iso-2022-jp-2 (ISO7/SS)'
4133 `iso-2022-jp-2-unix'
4134 Modeline indicator: `MULE/7bit'. A type `iso2022' 7-bit coding
4135 system with `ascii' (G0) initially designated, and complex
4136 specifications to insure backward compatibility with old Japanese
4137 systems. Used for communication with mail and news in Japan. The
4138 "-2" versions also use SS2 to invoke a 96-charset one character at
4142 Modeline indicator: `Ko/7bit' A type `iso2022' 7-bit coding
4143 system with `ascii' (G0) and `korean-ksc5601' (G1) initially
4144 designated. Used for e-mail in Korea.
4149 `iso-2022-lock-unix'
4150 Modeline indicator: `ISO7/Lock'. A type `iso2022' 7-bit coding
4151 system with `ascii' (G0) initially designated, using Locking-Shift
4152 to invoke a 96-charset.
4158 Due to implementation, this is not a type `iso2022' coding system,
4159 but rather an alias for the `raw-text' coding system.
4165 Modeline indicator: `MIME/Ltn-2'. A type `iso2022' coding system
4166 with `ascii' (G0) and `latin-iso8859-2' (G1) initially invoked.
4172 Modeline indicator: `MIME/Ltn-3'. A type `iso2022' coding system
4173 with `ascii' (G0) and `latin-iso8859-3' (G1) initially invoked.
4179 Modeline indicator: `MIME/Ltn-4'. A type `iso2022' coding system
4180 with `ascii' (G0) and `latin-iso8859-4' (G1) initially invoked.
4186 Modeline indicator: `ISO8/Cyr'. A type `iso2022' coding system
4187 with `ascii' (G0) and `cyrillic-iso8859-5' (G1) initially invoked.
4193 Modeline indicator: `Grk'. A type `iso2022' coding system with
4194 `ascii' (G0) and `greek-iso8859-7' (G1) initially invoked.
4200 Modeline indicator: `MIME/Hbrw'. A type `iso2022' coding system
4201 with `ascii' (G0) and `hebrew-iso8859-8' (G1) initially invoked.
4207 Modeline indicator: `MIME/Ltn-5'. A type `iso2022' coding system
4208 with `ascii' (G0) and `latin-iso8859-9' (G1) initially invoked.
4214 Modeline indicator: `KOI8'. A type `ccl' coding-system used for
4215 KOI8-R, an encoding of the Cyrillic alphabet.
4221 Modeline indicator: `Ja/SJIS'. A type `shift-jis' coding-system
4222 implementing the Shift-JIS encoding for Japanese. The underscore
4223 is to conform to the MIME charset implementing this encoding.
4229 Modeline indicator: `TIS620'. A type `ccl' encoding for Thai. The
4230 external encoding is defined by TIS620, the internal encoding is
4231 peculiar to MULE, and called `thai-xtis'.
4234 Modeline indicator: `VIQR'. A type `no-conversion' coding system
4235 with Unix EOL convention (ie, no conversion) using
4236 post-read-decode and pre-write-encode functions to translate the
4237 VIQR coding system for Vietnamese.
4243 Modeline indicator: `VISCII'. A type `ccl' coding-system used for
4244 VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
4245 given priority by XEmacs.
4251 Modeline indicator: `VSCII'. A type `ccl' coding-system used for
4252 VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
4253 given priority by XEmacs. Use `(prefer-coding-system
4254 'vietnamese-vscii)' to give priority to VSCII.
4258 File: lispref.info, Node: CCL, Next: Category Tables, Prev: Coding Systems, Up: MULE
4263 CCL (Code Conversion Language) is a simple structured programming
4264 language designed for character coding conversions. A CCL program is
4265 compiled to CCL code (represented by a vector of integers) and executed
4266 by the CCL interpreter embedded in Emacs. The CCL interpreter
4267 implements a virtual machine with 8 registers called `r0', ..., `r7', a
4268 number of control structures, and some I/O operators. Take care when
4269 using registers `r0' (used in implicit "set" statements) and especially
4270 `r7' (used internally by several statements and operations, especially
4271 for multiple return values and I/O operations).
4273 CCL is used for code conversion during process I/O and file I/O for
4274 non-ISO2022 coding systems. (It is the only way for a user to specify a
4275 code conversion function.) It is also used for calculating the code
4276 point of an X11 font from a character code. However, since CCL is
4277 designed as a powerful programming language, it can be used for more
4278 generic calculation where efficiency is demanded. A combination of
4279 three or more arithmetic operations can be calculated faster by CCL than
4282 *Warning:* The code in `src/mule-ccl.c' and
4283 `$packages/lisp/mule-base/mule-ccl.el' is the definitive description of
4284 CCL's semantics. The previous version of this section contained
4285 several typos and obsolete names left from earlier versions of MULE,
4286 and many may remain. (I am not an experienced CCL programmer; the few
4287 who know CCL well find writing English painful.)
4289 A CCL program transforms an input data stream into an output data
4290 stream. The input stream, held in a buffer of constant bytes, is left
4291 unchanged. The buffer may be filled by an external input operation,
4292 taken from an Emacs buffer, or taken from a Lisp string. The output
4293 buffer is a dynamic array of bytes, which can be written by an external
4294 output operation, inserted into an Emacs buffer, or returned as a Lisp
4297 A CCL program is a (Lisp) list containing two or three members. The
4298 first member is the "buffer magnification", which indicates the
4299 required minimum size of the output buffer as a multiple of the input
4300 buffer. It is followed by the "main block" which executes while there
4301 is input remaining, and an optional "EOF block" which is executed when
4302 the input is exhausted. Both the main block and the EOF block are CCL
4305 A "CCL block" is either a CCL statement or list of CCL statements.
4306 A "CCL statement" is either a "set statement" (either an integer or an
4307 "assignment", which is a list of a register to receive the assignment,
4308 an assignment operator, and an expression) or a "control statement" (a
4309 list starting with a keyword, whose allowable syntax depends on the
4314 * CCL Syntax:: CCL program syntax in BNF notation.
4315 * CCL Statements:: Semantics of CCL statements.
4316 * CCL Expressions:: Operators and expressions in CCL.
4317 * Calling CCL:: Running CCL programs.
4318 * CCL Example:: A trivial program to transform the Web's URL encoding.
4321 File: lispref.info, Node: CCL Syntax, Next: CCL Statements, Up: CCL
4326 The full syntax of a CCL program in BNF notation:
4329 (BUFFER_MAGNIFICATION
4333 BUFFER_MAGNIFICATION := integer
4334 CCL_MAIN_BLOCK := CCL_BLOCK
4335 CCL_EOF_BLOCK := CCL_BLOCK
4338 STATEMENT | (STATEMENT [STATEMENT ...])
4340 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
4345 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
4348 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
4350 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
4351 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
4352 LOOP := (loop STATEMENT [STATEMENT ...])
4356 | (write-repeat [REG | integer | string])
4357 | (write-read-repeat REG [integer | ARRAY])
4360 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
4361 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
4364 | (write EXPRESSION)
4365 | (write integer) | (write string) | (write REG ARRAY)
4367 CALL := (call ccl-program-name)
4370 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
4371 ARG := REG | integer
4373 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
4374 | < | > | == | <= | >= | != | de-sjis | en-sjis
4375 ASSIGNMENT_OPERATOR :=
4376 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
4377 ARRAY := '[' integer ... ']'
4380 File: lispref.info, Node: CCL Statements, Next: CCL Expressions, Prev: CCL Syntax, Up: CCL
4382 63.7.2 CCL Statements
4383 ---------------------
4385 The Emacs Code Conversion Language provides the following statement
4386 types: "set", "if", "branch", "loop", "repeat", "break", "read",
4387 "write", "call", and "end".
4392 The "set" statement has three variants with the syntaxes `(REG =
4393 EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'.
4394 The assignment operator variation of the "set" statement works the same
4395 way as the corresponding C expression statement does. The assignment
4396 operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=',
4397 and `>>=', and they have the same meanings as in C. A "naked integer"
4398 INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'.
4403 The "read" statement takes one or more registers as arguments. It
4404 reads one byte (a C char) from the input into each register in turn.
4406 The "write" takes several forms. In the form `(write REG ...)' it
4407 takes one or more registers as arguments and writes each in turn to the
4408 output. The integer in a register (interpreted as an Emchar) is
4409 encoded to multibyte form (ie, Bufbytes) and written to the current
4410 output buffer. If it is less than 256, it is written as is. The forms
4411 `(write EXPRESSION)' and `(write INTEGER)' are treated analogously.
4412 The form `(write STRING)' writes the constant string to the output. A
4413 "naked string" `STRING' is equivalent to the statement `(write
4414 STRING)'. The form `(write REG ARRAY)' writes the REGth element of the
4415 ARRAY to the output.
4417 Conditional statements:
4418 =======================
4420 The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional
4421 SECOND CCL BLOCK as arguments. If the EXPRESSION evaluates to
4422 non-zero, the first CCL BLOCK is executed. Otherwise, if there is a
4423 SECOND CCL BLOCK, it is executed.
4425 The "read-if" variant of the "if" statement takes an EXPRESSION, a
4426 CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The
4427 EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is
4428 a register or an integer). The `read-if' statement first reads from
4429 the input into the first register operand in the EXPRESSION, then
4430 conditionally executes a CCL block just as the `if' statement does.
4432 The "branch" statement takes an EXPRESSION and one or more CCL
4433 blocks as arguments. The CCL blocks are treated as a zero-indexed
4434 array, and the `branch' statement uses the EXPRESSION as the index of
4435 the CCL block to execute. Null CCL blocks may be used as no-ops,
4436 continuing execution with the statement following the `branch'
4437 statement in the containing CCL block. Out-of-range values for the
4438 EXPRESSION are also treated as no-ops.
4440 The "read-branch" variant of the "branch" statement takes an
4441 REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.
4442 The `read-branch' statement first reads from the input into the
4443 REGISTER, then conditionally executes a CCL block just as the `branch'
4446 Loop control statements:
4447 ========================
4449 The "loop" statement creates a block with an implied jump from the end
4450 of the block back to its head. The loop is exited on a `break'
4451 statement, and continued without executing the tail by a `repeat'
4454 The "break" statement, written `(break)', terminates the current
4455 loop and continues with the next statement in the current block.
4457 The "repeat" statement has three variants, `repeat', `write-repeat',
4458 and `write-read-repeat'. Each continues the current loop from its
4459 head, possibly after performing I/O. `repeat' takes no arguments and
4460 does no I/O before jumping. `write-repeat' takes a single argument (a
4461 register, an integer, or a string), writes it to the output, then jumps.
4462 `write-read-repeat' takes one or two arguments. The first must be a
4463 register. The second may be an integer or an array; if absent, it is
4464 implicitly set to the first (register) argument. `write-read-repeat'
4465 writes its second argument to the output, then reads from the input
4466 into the register, and finally jumps. See the `write' and `read'
4467 statements for the semantics of the I/O operations for each type of
4470 Other control statements:
4471 =========================
4473 The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a CCL
4474 program as a subroutine. It does not return a value to the caller, but
4475 can modify the register status.
4477 The "end" statement, written `(end)', terminates the CCL program
4478 successfully, and returns to caller (which may be a CCL program). It
4479 does not alter the status of the registers.
4482 File: lispref.info, Node: CCL Expressions, Next: Calling CCL, Prev: CCL Statements, Up: CCL
4484 63.7.3 CCL Expressions
4485 ----------------------
4487 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
4488 consist of a single OPERAND, either a register (one of `r0', ..., `r0')
4489 or an integer. Complex expressions are lists of the form `( EXPRESSION
4490 OPERATOR OPERAND )'. Unlike C, assignments are not expressions.
4492 In the following table, X is the target resister for a "set". In
4493 subexpressions, this is implicitly `r7'. This means that `>8', `//',
4494 `de-sjis', and `en-sjis' cannot be used freely in subexpressions, since
4495 they return parts of their values in `r7'. Y may be an expression,
4496 register, or integer, while Z must be a register or an integer.
4498 Name Operator Code C-like Description
4499 CCL_PLUS `+' 0x00 X = Y + Z
4500 CCL_MINUS `-' 0x01 X = Y - Z
4501 CCL_MUL `*' 0x02 X = Y * Z
4502 CCL_DIV `/' 0x03 X = Y / Z
4503 CCL_MOD `%' 0x04 X = Y % Z
4504 CCL_AND `&' 0x05 X = Y & Z
4505 CCL_OR `|' 0x06 X = Y | Z
4506 CCL_XOR `^' 0x07 X = Y ^ Z
4507 CCL_LSH `<<' 0x08 X = Y << Z
4508 CCL_RSH `>>' 0x09 X = Y >> Z
4509 CCL_LSH8 `<8' 0x0A X = (Y << 8) | Z
4510 CCL_RSH8 `>8' 0x0B X = Y >> 8, r[7] = Y & 0xFF
4511 CCL_DIVMOD `//' 0x0C X = Y / Z, r[7] = Y % Z
4512 CCL_LS `<' 0x10 X = (X < Y)
4513 CCL_GT `>' 0x11 X = (X > Y)
4514 CCL_EQ `==' 0x12 X = (X == Y)
4515 CCL_LE `<=' 0x13 X = (X <= Y)
4516 CCL_GE `>=' 0x14 X = (X >= Y)
4517 CCL_NE `!=' 0x15 X = (X != Y)
4518 CCL_ENCODE_SJIS `en-sjis' 0x16 X = HIGHER_BYTE (SJIS (Y, Z))
4519 r[7] = LOWER_BYTE (SJIS (Y, Z)
4520 CCL_DECODE_SJIS `de-sjis' 0x17 X = HIGHER_BYTE (DE-SJIS (Y, Z))
4521 r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
4523 The CCL operators are as in C, with the addition of CCL_LSH8,
4524 CCL_RSH8, CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The
4525 CCL_ENCODE_SJIS and CCL_DECODE_SJIS treat their first and second bytes
4526 as the high and low bytes of a two-byte character code. (SJIS stands
4527 for Shift JIS, an encoding of Japanese characters used by Microsoft.
4528 CCL_ENCODE_SJIS is a complicated transformation of the Japanese
4529 standard JIS encoding to Shift JIS. CCL_DECODE_SJIS is its inverse.)
4530 It is somewhat odd to represent the SJIS operations in infix form.
4533 File: lispref.info, Node: Calling CCL, Next: CCL Example, Prev: CCL Expressions, Up: CCL
4538 CCL programs are called automatically during Emacs buffer I/O when the
4539 external representation has a coding system type of `shift-jis',
4540 `big5', or `ccl'. The program is specified by the coding system (*note
4541 Coding Systems::). You can also call CCL programs from other CCL
4542 programs, and from Lisp using these functions:
4544 -- Function: ccl-execute ccl-program status
4545 Execute CCL-PROGRAM with registers initialized by STATUS.
4546 CCL-PROGRAM is a vector of compiled CCL code created by
4547 `ccl-compile'. It is an error for the program to try to execute a
4548 CCL I/O command. STATUS must be a vector of nine values,
4549 specifying the initial value for the R0, R1 .. R7 registers and
4550 for the instruction counter IC. A `nil' value for a register
4551 initializer causes the register to be set to 0. A `nil' value for
4552 the IC initializer causes execution to start at the beginning of
4553 the program. When the program is done, STATUS is modified (by
4554 side-effect) to contain the ending values for the corresponding
4557 -- Function: ccl-execute-on-string ccl-program status string &optional
4559 Execute CCL-PROGRAM with initial STATUS on STRING. CCL-PROGRAM is
4560 a vector of compiled CCL code created by `ccl-compile'. STATUS
4561 must be a vector of nine values, specifying the initial value for
4562 the R0, R1 .. R7 registers and for the instruction counter IC. A
4563 `nil' value for a register initializer causes the register to be
4564 set to 0. A `nil' value for the IC initializer causes execution
4565 to start at the beginning of the program. An optional fourth
4566 argument CONTINUE, if non-`nil', causes the IC to remain on the
4567 unsatisfied read operation if the program terminates due to
4568 exhaustion of the input buffer. Otherwise the IC is set to the end
4569 of the program. When the program is done, STATUS is modified (by
4570 side-effect) to contain the ending values for the corresponding
4571 registers and IC. Returns the resulting string.
4573 To call a CCL program from another CCL program, it must first be
4576 -- Function: register-ccl-program name ccl-program
4577 Register NAME for CCL program CCL-PROGRAM in `ccl-program-table'.
4578 CCL-PROGRAM should be the compiled form of a CCL program, or
4579 `nil'. Return index number of the registered CCL program.
4581 Information about the processor time used by the CCL interpreter can
4582 be obtained using these functions:
4584 -- Function: ccl-elapsed-time
4585 Returns the elapsed processor time of the CCL interpreter as cons
4586 of user and system time, as floating point numbers measured in
4587 seconds. If only one overall value can be determined, the return
4588 value will be a cons of that value and 0.
4590 -- Function: ccl-reset-elapsed-time
4591 Resets the CCL interpreter's internal elapsed time registers.
4594 File: lispref.info, Node: CCL Example, Prev: Calling CCL, Up: CCL
4599 In this section, we describe the implementation of a trivial coding
4600 system to transform from the Web's URL encoding to XEmacs' internal
4601 coding. Many people will have been first exposed to URL encoding when
4602 they saw "%20" where they expected a space in a file's name on their
4603 local hard disk; this can happen when a browser saves a file from the
4604 web and doesn't encode the name, as passed from the server, properly.
4606 URL encoding itself is underspecified with regard to encodings beyond
4607 ASCII. The relevant document, RFC 1738, explicitly doesn't give any
4608 information on how to encode non-ASCII characters, and the "obvious"
4609 way--use the %xx values for the octets of the eight bit MIME character
4610 set in which the page was served--breaks when a user types a character
4611 outside that character set. Best practice for web development is to
4612 serve all pages as UTF-8 and treat incoming form data as using that
4613 coding system. (Oh, and gamble that your clients won't ever want to
4614 type anything outside Unicode. But that's not so much of a gamble with
4615 today's client operating systems.) We don't treat non-ASCII in this
4616 example, as dealing with `(read-multibyte-character ...)' and errors
4617 therewith would make it much harder to understand.
4619 Since CCL isn't a very rich language, we move much of the logic that
4620 would ordinarily be computed from operations like `(member ..)', `(and
4621 ...)' and `(or ...)' into tables, from which register values are read
4622 and written, and on which `if' statements are predicated. Much more of
4623 the implementation of this coding system is occupied with constructing
4624 these tables--in normal Emacs Lisp--than it is with actual CCL code.
4626 All the `defvar' statements we deal with in the next few sections
4627 are surrounded by a `(eval-and-compile ...)', which means that the
4628 logic which initializes these variables executes at compile time, and if
4629 XEmacs loads the compiled version of the file, these variables are
4630 initialized as constants.
4634 * Four bits to ASCII:: Two tables used for getting hex digits from ASCII.
4635 * URI Encoding constants:: Useful predefined characters.
4636 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL.
4637 * Characters to be preserved:: No transformation needed for these characters.
4638 * The program to decode to internal format:: .
4639 * The program to encode from internal format:: .
4640 * The actual coding system:: .
4643 File: lispref.info, Node: Four bits to ASCII, Next: URI Encoding constants, Up: CCL Example
4645 63.7.5.1 Four bits to ASCII
4646 ...........................
4648 The first `defvar' is for `url-coding-high-order-nybble-as-ascii', a
4649 256-entry table that maps from an octet's value to the ASCII encoding
4650 for the hex value of its most significant four bits. That might sound
4651 complex, but it isn't; for decimal 65, hex value `#x41', the entry in
4652 the table is the ASCII encoding of `4'. For decimal 122, ASCII `z',
4653 hex value `#x7a', `(elt url-coding-high-order-nybble-as-ascii #x7a)'
4654 after this file is loaded gives the ASCII encoding of 7.
4656 (defvar url-coding-high-order-nybble-as-ascii
4657 (let ((val (make-vector 256 0))
4659 (while (< i (length val))
4660 (aset val i (char-to-int (aref (format "%02X" i) 0)))
4663 "Table to find an ASCII version of an octet's most significant 4 bits.")
4665 The next table, `url-coding-low-order-nybble-as-ascii' is almost the
4666 same thing, but this time it has a map for the hex encoding of the
4667 low-order four bits. So the sixty-fifth entry (offset `#x41') is the
4668 ASCII encoding of `1', the hundred-and-twenty-second (offset `#x7a') is
4669 the ASCII encoding of `A'.
4671 (defvar url-coding-low-order-nybble-as-ascii
4672 (let ((val (make-vector 256 0))
4674 (while (< i (length val))
4675 (aset val i (char-to-int (aref (format "%02X" i) 1)))
4678 "Table to find an ASCII version of an octet's least significant 4 bits.")
4681 File: lispref.info, Node: URI Encoding constants, Next: Numeric to ASCII-hexadecimal conversion, Prev: Four bits to ASCII, Up: CCL Example
4683 63.7.5.2 URI Encoding constants
4684 ...............................
4686 Next, we have a couple of variables that make the CCL code more
4687 readable. The first is the ASCII encoding of the percentage sign; this
4688 character is used as an escape code, to start the encoding of a
4689 non-printable character. For historical reasons, URL encoding allows
4690 the space character to be encoded as a plus sign-it does make typing
4691 URLs like `http://google.com/search?q=XEmacs+home+page' easier-and as
4692 such, we have to check when decoding for this value, and map it to the
4693 space character. When doing this in CCL, we use the
4694 `url-coding-escaped-space-code' variable.
4696 (defvar url-coding-escape-character-code (char-to-int ?%)
4697 "The code point for the percentage sign, in ASCII.")
4699 (defvar url-coding-escaped-space-code (char-to-int ?+)
4700 "The URL-encoded value of the space character, that is, +.")
4703 File: lispref.info, Node: Numeric to ASCII-hexadecimal conversion, Next: Characters to be preserved, Prev: URI Encoding constants, Up: CCL Example
4705 63.7.5.3 Numeric to ASCII-hexadecimal conversion
4706 ................................................
4708 Now, we have a couple of utility tables that wouldn't be necessary in a
4709 more expressive programming language than is CCL. The first is sixteen
4710 in length, and maps a hexadecimal number to the ASCII encoding of that
4711 number; so zero maps to ASCII `0', ten maps to ASCII `A.' The second
4712 does the reverse; that is, it maps an ASCII character to its value when
4713 interpreted as a hexadecimal digit. ('A' => 10, 'c' => 12, '2' => 2, as
4716 (defvar url-coding-hex-digit-table
4718 (val (make-vector 16 0)))
4720 (aset val i (char-to-int (aref (format "%X" i) 0)))
4723 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.")
4725 (defvar url-coding-latin-1-as-hex-table
4726 (let ((val (make-vector 256 0))
4728 (while (< i (length val))
4729 ;; Get a hex val for this ASCII character.
4730 (aset val i (string-to-int (format "%c" i) 16))
4733 "A map from Latin 1 code points to their values as hexadecimal digits.")
4736 File: lispref.info, Node: Characters to be preserved, Next: The program to decode to internal format, Prev: Numeric to ASCII-hexadecimal conversion, Up: CCL Example
4738 63.7.5.4 Characters to be preserved
4739 ...................................
4741 And finally, the last of these tables. URL encoding says that
4742 alphanumeric characters, the underscore, hyphen and the full stop (1)
4743 retain their ASCII encoding, and don't undergo transformation.
4744 `url-coding-should-preserve-table' is an array in which the entries are
4745 one if the corresponding ASCII character should be left as-is, and zero
4746 if they should be transformed. So the entries for all the control and
4747 most of the punctuation charcters are zero. Lisp programmers will
4748 observe that this initialization is particularly inefficient, but
4749 they'll also be aware that this is a long way from an inner loop where
4750 every nanosecond counts.
4752 (defvar url-coding-should-preserve-table
4754 (list ?- ?_ ?. ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o
4755 ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z ?A ?B ?C ?D ?E ?F ?G
4756 ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y
4757 ?Z ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))
4759 (res (make-vector 256 0)))
4761 (when (member (int-char i) preserve)
4765 "A 256-entry array of flags, indicating whether or not to preserve an
4766 octet as its ASCII encoding.")
4768 ---------- Footnotes ----------
4770 (1) That's what the standards call it, though my North American
4771 readers will be more familiar with it as the period character.
4774 File: lispref.info, Node: The program to decode to internal format, Next: The program to encode from internal format, Prev: Characters to be preserved, Up: CCL Example
4776 63.7.5.5 The program to decode to internal format
4777 .................................................
4779 After the almost interminable tables, we get to the CCL. The first CCL
4780 program, `ccl-decode-urlcoding' decodes from the URL coding to our
4781 internal format; since this version of CCL doesn't have support for
4782 error checking on the input, we don't do any verification on it.
4784 The buffer magnification-approximate ratio of the size of the output
4785 buffer to the size of the input buffer-is declared as one, because
4786 fractional values aren't allowed. (Since all those %20's will map to `
4787 ', the length of the output text will be less than that of the input
4790 So, first we read an octet from the input buffer into register `r0',
4791 to set up the loop. Next, we start the loop, with a `(loop ...)'
4792 statement, and we check if the value in `r0' is a percentage sign.
4793 (Note the comma before `url-coding-escape-character-code'; since CCL is
4794 a Lisp macro language, we can break out of the macro evaluation with a
4795 comman, and as such, "`,url-coding-escape-character-code'" will be
4796 evaluated as a literal `37.')
4798 If it is a percentage sign, we read the next two octets into `r2'
4799 and `r3', and convert them into their hexadecimal numeric values, using
4800 the `url-coding-latin-1-as-hex-table' array declared above. (But
4801 again, it'll be interpreted as a literal array.) We then left shift
4802 the first by four bits, mask the two together, and write the result to
4805 If it isn't a percentage sign, and it is a `+' sign, we write a
4806 space-hexadecimal 20-to the output buffer.
4808 If none of those things are true, we pass the octet to the output
4809 buffer untransformed. (This could be a place to put error checking, in
4810 a more expressive language.) We then read one more octet from the input
4811 buffer, and move to the next iteration of the loop.
4813 (define-ccl-program ccl-decode-urlcoding
4817 (if (r0 == ,url-coding-escape-character-code)
4819 ;; Assign the value at offset r2 in the url-coding-hex-digit-table
4821 (r2 = r2 ,url-coding-latin-1-as-hex-table)
4822 (r3 = r3 ,url-coding-latin-1-as-hex-table)
4826 (if (r0 == ,url-coding-escaped-space-code)
4831 "CCL program to take URI-encoded ASCII text and transform it to our
4832 internal encoding. ")
4835 File: lispref.info, Node: The program to encode from internal format, Next: The actual coding system, Prev: The program to decode to internal format, Up: CCL Example
4837 63.7.5.6 The program to encode from internal format
4838 ...................................................
4840 Next, we see the CCL program to encode ASCII text as URL coded text.
4841 Here, the buffer magnification is specified as three, to account for ` '
4842 mapping to %20, etc. As before, we read an octet from the input into
4843 `r0', and move into the body of the loop. Next, we check if we should
4844 preserve the value of this octet, by reading from offset `r0' in the
4845 `url-coding-should-preserve-table' into `r1'. Then we have an `if'
4846 statement predicated on the value in `r1'; for the true branch, we
4847 write the input octet directly. For the false branch, we write a
4848 percentage sign, the ASCII encoding of the high four bits in hex, and
4849 then the ASCII encoding of the low four bits in hex.
4851 We then read an octet from the input into `r0', and repeat the loop.
4853 (define-ccl-program ccl-encode-urlcoding
4857 (r1 = r0 ,url-coding-should-preserve-table)
4858 ;; If we should preserve the value, just write the octet directly.
4861 ;; else, write a percentage sign, and the hex value of the octet, in
4862 ;; an ASCII-friendly format.
4863 ((write ,url-coding-escape-character-code)
4864 (write r0 ,url-coding-high-order-nybble-as-ascii)
4865 (write r0 ,url-coding-low-order-nybble-as-ascii)))
4868 "CCL program to encode octets (almost) according to RFC 1738")
4871 File: lispref.info, Node: The actual coding system, Prev: The program to encode from internal format, Up: CCL Example
4873 63.7.5.7 The actual coding system
4874 .................................
4876 To actually create the coding system, we call `make-coding-system'.
4877 The first argument is the symbol that is to be the name of the coding
4878 system, in our case `url-coding'. The second specifies that the coding
4879 system is to be of type `ccl'--there are several other coding system
4880 types available, including, see the documentation for
4881 `make-coding-system' for the full list. Then there's a documentation
4882 string describing the wherefore and caveats of the coding system, and
4883 the final argument is a property list giving information about the CCL
4884 programs and the coding system's mnemonic.
4888 "The coding used by application/x-www-form-urlencoded HTTP applications.
4889 This coding form doesn't specify anything about non-ASCII characters, so
4890 make sure you've transformed to a seven-bit coding system first."
4891 '(decode ccl-decode-urlcoding
4892 encode ccl-encode-urlcoding
4895 If you're lucky, the `url-coding' coding system describe here should
4896 be available in the XEmacs package system. Otherwise, downloading it
4897 from `http://www.parhasard.net/url-coding.el' should work for the
4901 File: lispref.info, Node: Category Tables, Prev: CCL, Up: MULE
4903 63.8 Category Tables
4904 ====================
4906 A category table is a type of char table used for keeping track of
4907 categories. Categories are used for classifying characters for use in
4908 regexps--you can refer to a category rather than having to use a
4909 complicated [] expression (and category lookups are significantly
4912 There are 95 different categories available, one for each printable
4913 character (including space) in the ASCII charset. Each category is
4914 designated by one such character, called a "category designator". They
4915 are specified in a regexp using the syntax `\cX', where X is a category
4916 designator. (This is not yet implemented.)
4918 A category table specifies, for each character, the categories that
4919 the character is in. Note that a character can be in more than one
4920 category. More specifically, a category table maps from a character to
4921 either the value `nil' (meaning the character is in no categories) or a
4922 95-element bit vector, specifying for each of the 95 categories whether
4923 the character is in that category.
4925 Special Lisp functions are provided that abstract this, so you do not
4926 have to directly manipulate bit vectors.
4928 -- Function: category-table-p object
4929 This function returns `t' if OBJECT is a category table.
4931 -- Function: category-table &optional buffer
4932 This function returns the current category table. This is the one
4933 specified by the current buffer, or by BUFFER if it is non-`nil'.
4935 -- Function: standard-category-table
4936 This function returns the standard category table. This is the
4937 one used for new buffers.
4939 -- Function: copy-category-table &optional category-table
4940 This function returns a new category table which is a copy of
4941 CATEGORY-TABLE, which defaults to the standard category table.
4943 -- Function: set-category-table category-table &optional buffer
4944 This function selects CATEGORY-TABLE as the new category table for
4945 BUFFER. BUFFER defaults to the current buffer if omitted.
4947 -- Function: category-designator-p object
4948 This function returns `t' if OBJECT is a category designator (a
4949 char in the range `' '' to `'~'').
4951 -- Function: category-table-value-p object
4952 This function returns `t' if OBJECT is a category table value.
4953 Valid values are `nil' or a bit vector of size 95.
4956 File: lispref.info, Node: Tips, Next: Building XEmacs and Object Allocation, Prev: MULE, Up: Top
4958 Appendix A Tips and Standards
4959 *****************************
4961 This chapter describes no additional features of XEmacs Lisp. Instead
4962 it gives advice on making effective use of the features described in
4963 the previous chapters.
4967 * Style Tips:: Writing clean and robust programs.
4968 * Compilation Tips:: Making compiled code run fast.
4969 * Documentation Tips:: Writing readable documentation strings.
4970 * Comment Tips:: Conventions for writing comments.
4971 * Library Headers:: Standard headers for library packages.
4974 File: lispref.info, Node: Style Tips, Next: Compilation Tips, Up: Tips
4976 A.1 Writing Clean Lisp Programs
4977 ===============================
4979 Here are some tips for avoiding common errors in writing Lisp code
4980 intended for widespread use:
4982 * Since all global variables share the same name space, and all
4983 functions share another name space, you should choose a short word
4984 to distinguish your program from other Lisp programs. Then take
4985 care to begin the names of all global variables, constants, and
4986 functions with the chosen prefix. This helps avoid name conflicts.
4988 This recommendation applies even to names for traditional Lisp
4989 primitives that are not primitives in XEmacs Lisp--even to `cadr'.
4990 Believe it or not, there is more than one plausible way to define
4991 `cadr'. Play it safe; append your name prefix to produce a name
4992 like `foo-cadr' or `mylib-cadr' instead.
4994 If you write a function that you think ought to be added to Emacs
4995 under a certain name, such as `twiddle-files', don't call it by
4996 that name in your program. Call it `mylib-twiddle-files' in your
4997 program, and send mail to `bug-gnu-emacs@prep.ai.mit.edu'
4998 suggesting we add it to Emacs. If and when we do, we can change
4999 the name easily enough.
5001 If one prefix is insufficient, your package may use two or three
5002 alternative common prefixes, so long as they make sense.
5004 Separate the prefix from the rest of the symbol name with a hyphen,
5005 `-'. This will be consistent with XEmacs itself and with most
5006 Emacs Lisp programs.
5008 * It is often useful to put a call to `provide' in each separate
5009 library program, at least if there is more than one entry point to
5012 * If a file requires certain other library programs to be loaded
5013 beforehand, then the comments at the beginning of the file should
5014 say so. Also, use `require' to make sure they are loaded.
5016 * If one file FOO uses a macro defined in another file BAR, FOO
5017 should contain this expression before the first use of the macro:
5019 (eval-when-compile (require 'BAR))
5021 (And BAR should contain `(provide 'BAR)', to make the `require'
5022 work.) This will cause BAR to be loaded when you byte-compile
5023 FOO. Otherwise, you risk compiling FOO without the necessary
5024 macro loaded, and that would produce compiled code that won't work
5025 right. *Note Compiling Macros::.
5027 Using `eval-when-compile' avoids loading BAR when the compiled
5028 version of FOO is _used_.
5030 * If you define a major mode, make sure to run a hook variable using
5031 `run-hooks', just as the existing major modes do. *Note Hooks::.
5033 * If the purpose of a function is to tell you whether a certain
5034 condition is true or false, give the function a name that ends in
5035 `p'. If the name is one word, add just `p'; if the name is
5036 multiple words, add `-p'. Examples are `framep' and
5039 * If a user option variable records a true-or-false condition, give
5040 it a name that ends in `-flag'.
5042 * Please do not define `C-c LETTER' as a key in your major modes.
5043 These sequences are reserved for users; they are the *only*
5044 sequences reserved for users, so we cannot do without them.
5046 Instead, define sequences consisting of `C-c' followed by a
5047 non-letter. These sequences are reserved for major modes.
5049 Changing all the major modes in Emacs 18 so they would follow this
5050 convention was a lot of work. Abandoning this convention would
5051 make that work go to waste, and inconvenience users.
5053 * Sequences consisting of `C-c' followed by `{', `}', `<', `>', `:'
5054 or `;' are also reserved for major modes.
5056 * Sequences consisting of `C-c' followed by any other punctuation
5057 character are allocated for minor modes. Using them in a major
5058 mode is not absolutely prohibited, but if you do that, the major
5059 mode binding may be shadowed from time to time by minor modes.
5061 * You should not bind `C-h' following any prefix character (including
5062 `C-c'). If you don't bind `C-h', it is automatically available as
5063 a help character for listing the subcommands of the prefix
5066 * You should not bind a key sequence ending in <ESC> except following
5067 another <ESC>. (That is, it is ok to bind a sequence ending in
5070 The reason for this rule is that a non-prefix binding for <ESC> in
5071 any context prevents recognition of escape sequences as function
5072 keys in that context.
5074 * Applications should not bind mouse events based on button 1 with
5075 the shift key held down. These events include `S-mouse-1',
5076 `M-S-mouse-1', `C-S-mouse-1', and so on. They are reserved for
5079 * Modes should redefine `mouse-2' as a command to follow some sort of
5080 reference in the text of a buffer, if users usually would not want
5081 to alter the text in that buffer by hand. Modes such as Dired,
5082 Info, Compilation, and Occur redefine it in this way.
5084 * When a package provides a modification of ordinary Emacs behavior,
5085 it is good to include a command to enable and disable the feature,
5086 Provide a command named `WHATEVER-mode' which turns the feature on
5087 or off, and make it autoload (*note Autoload::). Design the
5088 package so that simply loading it has no visible effect--that
5089 should not enable the feature. Users will request the feature by
5090 invoking the command.
5092 * It is a bad idea to define aliases for the Emacs primitives. Use
5093 the standard names instead.
5095 * Redefining an Emacs primitive is an even worse idea. It may do
5096 the right thing for a particular program, but there is no telling
5097 what other programs might break as a result.
5099 * If a file does replace any of the functions or library programs of
5100 standard XEmacs, prominent comments at the beginning of the file
5101 should say which functions are replaced, and how the behavior of
5102 the replacements differs from that of the originals.
5104 * Please keep the names of your XEmacs Lisp source files to 13
5105 characters or less. This way, if the files are compiled, the
5106 compiled files' names will be 14 characters or less, which is
5107 short enough to fit on all kinds of Unix systems.
5109 * Don't use `next-line' or `previous-line' in programs; nearly
5110 always, `forward-line' is more convenient as well as more
5111 predictable and robust. *Note Text Lines::.
5113 * Don't call functions that set the mark, unless setting the mark is
5114 one of the intended features of your program. The mark is a
5115 user-level feature, so it is incorrect to change the mark except
5116 to supply a value for the user's benefit. *Note The Mark::.
5118 In particular, don't use these functions:
5120 * `beginning-of-buffer', `end-of-buffer'
5122 * `replace-string', `replace-regexp'
5124 If you just want to move point, or replace a certain string,
5125 without any of the other features intended for interactive users,
5126 you can replace these functions with one or two lines of simple
5129 * Use lists rather than vectors, except when there is a particular
5130 reason to use a vector. Lisp has more facilities for manipulating
5131 lists than for vectors, and working with lists is usually more
5134 Vectors are advantageous for tables that are substantial in size
5135 and are accessed in random order (not searched front to back),
5136 provided there is no need to insert or delete elements (only lists
5139 * The recommended way to print a message in the echo area is with
5140 the `message' function, not `princ'. *Note The Echo Area::.
5142 * When you encounter an error condition, call the function `error'
5143 (or `signal'). The function `error' does not return. *Note
5146 Do not use `message', `throw', `sleep-for', or `beep' to report
5149 * An error message should start with a capital letter but should not
5152 * Try to avoid using recursive edits. Instead, do what the Rmail `e'
5153 command does: use a new local keymap that contains one command
5154 defined to switch back to the old local keymap. Or do what the
5155 `edit-options' command does: switch to another buffer and let the
5156 user switch back at will. *Note Recursive Editing::.
5158 * In some other systems there is a convention of choosing variable
5159 names that begin and end with `*'. We don't use that convention
5160 in Emacs Lisp, so please don't use it in your programs. (Emacs
5161 uses such names only for program-generated buffers.) The users
5162 will find Emacs more coherent if all libraries use the same
5165 * Use names starting with a space for temporary buffers (*note
5166 Buffer Names::), or at least call `buffer-disable-undo' on them.
5167 Otherwise they may stay referenced by internal undo variable(s)
5168 after getting killed. If this happens before dumping (*note
5169 Building XEmacs::), this may cause fatal error when portable
5172 * Indent each function with `C-M-q' (`indent-sexp') using the
5173 default indentation parameters.
5175 * Don't make a habit of putting close-parentheses on lines by
5176 themselves; Lisp programmers find this disconcerting. Once in a
5177 while, when there is a sequence of many consecutive
5178 close-parentheses, it may make sense to split them in one or two
5181 * Please put a copyright notice on the file if you give copies to
5182 anyone. Use the same lines that appear at the top of the Lisp
5183 files in XEmacs itself. If you have not signed papers to assign
5184 the copyright to the Foundation, then place your name in the
5185 copyright notice in place of the Foundation's name.
5188 File: lispref.info, Node: Compilation Tips, Next: Documentation Tips, Prev: Style Tips, Up: Tips
5190 A.2 Tips for Making Compiled Code Fast
5191 ======================================
5193 Here are ways of improving the execution speed of byte-compiled Lisp
5196 * Use the `profile' library to profile your program. See the file
5197 `profile.el' for instructions.
5199 * Use iteration rather than recursion whenever possible. Function
5200 calls are slow in XEmacs Lisp even when a compiled function is
5201 calling another compiled function.
5203 * Using the primitive list-searching functions `memq', `member',
5204 `assq', or `assoc' is even faster than explicit iteration. It may
5205 be worth rearranging a data structure so that one of these
5206 primitive search functions can be used.
5208 * Certain built-in functions are handled specially in byte-compiled
5209 code, avoiding the need for an ordinary function call. It is a
5210 good idea to use these functions rather than alternatives. To see
5211 whether a function is handled specially by the compiler, examine
5212 its `byte-compile' property. If the property is non-`nil', then
5213 the function is handled specially.
5215 For example, the following input will show you that `aref' is
5216 compiled specially (*note Array Functions::) while `elt' is not
5217 (*note Sequence Functions::):
5219 (get 'aref 'byte-compile)
5220 => byte-compile-two-args
5222 (get 'elt 'byte-compile)
5225 * If calling a small function accounts for a substantial part of
5226 your program's running time, make the function inline. This
5227 eliminates the function call overhead. Since making a function
5228 inline reduces the flexibility of changing the program, don't do
5229 it unless it gives a noticeable speedup in something slow enough
5230 that users care about the speed. *Note Inline Functions::.
5233 File: lispref.info, Node: Documentation Tips, Next: Comment Tips, Prev: Compilation Tips, Up: Tips
5235 A.3 Tips for Documentation Strings
5236 ==================================
5238 Here are some tips for the writing of documentation strings.
5240 * Every command, function, or variable intended for users to know
5241 about should have a documentation string.
5243 * An internal variable or subroutine of a Lisp program might as well
5244 have a documentation string. In earlier Emacs versions, you could
5245 save space by using a comment instead of a documentation string,
5246 but that is no longer the case.
5248 * The first line of the documentation string should consist of one
5249 or two complete sentences that stand on their own as a summary.
5250 `M-x apropos' displays just the first line, and if it doesn't
5251 stand on its own, the result looks bad. In particular, start the
5252 first line with a capital letter and end with a period.
5254 The documentation string can have additional lines that expand on
5255 the details of how to use the function or variable. The
5256 additional lines should be made up of complete sentences also, but
5257 they may be filled if that looks good.
5259 * For consistency, phrase the verb in the first sentence of a
5260 documentation string as an infinitive with "to" omitted. For
5261 instance, use "Return the cons of A and B." in preference to
5262 "Returns the cons of A and B." Usually it looks good to do
5263 likewise for the rest of the first paragraph. Subsequent
5264 paragraphs usually look better if they have proper subjects.
5266 * Write documentation strings in the active voice, not the passive,
5267 and in the present tense, not the future. For instance, use
5268 "Return a list containing A and B." instead of "A list containing
5269 A and B will be returned."
5271 * Avoid using the word "cause" (or its equivalents) unnecessarily.
5272 Instead of, "Cause Emacs to display text in boldface," write just
5273 "Display text in boldface."
5275 * Do not start or end a documentation string with whitespace.
5277 * Format the documentation string so that it fits in an Emacs window
5278 on an 80-column screen. It is a good idea for most lines to be no
5279 wider than 60 characters. The first line can be wider if
5280 necessary to fit the information that ought to be there.
5282 However, rather than simply filling the entire documentation
5283 string, you can make it much more readable by choosing line breaks
5284 with care. Use blank lines between topics if the documentation
5287 * *Do not* indent subsequent lines of a documentation string so that
5288 the text is lined up in the source code with the text of the first
5289 line. This looks nice in the source code, but looks bizarre when
5290 users view the documentation. Remember that the indentation
5291 before the starting double-quote is not part of the string!
5293 * A variable's documentation string should start with `*' if the
5294 variable is one that users would often want to set interactively.
5295 If the value is a long list, or a function, or if the variable
5296 would be set only in init files, then don't start the
5297 documentation string with `*'. *Note Defining Variables::.
5299 * The documentation string for a variable that is a yes-or-no flag
5300 should start with words such as "Non-nil means...", to make it
5301 clear that all non-`nil' values are equivalent and indicate
5302 explicitly what `nil' and non-`nil' mean.
5304 * When a function's documentation string mentions the value of an
5305 argument of the function, use the argument name in capital letters
5306 as if it were a name for that value. Thus, the documentation
5307 string of the function `/' refers to its second argument as
5308 `DIVISOR', because the actual argument name is `divisor'.
5310 Also use all caps for meta-syntactic variables, such as when you
5311 show the decomposition of a list or vector into subunits, some of
5314 * When a documentation string refers to a Lisp symbol, write it as it
5315 would be printed (which usually means in lower case), with
5316 single-quotes around it. For example: `lambda'. There are two
5317 exceptions: write t and nil without single-quotes. (In this
5318 manual, we normally do use single-quotes for those symbols.)
5320 * Don't write key sequences directly in documentation strings.
5321 Instead, use the `\\[...]' construct to stand for them. For
5322 example, instead of writing `C-f', write `\\[forward-char]'. When
5323 Emacs displays the documentation string, it substitutes whatever
5324 key is currently bound to `forward-char'. (This is normally `C-f',
5325 but it may be some other character if the user has moved key
5326 bindings.) *Note Keys in Documentation::.
5328 * In documentation strings for a major mode, you will want to refer
5329 to the key bindings of that mode's local map, rather than global
5330 ones. Therefore, use the construct `\\<...>' once in the
5331 documentation string to specify which key map to use. Do this
5332 before the first use of `\\[...]'. The text inside the `\\<...>'
5333 should be the name of the variable containing the local keymap for
5336 It is not practical to use `\\[...]' very many times, because
5337 display of the documentation string will become slow. So use this
5338 to describe the most important commands in your major mode, and
5339 then use `\\{...}' to display the rest of the mode's keymap.
5342 File: lispref.info, Node: Comment Tips, Next: Library Headers, Prev: Documentation Tips, Up: Tips
5344 A.4 Tips on Writing Comments
5345 ============================
5347 We recommend these conventions for where to put comments and how to
5351 Comments that start with a single semicolon, `;', should all be
5352 aligned to the same column on the right of the source code. Such
5353 comments usually explain how the code on the same line does its
5354 job. In Lisp mode and related modes, the `M-;'
5355 (`indent-for-comment') command automatically inserts such a `;' in
5356 the right place, or aligns such a comment if it is already present.
5358 This and following examples are taken from the Emacs sources.
5360 (setq base-version-list ; there was a base
5361 (assoc (substring fn 0 start-vn) ; version to which
5362 file-version-assoc-list)) ; this looks like
5366 Comments that start with two semicolons, `;;', should be aligned to
5367 the same level of indentation as the code. Such comments usually
5368 describe the purpose of the following lines or the state of the
5369 program at that point. For example:
5371 (prog1 (setq auto-fill-function
5377 Every function that has no documentation string (because it is
5378 used only internally within the package it belongs to), should
5379 have instead a two-semicolon comment right before the function,
5380 explaining what the function does and how to call it properly.
5381 Explain precisely what each argument means and how the function
5382 interprets its possible values.
5385 Comments that start with three semicolons, `;;;', should start at
5386 the left margin. Such comments are used outside function
5387 definitions to make general statements explaining the design
5388 principles of the program. For example:
5390 ;;; This Lisp code is run in XEmacs
5391 ;;; when it is to operate as a server
5392 ;;; for other processes.
5394 Another use for triple-semicolon comments is for commenting out
5395 lines within a function. We use triple-semicolons for this
5396 precisely so that they remain at the left margin.
5399 ;;; This is no longer necessary.
5400 ;;; (force-mode-line-update)
5401 (message "Finished with %s" a))
5404 Comments that start with four semicolons, `;;;;', should be aligned
5405 to the left margin and are used for headings of major sections of a
5406 program. For example:
5410 The indentation commands of the Lisp modes in XEmacs, such as `M-;'
5411 (`indent-for-comment') and <TAB> (`lisp-indent-line') automatically
5412 indent comments according to these conventions, depending on the number
5413 of semicolons. *Note Manipulating Comments: (xemacs)Comments.
5416 File: lispref.info, Node: Library Headers, Prev: Comment Tips, Up: Tips
5418 A.5 Conventional Headers for XEmacs Libraries
5419 =============================================
5421 XEmacs has conventions for using special comments in Lisp libraries to
5422 divide them into sections and give information such as who wrote them.
5423 This section explains these conventions. First, an example:
5425 ;;; lisp-mnt.el --- minor mode for Emacs Lisp maintainers
5427 ;; Copyright (C) 1992 Free Software Foundation, Inc.
5429 ;; Author: Eric S. Raymond <esr@snark.thyrsus.com>
5430 ;; Maintainer: Eric S. Raymond <esr@snark.thyrsus.com>
5431 ;; Created: 14 Jul 1992
5435 ;; This file is part of XEmacs.
5436 COPYING PERMISSIONS...
5438 The very first line should have this format:
5440 ;;; FILENAME --- DESCRIPTION
5442 The description should be complete in one line.
5444 After the copyright notice come several "header comment" lines, each
5445 beginning with `;; HEADER-NAME:'. Here is a table of the conventional
5446 possibilities for HEADER-NAME:
5449 This line states the name and net address of at least the principal
5450 author of the library.
5452 If there are multiple authors, you can list them on continuation
5453 lines led by `;;' and a tab character, like this:
5455 ;; Author: Ashwin Ram <Ram-Ashwin@cs.yale.edu>
5456 ;; Dave Sill <de5@ornl.gov>
5457 ;; Dave Brennan <brennan@hal.com>
5458 ;; Eric Raymond <esr@snark.thyrsus.com>
5461 This line should contain a single name/address as in the Author
5462 line, or an address only, or the string `FSF'. If there is no
5463 maintainer line, the person(s) in the Author field are presumed to
5464 be the maintainers. The example above is mildly bogus because the
5465 maintainer line is redundant.
5467 The idea behind the `Author' and `Maintainer' lines is to make
5468 possible a Lisp function to "send mail to the maintainer" without
5469 having to mine the name out by hand.
5471 Be sure to surround the network address with `<...>' if you
5472 include the person's full name as well as the network address.
5475 This optional line gives the original creation date of the file.
5476 For historical interest only.
5479 If you wish to record version numbers for the individual Lisp
5480 program, put them in this line.
5483 In this header line, place the name of the person who adapted the
5484 library for installation (to make it fit the style conventions, for
5488 This line lists keywords for the `finder-by-keyword' help command.
5489 This field is important; it's how people will find your package
5490 when they're looking for things by topic area. To separate the
5491 keywords, you can use spaces, commas, or both.
5493 Just about every Lisp library ought to have the `Author' and
5494 `Keywords' header comment lines. Use the others if they are
5495 appropriate. You can also put in header lines with other header
5496 names--they have no standard meanings, so they can't do any harm.
5498 We use additional stylized comments to subdivide the contents of the
5499 library file. Here is a table of them:
5502 This begins introductory comments that explain how the library
5503 works. It should come right after the copying permissions.
5506 This begins change log information stored in the library file (if
5507 you store the change history there). For most of the Lisp files
5508 distributed with XEmacs, the change history is kept in the file
5509 `ChangeLog' and not in the source file at all; these files do not
5510 have a `;;; Change log:' line.
5513 This begins the actual code of the program.
5515 `;;; FILENAME ends here'
5516 This is the "footer line"; it appears at the very end of the file.
5517 Its purpose is to enable people to detect truncated versions of
5518 the file from the lack of a footer line.
5521 File: lispref.info, Node: Building XEmacs and Object Allocation, Next: Standard Errors, Prev: Tips, Up: Top
5523 Appendix B Building XEmacs; Allocation of Objects
5524 *************************************************
5526 This chapter describes how the runnable XEmacs executable is dumped
5527 with the preloaded Lisp libraries in it and how storage is allocated.
5529 There is an entire separate document, the `XEmacs Internals Manual',
5530 devoted to the internals of XEmacs from the perspective of the C
5531 programmer. It contains much more detailed information about the build
5532 process, the allocation and garbage-collection process, and other
5533 aspects related to the internals of XEmacs.
5537 * Building XEmacs:: How to preload Lisp libraries into XEmacs.
5538 * Pure Storage:: A kludge to make preloaded Lisp functions sharable.
5539 * Garbage Collection:: Reclaiming space for Lisp objects no longer used.
5542 File: lispref.info, Node: Building XEmacs, Next: Pure Storage, Up: Building XEmacs and Object Allocation
5547 This section explains the steps involved in building the XEmacs
5548 executable. You don't have to know this material to build and install
5549 XEmacs, since the makefiles do all these things automatically. This
5550 information is pertinent to XEmacs maintenance.
5552 The `XEmacs Internals Manual' contains more information about this.
5554 Compilation of the C source files in the `src' directory produces an
5555 executable file called `temacs', also called a "bare impure XEmacs".
5556 It contains the XEmacs Lisp interpreter and I/O routines, but not the
5559 Before XEmacs is actually usable, a number of Lisp files need to be
5560 loaded. These define all the editing commands, plus most of the startup
5561 code and many very basic Lisp primitives. This is accomplished by
5562 loading the file `loadup.el', which in turn loads all of the other
5563 standardly-loaded Lisp files.
5565 It takes a substantial time to load the standard Lisp files.
5566 Luckily, you don't have to do this each time you run XEmacs; `temacs'
5567 can dump out an executable program called `xemacs' that has these files
5568 preloaded. `xemacs' starts more quickly because it does not need to
5569 load the files. This is the XEmacs executable that is normally
5572 To create `xemacs', use the command `temacs -batch -l loadup dump'.
5573 The purpose of `-batch' here is to tell `temacs' to run in
5574 non-interactive, command-line mode. (`temacs' can _only_ run in this
5575 fashion. Part of the code required to initialize frames and faces is
5576 in Lisp, and must be loaded before XEmacs is able to create any frames.)
5577 The argument `dump' tells `loadup.el' to dump a new executable named
5580 The dumping process is highly system-specific, and some operating
5581 systems don't support dumping. On those systems, you must start XEmacs
5582 with the `temacs -batch -l loadup run-temacs' command each time you use
5583 it. This takes a substantial time, but since you need to start Emacs
5584 once a day at most--or once a week if you never log out--the extra time
5585 is not too severe a problem. (In older versions of Emacs, you started
5586 Emacs from `temacs' using `temacs -l loadup'.)
5588 You are free to start XEmacs directly from `temacs' if you want,
5589 even if there is already a dumped `xemacs'. Normally you wouldn't want
5590 to do that; but the Makefiles do this when you rebuild XEmacs using
5591 `make all-elc', which builds XEmacs and simultaneously compiles any
5592 out-of-date Lisp files. (You need `xemacs' in order to compile Lisp
5593 files. However, you also need the compiled Lisp files in order to dump
5594 out `xemacs'. If both of these are missing or corrupted, you are out
5595 of luck unless you're able to bootstrap `xemacs' from `temacs'. Note
5596 that `make all-elc' actually loads the alternative loadup file
5597 `loadup-el.el', which works like `loadup.el' but disables the
5598 pure-copying process and forces XEmacs to ignore any compiled Lisp
5599 files even if they exist.)
5601 You can specify additional files to preload by writing a library
5602 named `site-load.el' that loads them. You may need to increase the
5603 value of `PURESIZE', in `src/puresize.h', to make room for the
5604 additional files. You should _not_ modify this file directly, however;
5605 instead, use the `--puresize' configuration option. (If you run out of
5606 pure space while dumping `xemacs', you will be told how much pure space
5607 you actually will need.) However, the advantage of preloading
5608 additional files decreases as machines get faster. On modern machines,
5609 it is often not advisable, especially if the Lisp code is on a file
5610 system local to the machine running XEmacs.
5612 You can specify other Lisp expressions to execute just before dumping
5613 by putting them in a library named `site-init.el'. However, if they
5614 might alter the behavior that users expect from an ordinary unmodified
5615 XEmacs, it is better to put them in `default.el', so that users can
5616 override them if they wish. *Note Start-up Summary::.
5618 Before `loadup.el' dumps the new executable, it finds the
5619 documentation strings for primitive and preloaded functions (and
5620 variables) in the file where they are stored, by calling
5621 `Snarf-documentation' (*note Accessing Documentation::). These strings
5622 were moved out of the `xemacs' executable to make it smaller. *Note
5623 Documentation Basics::.
5625 -- Function: dump-emacs to-file from-file
5626 This function dumps the current state of XEmacs into an executable
5627 file TO-FILE. It takes symbols from FROM-FILE (this is normally
5628 the executable file `temacs').
5630 If you use this function in an XEmacs that was already dumped, you
5631 must set `command-line-processed' to `nil' first for good results.
5632 *Note Command Line Arguments::.
5634 -- Function: run-emacs-from-temacs &rest args
5635 This is the function that implements the `run-temacs' command-line
5636 argument. It is called from `loadup.el' as appropriate. You
5637 should most emphatically _not_ call this yourself; it will
5638 reinitialize your XEmacs process and you'll be sorry.
5640 -- Command: emacs-version &optional arg
5641 This function returns a string describing the version of XEmacs
5642 that is running. It is useful to include this string in bug
5645 When called interactively with a prefix argument, insert string at
5646 point. Don't use this function in programs to choose actions
5647 according to the system configuration; look at
5648 `system-configuration' instead.
5651 => "XEmacs 20.1 [Lucid] (i586-unknown-linux2.0.29)
5652 of Mon Apr 7 1997 on altair.xemacs.org"
5654 Called interactively, the function prints the same information in
5657 -- Variable: emacs-build-time
5658 The value of this variable is the time at which XEmacs was built
5661 emacs-build-time "Mon Apr 7 20:28:52 1997"
5664 -- Variable: emacs-version
5665 The value of this variable is the version of Emacs being run. It
5666 is a string, e.g. `"20.1 XEmacs Lucid"'.
5668 The following two variables did not exist before FSF GNU Emacs
5669 version 19.23 and XEmacs version 19.10, which reduces their usefulness
5670 at present, but we hope they will be convenient in the future.
5672 -- Variable: emacs-major-version
5673 The major version number of Emacs, as an integer. For XEmacs
5674 version 20.1, the value is 20.
5676 -- Variable: emacs-minor-version
5677 The minor version number of Emacs, as an integer. For XEmacs
5678 version 20.1, the value is 1.
5681 File: lispref.info, Node: Pure Storage, Next: Garbage Collection, Prev: Building XEmacs, Up: Building XEmacs and Object Allocation
5686 XEmacs Lisp uses two kinds of storage for user-created Lisp objects:
5687 "normal storage" and "pure storage". Normal storage is where all the
5688 new data created during an XEmacs session is kept; see the following
5689 section for information on normal storage. Pure storage is used for
5690 certain data in the preloaded standard Lisp files--data that should
5691 never change during actual use of XEmacs.
5693 Pure storage is allocated only while `temacs' is loading the
5694 standard preloaded Lisp libraries. In the file `xemacs', it is marked
5695 as read-only (on operating systems that permit this), so that the
5696 memory space can be shared by all the XEmacs jobs running on the machine
5697 at once. Pure storage is not expandable; a fixed amount is allocated
5698 when XEmacs is compiled, and if that is not sufficient for the preloaded
5699 libraries, `temacs' aborts with an error message. If that happens, you
5700 must increase the compilation parameter `PURESIZE' using the
5701 `--puresize' option to `configure'. This normally won't happen unless
5702 you try to preload additional libraries or add features to the standard
5705 -- Function: purecopy object
5706 This function makes a copy of OBJECT in pure storage and returns
5707 it. It copies strings by simply making a new string with the same
5708 characters in pure storage. It recursively copies the contents of
5709 vectors and cons cells. It does not make copies of other objects
5710 such as symbols, but just returns them unchanged. It signals an
5711 error if asked to copy markers.
5713 This function is a no-op in XEmacs, and its use in new code is
5716 -- Variable: pure-bytes-used
5717 The value of this variable is the number of bytes of pure storage
5718 allocated so far. Typically, in a dumped XEmacs, this number is
5719 very close to the total amount of pure storage available--if it
5720 were not, we would preallocate less.
5722 -- Variable: purify-flag
5723 This variable determines whether `defun' should make a copy of the
5724 function definition in pure storage. If it is non-`nil', then the
5725 function definition is copied into pure storage.
5727 This flag is `t' while loading all of the basic functions for
5728 building XEmacs initially (allowing those functions to be sharable
5729 and non-collectible). Dumping XEmacs as an executable always
5730 writes `nil' in this variable, regardless of the value it actually
5731 has before and after dumping.
5733 You should not change this flag in a running XEmacs.
5736 File: lispref.info, Node: Garbage Collection, Prev: Pure Storage, Up: Building XEmacs and Object Allocation
5738 B.3 Garbage Collection
5739 ======================
5741 When a program creates a list or the user defines a new function (such
5742 as by loading a library), that data is placed in normal storage. If
5743 normal storage runs low, then XEmacs asks the operating system to
5744 allocate more memory in blocks of 2k bytes. Each block is used for one
5745 type of Lisp object, so symbols, cons cells, markers, etc., are
5746 segregated in distinct blocks in memory. (Vectors, long strings,
5747 buffers and certain other editing types, which are fairly large, are
5748 allocated in individual blocks, one per object, while small strings are
5749 packed into blocks of 8k bytes. [More correctly, a string is allocated
5750 in two sections: a fixed size chunk containing the length, list of
5751 extents, etc.; and a chunk containing the actual characters in the
5752 string. It is this latter chunk that is either allocated individually
5753 or packed into 8k blocks. The fixed size chunk is packed into 2k
5754 blocks, as for conses, markers, etc.])
5756 It is quite common to use some storage for a while, then release it
5757 by (for example) killing a buffer or deleting the last pointer to an
5758 object. XEmacs provides a "garbage collector" to reclaim this
5759 abandoned storage. (This name is traditional, but "garbage recycler"
5760 might be a more intuitive metaphor for this facility.)
5762 The garbage collector operates by finding and marking all Lisp
5763 objects that are still accessible to Lisp programs. To begin with, it
5764 assumes all the symbols, their values and associated function
5765 definitions, and any data presently on the stack, are accessible. Any
5766 objects that can be reached indirectly through other accessible objects
5767 are also accessible.
5769 When marking is finished, all objects still unmarked are garbage. No
5770 matter what the Lisp program or the user does, it is impossible to refer
5771 to them, since there is no longer a way to reach them. Their space
5772 might as well be reused, since no one will miss them. The second
5773 ("sweep") phase of the garbage collector arranges to reuse them.
5775 The sweep phase puts unused cons cells onto a "free list" for future
5776 allocation; likewise for symbols, markers, extents, events, floats,
5777 compiled-function objects, and the fixed-size portion of strings. It
5778 compacts the accessible small string-chars chunks so they occupy fewer
5779 8k blocks; then it frees the other 8k blocks. Vectors, buffers,
5780 windows, and other large objects are individually allocated and freed
5781 using `malloc' and `free'.
5783 Common Lisp note: unlike other Lisps, XEmacs Lisp does not call
5784 the garbage collector when the free list is empty. Instead, it
5785 simply requests the operating system to allocate more storage, and
5786 processing continues until `gc-cons-threshold' bytes have been
5789 This means that you can make sure that the garbage collector will
5790 not run during a certain portion of a Lisp program by calling the
5791 garbage collector explicitly just before it (provided that portion
5792 of the program does not use so much space as to force a second
5793 garbage collection).
5795 -- Command: garbage-collect
5796 This command runs a garbage collection, and returns information on
5797 the amount of space in use. (Garbage collection can also occur
5798 spontaneously if you use more than `gc-cons-threshold' bytes of
5799 Lisp data since the previous garbage collection.)
5801 `garbage-collect' returns a list containing the following
5804 ((USED-CONSES . FREE-CONSES)
5805 (USED-SYMS . FREE-SYMS)
5806 (USED-MARKERS . FREE-MARKERS)
5811 => ((73362 . 8325) (13718 . 164)
5812 (5089 . 5098) 949121 118677
5813 (conses-used 73362 conses-free 8329 cons-storage 658168
5814 symbols-used 13718 symbols-free 164 symbol-storage 335216
5815 bit-vectors-used 0 bit-vectors-total-length 0
5816 bit-vector-storage 0 vectors-used 7882
5817 vectors-total-length 118677 vector-storage 537764
5818 compiled-functions-used 1336 compiled-functions-free 37
5819 compiled-function-storage 44440 short-strings-used 28829
5820 long-strings-used 2 strings-free 7722
5821 short-strings-total-length 916657 short-string-storage 1179648
5822 long-strings-total-length 32464 string-header-storage 441504
5823 floats-used 3 floats-free 43 float-storage 2044 markers-used 5089
5824 markers-free 5098 marker-storage 245280 events-used 103
5825 events-free 835 event-storage 110656 extents-used 10519
5826 extents-free 2718 extent-storage 372736
5827 extent-auxiliarys-used 111 extent-auxiliarys-freed 3
5828 extent-auxiliary-storage 4440 window-configurations-used 39
5829 window-configurations-on-free-list 5
5830 window-configurations-freed 10 window-configuration-storage 9492
5831 popup-datas-used 3 popup-data-storage 72 toolbar-buttons-used 62
5832 toolbar-button-storage 4960 toolbar-datas-used 12
5833 toolbar-data-storage 240 symbol-value-buffer-locals-used 182
5834 symbol-value-buffer-local-storage 5824
5835 symbol-value-lisp-magics-used 22
5836 symbol-value-lisp-magic-storage 1496
5837 symbol-value-varaliases-used 43
5838 symbol-value-varalias-storage 1032 opaque-lists-used 2
5839 opaque-list-storage 48 color-instances-used 12
5840 color-instance-storage 288 font-instances-used 5
5841 font-instance-storage 180 opaques-used 11 opaque-storage 312
5842 range-tables-used 1 range-table-storage 16 faces-used 34
5843 face-storage 2584 glyphs-used 124 glyph-storage 4464
5844 specifiers-used 775 specifier-storage 43869 weak-lists-used 786
5845 weak-list-storage 18864 char-tables-used 40
5846 char-table-storage 41920 buffers-used 25 buffer-storage 7000
5847 extent-infos-used 457 extent-infos-freed 73
5848 extent-info-storage 9140 keymaps-used 275 keymap-storage 12100
5849 consoles-used 4 console-storage 384 command-builders-used 2
5850 command-builder-storage 120 devices-used 2 device-storage 344
5851 frames-used 3 frame-storage 624 image-instances-used 47
5852 image-instance-storage 3008 windows-used 27 windows-freed 2
5853 window-storage 9180 lcrecord-lists-used 15
5854 lcrecord-list-storage 360 hash-tables-used 631
5855 hash-table-storage 25240 streams-used 1 streams-on-free-list 3
5856 streams-freed 12 stream-storage 91))
5858 Here is a table explaining each element:
5861 The number of cons cells in use.
5864 The number of cons cells for which space has been obtained
5865 from the operating system, but that are not currently being
5869 The number of symbols in use.
5872 The number of symbols for which space has been obtained from
5873 the operating system, but that are not currently being used.
5876 The number of markers in use.
5879 The number of markers for which space has been obtained from
5880 the operating system, but that are not currently being used.
5883 The total size of all strings, in characters.
5886 The total number of elements of existing vectors.
5889 A list of alternating keyword/value pairs providing more
5890 detailed information. (As you can see above, quite a lot of
5891 information is provided.)
5893 -- User Option: gc-cons-threshold
5894 The value of this variable is the number of bytes of storage that
5895 must be allocated for Lisp objects after one garbage collection in
5896 order to trigger another garbage collection. A cons cell counts
5897 as eight bytes, a string as one byte per character plus a few
5898 bytes of overhead, and so on; space allocated to the contents of
5899 buffers does not count. Note that the subsequent garbage
5900 collection does not happen immediately when the threshold is
5901 exhausted, but only the next time the Lisp evaluator is called.
5903 The initial threshold value is 500,000. If you specify a larger
5904 value, garbage collection will happen less often. This reduces the
5905 amount of time spent garbage collecting, but increases total
5906 memory use. You may want to do this when running a program that
5907 creates lots of Lisp data.
5909 You can make collections more frequent by specifying a smaller
5910 value, down to 10,000. A value less than 10,000 will remain in
5911 effect only until the subsequent garbage collection, at which time
5912 `garbage-collect' will set the threshold back to 10,000. (This does
5913 not apply if XEmacs was configured with `--debug'. Therefore, be
5914 careful when setting `gc-cons-threshold' in that case!)
5916 -- Variable: pre-gc-hook
5917 This is a normal hook to be run just before each garbage
5918 collection. Interrupts, garbage collection, and errors are
5919 inhibited while this hook runs, so be extremely careful in what
5920 you add here. In particular, avoid consing, and do not interact
5923 -- Variable: post-gc-hook
5924 This is a normal hook to be run just after each garbage collection.
5925 Interrupts, garbage collection, and errors are inhibited while
5926 this hook runs, so be extremely careful in what you add here. In
5927 particular, avoid consing, and do not interact with the user.
5929 -- Variable: gc-message
5930 This is a string to print to indicate that a garbage collection is
5931 in progress. This is printed in the echo area. If the selected
5932 frame is on a window system and `gc-pointer-glyph' specifies a
5933 value (i.e. a pointer image instance) in the domain of the
5934 selected frame, the mouse cursor will change instead of this
5935 message being printed.
5937 -- Glyph: gc-pointer-glyph
5938 This holds the pointer glyph used to indicate that a garbage
5939 collection is in progress. If the selected window is on a window
5940 system and this glyph specifies a value (i.e. a pointer image
5941 instance) in the domain of the selected window, the cursor will be
5942 changed as specified during garbage collection. Otherwise, a
5943 message will be printed in the echo area, as controlled by
5944 `gc-message'. *Note Glyphs::.
5946 If XEmacs was configured with `--debug', you can set the following
5947 two variables to get direct information about all the allocation that
5948 is happening in a segment of Lisp code.
5950 -- Variable: debug-allocation
5951 If non-zero, print out information to stderr about all objects
5954 -- Variable: debug-allocation-backtrace
5955 Length (in stack frames) of short backtrace printed out by
5959 File: lispref.info, Node: Standard Errors, Next: Standard Buffer-Local Variables, Prev: Building XEmacs and Object Allocation, Up: Top
5961 Appendix C Standard Errors
5962 **************************
5964 Here is the complete list of the error symbols in standard Emacs,
5965 grouped by concept. The list includes each symbol's message (on the
5966 `error-message' property of the symbol) and a cross reference to a
5967 description of how the error can occur.
5969 Each error symbol has an `error-conditions' property that is a list
5970 of symbols. Normally this list includes the error symbol itself and
5971 the symbol `error'. Occasionally it includes additional symbols, which
5972 are intermediate classifications, narrower than `error' but broader
5973 than a single error symbol. For example, all the errors in accessing
5974 files have the condition `file-error'.
5976 As a special exception, the error symbol `quit' does not have the
5977 condition `error', because quitting is not considered an error.
5979 *Note Errors::, for an explanation of how errors are generated and
5994 `"Args out of range"'
5995 *Note Sequences Arrays Vectors::.
5998 `"Arithmetic error"'
5999 See `/' and `%' in *Note Numbers::.
6001 `beginning-of-buffer'
6002 `"Beginning of buffer"'
6006 `"Buffer is read-only"'
6007 *Note Read Only Buffers::.
6009 `cyclic-function-indirection'
6010 `"Symbol's chain of function indirections contains a loop"'
6011 *Note Function Indirection::.
6014 `"Arithmetic domain error"'
6020 `"End of file during parsing"'
6021 This is not a `file-error'.
6022 *Note Input Functions::.
6025 This error and its subcategories do not have error-strings,
6026 because the error message is constructed from the data items alone
6027 when the error condition `file-error' is present.
6031 This is a `file-error'.
6034 `file-already-exists'
6035 This is a `file-error'.
6036 *Note Writing to Files::.
6039 This is a `file-error'.
6040 *Note Modification Time::.
6043 `"Invalid byte code"'
6044 *Note Byte Compilation::.
6047 `"Invalid function"'
6048 *Note Classifying Lists::.
6050 `invalid-read-syntax'
6051 `"Invalid read syntax"'
6052 *Note Input Functions::.
6056 *Note Regular Expressions::.
6059 `"The mark is not active now"'
6061 `"No catch for tag"'
6062 *Note Catch and Throw::.
6065 `"Arithmetic overflow error"'
6067 `"Attempt to modify a protected field"'
6069 `"Arithmetic range error"'
6072 *Note Searching and Matching::.
6075 `"Attempt to set a constant symbol"'
6076 *Note Variables that Never Change: Constant Variables.
6079 `"Arithmetic singularity error"'
6082 *Note ToolTalk Support::.
6084 `undefined-keystroke-sequence'
6085 `"Undefined keystroke sequence"'
6087 `"Symbol's function definition is void"'
6088 *Note Function Cells::.
6091 `"Symbol's value as variable is void"'
6092 *Note Accessing Variables::.
6094 `wrong-number-of-arguments'
6095 `"Wrong number of arguments"'
6096 *Note Classifying Lists::.
6098 `wrong-type-argument'
6099 `"Wrong type argument"'
6100 *Note Type Predicates::.
6102 These error types, which are all classified as special cases of
6103 `arith-error', can occur on certain systems for invalid use of
6104 mathematical functions.
6107 `"Arithmetic domain error"'
6108 *Note Math Functions::.
6111 `"Arithmetic overflow error"'
6112 *Note Math Functions::.
6115 `"Arithmetic range error"'
6116 *Note Math Functions::.
6119 `"Arithmetic singularity error"'
6120 *Note Math Functions::.
6123 `"Arithmetic underflow error"'
6124 *Note Math Functions::.
6127 File: lispref.info, Node: Standard Buffer-Local Variables, Next: Standard Keymaps, Prev: Standard Errors, Up: Top
6129 Appendix D Buffer-Local Variables
6130 *********************************
6132 The table below lists the general-purpose Emacs variables that are
6133 automatically local (when set) in each buffer. Many Lisp packages
6134 define such variables for their internal use; we don't list them here.
6139 `auto-fill-function'
6140 *note Auto Filling::
6142 `buffer-auto-save-file-name'
6146 *note Backup Files::
6148 `buffer-display-table'
6149 *note Display Tables::
6151 `buffer-file-format'
6152 *note Format Conversion::
6155 *note Buffer File Name::
6157 `buffer-file-number'
6158 *note Buffer File Name::
6160 `buffer-file-truename'
6161 *note Buffer File Name::
6164 *note Files and MS-DOS::
6166 `buffer-invisibility-spec'
6167 *note Invisible Text::
6170 *note Saving Buffers::
6173 *note Read Only Buffers::
6181 `cache-long-line-scans'
6185 *note Searching and Case::
6188 *note Usual Display::
6191 *note Comments: (xemacs)Comments.
6194 *note System Environment::
6196 `defun-prompt-regexp'
6200 *note Auto Filling::
6203 *note Moving Point: (xemacs)Moving Point.
6208 `local-abbrev-table'
6211 `local-write-file-hooks'
6212 *note Saving Buffers::
6227 *note Modeline Data::
6229 `modeline-buffer-identification'
6230 *note Modeline Variables::
6233 *note Modeline Data::
6236 *note Modeline Variables::
6239 *note Modeline Variables::
6242 *note Modeline Variables::
6247 `paragraph-separate'
6248 *note Standard Regexps::
6251 *note Standard Regexps::
6253 `point-before-scroll'
6254 Used for communication between mouse commands and scroll-bar
6257 `require-final-newline'
6261 *note Selective Display::
6263 `selective-display-ellipses'
6264 *note Selective Display::
6267 *note Usual Display::
6273 *note Modeline Variables::
6276 File: lispref.info, Node: Standard Keymaps, Next: Standard Hooks, Prev: Standard Buffer-Local Variables, Up: Top
6278 Appendix E Standard Keymaps
6279 ***************************
6281 The following symbols are used as the names for various keymaps. Some
6282 of these exist when XEmacs is first started, others are loaded only
6283 when their respective mode is used. This is not an exhaustive list.
6285 Almost all of these maps are used as local maps. Indeed, of the
6286 modes that presently exist, only Vip mode and Terminal mode ever change
6290 A keymap containing bindings to bookmark functions.
6292 `Buffer-menu-mode-map'
6293 A keymap used by Buffer Menu mode.
6296 A keymap used by C++ mode.
6299 A keymap used by C mode. A sparse keymap used by C mode.
6301 `command-history-map'
6302 A keymap used by Command History mode.
6305 A keymap for subcommands of the prefix `C-x 4'.
6308 A keymap for subcommands of the prefix `C-x 5'.
6311 A keymap for `C-x' commands.
6314 A keymap used by Debugger mode.
6317 A keymap for `dired-mode' buffers.
6320 A keymap used in `edit-abbrevs'.
6322 `edit-tab-stops-map'
6323 A keymap used in `edit-tab-stops'.
6325 `electric-buffer-menu-mode-map'
6326 A keymap used by Electric Buffer Menu mode.
6328 `electric-history-map'
6329 A keymap used by Electric Command History mode.
6331 `emacs-lisp-mode-map'
6332 A keymap used by Emacs Lisp mode.
6335 A keymap for characters following the Help key.
6338 A keymap used by the help utility package.
6339 It has the same keymap in its value cell and in its function cell.
6342 A keymap used by the `e' command of Info.
6345 A keymap containing Info commands.
6348 A keymap that defines the characters you can type within
6352 A keymap used when in Itimer Edit mode.
6354 `lisp-interaction-mode-map'
6355 A keymap used by Lisp mode.
6358 A keymap used by Lisp mode.
6360 A keymap for minibuffer input with completion.
6362 `minibuffer-local-isearch-map'
6363 A keymap for editing isearch strings in the minibuffer.
6365 `minibuffer-local-map'
6366 Default keymap to use when reading from the minibuffer.
6368 `minibuffer-local-must-match-map'
6369 A keymap for minibuffer input with completion, for exact match.
6372 The keymap for characters following `C-c'. Note, this is in the
6373 global map. This map is not actually mode specific: its name was
6374 chosen to be informative for the user in `C-h b'
6375 (`display-bindings'), where it describes the main use of the `C-c'
6379 The keymap consulted for mouse-clicks on the modeline of a window.
6382 A keymap used in Objective C mode as a local map.
6385 A local keymap used by Occur mode.
6387 `overriding-local-map'
6388 A keymap that overrides all other local keymaps.
6391 A local keymap used for responses in `query-replace' and related
6392 commands; also for `y-or-n-p' and `map-y-or-n-p'. The functions
6393 that use this map do not support prefix keys; they look up one
6396 `read-expression-map'
6397 The minibuffer keymap used for reading Lisp expressions.
6399 `read-shell-command-map'
6400 The minibuffer keymap used by `shell-command' and related commands.
6402 `shared-lisp-mode-map'
6403 A keymap for commands shared by all sorts of Lisp modes.
6406 A keymap used by Text mode.
6409 The keymap consulted for mouse-clicks over a toolbar.
6412 A keymap used by View mode.
6415 File: lispref.info, Node: Standard Hooks, Next: Index, Prev: Standard Keymaps, Up: Top
6417 Appendix F Standard Hooks
6418 *************************
6420 The following is a list of hook variables that let you provide
6421 functions to be called from within Emacs on suitable occasions.
6423 Most of these variables have names ending with `-hook'. They are
6424 "normal hooks", run by means of `run-hooks'. The value of such a hook
6425 is a list of functions. The recommended way to put a new function on
6426 such a hook is to call `add-hook'. *Note Hooks::, for more information
6429 The variables whose names end in `-function' have single functions
6430 as their values. Usually there is a specific reason why the variable is
6431 not a normal hook, such as the need to pass arguments to the function.
6432 (In older Emacs versions, some of these variables had names ending in
6433 `-hook' even though they were not normal hooks.)
6435 The variables whose names end in `-hooks' or `-functions' have lists
6436 of functions as their values, but these functions are called in a
6437 special way (they are passed arguments, or else their values are used).
6439 `activate-menubar-hook'
6441 `activate-popup-menu-hook'
6443 `ad-definition-hooks'
6445 `adaptive-fill-function'
6447 `add-log-current-defun-function'
6449 `after-change-functions'
6451 `after-delete-annotation-hook'
6455 `after-insert-file-functions'
6461 `after-set-visited-file-name-hooks'
6463 `after-write-file-hooks'
6465 `auto-fill-function'
6469 `before-change-functions'
6471 `before-delete-annotation-hook'
6475 `before-revert-hook'
6477 `blink-paren-function'
6479 `buffers-menu-switch-to-buffer-function'
6485 `c-mode-common-hook'
6489 `c-special-indent-hook'
6491 `calendar-load-hook'
6493 `change-major-mode-hook'
6495 `command-history-hook'
6497 `comment-indent-function'
6499 `compilation-buffer-name-function'
6501 `compilation-exit-message-function'
6503 `compilation-finish-function'
6505 `compilation-parse-errors-function'
6507 `compilation-mode-hook'
6509 `create-console-hook'
6511 `create-device-hook'
6515 `dabbrev-friend-buffer-function'
6517 `dabbrev-select-buffers-function'
6519 `delete-console-hook'
6521 `delete-device-hook'
6525 `deselect-frame-hook'
6527 `diary-display-hook'
6531 `dired-after-readin-hook'
6533 `dired-before-readin-hook'
6539 `disabled-command-hook'
6541 `display-buffer-function'
6543 `ediff-after-setup-control-frame-hook'
6545 `ediff-after-setup-windows-hook'
6547 `ediff-before-setup-control-frame-hook'
6549 `ediff-before-setup-windows-hook'
6551 `ediff-brief-help-message-function'
6553 `ediff-cleanup-hook'
6555 `ediff-control-frame-position-function'
6557 `ediff-display-help-hook'
6559 `ediff-focus-on-regexp-matches-function'
6561 `ediff-forward-word-function'
6563 `ediff-hide-regexp-matches-function'
6565 `ediff-keymap-setup-hook'
6569 `ediff-long-help-message-function'
6571 `ediff-make-wide-display-function'
6573 `ediff-merge-split-window-function'
6575 `ediff-meta-action-function'
6577 `ediff-meta-redraw-function'
6581 `ediff-prepare-buffer-hook'
6585 `ediff-registry-setup-hook'
6589 `ediff-session-action-function'
6591 `ediff-session-group-setup-hook'
6593 `ediff-setup-diff-regions-function'
6595 `ediff-show-registry-hook'
6597 `ediff-show-session-group-hook'
6599 `ediff-skip-diff-region-function'
6601 `ediff-split-window-function'
6603 `ediff-startup-hook'
6605 `ediff-suspend-hook'
6607 `ediff-toggle-read-only-function'
6609 `ediff-unselect-hook'
6611 `ediff-window-setup-function'
6615 `electric-buffer-menu-mode-hook'
6617 `electric-command-history-hook'
6619 `electric-help-mode-hook'
6621 `emacs-lisp-mode-hook'
6623 `fill-paragraph-function'
6627 `find-file-not-found-hooks'
6631 `font-lock-after-fontify-buffer-hook'
6633 `font-lock-beginning-of-syntax-function'
6635 `font-lock-mode-hook'
6637 `fume-found-function-hook'
6639 `fume-list-mode-hook'
6641 `fume-rescan-buffer-hook'
6643 `fume-sort-function'
6647 `hack-local-variables-hook'
6649 `highlight-headers-follow-url-function'
6651 `hyper-apropos-mode-hook'
6653 `indent-line-function'
6657 `indent-region-function'
6659 `initial-calendar-window-hook'
6661 `isearch-mode-end-hook'
6669 `kill-buffer-query-functions'
6673 `kill-emacs-query-functions'
6683 `lisp-indent-function'
6685 `lisp-interaction-mode-hook'
6689 `list-diary-entries-hook'
6691 `load-read-function'
6693 `log-message-filter-function'
6697 `mail-citation-hook'
6703 `make-annotation-hook'
6705 `makefile-mode-hook'
6709 `mark-diary-entries-hook'
6713 `menu-no-selection-hook'
6715 `mh-compose-letter-hook'
6717 `mh-folder-mode-hook'
6719 `mh-letter-mode-hook'
6723 `minibuffer-exit-hook'
6725 `minibuffer-setup-hook'
6729 `mouse-enter-frame-hook'
6731 `mouse-leave-frame-hook'
6733 `mouse-track-cleanup-hook'
6735 `mouse-track-click-hook'
6737 `mouse-track-down-hook'
6739 `mouse-track-drag-hook'
6741 `mouse-track-drag-up-hook'
6743 `mouse-track-up-hook'
6745 `mouse-yank-function'
6749 `news-reply-mode-hook'
6753 `nongregorian-diary-listing-hook'
6755 `nongregorian-diary-marking-hook'
6765 `plain-TeX-mode-hook'
6771 `pre-abbrev-expand-hook'
6775 `pre-display-buffer-function'
6781 `print-diary-entries-hook'
6785 `protect-innocence-hook'
6787 `remove-message-hook'
6789 `revert-buffer-function'
6791 `revert-buffer-insert-contents-function'
6793 `rmail-edit-mode-hook'
6797 `rmail-retry-setup-hook'
6799 `rmail-summary-mode-hook'
6801 `scheme-indent-hook'
6809 `send-mail-function'
6813 `shell-set-directory-error-hook'
6815 `special-display-function'
6819 `suspend-resume-hook'
6821 `temp-buffer-show-function'
6825 `terminal-mode-hook'
6827 `terminal-mode-break-hook'
6835 `today-visible-calendar-hook'
6837 `today-invisible-calendar-hook'
6839 `tooltalk-message-handler-hook'
6841 `tooltalk-pattern-handler-hook'
6843 `tooltalk-unprocessed-message-hook'
6849 `vc-checkout-writable-buffer-hook'
6851 `vc-log-after-operation-hook'
6853 `vc-make-buffer-writable-hook'
6857 `vm-arrived-message-hook'
6859 `vm-arrived-messages-hook'
6861 `vm-chop-full-name-function'
6863 `vm-display-buffer-hook'
6865 `vm-edit-message-hook'
6867 `vm-forward-message-hook'
6869 `vm-iconify-frame-hook'
6871 `vm-inhibit-write-file-hook'
6879 `vm-menu-setup-hook'
6885 `vm-rename-current-buffer-function'
6889 `vm-resend-bounced-message-hook'
6891 `vm-resend-message-hook'
6893 `vm-retrieved-spooled-mail-hook'
6895 `vm-select-message-hook'
6897 `vm-select-new-message-hook'
6899 `vm-select-unread-message-hook'
6901 `vm-send-digest-hook'
6903 `vm-summary-mode-hook'
6905 `vm-summary-pointer-update-hook'
6907 `vm-summary-redo-hook'
6909 `vm-summary-update-hook'
6911 `vm-undisplay-buffer-hook'
6913 `vm-visit-folder-hook'
6917 `write-contents-hooks'
6919 `write-file-data-hooks'
6923 `write-region-annotate-functions'
6925 `x-lost-selection-hooks'
6927 `x-sent-selection-hooks'
6929 `zmacs-activate-region-hook'
6931 `zmacs-deactivate-region-hook'
6933 `zmacs-update-region-hook'