1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
4 INFO-DIR-SECTION XEmacs Editor
6 * Lispref: (lispref). XEmacs Lisp Reference Manual.
11 GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
21 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
22 Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc.
23 Copyright (C) 1995, 1996 Ben Wing.
25 Permission is granted to make and distribute verbatim copies of this
26 manual provided the copyright notice and this permission notice are
27 preserved on all copies.
29 Permission is granted to copy and distribute modified versions of
30 this manual under the conditions for verbatim copying, provided that the
31 entire resulting derived work is distributed under the terms of a
32 permission notice identical to this one.
34 Permission is granted to copy and distribute translations of this
35 manual into another language, under the above conditions for modified
36 versions, except that this permission notice may be stated in a
37 translation approved by the Foundation.
39 Permission is granted to copy and distribute modified versions of
40 this manual under the conditions for verbatim copying, provided also
41 that the section entitled "GNU General Public License" is included
42 exactly as in the original, and provided that the entire resulting
43 derived work is distributed under the terms of a permission notice
44 identical to this one.
46 Permission is granted to copy and distribute translations of this
47 manual into another language, under the above conditions for modified
48 versions, except that the section entitled "GNU General Public License"
49 may be included in a translation approved by the Free Software
50 Foundation instead of in the original English.
53 File: lispref.info, Node: Coding System Properties, Next: Basic Coding System Functions, Prev: EOL Conversion, Up: Coding Systems
55 Coding System Properties
56 ------------------------
59 String to be displayed in the modeline when this coding system is
63 End-of-line conversion to be used. It should be one of the types
64 listed in *Note EOL Conversion::.
67 The coding system which is the same as this one, except that it
68 uses the Unix line-breaking convention.
71 The coding system which is the same as this one, except that it
72 uses the DOS line-breaking convention.
75 The coding system which is the same as this one, except that it
76 uses the Macintosh line-breaking convention.
78 `post-read-conversion'
79 Function called after a file has been read in, to perform the
80 decoding. Called with two arguments, BEG and END, denoting a
81 region of the current buffer to be decoded.
83 `pre-write-conversion'
84 Function called before a file is written out, to perform the
85 encoding. Called with two arguments, BEG and END, denoting a
86 region of the current buffer to be encoded.
88 The following additional properties are recognized if TYPE is
95 The character set initially designated to the G0 - G3 registers.
96 The value should be one of
98 * A charset object (designate that character set)
100 * `nil' (do not ever use this register)
102 * `t' (no character set is initially designated to the
103 register, but may be later on; this automatically sets the
104 corresponding `force-g*-on-output' property)
110 If non-`nil', send an explicit designation sequence on output
111 before using the specified register.
114 If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
115 B' on output in place of the full designation sequences `ESC $ (
116 @', `ESC $ ( A', and `ESC $ ( B'.
119 If non-`nil', don't designate ASCII to G0 at each end of line on
120 output. Setting this to non-`nil' also suppresses other
121 state-resetting that normally happens at the end of a line.
124 If non-`nil', don't designate ASCII to G0 before control chars on
128 If non-`nil', use 7-bit environment on output. Otherwise, use
132 If non-`nil', use locking-shift (SO/SI) instead of single-shift or
133 designation by escape sequence.
136 If non-`nil', don't use ISO6429's direction specification.
139 If non-nil, literal control characters that are the same as the
140 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
141 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
142 (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
143 that they can be properly distinguished from an escape sequence.
144 (Note that doing this results in a non-portable encoding.) This
145 encoding flag is used for byte-compiled files. Note that ESC is a
146 good choice for a quoting character because there are no escape
147 sequences whose second byte is a character from the Control-0 or
148 Control-1 character sets; this is explicitly disallowed by the ISO
151 `input-charset-conversion'
152 A list of conversion specifications, specifying conversion of
153 characters in one charset to another when decoding is performed.
154 Each specification is a list of two elements: the source charset,
155 and the destination charset.
157 `output-charset-conversion'
158 A list of conversion specifications, specifying conversion of
159 characters in one charset to another when encoding is performed.
160 The form of each specification is the same as for
161 `input-charset-conversion'.
163 The following additional properties are recognized (and required) if
167 CCL program used for decoding (converting to internal format).
170 CCL program used for encoding (converting to external format).
172 The following properties are used internally: EOL-CR, EOL-CRLF,
176 File: lispref.info, Node: Basic Coding System Functions, Next: Coding System Property Functions, Prev: Coding System Properties, Up: Coding Systems
178 Basic Coding System Functions
179 -----------------------------
181 - Function: find-coding-system coding-system-or-name
182 This function retrieves the coding system of the given name.
184 If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
185 returned. Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
186 If there is no such coding system, `nil' is returned. Otherwise
187 the associated coding system object is returned.
189 - Function: get-coding-system name
190 This function retrieves the coding system of the given name. Same
191 as `find-coding-system' except an error is signalled if there is no
192 such coding system instead of returning `nil'.
194 - Function: coding-system-list
195 This function returns a list of the names of all defined coding
198 - Function: coding-system-name coding-system
199 This function returns the name of the given coding system.
201 - Function: coding-system-base coding-system
202 Returns the base coding system (undecided EOL convention) coding
205 - Function: make-coding-system name type &optional doc-string props
206 This function registers symbol NAME as a coding system.
208 TYPE describes the conversion method used and should be one of the
209 types listed in *Note Coding System Types::.
211 DOC-STRING is a string describing the coding system.
213 PROPS is a property list, describing the specific nature of the
214 character set. Recognized properties are as in *Note Coding
217 - Function: copy-coding-system old-coding-system new-name
218 This function copies OLD-CODING-SYSTEM to NEW-NAME. If NEW-NAME
219 does not name an existing coding system, a new one will be created.
221 - Function: subsidiary-coding-system coding-system eol-type
222 This function returns the subsidiary coding system of
223 CODING-SYSTEM with eol type EOL-TYPE.
226 File: lispref.info, Node: Coding System Property Functions, Next: Encoding and Decoding Text, Prev: Basic Coding System Functions, Up: Coding Systems
228 Coding System Property Functions
229 --------------------------------
231 - Function: coding-system-doc-string coding-system
232 This function returns the doc string for CODING-SYSTEM.
234 - Function: coding-system-type coding-system
235 This function returns the type of CODING-SYSTEM.
237 - Function: coding-system-property coding-system prop
238 This function returns the PROP property of CODING-SYSTEM.
241 File: lispref.info, Node: Encoding and Decoding Text, Next: Detection of Textual Encoding, Prev: Coding System Property Functions, Up: Coding Systems
243 Encoding and Decoding Text
244 --------------------------
246 - Function: decode-coding-region start end coding-system &optional
248 This function decodes the text between START and END which is
249 encoded in CODING-SYSTEM. This is useful if you've read in
250 encoded text from a file without decoding it (e.g. you read in a
251 JIS-formatted file but used the `binary' or `no-conversion' coding
252 system, so that it shows up as `^[$B!<!+^[(B'). The length of the
253 encoded text is returned. BUFFER defaults to the current buffer
256 - Function: encode-coding-region start end coding-system &optional
258 This function encodes the text between START and END using
259 CODING-SYSTEM. This will, for example, convert Japanese
260 characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
261 encoding. The length of the encoded text is returned. BUFFER
262 defaults to the current buffer if unspecified.
265 File: lispref.info, Node: Detection of Textual Encoding, Next: Big5 and Shift-JIS Functions, Prev: Encoding and Decoding Text, Up: Coding Systems
267 Detection of Textual Encoding
268 -----------------------------
270 - Function: coding-category-list
271 This function returns a list of all recognized coding categories.
273 - Function: set-coding-priority-list list
274 This function changes the priority order of the coding categories.
275 LIST should be a list of coding categories, in descending order of
276 priority. Unspecified coding categories will be lower in priority
277 than all specified ones, in the same relative order they were in
280 - Function: coding-priority-list
281 This function returns a list of coding categories in descending
284 - Function: set-coding-category-system coding-category coding-system
285 This function changes the coding system associated with a coding
288 - Function: coding-category-system coding-category
289 This function returns the coding system associated with a coding
292 - Function: detect-coding-region start end &optional buffer
293 This function detects coding system of the text in the region
294 between START and END. Returned value is a list of possible coding
295 systems ordered by priority. If only ASCII characters are found,
296 it returns `autodetect' or one of its subsidiary coding systems
297 according to a detected end-of-line type. Optional arg BUFFER
298 defaults to the current buffer.
301 File: lispref.info, Node: Big5 and Shift-JIS Functions, Next: Predefined Coding Systems, Prev: Detection of Textual Encoding, Up: Coding Systems
303 Big5 and Shift-JIS Functions
304 ----------------------------
306 These are special functions for working with the non-standard
307 Shift-JIS and Big5 encodings.
309 - Function: decode-shift-jis-char code
310 This function decodes a JIS X 0208 character of Shift-JIS
311 coding-system. CODE is the character code in Shift-JIS as a cons
312 of type bytes. The corresponding character is returned.
314 - Function: encode-shift-jis-char ch
315 This function encodes a JIS X 0208 character CH to SHIFT-JIS
316 coding-system. The corresponding character code in SHIFT-JIS is
317 returned as a cons of two bytes.
319 - Function: decode-big5-char code
320 This function decodes a Big5 character CODE of BIG5 coding-system.
321 CODE is the character code in BIG5. The corresponding character
324 - Function: encode-big5-char ch
325 This function encodes the Big5 character CHAR to BIG5
326 coding-system. The corresponding character code in Big5 is
330 File: lispref.info, Node: Predefined Coding Systems, Prev: Big5 and Shift-JIS Functions, Up: Coding Systems
332 Coding Systems Implemented
333 --------------------------
335 MULE initializes most of the commonly used coding systems at XEmacs's
336 startup. A few others are initialized only when the relevant language
337 environment is selected and support libraries are loaded. (NB: The
338 following list is based on XEmacs 21.2.19, the development branch at the
339 time of writing. The list may be somewhat different for other
340 versions. Recent versions of GNU Emacs 20 implement a few more rare
341 coding systems; work is being done to port these to XEmacs.)
343 Unfortunately, there is not a consistent naming convention for
344 character sets, and for practical purposes coding systems often take
345 their name from their principal character sets (ASCII, KOI8-R, Shift
346 JIS). Others take their names from the coding system (ISO-2022-JP,
347 EUC-KR), and a few from their non-text usages (internal, binary). To
348 provide for this, and for the fact that many coding systems have
349 several common names, an aliasing system is provided. Finally, some
350 effort has been made to use names that are registered as MIME charsets
351 (this is why the name 'shift_jis contains that un-Lisp-y underscore).
353 There is a systematic naming convention regarding end-of-line (EOL)
354 conventions for different systems. A coding system whose name ends in
355 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
356 A coding system whose name ends in "-mac" forces the assumptions that
357 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
358 in "-dos" forces the assumptions that lines are broken by CRLF sequences
359 (0x0D 0x0A). These subsidiary coding systems are automatically derived
360 from a base coding system. Use of the base coding system implies
361 autodetection of the text file convention. (The fact that the -unix,
362 -mac, and -dos are derived from a base system results in them showing up
363 as "aliases" in `list-coding-systems'.) These subsidiaries have a
364 consistent modeline indicator as well. "-dos" coding systems have ":T"
365 appended to their modeline indicator, while "-mac" coding systems have
366 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
368 In the following table, each coding system is given with its mode
369 line indicator in parentheses. Non-textual coding systems are listed
370 first, followed by textual coding systems and their aliases. (The
371 coding system subsidiary modeline indicators ":T" and ":t" will be
372 omitted from the table of coding systems.)
374 ### SJT 1999-08-23 Maybe should order these by language? Definitely
375 need language usage for the ISO-8859 family.
377 Note that although true coding system aliases have been implemented
378 for XEmacs 21.2, the coding system initialization has not yet been
379 converted as of 21.2.19. So coding systems described as aliases have
380 the same properties as the aliased coding system, but will not be equal
383 `automatic-conversion'
388 Modeline indicator: `Auto'. A type `undecided' coding system.
389 Attempts to determine an appropriate coding system from file
390 contents or the environment.
400 Modeline indicator: `Raw'. A type `no-conversion' coding system,
401 which converts only line-break-codes. An implementation quirk
402 means that this coding system is also used for ISO8859-1.
405 Modeline indicator: `Binary'. A type `no-conversion' coding
406 system which does no character coding or EOL conversions. An
407 alias for `raw-text-unix'.
413 Modeline indicator: `Cy.Alt'. A type `ccl' coding system used for
414 Alternativnyj, an encoding of the Cyrillic alphabet.
420 Modeline indicator: `Zh/Big5'. A type `big5' coding system used
421 for BIG5, the most common encoding of traditional Chinese as used
428 Modeline indicator: `Zh-GB/EUC'. A type `iso2022' coding system
429 used for simplified Chinese (as used in the People's Republic of
430 China), with the `ascii' (G0), `chinese-gb2312' (G1), and `sisheng'
431 (G2) character sets initially designated. Chinese EUC (Extended
438 Modeline indicator: `CText/Hbrw'. A type `iso2022' coding system
439 with the `ascii' (G0) and `hebrew-iso8859-8' (G1) character sets
440 initially designated for Hebrew.
446 Modeline indicator: `CText'. A type `iso2022' 8-bit coding system
447 with the `ascii' (G0) and `latin-iso8859-1' (G1) character sets
448 initially designated. X11 Compound Text Encoding. Often
449 mistakenly recognized instead of EUC encodings; usual cause is
450 inappropriate setting of `coding-priority-list'.
453 Modeline indicator: `ESC/Quot'. A type `iso2022' 8-bit coding
454 system with the `ascii' (G0) and `latin-iso8859-1' (G1) character
455 sets initially designated and escape quoting. Unix EOL conversion
456 (ie, no conversion). It is used for .ELC files.
462 Modeline indicator: `Ja/EUC'. A type `iso2022' 8-bit coding system
463 with `ascii' (G0), `japanese-jisx0208' (G1), `katakana-jisx0201'
464 (G2), and `japanese-jisx0212' (G3) initially designated. Japanese
465 EUC (Extended Unix Code).
471 Modeline indicator: `ko/EUC'. A type `iso2022' 8-bit coding system
472 with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
473 Korean EUC (Extended Unix Code).
476 Modeline indicator: `Zh-GB/Hz'. A type `no-conversion' coding
477 system with Unix EOL convention (ie, no conversion) using
478 post-read-decode and pre-write-encode functions to translate the
479 Hz/ZW coding system used for Chinese.
486 Modeline indicator: `ISO7'. A type `iso2022' 7-bit coding system
487 with `ascii' (G0) initially designated. Other character sets must
488 be explicitly designated to be used.
491 `iso-2022-7bit-ss2-dos'
492 `iso-2022-7bit-ss2-mac'
493 `iso-2022-7bit-ss2-unix'
494 Modeline indicator: `ISO7/SS'. A type `iso2022' 7-bit coding
495 system with `ascii' (G0) initially designated. Other character
496 sets must be explicitly designated to be used. SS2 is used to
497 invoke a 96-charset, one character at a time.
503 Modeline indicator: `ISO8'. A type `iso2022' 8-bit coding system
504 with `ascii' (G0) and `latin-iso8859-1' (G1) initially designated.
505 Other character sets must be explicitly designated to be used.
506 No single-shift or locking-shift.
509 `iso-2022-8bit-ss2-dos'
510 `iso-2022-8bit-ss2-mac'
511 `iso-2022-8bit-ss2-unix'
512 Modeline indicator: `ISO8/SS'. A type `iso2022' 8-bit coding
513 system with `ascii' (G0) and `latin-iso8859-1' (G1) initially
514 designated. Other character sets must be explicitly designated to
515 be used. SS2 is used to invoke a 96-charset, one character at a
521 `iso-2022-int-1-unix'
522 Modeline indicator: `INT-1'. A type `iso2022' 7-bit coding system
523 with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
526 `iso-2022-jp-1978-irv'
527 `iso-2022-jp-1978-irv-dos'
528 `iso-2022-jp-1978-irv-mac'
529 `iso-2022-jp-1978-irv-unix'
530 Modeline indicator: `Ja-78/7bit'. A type `iso2022' 7-bit coding
531 system. For compatibility with old Japanese terminals; if you
532 need to know, look at the source.
535 `iso-2022-jp-2 (ISO7/SS)'
542 Modeline indicator: `MULE/7bit'. A type `iso2022' 7-bit coding
543 system with `ascii' (G0) initially designated, and complex
544 specifications to insure backward compatibility with old Japanese
545 systems. Used for communication with mail and news in Japan. The
546 "-2" versions also use SS2 to invoke a 96-charset one character at
550 Modeline indicator: `Ko/7bit' A type `iso2022' 7-bit coding
551 system with `ascii' (G0) and `korean-ksc5601' (G1) initially
552 designated. Used for e-mail in Korea.
558 Modeline indicator: `ISO7/Lock'. A type `iso2022' 7-bit coding
559 system with `ascii' (G0) initially designated, using Locking-Shift
560 to invoke a 96-charset.
566 Due to implementation, this is not a type `iso2022' coding system,
567 but rather an alias for the `raw-text' coding system.
573 Modeline indicator: `MIME/Ltn-2'. A type `iso2022' coding system
574 with `ascii' (G0) and `latin-iso8859-2' (G1) initially invoked.
580 Modeline indicator: `MIME/Ltn-3'. A type `iso2022' coding system
581 with `ascii' (G0) and `latin-iso8859-3' (G1) initially invoked.
587 Modeline indicator: `MIME/Ltn-4'. A type `iso2022' coding system
588 with `ascii' (G0) and `latin-iso8859-4' (G1) initially invoked.
594 Modeline indicator: `ISO8/Cyr'. A type `iso2022' coding system
595 with `ascii' (G0) and `cyrillic-iso8859-5' (G1) initially invoked.
601 Modeline indicator: `Grk'. A type `iso2022' coding system with
602 `ascii' (G0) and `greek-iso8859-7' (G1) initially invoked.
608 Modeline indicator: `MIME/Hbrw'. A type `iso2022' coding system
609 with `ascii' (G0) and `hebrew-iso8859-8' (G1) initially invoked.
615 Modeline indicator: `MIME/Ltn-5'. A type `iso2022' coding system
616 with `ascii' (G0) and `latin-iso8859-9' (G1) initially invoked.
622 Modeline indicator: `KOI8'. A type `ccl' coding-system used for
623 KOI8-R, an encoding of the Cyrillic alphabet.
629 Modeline indicator: `Ja/SJIS'. A type `shift-jis' coding-system
630 implementing the Shift-JIS encoding for Japanese. The underscore
631 is to conform to the MIME charset implementing this encoding.
637 Modeline indicator: `TIS620'. A type `ccl' encoding for Thai. The
638 external encoding is defined by TIS620, the internal encoding is
639 peculiar to MULE, and called `thai-xtis'.
642 Modeline indicator: `VIQR'. A type `no-conversion' coding system
643 with Unix EOL convention (ie, no conversion) using
644 post-read-decode and pre-write-encode functions to translate the
645 VIQR coding system for Vietnamese.
651 Modeline indicator: `VISCII'. A type `ccl' coding-system used for
652 VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
653 given priority by XEmacs.
659 Modeline indicator: `VSCII'. A type `ccl' coding-system used for
660 VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
661 given priority by XEmacs. Use `(prefer-coding-system
662 'vietnamese-vscii)' to give priority to VSCII.
665 File: lispref.info, Node: CCL, Next: Category Tables, Prev: Coding Systems, Up: MULE
670 CCL (Code Conversion Language) is a simple structured programming
671 language designed for character coding conversions. A CCL program is
672 compiled to CCL code (represented by a vector of integers) and executed
673 by the CCL interpreter embedded in Emacs. The CCL interpreter
674 implements a virtual machine with 8 registers called `r0', ..., `r7', a
675 number of control structures, and some I/O operators. Take care when
676 using registers `r0' (used in implicit "set" statements) and especially
677 `r7' (used internally by several statements and operations, especially
678 for multiple return values and I/O operations).
680 CCL is used for code conversion during process I/O and file I/O for
681 non-ISO2022 coding systems. (It is the only way for a user to specify a
682 code conversion function.) It is also used for calculating the code
683 point of an X11 font from a character code. However, since CCL is
684 designed as a powerful programming language, it can be used for more
685 generic calculation where efficiency is demanded. A combination of
686 three or more arithmetic operations can be calculated faster by CCL than
689 *Warning:* The code in `src/mule-ccl.c' and
690 `$packages/lisp/mule-base/mule-ccl.el' is the definitive description of
691 CCL's semantics. The previous version of this section contained
692 several typos and obsolete names left from earlier versions of MULE,
693 and many may remain. (I am not an experienced CCL programmer; the few
694 who know CCL well find writing English painful.)
696 A CCL program transforms an input data stream into an output data
697 stream. The input stream, held in a buffer of constant bytes, is left
698 unchanged. The buffer may be filled by an external input operation,
699 taken from an Emacs buffer, or taken from a Lisp string. The output
700 buffer is a dynamic array of bytes, which can be written by an external
701 output operation, inserted into an Emacs buffer, or returned as a Lisp
704 A CCL program is a (Lisp) list containing two or three members. The
705 first member is the "buffer magnification", which indicates the
706 required minimum size of the output buffer as a multiple of the input
707 buffer. It is followed by the "main block" which executes while there
708 is input remaining, and an optional "EOF block" which is executed when
709 the input is exhausted. Both the main block and the EOF block are CCL
712 A "CCL block" is either a CCL statement or list of CCL statements.
713 A "CCL statement" is either a "set statement" (either an integer or an
714 "assignment", which is a list of a register to receive the assignment,
715 an assignment operator, and an expression) or a "control statement" (a
716 list starting with a keyword, whose allowable syntax depends on the
721 * CCL Syntax:: CCL program syntax in BNF notation.
722 * CCL Statements:: Semantics of CCL statements.
723 * CCL Expressions:: Operators and expressions in CCL.
724 * Calling CCL:: Running CCL programs.
725 * CCL Examples:: The encoding functions for Big5 and KOI-8.
728 File: lispref.info, Node: CCL Syntax, Next: CCL Statements, Up: CCL
733 The full syntax of a CCL program in BNF notation:
736 (BUFFER_MAGNIFICATION
740 BUFFER_MAGNIFICATION := integer
741 CCL_MAIN_BLOCK := CCL_BLOCK
742 CCL_EOF_BLOCK := CCL_BLOCK
745 STATEMENT | (STATEMENT [STATEMENT ...])
747 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
752 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
755 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
757 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
758 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
759 LOOP := (loop STATEMENT [STATEMENT ...])
763 | (write-repeat [REG | integer | string])
764 | (write-read-repeat REG [integer | ARRAY])
767 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
768 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
772 | (write integer) | (write string) | (write REG ARRAY)
774 CALL := (call ccl-program-name)
777 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
780 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
781 | < | > | == | <= | >= | != | de-sjis | en-sjis
782 ASSIGNMENT_OPERATOR :=
783 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
784 ARRAY := '[' integer ... ']'
787 File: lispref.info, Node: CCL Statements, Next: CCL Expressions, Prev: CCL Syntax, Up: CCL
792 The Emacs Code Conversion Language provides the following statement
793 types: "set", "if", "branch", "loop", "repeat", "break", "read",
794 "write", "call", and "end".
799 The "set" statement has three variants with the syntaxes `(REG =
800 EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'.
801 The assignment operator variation of the "set" statement works the same
802 way as the corresponding C expression statement does. The assignment
803 operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=',
804 and `>>=', and they have the same meanings as in C. A "naked integer"
805 INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'.
810 The "read" statement takes one or more registers as arguments. It
811 reads one byte (a C char) from the input into each register in turn.
813 The "write" takes several forms. In the form `(write REG ...)' it
814 takes one or more registers as arguments and writes each in turn to the
815 output. The integer in a register (interpreted as an Emchar) is
816 encoded to multibyte form (ie, Bufbytes) and written to the current
817 output buffer. If it is less than 256, it is written as is. The forms
818 `(write EXPRESSION)' and `(write INTEGER)' are treated analogously.
819 The form `(write STRING)' writes the constant string to the output. A
820 "naked string" `STRING' is equivalent to the statement `(write
821 STRING)'. The form `(write REG ARRAY)' writes the REGth element of the
824 Conditional statements:
825 =======================
827 The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional
828 SECOND CCL BLOCK as arguments. If the EXPRESSION evaluates to
829 non-zero, the first CCL BLOCK is executed. Otherwise, if there is a
830 SECOND CCL BLOCK, it is executed.
832 The "read-if" variant of the "if" statement takes an EXPRESSION, a
833 CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The
834 EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is
835 a register or an integer). The `read-if' statement first reads from
836 the input into the first register operand in the EXPRESSION, then
837 conditionally executes a CCL block just as the `if' statement does.
839 The "branch" statement takes an EXPRESSION and one or more CCL
840 blocks as arguments. The CCL blocks are treated as a zero-indexed
841 array, and the `branch' statement uses the EXPRESSION as the index of
842 the CCL block to execute. Null CCL blocks may be used as no-ops,
843 continuing execution with the statement following the `branch'
844 statement in the containing CCL block. Out-of-range values for the
845 EXPRESSION are also treated as no-ops.
847 The "read-branch" variant of the "branch" statement takes an
848 REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.
849 The `read-branch' statement first reads from the input into the
850 REGISTER, then conditionally executes a CCL block just as the `branch'
853 Loop control statements:
854 ========================
856 The "loop" statement creates a block with an implied jump from the
857 end of the block back to its head. The loop is exited on a `break'
858 statement, and continued without executing the tail by a `repeat'
861 The "break" statement, written `(break)', terminates the current
862 loop and continues with the next statement in the current block.
864 The "repeat" statement has three variants, `repeat', `write-repeat',
865 and `write-read-repeat'. Each continues the current loop from its
866 head, possibly after performing I/O. `repeat' takes no arguments and
867 does no I/O before jumping. `write-repeat' takes a single argument (a
868 register, an integer, or a string), writes it to the output, then jumps.
869 `write-read-repeat' takes one or two arguments. The first must be a
870 register. The second may be an integer or an array; if absent, it is
871 implicitly set to the first (register) argument. `write-read-repeat'
872 writes its second argument to the output, then reads from the input
873 into the register, and finally jumps. See the `write' and `read'
874 statements for the semantics of the I/O operations for each type of
877 Other control statements:
878 =========================
880 The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a
881 CCL program as a subroutine. It does not return a value to the caller,
882 but can modify the register status.
884 The "end" statement, written `(end)', terminates the CCL program
885 successfully, and returns to caller (which may be a CCL program). It
886 does not alter the status of the registers.
889 File: lispref.info, Node: CCL Expressions, Next: Calling CCL, Prev: CCL Statements, Up: CCL
894 CCL, unlike Lisp, uses infix expressions. The simplest CCL
895 expressions consist of a single OPERAND, either a register (one of `r0',
896 ..., `r0') or an integer. Complex expressions are lists of the form `(
897 EXPRESSION OPERATOR OPERAND )'. Unlike C, assignments are not
900 In the following table, X is the target resister for a "set". In
901 subexpressions, this is implicitly `r7'. This means that `>8', `//',
902 `de-sjis', and `en-sjis' cannot be used freely in subexpressions, since
903 they return parts of their values in `r7'. Y may be an expression,
904 register, or integer, while Z must be a register or an integer.
906 Name Operator Code C-like Description
907 CCL_PLUS `+' 0x00 X = Y + Z
908 CCL_MINUS `-' 0x01 X = Y - Z
909 CCL_MUL `*' 0x02 X = Y * Z
910 CCL_DIV `/' 0x03 X = Y / Z
911 CCL_MOD `%' 0x04 X = Y % Z
912 CCL_AND `&' 0x05 X = Y & Z
913 CCL_OR `|' 0x06 X = Y | Z
914 CCL_XOR `^' 0x07 X = Y ^ Z
915 CCL_LSH `<<' 0x08 X = Y << Z
916 CCL_RSH `>>' 0x09 X = Y >> Z
917 CCL_LSH8 `<8' 0x0A X = (Y << 8) | Z
918 CCL_RSH8 `>8' 0x0B X = Y >> 8, r[7] = Y & 0xFF
919 CCL_DIVMOD `//' 0x0C X = Y / Z, r[7] = Y % Z
920 CCL_LS `<' 0x10 X = (X < Y)
921 CCL_GT `>' 0x11 X = (X > Y)
922 CCL_EQ `==' 0x12 X = (X == Y)
923 CCL_LE `<=' 0x13 X = (X <= Y)
924 CCL_GE `>=' 0x14 X = (X >= Y)
925 CCL_NE `!=' 0x15 X = (X != Y)
926 CCL_ENCODE_SJIS `en-sjis' 0x16 X = HIGHER_BYTE (SJIS (Y, Z))
927 r[7] = LOWER_BYTE (SJIS (Y, Z)
928 CCL_DECODE_SJIS `de-sjis' 0x17 X = HIGHER_BYTE (DE-SJIS (Y, Z))
929 r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
931 The CCL operators are as in C, with the addition of CCL_LSH8,
932 CCL_RSH8, CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The
933 CCL_ENCODE_SJIS and CCL_DECODE_SJIS treat their first and second bytes
934 as the high and low bytes of a two-byte character code. (SJIS stands
935 for Shift JIS, an encoding of Japanese characters used by Microsoft.
936 CCL_ENCODE_SJIS is a complicated transformation of the Japanese
937 standard JIS encoding to Shift JIS. CCL_DECODE_SJIS is its inverse.)
938 It is somewhat odd to represent the SJIS operations in infix form.
941 File: lispref.info, Node: Calling CCL, Next: CCL Examples, Prev: CCL Expressions, Up: CCL
946 CCL programs are called automatically during Emacs buffer I/O when
947 the external representation has a coding system type of `shift-jis',
948 `big5', or `ccl'. The program is specified by the coding system (*note
949 Coding Systems::). You can also call CCL programs from other CCL
950 programs, and from Lisp using these functions:
952 - Function: ccl-execute ccl-program status
953 Execute CCL-PROGRAM with registers initialized by STATUS.
954 CCL-PROGRAM is a vector of compiled CCL code created by
955 `ccl-compile'. It is an error for the program to try to execute a
956 CCL I/O command. STATUS must be a vector of nine values,
957 specifying the initial value for the R0, R1 .. R7 registers and
958 for the instruction counter IC. A `nil' value for a register
959 initializer causes the register to be set to 0. A `nil' value for
960 the IC initializer causes execution to start at the beginning of
961 the program. When the program is done, STATUS is modified (by
962 side-effect) to contain the ending values for the corresponding
965 - Function: ccl-execute-on-string ccl-program status str &optional
967 Execute CCL-PROGRAM with initial STATUS on STRING. CCL-PROGRAM is
968 a vector of compiled CCL code created by `ccl-compile'. STATUS
969 must be a vector of nine values, specifying the initial value for
970 the R0, R1 .. R7 registers and for the instruction counter IC. A
971 `nil' value for a register initializer causes the register to be
972 set to 0. A `nil' value for the IC initializer causes execution
973 to start at the beginning of the program. An optional fourth
974 argument CONTINUE, if non-nil, causes the IC to remain on the
975 unsatisfied read operation if the program terminates due to
976 exhaustion of the input buffer. Otherwise the IC is set to the end
977 of the program. When the program is done, STATUS is modified (by
978 side-effect) to contain the ending values for the corresponding
979 registers and IC. Returns the resulting string.
981 To call a CCL program from another CCL program, it must first be
984 - Function: register-ccl-program name ccl-program
985 Register NAME for CCL program PROGRAM in `ccl-program-table'.
986 PROGRAM should be the compiled form of a CCL program, or nil.
987 Return index number of the registered CCL program.
989 Information about the processor time used by the CCL interpreter can
990 be obtained using these functions:
992 - Function: ccl-elapsed-time
993 Returns the elapsed processor time of the CCL interpreter as cons
994 of user and system time, as floating point numbers measured in
995 seconds. If only one overall value can be determined, the return
996 value will be a cons of that value and 0.
998 - Function: ccl-reset-elapsed-time
999 Resets the CCL interpreter's internal elapsed time registers.
1002 File: lispref.info, Node: CCL Examples, Prev: Calling CCL, Up: CCL
1007 This section is not yet written.
1010 File: lispref.info, Node: Category Tables, Prev: CCL, Up: MULE
1015 A category table is a type of char table used for keeping track of
1016 categories. Categories are used for classifying characters for use in
1017 regexps--you can refer to a category rather than having to use a
1018 complicated [] expression (and category lookups are significantly
1021 There are 95 different categories available, one for each printable
1022 character (including space) in the ASCII charset. Each category is
1023 designated by one such character, called a "category designator". They
1024 are specified in a regexp using the syntax `\cX', where X is a category
1025 designator. (This is not yet implemented.)
1027 A category table specifies, for each character, the categories that
1028 the character is in. Note that a character can be in more than one
1029 category. More specifically, a category table maps from a character to
1030 either the value `nil' (meaning the character is in no categories) or a
1031 95-element bit vector, specifying for each of the 95 categories whether
1032 the character is in that category.
1034 Special Lisp functions are provided that abstract this, so you do not
1035 have to directly manipulate bit vectors.
1037 - Function: category-table-p obj
1038 This function returns `t' if ARG is a category table.
1040 - Function: category-table &optional buffer
1041 This function returns the current category table. This is the one
1042 specified by the current buffer, or by BUFFER if it is non-`nil'.
1044 - Function: standard-category-table
1045 This function returns the standard category table. This is the
1046 one used for new buffers.
1048 - Function: copy-category-table &optional table
1049 This function constructs a new category table and return it. It
1050 is a copy of the TABLE, which defaults to the standard category
1053 - Function: set-category-table table &optional buffer
1054 This function selects a new category table for BUFFER. One
1055 argument, a category table. BUFFER defaults to the current buffer
1058 - Function: category-designator-p obj
1059 This function returns `t' if ARG is a category designator (a char
1060 in the range `' '' to `'~'').
1062 - Function: category-table-value-p obj
1063 This function returns `t' if ARG is a category table value. Valid
1064 values are `nil' or a bit vector of size 95.
1067 File: lispref.info, Node: Tips, Next: Building XEmacs and Object Allocation, Prev: MULE, Up: Top
1072 This chapter describes no additional features of XEmacs Lisp.
1073 Instead it gives advice on making effective use of the features
1074 described in the previous chapters.
1078 * Style Tips:: Writing clean and robust programs.
1079 * Compilation Tips:: Making compiled code run fast.
1080 * Documentation Tips:: Writing readable documentation strings.
1081 * Comment Tips:: Conventions for writing comments.
1082 * Library Headers:: Standard headers for library packages.