This is ../info/lispref.info, produced by makeinfo version 4.0 from lispref/lispref.texi. INFO-DIR-SECTION XEmacs Editor START-INFO-DIR-ENTRY * Lispref: (lispref). XEmacs Lisp Reference Manual. END-INFO-DIR-ENTRY Edition History: GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May, November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998 Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software Foundation, Inc. Copyright (C) 1994, 1995 Sun Microsystems, Inc. Copyright (C) 1995, 1996 Ben Wing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the section entitled "GNU General Public License" is included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the section entitled "GNU General Public License" may be included in a translation approved by the Free Software Foundation instead of in the original English.  File: lispref.info, Node: Coding System Types, Next: ISO 2022, Up: Coding Systems Coding System Types ------------------- The coding system type determines the basic algorithm XEmacs will use to decode or encode a data stream. Character encodings will be converted to the MULE encoding, escape sequences processed, and newline sequences converted to XEmacs's internal representation. There are three basic classes of coding system type: no-conversion, ISO-2022, and special. No conversion allows you to look at the file's internal representation. Since XEmacs is basically a text editor, "no conversion" does convert newline conventions by default. (Use the 'binary coding-system if this is not desired.) ISO 2022 (*note ISO 2022::) is the basic international standard regulating use of "coded character sets for the exchange of data", ie, text streams. ISO 2022 contains functions that make it possible to encode text streams to comply with restrictions of the Internet mail system and de facto restrictions of most file systems (eg, use of the separator character in file names). Coding systems which are not ISO 2022 conformant can be difficult to handle. Perhaps more important, they are not adaptable to multilingual information interchange, with the obvious exception of ISO 10646 (Unicode). (Unicode is partially supported by XEmacs with the addition of the Lisp package ucs-conv.) The special class of coding systems includes automatic detection, CCL (a "little language" embedded as an interpreter, useful for translating between variants of a single character set), non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5, and MULE internal coding. (NB: this list is based on XEmacs 21.2. Terminology may vary slightly for other versions of XEmacs and for GNU Emacs 20.) `no-conversion' No conversion, for binary files, and a few special cases of non-ISO-2022 coding systems where conversion is done by hook functions (usually implemented in CCL). On output, graphic characters that are not in ASCII or Latin-1 will be replaced by a `?'. (For a no-conversion-encoded buffer, these characters will only be present if you explicitly insert them.) `iso2022' Any ISO-2022-compliant encoding. Among others, this includes JIS (the Japanese encoding commonly used for e-mail), national variants of EUC (the standard Unix encoding for Japanese and other languages), and Compound Text (an encoding used in X11). You can specify more specific information about the conversion with the FLAGS argument. `ucs-4' ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode. `utf-8' ISO 10646 UTF-8 encoding. A "file system safe" transformation format that can be used with both UCS-4 and Unicode. `undecided' Automatic conversion. XEmacs attempts to detect the coding system used in the file. `shift-jis' Shift-JIS (a Japanese encoding commonly used in PC operating systems). `big5' Big5 (the encoding commonly used for Taiwanese). `ccl' The conversion is performed using a user-written pseudo-code program. CCL (Code Conversion Language) is the name of this pseudo-code. For example, CCL is used to map KOI8-R characters (an encoding for Russian Cyrillic) to ISO8859-5 (the form used internally by MULE). `internal' Write out or read in the raw contents of the memory representing the buffer's text. This is primarily useful for debugging purposes, and is only enabled when XEmacs has been compiled with `DEBUG_XEMACS' set (the `--debug' configure option). *Warning*: Reading in a file using `internal' conversion can result in an internal inconsistency in the memory representing a buffer's text, which will produce unpredictable results and may cause XEmacs to crash. Under normal circumstances you should never use `internal' conversion.  File: lispref.info, Node: ISO 2022, Next: EOL Conversion, Prev: Coding System Types, Up: Coding Systems ISO 2022 ======== This section briefly describes the ISO 2022 encoding standard. A more thorough treatment is available in the original document of ISO 2022 as well as various national standards (such as JIS X 0202). Character sets ("charsets") are classified into the following four categories, according to the number of characters in the charset: 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means that although an ISO 2022 coding system may have variable width characters, each charset used is fixed-width (in contrast to the MULE character set and UTF-8, for example). ISO 2022 provides for switching between character sets via escape sequences. This switching is somewhat complicated, because ISO 2022 provides for both legacy applications like Internet mail that accept only 7 significant bits in some contexts (RFC 822 headers, for example), and more modern "8-bit clean" applications. It also provides for compact and transparent representation of languages like Japanese which mix ASCII and a national script (even outside of computer programs). First, ISO 2022 codified prevailing practice by dividing the code space into "control" and "graphic" regions. The code points 0x00-0x1F and 0x80-0x9F are reserved for "control characters", while "graphic characters" must be assigned to code points in the regions 0x20-0x7F and 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some circumstances must be assigned the graphic character "ASCII SPACE" and the control character "ASCII DEL" respectively. The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left" and "graphic right", respectively, because of the standard method of displaying graphic character sets in tables with the high byte indexing columns and the low byte indexing rows. I don't find it very intuitive, but these are called "registers". An ISO 2022-conformant encoding for a graphic character set must use a fixed number of bytes per character, and the values must fit into a single register; that is, each byte must range over either 0x20-0x7F, or 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a character set by using both ranges at the same. This is why a standard character set such as ISO 8859-1 is actually considered by ISO 2022 to be an aggregation of two character sets, ASCII and LATIN-1, and why it is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a single character's bytes must all be drawn from the same register; this is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO 2022-compatible encodings. The reason for this restriction becomes clear when you attempt to define an efficient, robust encoding for a language like Japanese. Like ISO 8859, Japanese encodings are aggregations of several character sets. In practice, the vast majority of characters are drawn from the "JIS Roman" character set (a derivative of ASCII; it won't hurt to think of it as ASCII) and the JIS X 0208 standard "basic Japanese" character set including not only ideographic characters ("kanji") but syllabic Japanese characters ("kana"), a wide variety of symbols, and many alphabetic characters (Roman, Greek, and Cyrillic) as well. Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not suited to programming; thus the inclusion of ASCII in the standard Japanese encodings. For normal Japanese text such as in newspapers, a broad repertoire of approximately 3000 characters is used. Evidently this won't fit into one byte; two must be used. But much of the text processed by Japanese computers is computer source code, nearly all of which is ASCII. A not insignificant portion of ordinary text is English (as such or as borrowed Japanese vocabulary) or other languages which can represented at least approximately in ASCII, as well. It seems reasonable then to represent ASCII in one byte, and JIS X 0208 in two. And this is exactly what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is invoked to the GL register, and JIS X 0208 is invoked to the GR register. Thus, each byte can be tested for its character set by looking at the high bit; if set, it is Japanese, if clear, it is ASCII. Furthermore, since control characters like newline can never be part of a graphic character, even in the case of corruption in transmission the stream will be resynchronized at every line break, on the order of 60-80 bytes. This coding system requires no escape sequences or special control codes to represent 99.9% of all Japanese text. Note carefully the distinction between the character sets (ASCII and JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The JIS X 0208 character set is used in three different encodings for Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is always clear), in EUC-JP it is invoked into GR (setting the high bit in the process), and in Shift JIS the high bit may be set or reset, and the significant bits are shifted within the 16-bit character so that the two main character sets can coexist with a third (the "halfwidth katakana" of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a version of the ISO-2022 coding system. In order to systematically treat subsidiary character sets (like the "halfwidth katakana" already mentioned, and the "supplementary kanji" of JIS X 0212), four further registers are defined: G0, G1, G2, and G3. Unlike GL and GR, they are not logically distinguished by internal format. Instead, the process of "invocation" mentioned earlier is broken into two steps: first, a character set is "designated" to one of the registers G0-G3 by use of an "escape sequence" of the form: ESC [I] I F where I is an intermediate character or characters in the range 0x20 - 0x3F, and F, from the range 0x30-0x7Fm is the final character identifying this charset. (Final characters in the range 0x30-0x3F are reserved for private use and will never have a publically registered meaning.) Then that register is "invoked" to either GL or GR, either automatically (designations to G0 normally involve invocation to GL as well), or by use of shifting (affecting only the following character in the data stream) or locking (effective until the next designation or locking) control sequences. An encoding conformant to ISO 2022 is typically defined by designating the initial contents of the G0-G3 registers, specifying an 7 or 8 bit environment, and specifying whether further designations will be recognized. Some examples of character sets and the registered final characters F used to designate them: 94-charset ASCII (B), left (J) and right (I) half of JIS X 0201, ... 96-charset Latin-1 (A), Latin-2 (B), Latin-3 (C), ... 94x94-charset GB2312 (A), JIS X 0208 (B), KSC5601 (C), ... 96x96-charset none for the moment The meanings of the various characters in these sequences, where not specified by the ISO 2022 standard (such as the ESC character), are assigned by "ECMA", the European Computer Manufacturers Association. The meaning of intermediate characters are: $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96). ( [0x28]: designate to G0 a 94-charset whose final byte is F. ) [0x29]: designate to G1 a 94-charset whose final byte is F. * [0x2A]: designate to G2 a 94-charset whose final byte is F. + [0x2B]: designate to G3 a 94-charset whose final byte is F. , [0x2C]: designate to G0 a 96-charset whose final byte is F. - [0x2D]: designate to G1 a 96-charset whose final byte is F. . [0x2E]: designate to G2 a 96-charset whose final byte is F. / [0x2F]: designate to G3 a 96-charset whose final byte is F. The comma may be used in files read and written only by MULE, as a MULE extension, but this is illegal in ISO 2022. (The reason is that in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned the value SPACE, and 0x7F assigned the value DEL.) Here are examples of designations: ESC ( B : designate to G0 ASCII ESC - A : designate to G1 Latin-1 ESC $ ( A or ESC $ A : designate to G0 GB2312 ESC $ ( B or ESC $ B : designate to G0 JISX0208 ESC $ ) C : designate to G1 KSC5601 (The short forms used to designate GB2312 and JIS X 0208 are for backwards compatibility; the long forms are preferred.) To use a charset designated to G2 or G3, and to use a charset designated to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3 into GL. There are two types of invocation, Locking Shift (forever) and Single Shift (one character only). Locking Shift is done as follows: LS0 or SI (0x0F): invoke G0 into GL LS1 or SO (0x0E): invoke G1 into GL LS2: invoke G2 into GL LS3: invoke G3 into GL LS1R: invoke G1 into GR LS2R: invoke G2 into GR LS3R: invoke G3 into GR Single Shift is done as follows: SS2 or ESC N: invoke G2 into GL SS3 or ESC O: invoke G3 into GL The shift functions (such as LS1R and SS3) are represented by control characters (from C1) in 8 bit environments and by escape sequences in 7 bit environments. (#### Ben says: I think the above is slightly incorrect. It appears that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and ESC O behave as indicated. The above definitions will not parse EUC-encoded text correctly, and it looks like the code in mule-coding.c has similar problems.) Evidently there are a lot of ISO-2022-compliant ways of encoding multilingual text. Now, in the world, there exist many coding systems such as X11's Compound Text, Japanese JUNET code, and so-called EUC (Extended UNIX Code); all of these are variants of ISO 2022. In MULE, we characterize a version of ISO 2022 by the following attributes: 1. The character sets initially designated to G0 thru G3. 2. Whether short form designations are allowed for Japanese and Chinese. 3. Whether ASCII should be designated to G0 before control characters. 4. Whether ASCII should be designated to G0 at the end of line. 5. 7-bit environment or 8-bit environment. 6. Whether Locking Shifts are used or not. 7. Whether to use ASCII or the variant JIS X 0201-1976-Roman. 8. Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976. (The last two are only for Japanese.) By specifying these attributes, you can create any variant of ISO 2022. Here are several examples: ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check). 1. G0 <- ASCII, G1..3 <- never used 2. Yes. 3. Yes. 4. Yes. 5. 7-bit environment 6. No. 7. Use ASCII 8. Use JIS X 0208-1983 ctext -- X11 Compound Text 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used. 2. No. 3. No. 4. Yes. 5. 8-bit environment. 6. No. 7. Use ASCII. 8. Use JIS X 0208-1983. euc-china -- Chinese EUC. Often called the "GB encoding", but that is technically incorrect. 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used. 2. No. 3. Yes. 4. Yes. 5. 8-bit environment. 6. No. 7. Use ASCII. 8. Use JIS X 0208-1983. ISO-2022-KR -- Coding system used in Korean email. 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used. 2. No. 3. Yes. 4. Yes. 5. 7-bit environment. 6. Yes. 7. Use ASCII. 8. Use JIS X 0208-1983. MULE creates all of these coding systems by default.  File: lispref.info, Node: EOL Conversion, Next: Coding System Properties, Prev: ISO 2022, Up: Coding Systems EOL Conversion -------------- `nil' Automatically detect the end-of-line type (LF, CRLF, or CR). Also generate subsidiary coding systems named `NAME-unix', `NAME-dos', and `NAME-mac', that are identical to this coding system but have an EOL-TYPE value of `lf', `crlf', and `cr', respectively. `lf' The end of a line is marked externally using ASCII LF. Since this is also the way that XEmacs represents an end-of-line internally, specifying this option results in no end-of-line conversion. This is the standard format for Unix text files. `crlf' The end of a line is marked externally using ASCII CRLF. This is the standard format for MS-DOS text files. `cr' The end of a line is marked externally using ASCII CR. This is the standard format for Macintosh text files. `t' Automatically detect the end-of-line type but do not generate subsidiary coding systems. (This value is converted to `nil' when stored internally, and `coding-system-property' will return `nil'.)  File: lispref.info, Node: Coding System Properties, Next: Basic Coding System Functions, Prev: EOL Conversion, Up: Coding Systems Coding System Properties ------------------------ `mnemonic' String to be displayed in the modeline when this coding system is active. `eol-type' End-of-line conversion to be used. It should be one of the types listed in *Note EOL Conversion::. `eol-lf' The coding system which is the same as this one, except that it uses the Unix line-breaking convention. `eol-crlf' The coding system which is the same as this one, except that it uses the DOS line-breaking convention. `eol-cr' The coding system which is the same as this one, except that it uses the Macintosh line-breaking convention. `post-read-conversion' Function called after a file has been read in, to perform the decoding. Called with two arguments, BEG and END, denoting a region of the current buffer to be decoded. `pre-write-conversion' Function called before a file is written out, to perform the encoding. Called with two arguments, BEG and END, denoting a region of the current buffer to be encoded. The following additional properties are recognized if TYPE is `iso2022': `charset-g0' `charset-g1' `charset-g2' `charset-g3' The character set initially designated to the G0 - G3 registers. The value should be one of * A charset object (designate that character set) * `nil' (do not ever use this register) * `t' (no character set is initially designated to the register, but may be later on; this automatically sets the corresponding `force-g*-on-output' property) `force-g0-on-output' `force-g1-on-output' `force-g2-on-output' `force-g3-on-output' If non-`nil', send an explicit designation sequence on output before using the specified register. `short' If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $ B' on output in place of the full designation sequences `ESC $ ( @', `ESC $ ( A', and `ESC $ ( B'. `no-ascii-eol' If non-`nil', don't designate ASCII to G0 at each end of line on output. Setting this to non-`nil' also suppresses other state-resetting that normally happens at the end of a line. `no-ascii-cntl' If non-`nil', don't designate ASCII to G0 before control chars on output. `seven' If non-`nil', use 7-bit environment on output. Otherwise, use 8-bit environment. `lock-shift' If non-`nil', use locking-shift (SO/SI) instead of single-shift or designation by escape sequence. `no-iso6429' If non-`nil', don't use ISO6429's direction specification. `escape-quoted' If non-nil, literal control characters that are the same as the beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F), and CSI (0x9B)) are "quoted" with an escape character so that they can be properly distinguished from an escape sequence. (Note that doing this results in a non-portable encoding.) This encoding flag is used for byte-compiled files. Note that ESC is a good choice for a quoting character because there are no escape sequences whose second byte is a character from the Control-0 or Control-1 character sets; this is explicitly disallowed by the ISO 2022 standard. `input-charset-conversion' A list of conversion specifications, specifying conversion of characters in one charset to another when decoding is performed. Each specification is a list of two elements: the source charset, and the destination charset. `output-charset-conversion' A list of conversion specifications, specifying conversion of characters in one charset to another when encoding is performed. The form of each specification is the same as for `input-charset-conversion'. The following additional properties are recognized (and required) if TYPE is `ccl': `decode' CCL program used for decoding (converting to internal format). `encode' CCL program used for encoding (converting to external format). The following properties are used internally: EOL-CR, EOL-CRLF, EOL-LF, and BASE.  File: lispref.info, Node: Basic Coding System Functions, Next: Coding System Property Functions, Prev: Coding System Properties, Up: Coding Systems Basic Coding System Functions ----------------------------- - Function: find-coding-system coding-system-or-name This function retrieves the coding system of the given name. If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply returned. Otherwise, CODING-SYSTEM-OR-NAME should be a symbol. If there is no such coding system, `nil' is returned. Otherwise the associated coding system object is returned. - Function: get-coding-system name This function retrieves the coding system of the given name. Same as `find-coding-system' except an error is signalled if there is no such coding system instead of returning `nil'. - Function: coding-system-list This function returns a list of the names of all defined coding systems. - Function: coding-system-name coding-system This function returns the name of the given coding system. - Function: coding-system-base coding-system Returns the base coding system (undecided EOL convention) coding system. - Function: make-coding-system name type &optional doc-string props This function registers symbol NAME as a coding system. TYPE describes the conversion method used and should be one of the types listed in *Note Coding System Types::. DOC-STRING is a string describing the coding system. PROPS is a property list, describing the specific nature of the character set. Recognized properties are as in *Note Coding System Properties::. - Function: copy-coding-system old-coding-system new-name This function copies OLD-CODING-SYSTEM to NEW-NAME. If NEW-NAME does not name an existing coding system, a new one will be created. - Function: subsidiary-coding-system coding-system eol-type This function returns the subsidiary coding system of CODING-SYSTEM with eol type EOL-TYPE.  File: lispref.info, Node: Coding System Property Functions, Next: Encoding and Decoding Text, Prev: Basic Coding System Functions, Up: Coding Systems Coding System Property Functions -------------------------------- - Function: coding-system-doc-string coding-system This function returns the doc string for CODING-SYSTEM. - Function: coding-system-type coding-system This function returns the type of CODING-SYSTEM. - Function: coding-system-property coding-system prop This function returns the PROP property of CODING-SYSTEM.  File: lispref.info, Node: Encoding and Decoding Text, Next: Detection of Textual Encoding, Prev: Coding System Property Functions, Up: Coding Systems Encoding and Decoding Text -------------------------- - Function: decode-coding-region start end coding-system &optional buffer This function decodes the text between START and END which is encoded in CODING-SYSTEM. This is useful if you've read in encoded text from a file without decoding it (e.g. you read in a JIS-formatted file but used the `binary' or `no-conversion' coding system, so that it shows up as `^[$B!> | <8 | >8 | // | < | > | == | <= | >= | != | de-sjis | en-sjis ASSIGNMENT_OPERATOR := += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= ARRAY := '[' integer ... ']'  File: lispref.info, Node: CCL Statements, Next: CCL Expressions, Prev: CCL Syntax, Up: CCL CCL Statements -------------- The Emacs Code Conversion Language provides the following statement types: "set", "if", "branch", "loop", "repeat", "break", "read", "write", "call", and "end". Set statement: ============== The "set" statement has three variants with the syntaxes `(REG = EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'. The assignment operator variation of the "set" statement works the same way as the corresponding C expression statement does. The assignment operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=', and `>>=', and they have the same meanings as in C. A "naked integer" INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'. I/O statements: =============== The "read" statement takes one or more registers as arguments. It reads one byte (a C char) from the input into each register in turn. The "write" takes several forms. In the form `(write REG ...)' it takes one or more registers as arguments and writes each in turn to the output. The integer in a register (interpreted as an Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the current output buffer. If it is less than 256, it is written as is. The forms `(write EXPRESSION)' and `(write INTEGER)' are treated analogously. The form `(write STRING)' writes the constant string to the output. A "naked string" `STRING' is equivalent to the statement `(write STRING)'. The form `(write REG ARRAY)' writes the REGth element of the ARRAY to the output. Conditional statements: ======================= The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. If the EXPRESSION evaluates to non-zero, the first CCL BLOCK is executed. Otherwise, if there is a SECOND CCL BLOCK, it is executed. The "read-if" variant of the "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is a register or an integer). The `read-if' statement first reads from the input into the first register operand in the EXPRESSION, then conditionally executes a CCL block just as the `if' statement does. The "branch" statement takes an EXPRESSION and one or more CCL blocks as arguments. The CCL blocks are treated as a zero-indexed array, and the `branch' statement uses the EXPRESSION as the index of the CCL block to execute. Null CCL blocks may be used as no-ops, continuing execution with the statement following the `branch' statement in the containing CCL block. Out-of-range values for the EXPRESSION are also treated as no-ops. The "read-branch" variant of the "branch" statement takes an REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments. The `read-branch' statement first reads from the input into the REGISTER, then conditionally executes a CCL block just as the `branch' statement does. Loop control statements: ======================== The "loop" statement creates a block with an implied jump from the end of the block back to its head. The loop is exited on a `break' statement, and continued without executing the tail by a `repeat' statement. The "break" statement, written `(break)', terminates the current loop and continues with the next statement in the current block. The "repeat" statement has three variants, `repeat', `write-repeat', and `write-read-repeat'. Each continues the current loop from its head, possibly after performing I/O. `repeat' takes no arguments and does no I/O before jumping. `write-repeat' takes a single argument (a register, an integer, or a string), writes it to the output, then jumps. `write-read-repeat' takes one or two arguments. The first must be a register. The second may be an integer or an array; if absent, it is implicitly set to the first (register) argument. `write-read-repeat' writes its second argument to the output, then reads from the input into the register, and finally jumps. See the `write' and `read' statements for the semantics of the I/O operations for each type of argument. Other control statements: ========================= The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a CCL program as a subroutine. It does not return a value to the caller, but can modify the register status. The "end" statement, written `(end)', terminates the CCL program successfully, and returns to caller (which may be a CCL program). It does not alter the status of the registers.