+ - Function: decode-big5-char code
+ This function decodes a Big5 character CODE of BIG5 coding-system.
+ CODE is the character code in BIG5. The corresponding character
+ is returned.
+
+ - Function: encode-big5-char character
+ This function encodes the Big5 character CHARACTER to BIG5
+ coding-system. The corresponding character code in Big5 is
+ returned.
+
+\1f
+File: lispref.info, Node: Predefined Coding Systems, Prev: Big5 and Shift-JIS Functions, Up: Coding Systems
+
+Coding Systems Implemented
+--------------------------
+
+ MULE initializes most of the commonly used coding systems at XEmacs's
+startup. A few others are initialized only when the relevant language
+environment is selected and support libraries are loaded. (NB: The
+following list is based on XEmacs 21.2.19, the development branch at the
+time of writing. The list may be somewhat different for other
+versions. Recent versions of GNU Emacs 20 implement a few more rare
+coding systems; work is being done to port these to XEmacs.)
+
+ Unfortunately, there is not a consistent naming convention for
+character sets, and for practical purposes coding systems often take
+their name from their principal character sets (ASCII, KOI8-R, Shift
+JIS). Others take their names from the coding system (ISO-2022-JP,
+EUC-KR), and a few from their non-text usages (internal, binary). To
+provide for this, and for the fact that many coding systems have
+several common names, an aliasing system is provided. Finally, some
+effort has been made to use names that are registered as MIME charsets
+(this is why the name 'shift_jis contains that un-Lisp-y underscore).
+
+ There is a systematic naming convention regarding end-of-line (EOL)
+conventions for different systems. A coding system whose name ends in
+"-unix" forces the assumptions that lines are broken by newlines (0x0A).
+A coding system whose name ends in "-mac" forces the assumptions that
+lines are broken by ASCII CRs (0x0D). A coding system whose name ends
+in "-dos" forces the assumptions that lines are broken by CRLF sequences
+(0x0D 0x0A). These subsidiary coding systems are automatically derived
+from a base coding system. Use of the base coding system implies
+autodetection of the text file convention. (The fact that the -unix,
+-mac, and -dos are derived from a base system results in them showing up
+as "aliases" in `list-coding-systems'.) These subsidiaries have a
+consistent modeline indicator as well. "-dos" coding systems have ":T"
+appended to their modeline indicator, while "-mac" coding systems have
+":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
+
+ In the following table, each coding system is given with its mode
+line indicator in parentheses. Non-textual coding systems are listed
+first, followed by textual coding systems and their aliases. (The
+coding system subsidiary modeline indicators ":T" and ":t" will be
+omitted from the table of coding systems.)
+
+ ### SJT 1999-08-23 Maybe should order these by language? Definitely
+need language usage for the ISO-8859 family.
+
+ Note that although true coding system aliases have been implemented
+for XEmacs 21.2, the coding system initialization has not yet been
+converted as of 21.2.19. So coding systems described as aliases have
+the same properties as the aliased coding system, but will not be equal
+as Lisp objects.
+
+`automatic-conversion'
+`undecided'
+`undecided-dos'
+`undecided-mac'
+`undecided-unix'
+ Modeline indicator: `Auto'. A type `undecided' coding system.
+ Attempts to determine an appropriate coding system from file
+ contents or the environment.
+
+`raw-text'
+`no-conversion'
+`raw-text-dos'
+`raw-text-mac'
+`raw-text-unix'
+`no-conversion-dos'
+`no-conversion-mac'
+`no-conversion-unix'
+ Modeline indicator: `Raw'. A type `no-conversion' coding system,
+ which converts only line-break-codes. An implementation quirk
+ means that this coding system is also used for ISO8859-1.
+
+`binary'
+ Modeline indicator: `Binary'. A type `no-conversion' coding
+ system which does no character coding or EOL conversions. An
+ alias for `raw-text-unix'.
+
+`alternativnyj'
+`alternativnyj-dos'
+`alternativnyj-mac'
+`alternativnyj-unix'
+ Modeline indicator: `Cy.Alt'. A type `ccl' coding system used for
+ Alternativnyj, an encoding of the Cyrillic alphabet.
+
+`big5'
+`big5-dos'
+`big5-mac'
+`big5-unix'
+ Modeline indicator: `Zh/Big5'. A type `big5' coding system used
+ for BIG5, the most common encoding of traditional Chinese as used
+ in Taiwan.
+
+`cn-gb-2312'
+`cn-gb-2312-dos'
+`cn-gb-2312-mac'
+`cn-gb-2312-unix'
+ Modeline indicator: `Zh-GB/EUC'. A type `iso2022' coding system
+ used for simplified Chinese (as used in the People's Republic of
+ China), with the `ascii' (G0), `chinese-gb2312' (G1), and `sisheng'
+ (G2) character sets initially designated. Chinese EUC (Extended
+ Unix Code).
+
+`ctext-hebrew'
+`ctext-hebrew-dos'
+`ctext-hebrew-mac'
+`ctext-hebrew-unix'
+ Modeline indicator: `CText/Hbrw'. A type `iso2022' coding system
+ with the `ascii' (G0) and `hebrew-iso8859-8' (G1) character sets
+ initially designated for Hebrew.
+
+`ctext'
+`ctext-dos'
+`ctext-mac'
+`ctext-unix'
+ Modeline indicator: `CText'. A type `iso2022' 8-bit coding system
+ with the `ascii' (G0) and `latin-iso8859-1' (G1) character sets
+ initially designated. X11 Compound Text Encoding. Often
+ mistakenly recognized instead of EUC encodings; usual cause is
+ inappropriate setting of `coding-priority-list'.
+
+`escape-quoted'
+ Modeline indicator: `ESC/Quot'. A type `iso2022' 8-bit coding
+ system with the `ascii' (G0) and `latin-iso8859-1' (G1) character
+ sets initially designated and escape quoting. Unix EOL conversion
+ (ie, no conversion). It is used for .ELC files.
+
+`euc-jp'
+`euc-jp-dos'
+`euc-jp-mac'
+`euc-jp-unix'
+ Modeline indicator: `Ja/EUC'. A type `iso2022' 8-bit coding system
+ with `ascii' (G0), `japanese-jisx0208' (G1), `katakana-jisx0201'
+ (G2), and `japanese-jisx0212' (G3) initially designated. Japanese
+ EUC (Extended Unix Code).
+
+`euc-kr'
+`euc-kr-dos'
+`euc-kr-mac'
+`euc-kr-unix'
+ Modeline indicator: `ko/EUC'. A type `iso2022' 8-bit coding system
+ with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
+ Korean EUC (Extended Unix Code).
+
+`hz-gb-2312'
+ Modeline indicator: `Zh-GB/Hz'. A type `no-conversion' coding
+ system with Unix EOL convention (ie, no conversion) using
+ post-read-decode and pre-write-encode functions to translate the
+ Hz/ZW coding system used for Chinese.
+
+`iso-2022-7bit'
+`iso-2022-7bit-unix'
+`iso-2022-7bit-dos'
+`iso-2022-7bit-mac'
+`iso-2022-7'
+ Modeline indicator: `ISO7'. A type `iso2022' 7-bit coding system
+ with `ascii' (G0) initially designated. Other character sets must
+ be explicitly designated to be used.
+
+`iso-2022-7bit-ss2'
+`iso-2022-7bit-ss2-dos'
+`iso-2022-7bit-ss2-mac'
+`iso-2022-7bit-ss2-unix'
+ Modeline indicator: `ISO7/SS'. A type `iso2022' 7-bit coding
+ system with `ascii' (G0) initially designated. Other character
+ sets must be explicitly designated to be used. SS2 is used to
+ invoke a 96-charset, one character at a time.
+
+`iso-2022-8'
+`iso-2022-8-dos'
+`iso-2022-8-mac'
+`iso-2022-8-unix'
+ Modeline indicator: `ISO8'. A type `iso2022' 8-bit coding system
+ with `ascii' (G0) and `latin-iso8859-1' (G1) initially designated.
+ Other character sets must be explicitly designated to be used.
+ No single-shift or locking-shift.
+
+`iso-2022-8bit-ss2'
+`iso-2022-8bit-ss2-dos'
+`iso-2022-8bit-ss2-mac'
+`iso-2022-8bit-ss2-unix'
+ Modeline indicator: `ISO8/SS'. A type `iso2022' 8-bit coding
+ system with `ascii' (G0) and `latin-iso8859-1' (G1) initially
+ designated. Other character sets must be explicitly designated to
+ be used. SS2 is used to invoke a 96-charset, one character at a
+ time.
+
+`iso-2022-int-1'
+`iso-2022-int-1-dos'
+`iso-2022-int-1-mac'
+`iso-2022-int-1-unix'
+ Modeline indicator: `INT-1'. A type `iso2022' 7-bit coding system
+ with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
+ ISO-2022-INT-1.
+
+`iso-2022-jp-1978-irv'
+`iso-2022-jp-1978-irv-dos'
+`iso-2022-jp-1978-irv-mac'
+`iso-2022-jp-1978-irv-unix'
+ Modeline indicator: `Ja-78/7bit'. A type `iso2022' 7-bit coding
+ system. For compatibility with old Japanese terminals; if you
+ need to know, look at the source.
+
+`iso-2022-jp'
+`iso-2022-jp-2 (ISO7/SS)'
+`iso-2022-jp-dos'
+`iso-2022-jp-mac'
+`iso-2022-jp-unix'
+`iso-2022-jp-2-dos'
+`iso-2022-jp-2-mac'
+`iso-2022-jp-2-unix'
+ Modeline indicator: `MULE/7bit'. A type `iso2022' 7-bit coding
+ system with `ascii' (G0) initially designated, and complex
+ specifications to insure backward compatibility with old Japanese
+ systems. Used for communication with mail and news in Japan. The
+ "-2" versions also use SS2 to invoke a 96-charset one character at
+ a time.
+
+`iso-2022-kr'
+ Modeline indicator: `Ko/7bit' A type `iso2022' 7-bit coding
+ system with `ascii' (G0) and `korean-ksc5601' (G1) initially
+ designated. Used for e-mail in Korea.
+
+`iso-2022-lock'
+`iso-2022-lock-dos'
+`iso-2022-lock-mac'
+`iso-2022-lock-unix'
+ Modeline indicator: `ISO7/Lock'. A type `iso2022' 7-bit coding
+ system with `ascii' (G0) initially designated, using Locking-Shift
+ to invoke a 96-charset.
+
+`iso-8859-1'
+`iso-8859-1-dos'
+`iso-8859-1-mac'
+`iso-8859-1-unix'
+ Due to implementation, this is not a type `iso2022' coding system,
+ but rather an alias for the `raw-text' coding system.
+
+`iso-8859-2'
+`iso-8859-2-dos'
+`iso-8859-2-mac'
+`iso-8859-2-unix'
+ Modeline indicator: `MIME/Ltn-2'. A type `iso2022' coding system
+ with `ascii' (G0) and `latin-iso8859-2' (G1) initially invoked.
+
+`iso-8859-3'
+`iso-8859-3-dos'
+`iso-8859-3-mac'
+`iso-8859-3-unix'
+ Modeline indicator: `MIME/Ltn-3'. A type `iso2022' coding system
+ with `ascii' (G0) and `latin-iso8859-3' (G1) initially invoked.
+
+`iso-8859-4'
+`iso-8859-4-dos'
+`iso-8859-4-mac'
+`iso-8859-4-unix'
+ Modeline indicator: `MIME/Ltn-4'. A type `iso2022' coding system
+ with `ascii' (G0) and `latin-iso8859-4' (G1) initially invoked.
+
+`iso-8859-5'
+`iso-8859-5-dos'
+`iso-8859-5-mac'
+`iso-8859-5-unix'
+ Modeline indicator: `ISO8/Cyr'. A type `iso2022' coding system
+ with `ascii' (G0) and `cyrillic-iso8859-5' (G1) initially invoked.
+
+`iso-8859-7'
+`iso-8859-7-dos'
+`iso-8859-7-mac'
+`iso-8859-7-unix'
+ Modeline indicator: `Grk'. A type `iso2022' coding system with
+ `ascii' (G0) and `greek-iso8859-7' (G1) initially invoked.
+
+`iso-8859-8'
+`iso-8859-8-dos'
+`iso-8859-8-mac'
+`iso-8859-8-unix'
+ Modeline indicator: `MIME/Hbrw'. A type `iso2022' coding system
+ with `ascii' (G0) and `hebrew-iso8859-8' (G1) initially invoked.
+
+`iso-8859-9'
+`iso-8859-9-dos'
+`iso-8859-9-mac'
+`iso-8859-9-unix'
+ Modeline indicator: `MIME/Ltn-5'. A type `iso2022' coding system
+ with `ascii' (G0) and `latin-iso8859-9' (G1) initially invoked.
+
+`koi8-r'
+`koi8-r-dos'
+`koi8-r-mac'
+`koi8-r-unix'
+ Modeline indicator: `KOI8'. A type `ccl' coding-system used for
+ KOI8-R, an encoding of the Cyrillic alphabet.
+
+`shift_jis'
+`shift_jis-dos'
+`shift_jis-mac'
+`shift_jis-unix'
+ Modeline indicator: `Ja/SJIS'. A type `shift-jis' coding-system
+ implementing the Shift-JIS encoding for Japanese. The underscore
+ is to conform to the MIME charset implementing this encoding.
+
+`tis-620'
+`tis-620-dos'
+`tis-620-mac'
+`tis-620-unix'
+ Modeline indicator: `TIS620'. A type `ccl' encoding for Thai. The
+ external encoding is defined by TIS620, the internal encoding is
+ peculiar to MULE, and called `thai-xtis'.
+
+`viqr'
+ Modeline indicator: `VIQR'. A type `no-conversion' coding system
+ with Unix EOL convention (ie, no conversion) using
+ post-read-decode and pre-write-encode functions to translate the
+ VIQR coding system for Vietnamese.
+
+`viscii'
+`viscii-dos'
+`viscii-mac'
+`viscii-unix'
+ Modeline indicator: `VISCII'. A type `ccl' coding-system used for
+ VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
+ given priority by XEmacs.
+
+`vscii'
+`vscii-dos'
+`vscii-mac'
+`vscii-unix'
+ Modeline indicator: `VSCII'. A type `ccl' coding-system used for
+ VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
+ given priority by XEmacs. Use `(prefer-coding-system
+ 'vietnamese-vscii)' to give priority to VSCII.
+
+\1f
+File: lispref.info, Node: CCL, Next: Category Tables, Prev: Coding Systems, Up: MULE
+
+CCL
+===
+
+ CCL (Code Conversion Language) is a simple structured programming
+language designed for character coding conversions. A CCL program is
+compiled to CCL code (represented by a vector of integers) and executed
+by the CCL interpreter embedded in Emacs. The CCL interpreter
+implements a virtual machine with 8 registers called `r0', ..., `r7', a
+number of control structures, and some I/O operators. Take care when
+using registers `r0' (used in implicit "set" statements) and especially
+`r7' (used internally by several statements and operations, especially
+for multiple return values and I/O operations).
+
+ CCL is used for code conversion during process I/O and file I/O for
+non-ISO2022 coding systems. (It is the only way for a user to specify a
+code conversion function.) It is also used for calculating the code
+point of an X11 font from a character code. However, since CCL is
+designed as a powerful programming language, it can be used for more
+generic calculation where efficiency is demanded. A combination of
+three or more arithmetic operations can be calculated faster by CCL than
+by Emacs Lisp.
+
+ *Warning:* The code in `src/mule-ccl.c' and
+`$packages/lisp/mule-base/mule-ccl.el' is the definitive description of
+CCL's semantics. The previous version of this section contained
+several typos and obsolete names left from earlier versions of MULE,
+and many may remain. (I am not an experienced CCL programmer; the few
+who know CCL well find writing English painful.)
+
+ A CCL program transforms an input data stream into an output data
+stream. The input stream, held in a buffer of constant bytes, is left
+unchanged. The buffer may be filled by an external input operation,
+taken from an Emacs buffer, or taken from a Lisp string. The output
+buffer is a dynamic array of bytes, which can be written by an external
+output operation, inserted into an Emacs buffer, or returned as a Lisp
+string.
+
+ A CCL program is a (Lisp) list containing two or three members. The
+first member is the "buffer magnification", which indicates the
+required minimum size of the output buffer as a multiple of the input
+buffer. It is followed by the "main block" which executes while there
+is input remaining, and an optional "EOF block" which is executed when
+the input is exhausted. Both the main block and the EOF block are CCL
+blocks.
+
+ A "CCL block" is either a CCL statement or list of CCL statements.
+A "CCL statement" is either a "set statement" (either an integer or an
+"assignment", which is a list of a register to receive the assignment,
+an assignment operator, and an expression) or a "control statement" (a
+list starting with a keyword, whose allowable syntax depends on the
+keyword).