git.chise.org Git - chise/xemacs-chise.git-/blob - info/lispref.info-44

   1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Coding System Types,  Next: ISO 2022,  Up: Coding Systems
  54
  55 Coding System Types
  56 -------------------
  57
  58    The coding system type determines the basic algorithm XEmacs will
  59 use to decode or encode a data stream.  Character encodings will be
  60 converted to the MULE encoding, escape sequences processed, and newline
  61 sequences converted to XEmacs's internal representation.  There are
  62 three basic classes of coding system type: no-conversion, ISO-2022, and
  63 special.
  64
  65    No conversion allows you to look at the file's internal
  66 representation.  Since XEmacs is basically a text editor, "no
  67 conversion" does convert newline conventions by default.  (Use the
  68 'binary coding-system if this is not desired.)
  69
  70    ISO 2022 (*note ISO 2022::) is the basic international standard
  71 regulating use of "coded character sets for the exchange of data", ie,
  72 text streams.  ISO 2022 contains functions that make it possible to
  73 encode text streams to comply with restrictions of the Internet mail
  74 system and de facto restrictions of most file systems (eg, use of the
  75 separator character in file names).  Coding systems which are not ISO
  76 2022 conformant can be difficult to handle.  Perhaps more important,
  77 they are not adaptable to multilingual information interchange, with
  78 the obvious exception of ISO 10646 (Unicode).  (Unicode is partially
  79 supported by XEmacs with the addition of the Lisp package ucs-conv.)
  80
  81    The special class of coding systems includes automatic detection,
  82 CCL (a "little language" embedded as an interpreter, useful for
  83 translating between variants of a single character set),
  84 non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5,
  85 and MULE internal coding.  (NB: this list is based on XEmacs 21.2.
  86 Terminology may vary slightly for other versions of XEmacs and for GNU
  87 Emacs 20.)
  88
  89 `no-conversion'
  90      No conversion, for binary files, and a few special cases of
  91      non-ISO-2022 coding systems where conversion is done by hook
  92      functions (usually implemented in CCL).  On output, graphic
  93      characters that are not in ASCII or Latin-1 will be replaced by a
  94      `?'. (For a no-conversion-encoded buffer, these characters will
  95      only be present if you explicitly insert them.)
  96
  97 `iso2022'
  98      Any ISO-2022-compliant encoding.  Among others, this includes JIS
  99      (the Japanese encoding commonly used for e-mail), national
 100      variants of EUC (the standard Unix encoding for Japanese and other
 101      languages), and Compound Text (an encoding used in X11).  You can
 102      specify more specific information about the conversion with the
 103      FLAGS argument.
 104
 105 `ucs-4'
 106      ISO 10646 UCS-4 encoding.  A 31-bit fixed-width superset of
 107      Unicode.
 108
 109 `utf-8'
 110      ISO 10646 UTF-8 encoding.  A "file system safe" transformation
 111      format that can be used with both UCS-4 and Unicode.
 112
 113 `undecided'
 114      Automatic conversion.  XEmacs attempts to detect the coding system
 115      used in the file.
 116
 117 `shift-jis'
 118      Shift-JIS (a Japanese encoding commonly used in PC operating
 119      systems).
 120
 121 `big5'
 122      Big5 (the encoding commonly used for Taiwanese).
 123
 124 `ccl'
 125      The conversion is performed using a user-written pseudo-code
 126      program.  CCL (Code Conversion Language) is the name of this
 127      pseudo-code.  For example, CCL is used to map KOI8-R characters
 128      (an encoding for Russian Cyrillic) to ISO8859-5 (the form used
 129      internally by MULE).
 130
 131 `internal'
 132      Write out or read in the raw contents of the memory representing
 133      the buffer's text.  This is primarily useful for debugging
 134      purposes, and is only enabled when XEmacs has been compiled with
 135      `DEBUG_XEMACS' set (the `--debug' configure option).  *Warning*:
 136      Reading in a file using `internal' conversion can result in an
 137      internal inconsistency in the memory representing a buffer's text,
 138      which will produce unpredictable results and may cause XEmacs to
 139      crash.  Under normal circumstances you should never use `internal'
 140      conversion.
 141
 142 \1f
 143 File: lispref.info,  Node: ISO 2022,  Next: EOL Conversion,  Prev: Coding System Types,  Up: Coding Systems
 144
 145 ISO 2022
 146 ========
 147
 148    This section briefly describes the ISO 2022 encoding standard.  A
 149 more thorough treatment is available in the original document of ISO
 150 2022 as well as various national standards (such as JIS X 0202).
 151
 152    Character sets ("charsets") are classified into the following four
 153 categories, according to the number of characters in the charset:
 154 94-charset, 96-charset, 94x94-charset, and 96x96-charset.  This means
 155 that although an ISO 2022 coding system may have variable width
 156 characters, each charset used is fixed-width (in contrast to the MULE
 157 character set and UTF-8, for example).
 158
 159    ISO 2022 provides for switching between character sets via escape
 160 sequences.  This switching is somewhat complicated, because ISO 2022
 161 provides for both legacy applications like Internet mail that accept
 162 only 7 significant bits in some contexts (RFC 822 headers, for example),
 163 and more modern "8-bit clean" applications.  It also provides for
 164 compact and transparent representation of languages like Japanese which
 165 mix ASCII and a national script (even outside of computer programs).
 166
 167    First, ISO 2022 codified prevailing practice by dividing the code
 168 space into "control" and "graphic" regions.  The code points 0x00-0x1F
 169 and 0x80-0x9F are reserved for "control characters", while "graphic
 170 characters" must be assigned to code points in the regions 0x20-0x7F and
 171 0xA0-0xFF.  The positions 0x20 and 0x7F are special, and under some
 172 circumstances must be assigned the graphic character "ASCII SPACE" and
 173 the control character "ASCII DEL" respectively.
 174
 175    The various regions are given the name C0 (0x00-0x1F), GL
 176 (0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF).  GL and GR stand for
 177 "graphic left" and "graphic right", respectively, because of the
 178 standard method of displaying graphic character sets in tables with the
 179 high byte indexing columns and the low byte indexing rows.  I don't
 180 find it very intuitive, but these are called "registers".
 181
 182    An ISO 2022-conformant encoding for a graphic character set must use
 183 a fixed number of bytes per character, and the values must fit into a
 184 single register; that is, each byte must range over either 0x20-0x7F, or
 185 0xA0-0xFF.  It is not allowed to extend the range of the repertoire of a
 186 character set by using both ranges at the same.  This is why a standard
 187 character set such as ISO 8859-1 is actually considered by ISO 2022 to
 188 be an aggregation of two character sets, ASCII and LATIN-1, and why it
 189 is technically incorrect to refer to ISO 8859-1 as "Latin 1".  Also, a
 190 single character's bytes must all be drawn from the same register; this
 191 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
 192 2022-compatible encodings.
 193
 194    The reason for this restriction becomes clear when you attempt to
 195 define an efficient, robust encoding for a language like Japanese.
 196 Like ISO 8859, Japanese encodings are aggregations of several character
 197 sets.  In practice, the vast majority of characters are drawn from the
 198 "JIS Roman" character set (a derivative of ASCII; it won't hurt to
 199 think of it as ASCII) and the JIS X 0208 standard "basic Japanese"
 200 character set including not only ideographic characters ("kanji") but
 201 syllabic Japanese characters ("kana"), a wide variety of symbols, and
 202 many alphabetic characters (Roman, Greek, and Cyrillic) as well.
 203 Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code
 204 it is not suited to programming; thus the inclusion of ASCII in the
 205 standard Japanese encodings.
 206
 207    For normal Japanese text such as in newspapers, a broad repertoire of
 208 approximately 3000 characters is used.  Evidently this won't fit into
 209 one byte; two must be used.  But much of the text processed by Japanese
 210 computers is computer source code, nearly all of which is ASCII.  A not
 211 insignificant portion of ordinary text is English (as such or as
 212 borrowed Japanese vocabulary) or other languages which can represented
 213 at least approximately in ASCII, as well.  It seems reasonable then to
 214 represent ASCII in one byte, and JIS X 0208 in two.  And this is exactly
 215 what the Extended Unix Code for Japanese (EUC-JP) does.  ASCII is
 216 invoked to the GL register, and JIS X 0208 is invoked to the GR
 217 register.  Thus, each byte can be tested for its character set by
 218 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
 219 Furthermore, since control characters like newline can never be part of
 220 a graphic character, even in the case of corruption in transmission the
 221 stream will be resynchronized at every line break, on the order of 60-80
 222 bytes.  This coding system requires no escape sequences or special
 223 control codes to represent 99.9% of all Japanese text.
 224
 225    Note carefully the distinction between the character sets (ASCII and
 226 JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022).
 227 The JIS X 0208 character set is used in three different encodings for
 228 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
 229 always clear), in EUC-JP it is invoked into GR (setting the high bit in
 230 the process), and in Shift JIS the high bit may be set or reset, and the
 231 significant bits are shifted within the 16-bit character so that the two
 232 main character sets can coexist with a third (the "halfwidth katakana"
 233 of JIS X 0201).  As the name implies, the ISO-2022-JP encoding is also a
 234 version of the ISO-2022 coding system.
 235
 236    In order to systematically treat subsidiary character sets (like the
 237 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
 238 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
 239 Unlike GL and GR, they are not logically distinguished by internal
 240 format.  Instead, the process of "invocation" mentioned earlier is
 241 broken into two steps: first, a character set is "designated" to one of
 242 the registers G0-G3 by use of an "escape sequence" of the form:
 243
 244              ESC [I] I F
 245
 246    where I is an intermediate character or characters in the range 0x20
 247 - 0x3F, and F, from the range 0x30-0x7Fm is the final character
 248 identifying this charset.  (Final characters in the range 0x30-0x3F are
 249 reserved for private use and will never have a publicly registered
 250 meaning.)
 251
 252    Then that register is "invoked" to either GL or GR, either
 253 automatically (designations to G0 normally involve invocation to GL as
 254 well), or by use of shifting (affecting only the following character in
 255 the data stream) or locking (effective until the next designation or
 256 locking) control sequences.  An encoding conformant to ISO 2022 is
 257 typically defined by designating the initial contents of the G0-G3
 258 registers, specifying an 7 or 8 bit environment, and specifying whether
 259 further designations will be recognized.
 260
 261    Some examples of character sets and the registered final characters
 262 F used to designate them:
 263
 264 94-charset
 265      ASCII (B), left (J) and right (I) half of JIS X 0201, ...
 266
 267 96-charset
 268      Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
 269
 270 94x94-charset
 271      GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
 272
 273 96x96-charset
 274      none for the moment
 275
 276    The meanings of the various characters in these sequences, where not
 277 specified by the ISO 2022 standard (such as the ESC character), are
 278 assigned by "ECMA", the European Computer Manufacturers Association.
 279
 280    The meaning of intermediate characters are:
 281
 282              $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
 283              ( [0x28]: designate to G0 a 94-charset whose final byte is F.
 284              ) [0x29]: designate to G1 a 94-charset whose final byte is F.
 285              * [0x2A]: designate to G2 a 94-charset whose final byte is F.
 286              + [0x2B]: designate to G3 a 94-charset whose final byte is F.
 287              , [0x2C]: designate to G0 a 96-charset whose final byte is F.
 288              - [0x2D]: designate to G1 a 96-charset whose final byte is F.
 289              . [0x2E]: designate to G2 a 96-charset whose final byte is F.
 290              / [0x2F]: designate to G3 a 96-charset whose final byte is F.
 291
 292    The comma may be used in files read and written only by MULE, as a
 293 MULE extension, but this is illegal in ISO 2022.  (The reason is that
 294 in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned
 295 the value SPACE, and 0x7F assigned the value DEL.)
 296
 297    Here are examples of designations:
 298
 299              ESC ( B :              designate to G0 ASCII
 300              ESC - A :              designate to G1 Latin-1
 301              ESC $ ( A or ESC $ A : designate to G0 GB2312
 302              ESC $ ( B or ESC $ B : designate to G0 JISX0208
 303              ESC $ ) C :            designate to G1 KSC5601
 304
 305    (The short forms used to designate GB2312 and JIS X 0208 are for
 306 backwards compatibility; the long forms are preferred.)
 307
 308    To use a charset designated to G2 or G3, and to use a charset
 309 designated to G1 in a 7-bit environment, you must explicitly invoke G1,
 310 G2, or G3 into GL.  There are two types of invocation, Locking Shift
 311 (forever) and Single Shift (one character only).
 312
 313    Locking Shift is done as follows:
 314
 315              LS0 or SI (0x0F): invoke G0 into GL
 316              LS1 or SO (0x0E): invoke G1 into GL
 317              LS2:  invoke G2 into GL
 318              LS3:  invoke G3 into GL
 319              LS1R: invoke G1 into GR
 320              LS2R: invoke G2 into GR
 321              LS3R: invoke G3 into GR
 322
 323    Single Shift is done as follows:
 324
 325              SS2 or ESC N: invoke G2 into GL
 326              SS3 or ESC O: invoke G3 into GL
 327
 328    The shift functions (such as LS1R and SS3) are represented by control
 329 characters (from C1) in 8 bit environments and by escape sequences in 7
 330 bit environments.
 331
 332    (#### Ben says: I think the above is slightly incorrect.  It appears
 333 that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
 334 and ESC O behave as indicated.  The above definitions will not parse
 335 EUC-encoded text correctly, and it looks like the code in mule-coding.c
 336 has similar problems.)
 337
 338    Evidently there are a lot of ISO-2022-compliant ways of encoding
 339 multilingual text.  Now, in the world, there exist many coding systems
 340 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
 341 (Extended UNIX Code); all of these are variants of ISO 2022.
 342
 343    In MULE, we characterize a version of ISO 2022 by the following
 344 attributes:
 345
 346   1. The character sets initially designated to G0 thru G3.
 347
 348   2. Whether short form designations are allowed for Japanese and
 349      Chinese.
 350
 351   3. Whether ASCII should be designated to G0 before control characters.
 352
 353   4. Whether ASCII should be designated to G0 at the end of line.
 354
 355   5. 7-bit environment or 8-bit environment.
 356
 357   6. Whether Locking Shifts are used or not.
 358
 359   7. Whether to use ASCII or the variant JIS X 0201-1976-Roman.
 360
 361   8. Whether to use JIS X 0208-1983 or the older version JIS X
 362      0208-1976.
 363
 364    (The last two are only for Japanese.)
 365
 366    By specifying these attributes, you can create any variant of ISO
 367 2022.
 368
 369    Here are several examples:
 370
 371      ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
 372              1. G0 <- ASCII, G1..3 <- never used
 373              2. Yes.
 374              3. Yes.
 375              4. Yes.
 376              5. 7-bit environment
 377              6. No.
 378              7. Use ASCII
 379              8. Use JIS X 0208-1983
 380
 381      ctext -- X11 Compound Text
 382              1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
 383              2. No.
 384              3. No.
 385              4. Yes.
 386              5. 8-bit environment.
 387              6. No.
 388              7. Use ASCII.
 389              8. Use JIS X 0208-1983.
 390
 391      euc-china -- Chinese EUC.  Often called the "GB encoding", but that is
 392      technically incorrect.
 393              1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
 394              2. No.
 395              3. Yes.
 396              4. Yes.
 397              5. 8-bit environment.
 398              6. No.
 399              7. Use ASCII.
 400              8. Use JIS X 0208-1983.
 401
 402      ISO-2022-KR -- Coding system used in Korean email.
 403              1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
 404              2. No.
 405              3. Yes.
 406              4. Yes.
 407              5. 7-bit environment.
 408              6. Yes.
 409              7. Use ASCII.
 410              8. Use JIS X 0208-1983.
 411
 412    MULE creates all of these coding systems by default.
 413
 414 \1f
 415 File: lispref.info,  Node: EOL Conversion,  Next: Coding System Properties,  Prev: ISO 2022,  Up: Coding Systems
 416
 417 EOL Conversion
 418 --------------
 419
 420 `nil'
 421      Automatically detect the end-of-line type (LF, CRLF, or CR).  Also
 422      generate subsidiary coding systems named `NAME-unix', `NAME-dos',
 423      and `NAME-mac', that are identical to this coding system but have
 424      an EOL-TYPE value of `lf', `crlf', and `cr', respectively.
 425
 426 `lf'
 427      The end of a line is marked externally using ASCII LF.  Since this
 428      is also the way that XEmacs represents an end-of-line internally,
 429      specifying this option results in no end-of-line conversion.  This
 430      is the standard format for Unix text files.
 431
 432 `crlf'
 433      The end of a line is marked externally using ASCII CRLF.  This is
 434      the standard format for MS-DOS text files.
 435
 436 `cr'
 437      The end of a line is marked externally using ASCII CR.  This is the
 438      standard format for Macintosh text files.
 439
 440 `t'
 441      Automatically detect the end-of-line type but do not generate
 442      subsidiary coding systems.  (This value is converted to `nil' when
 443      stored internally, and `coding-system-property' will return `nil'.)
 444
 445 \1f
 446 File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems
 447
 448 Coding System Properties
 449 ------------------------
 450
 451 `mnemonic'
 452      String to be displayed in the modeline when this coding system is
 453      active.
 454
 455 `eol-type'
 456      End-of-line conversion to be used.  It should be one of the types
 457      listed in *Note EOL Conversion::.
 458
 459 `eol-lf'
 460      The coding system which is the same as this one, except that it
 461      uses the Unix line-breaking convention.
 462
 463 `eol-crlf'
 464      The coding system which is the same as this one, except that it
 465      uses the DOS line-breaking convention.
 466
 467 `eol-cr'
 468      The coding system which is the same as this one, except that it
 469      uses the Macintosh line-breaking convention.
 470
 471 `post-read-conversion'
 472      Function called after a file has been read in, to perform the
 473      decoding.  Called with two arguments, BEG and END, denoting a
 474      region of the current buffer to be decoded.
 475
 476 `pre-write-conversion'
 477      Function called before a file is written out, to perform the
 478      encoding.  Called with two arguments, BEG and END, denoting a
 479      region of the current buffer to be encoded.
 480
 481    The following additional properties are recognized if TYPE is
 482 `iso2022':
 483
 484 `charset-g0'
 485 `charset-g1'
 486 `charset-g2'
 487 `charset-g3'
 488      The character set initially designated to the G0 - G3 registers.
 489      The value should be one of
 490
 491         * A charset object (designate that character set)
 492
 493         * `nil' (do not ever use this register)
 494
 495         * `t' (no character set is initially designated to the
 496           register, but may be later on; this automatically sets the
 497           corresponding `force-g*-on-output' property)
 498
 499 `force-g0-on-output'
 500 `force-g1-on-output'
 501 `force-g2-on-output'
 502 `force-g3-on-output'
 503      If non-`nil', send an explicit designation sequence on output
 504      before using the specified register.
 505
 506 `short'
 507      If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
 508      B' on output in place of the full designation sequences `ESC $ (
 509      @', `ESC $ ( A', and `ESC $ ( B'.
 510
 511 `no-ascii-eol'
 512      If non-`nil', don't designate ASCII to G0 at each end of line on
 513      output.  Setting this to non-`nil' also suppresses other
 514      state-resetting that normally happens at the end of a line.
 515
 516 `no-ascii-cntl'
 517      If non-`nil', don't designate ASCII to G0 before control chars on
 518      output.
 519
 520 `seven'
 521      If non-`nil', use 7-bit environment on output.  Otherwise, use
 522      8-bit environment.
 523
 524 `lock-shift'
 525      If non-`nil', use locking-shift (SO/SI) instead of single-shift or
 526      designation by escape sequence.
 527
 528 `no-iso6429'
 529      If non-`nil', don't use ISO6429's direction specification.
 530
 531 `escape-quoted'
 532      If non-nil, literal control characters that are the same as the
 533      beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
 534      particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
 535      (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
 536      that they can be properly distinguished from an escape sequence.
 537      (Note that doing this results in a non-portable encoding.) This
 538      encoding flag is used for byte-compiled files.  Note that ESC is a
 539      good choice for a quoting character because there are no escape
 540      sequences whose second byte is a character from the Control-0 or
 541      Control-1 character sets; this is explicitly disallowed by the ISO
 542      2022 standard.
 543
 544 `input-charset-conversion'
 545      A list of conversion specifications, specifying conversion of
 546      characters in one charset to another when decoding is performed.
 547      Each specification is a list of two elements: the source charset,
 548      and the destination charset.
 549
 550 `output-charset-conversion'
 551      A list of conversion specifications, specifying conversion of
 552      characters in one charset to another when encoding is performed.
 553      The form of each specification is the same as for
 554      `input-charset-conversion'.
 555
 556    The following additional properties are recognized (and required) if
 557 TYPE is `ccl':
 558
 559 `decode'
 560      CCL program used for decoding (converting to internal format).
 561
 562 `encode'
 563      CCL program used for encoding (converting to external format).
 564
 565    The following properties are used internally:  EOL-CR, EOL-CRLF,
 566 EOL-LF, and BASE.
 567
 568 \1f
 569 File: lispref.info,  Node: Basic Coding System Functions,  Next: Coding System Property Functions,  Prev: Coding System Properties,  Up: Coding Systems
 570
 571 Basic Coding System Functions
 572 -----------------------------
 573
 574  - Function: find-coding-system coding-system-or-name
 575      This function retrieves the coding system of the given name.
 576
 577      If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
 578      returned.  Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
 579      If there is no such coding system, `nil' is returned.  Otherwise
 580      the associated coding system object is returned.
 581
 582  - Function: get-coding-system name
 583      This function retrieves the coding system of the given name.  Same
 584      as `find-coding-system' except an error is signalled if there is no
 585      such coding system instead of returning `nil'.
 586
 587  - Function: coding-system-list
 588      This function returns a list of the names of all defined coding
 589      systems.
 590
 591  - Function: coding-system-name coding-system
 592      This function returns the name of the given coding system.
 593
 594  - Function: coding-system-base coding-system
 595      Returns the base coding system (undecided EOL convention) coding
 596      system.
 597
 598  - Function: make-coding-system name type &optional doc-string props
 599      This function registers symbol NAME as a coding system.
 600
 601      TYPE describes the conversion method used and should be one of the
 602      types listed in *Note Coding System Types::.
 603
 604      DOC-STRING is a string describing the coding system.
 605
 606      PROPS is a property list, describing the specific nature of the
 607      character set.  Recognized properties are as in *Note Coding
 608      System Properties::.
 609
 610  - Function: copy-coding-system old-coding-system new-name
 611      This function copies OLD-CODING-SYSTEM to NEW-NAME.  If NEW-NAME
 612      does not name an existing coding system, a new one will be created.
 613
 614  - Function: subsidiary-coding-system coding-system eol-type
 615      This function returns the subsidiary coding system of
 616      CODING-SYSTEM with eol type EOL-TYPE.
 617
 618 \1f
 619 File: lispref.info,  Node: Coding System Property Functions,  Next: Encoding and Decoding Text,  Prev: Basic Coding System Functions,  Up: Coding Systems
 620
 621 Coding System Property Functions
 622 --------------------------------
 623
 624  - Function: coding-system-doc-string coding-system
 625      This function returns the doc string for CODING-SYSTEM.
 626
 627  - Function: coding-system-type coding-system
 628      This function returns the type of CODING-SYSTEM.
 629
 630  - Function: coding-system-property coding-system prop
 631      This function returns the PROP property of CODING-SYSTEM.
 632
 633 \1f
 634 File: lispref.info,  Node: Encoding and Decoding Text,  Next: Detection of Textual Encoding,  Prev: Coding System Property Functions,  Up: Coding Systems
 635
 636 Encoding and Decoding Text
 637 --------------------------
 638
 639  - Function: decode-coding-region start end coding-system &optional
 640           buffer
 641      This function decodes the text between START and END which is
 642      encoded in CODING-SYSTEM.  This is useful if you've read in
 643      encoded text from a file without decoding it (e.g. you read in a
 644      JIS-formatted file but used the `binary' or `no-conversion' coding
 645      system, so that it shows up as `^[$B!<!+^[(B').  The length of the
 646      encoded text is returned.  BUFFER defaults to the current buffer
 647      if unspecified.
 648
 649  - Function: encode-coding-region start end coding-system &optional
 650           buffer
 651      This function encodes the text between START and END using
 652      CODING-SYSTEM.  This will, for example, convert Japanese
 653      characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
 654      encoding.  The length of the encoded text is returned.  BUFFER
 655      defaults to the current buffer if unspecified.
 656
 657 \1f
 658 File: lispref.info,  Node: Detection of Textual Encoding,  Next: Big5 and Shift-JIS Functions,  Prev: Encoding and Decoding Text,  Up: Coding Systems
 659
 660 Detection of Textual Encoding
 661 -----------------------------
 662
 663  - Function: coding-category-list
 664      This function returns a list of all recognized coding categories.
 665
 666  - Function: set-coding-priority-list list
 667      This function changes the priority order of the coding categories.
 668      LIST should be a list of coding categories, in descending order of
 669      priority.  Unspecified coding categories will be lower in priority
 670      than all specified ones, in the same relative order they were in
 671      previously.
 672
 673  - Function: coding-priority-list
 674      This function returns a list of coding categories in descending
 675      order of priority.
 676
 677  - Function: set-coding-category-system coding-category coding-system
 678      This function changes the coding system associated with a coding
 679      category.
 680
 681  - Function: coding-category-system coding-category
 682      This function returns the coding system associated with a coding
 683      category.
 684
 685  - Function: detect-coding-region start end &optional buffer
 686      This function detects coding system of the text in the region
 687      between START and END.  Returned value is a list of possible coding
 688      systems ordered by priority.  If only ASCII characters are found,
 689      it returns `autodetect' or one of its subsidiary coding systems
 690      according to a detected end-of-line type.  Optional arg BUFFER
 691      defaults to the current buffer.
 692
 693 \1f
 694 File: lispref.info,  Node: Big5 and Shift-JIS Functions,  Next: Predefined Coding Systems,  Prev: Detection of Textual Encoding,  Up: Coding Systems
 695
 696 Big5 and Shift-JIS Functions
 697 ----------------------------
 698
 699    These are special functions for working with the non-standard
 700 Shift-JIS and Big5 encodings.
 701
 702  - Function: decode-shift-jis-char code
 703      This function decodes a JIS X 0208 character of Shift-JIS
 704      coding-system.  CODE is the character code in Shift-JIS as a cons
 705      of type bytes.  The corresponding character is returned.
 706
 707  - Function: encode-shift-jis-char ch
 708      This function encodes a JIS X 0208 character CH to SHIFT-JIS
 709      coding-system.  The corresponding character code in SHIFT-JIS is
 710      returned as a cons of two bytes.
 711
 712  - Function: decode-big5-char code
 713      This function decodes a Big5 character CODE of BIG5 coding-system.
 714      CODE is the character code in BIG5.  The corresponding character
 715      is returned.
 716
 717  - Function: encode-big5-char ch
 718      This function encodes the Big5 character CHAR to BIG5
 719      coding-system.  The corresponding character code in Big5 is
 720      returned.
 721
 722 \1f
 723 File: lispref.info,  Node: Predefined Coding Systems,  Prev: Big5 and Shift-JIS Functions,  Up: Coding Systems
 724
 725 Coding Systems Implemented
 726 --------------------------
 727
 728    MULE initializes most of the commonly used coding systems at XEmacs's
 729 startup.  A few others are initialized only when the relevant language
 730 environment is selected and support libraries are loaded.  (NB: The
 731 following list is based on XEmacs 21.2.19, the development branch at the
 732 time of writing.  The list may be somewhat different for other
 733 versions.  Recent versions of GNU Emacs 20 implement a few more rare
 734 coding systems; work is being done to port these to XEmacs.)
 735
 736    Unfortunately, there is not a consistent naming convention for
 737 character sets, and for practical purposes coding systems often take
 738 their name from their principal character sets (ASCII, KOI8-R, Shift
 739 JIS).  Others take their names from the coding system (ISO-2022-JP,
 740 EUC-KR), and a few from their non-text usages (internal, binary).  To
 741 provide for this, and for the fact that many coding systems have
 742 several common names, an aliasing system is provided.  Finally, some
 743 effort has been made to use names that are registered as MIME charsets
 744 (this is why the name 'shift_jis contains that un-Lisp-y underscore).
 745
 746    There is a systematic naming convention regarding end-of-line (EOL)
 747 conventions for different systems.  A coding system whose name ends in
 748 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
 749 A coding system whose name ends in "-mac" forces the assumptions that
 750 lines are broken by ASCII CRs (0x0D).  A coding system whose name ends
 751 in "-dos" forces the assumptions that lines are broken by CRLF sequences
 752 (0x0D 0x0A).  These subsidiary coding systems are automatically derived
 753 from a base coding system.  Use of the base coding system implies
 754 autodetection of the text file convention.  (The fact that the -unix,
 755 -mac, and -dos are derived from a base system results in them showing up
 756 as "aliases" in `list-coding-systems'.)  These subsidiaries have a
 757 consistent modeline indicator as well.  "-dos" coding systems have ":T"
 758 appended to their modeline indicator, while "-mac" coding systems have
 759 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
 760
 761    In the following table, each coding system is given with its mode
 762 line indicator in parentheses.  Non-textual coding systems are listed
 763 first, followed by textual coding systems and their aliases. (The
 764 coding system subsidiary modeline indicators ":T" and ":t" will be
 765 omitted from the table of coding systems.)
 766
 767    ### SJT 1999-08-23 Maybe should order these by language?  Definitely
 768 need language usage for the ISO-8859 family.
 769
 770    Note that although true coding system aliases have been implemented
 771 for XEmacs 21.2, the coding system initialization has not yet been
 772 converted as of 21.2.19.  So coding systems described as aliases have
 773 the same properties as the aliased coding system, but will not be equal
 774 as Lisp objects.
 775
 776 `automatic-conversion'
 777 `undecided'
 778 `undecided-dos'
 779 `undecided-mac'
 780 `undecided-unix'
 781      Modeline indicator: `Auto'.  A type `undecided' coding system.
 782      Attempts to determine an appropriate coding system from file
 783      contents or the environment.
 784
 785 `raw-text'
 786 `no-conversion'
 787 `raw-text-dos'
 788 `raw-text-mac'
 789 `raw-text-unix'
 790 `no-conversion-dos'
 791 `no-conversion-mac'
 792 `no-conversion-unix'
 793      Modeline indicator: `Raw'.  A type `no-conversion' coding system,
 794      which converts only line-break-codes.  An implementation quirk
 795      means that this coding system is also used for ISO8859-1.
 796
 797 `binary'
 798      Modeline indicator: `Binary'.  A type `no-conversion' coding
 799      system which does no character coding or EOL conversions.  An
 800      alias for `raw-text-unix'.
 801
 802 `alternativnyj'
 803 `alternativnyj-dos'
 804 `alternativnyj-mac'
 805 `alternativnyj-unix'
 806      Modeline indicator: `Cy.Alt'.  A type `ccl' coding system used for
 807      Alternativnyj, an encoding of the Cyrillic alphabet.
 808
 809 `big5'
 810 `big5-dos'
 811 `big5-mac'
 812 `big5-unix'
 813      Modeline indicator: `Zh/Big5'.  A type `big5' coding system used
 814      for BIG5, the most common encoding of traditional Chinese as used
 815      in Taiwan.
 816
 817 `cn-gb-2312'
 818 `cn-gb-2312-dos'
 819 `cn-gb-2312-mac'
 820 `cn-gb-2312-unix'
 821      Modeline indicator: `Zh-GB/EUC'.  A type `iso2022' coding system
 822      used for simplified Chinese (as used in the People's Republic of
 823      China), with the `ascii' (G0), `chinese-gb2312' (G1), and `sisheng'
 824      (G2) character sets initially designated.  Chinese EUC (Extended
 825      Unix Code).
 826
 827 `ctext-hebrew'
 828 `ctext-hebrew-dos'
 829 `ctext-hebrew-mac'
 830 `ctext-hebrew-unix'
 831      Modeline indicator: `CText/Hbrw'.  A type `iso2022' coding system
 832      with the `ascii' (G0) and `hebrew-iso8859-8' (G1) character sets
 833      initially designated for Hebrew.
 834
 835 `ctext'
 836 `ctext-dos'
 837 `ctext-mac'
 838 `ctext-unix'
 839      Modeline indicator: `CText'.  A type `iso2022' 8-bit coding system
 840      with the `ascii' (G0) and `latin-iso8859-1' (G1) character sets
 841      initially designated.  X11 Compound Text Encoding.  Often
 842      mistakenly recognized instead of EUC encodings; usual cause is
 843      inappropriate setting of `coding-priority-list'.
 844
 845 `escape-quoted'
 846      Modeline indicator: `ESC/Quot'.  A type `iso2022' 8-bit coding
 847      system with the `ascii' (G0) and `latin-iso8859-1' (G1) character
 848      sets initially designated and escape quoting.  Unix EOL conversion
 849      (ie, no conversion).  It is used for .ELC files.
 850
 851 `euc-jp'
 852 `euc-jp-dos'
 853 `euc-jp-mac'
 854 `euc-jp-unix'
 855      Modeline indicator: `Ja/EUC'.  A type `iso2022' 8-bit coding system
 856      with `ascii' (G0), `japanese-jisx0208' (G1), `katakana-jisx0201'
 857      (G2), and `japanese-jisx0212' (G3) initially designated.  Japanese
 858      EUC (Extended Unix Code).
 859
 860 `euc-kr'
 861 `euc-kr-dos'
 862 `euc-kr-mac'
 863 `euc-kr-unix'
 864      Modeline indicator: `ko/EUC'.  A type `iso2022' 8-bit coding system
 865      with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
 866      Korean EUC (Extended Unix Code).
 867
 868 `hz-gb-2312'
 869      Modeline indicator: `Zh-GB/Hz'.  A type `no-conversion' coding
 870      system with Unix EOL convention (ie, no conversion) using
 871      post-read-decode and pre-write-encode functions to translate the
 872      Hz/ZW coding system used for Chinese.
 873
 874 `iso-2022-7bit'
 875 `iso-2022-7bit-unix'
 876 `iso-2022-7bit-dos'
 877 `iso-2022-7bit-mac'
 878 `iso-2022-7'
 879      Modeline indicator: `ISO7'.  A type `iso2022' 7-bit coding system
 880      with `ascii' (G0) initially designated.  Other character sets must
 881      be explicitly designated to be used.
 882
 883 `iso-2022-7bit-ss2'
 884 `iso-2022-7bit-ss2-dos'
 885 `iso-2022-7bit-ss2-mac'
 886 `iso-2022-7bit-ss2-unix'
 887      Modeline indicator: `ISO7/SS'.  A type `iso2022' 7-bit coding
 888      system with `ascii' (G0) initially designated.  Other character
 889      sets must be explicitly designated to be used.  SS2 is used to
 890      invoke a 96-charset, one character at a time.
 891
 892 `iso-2022-8'
 893 `iso-2022-8-dos'
 894 `iso-2022-8-mac'
 895 `iso-2022-8-unix'
 896      Modeline indicator: `ISO8'.  A type `iso2022' 8-bit coding system
 897      with `ascii' (G0) and `latin-iso8859-1' (G1) initially designated.
 898      Other character sets must be explicitly designated to be used.
 899      No single-shift or locking-shift.
 900
 901 `iso-2022-8bit-ss2'
 902 `iso-2022-8bit-ss2-dos'
 903 `iso-2022-8bit-ss2-mac'
 904 `iso-2022-8bit-ss2-unix'
 905      Modeline indicator: `ISO8/SS'.  A type `iso2022' 8-bit coding
 906      system with `ascii' (G0) and `latin-iso8859-1' (G1) initially
 907      designated.  Other character sets must be explicitly designated to
 908      be used.  SS2 is used to invoke a 96-charset, one character at a
 909      time.
 910
 911 `iso-2022-int-1'
 912 `iso-2022-int-1-dos'
 913 `iso-2022-int-1-mac'
 914 `iso-2022-int-1-unix'
 915      Modeline indicator: `INT-1'.  A type `iso2022' 7-bit coding system
 916      with `ascii' (G0) and `korean-ksc5601' (G1) initially designated.
 917      ISO-2022-INT-1.
 918
 919 `iso-2022-jp-1978-irv'
 920 `iso-2022-jp-1978-irv-dos'
 921 `iso-2022-jp-1978-irv-mac'
 922 `iso-2022-jp-1978-irv-unix'
 923      Modeline indicator: `Ja-78/7bit'.  A type `iso2022' 7-bit coding
 924      system.  For compatibility with old Japanese terminals; if you
 925      need to know, look at the source.
 926
 927 `iso-2022-jp'
 928 `iso-2022-jp-2 (ISO7/SS)'
 929 `iso-2022-jp-dos'
 930 `iso-2022-jp-mac'
 931 `iso-2022-jp-unix'
 932 `iso-2022-jp-2-dos'
 933 `iso-2022-jp-2-mac'
 934 `iso-2022-jp-2-unix'
 935      Modeline indicator: `MULE/7bit'.  A type `iso2022' 7-bit coding
 936      system with `ascii' (G0) initially designated, and complex
 937      specifications to insure backward compatibility with old Japanese
 938      systems.  Used for communication with mail and news in Japan.  The
 939      "-2" versions also use SS2 to invoke a 96-charset one character at
 940      a time.
 941
 942 `iso-2022-kr'
 943      Modeline indicator: `Ko/7bit'  A type `iso2022' 7-bit coding
 944      system with `ascii' (G0) and `korean-ksc5601' (G1) initially
 945      designated.  Used for e-mail in Korea.
 946
 947 `iso-2022-lock'
 948 `iso-2022-lock-dos'
 949 `iso-2022-lock-mac'
 950 `iso-2022-lock-unix'
 951      Modeline indicator: `ISO7/Lock'.  A type `iso2022' 7-bit coding
 952      system with `ascii' (G0) initially designated, using Locking-Shift
 953      to invoke a 96-charset.
 954
 955 `iso-8859-1'
 956 `iso-8859-1-dos'
 957 `iso-8859-1-mac'
 958 `iso-8859-1-unix'
 959      Due to implementation, this is not a type `iso2022' coding system,
 960      but rather an alias for the `raw-text' coding system.
 961
 962 `iso-8859-2'
 963 `iso-8859-2-dos'
 964 `iso-8859-2-mac'
 965 `iso-8859-2-unix'
 966      Modeline indicator: `MIME/Ltn-2'.  A type `iso2022' coding system
 967      with `ascii' (G0) and `latin-iso8859-2' (G1) initially invoked.
 968
 969 `iso-8859-3'
 970 `iso-8859-3-dos'
 971 `iso-8859-3-mac'
 972 `iso-8859-3-unix'
 973      Modeline indicator: `MIME/Ltn-3'.  A type `iso2022' coding system
 974      with `ascii' (G0) and `latin-iso8859-3' (G1) initially invoked.
 975
 976 `iso-8859-4'
 977 `iso-8859-4-dos'
 978 `iso-8859-4-mac'
 979 `iso-8859-4-unix'
 980      Modeline indicator: `MIME/Ltn-4'.  A type `iso2022' coding system
 981      with `ascii' (G0) and `latin-iso8859-4' (G1) initially invoked.
 982
 983 `iso-8859-5'
 984 `iso-8859-5-dos'
 985 `iso-8859-5-mac'
 986 `iso-8859-5-unix'
 987      Modeline indicator: `ISO8/Cyr'.  A type `iso2022' coding system
 988      with `ascii' (G0) and `cyrillic-iso8859-5' (G1) initially invoked.
 989
 990 `iso-8859-7'
 991 `iso-8859-7-dos'
 992 `iso-8859-7-mac'
 993 `iso-8859-7-unix'
 994      Modeline indicator: `Grk'.  A type `iso2022' coding system with
 995      `ascii' (G0) and `greek-iso8859-7' (G1) initially invoked.
 996
 997 `iso-8859-8'
 998 `iso-8859-8-dos'
 999 `iso-8859-8-mac'
1000 `iso-8859-8-unix'
1001      Modeline indicator: `MIME/Hbrw'.  A type `iso2022' coding system
1002      with `ascii' (G0) and `hebrew-iso8859-8' (G1) initially invoked.
1003
1004 `iso-8859-9'
1005 `iso-8859-9-dos'
1006 `iso-8859-9-mac'
1007 `iso-8859-9-unix'
1008      Modeline indicator: `MIME/Ltn-5'.  A type `iso2022' coding system
1009      with `ascii' (G0) and `latin-iso8859-9' (G1) initially invoked.
1010
1011 `koi8-r'
1012 `koi8-r-dos'
1013 `koi8-r-mac'
1014 `koi8-r-unix'
1015      Modeline indicator: `KOI8'.  A type `ccl' coding-system used for
1016      KOI8-R, an encoding of the Cyrillic alphabet.
1017
1018 `shift_jis'
1019 `shift_jis-dos'
1020 `shift_jis-mac'
1021 `shift_jis-unix'
1022      Modeline indicator: `Ja/SJIS'.  A type `shift-jis' coding-system
1023      implementing the Shift-JIS encoding for Japanese.  The underscore
1024      is to conform to the MIME charset implementing this encoding.
1025
1026 `tis-620'
1027 `tis-620-dos'
1028 `tis-620-mac'
1029 `tis-620-unix'
1030      Modeline indicator: `TIS620'.  A type `ccl' encoding for Thai.  The
1031      external encoding is defined by TIS620, the internal encoding is
1032      peculiar to MULE, and called `thai-xtis'.
1033
1034 `viqr'
1035      Modeline indicator: `VIQR'.  A type `no-conversion' coding system
1036      with Unix EOL convention (ie, no conversion) using
1037      post-read-decode and pre-write-encode functions to translate the
1038      VIQR coding system for Vietnamese.
1039
1040 `viscii'
1041 `viscii-dos'
1042 `viscii-mac'
1043 `viscii-unix'
1044      Modeline indicator: `VISCII'.  A type `ccl' coding-system used for
1045      VISCII 1.1 for Vietnamese.  Differs slightly from VSCII; VISCII is
1046      given priority by XEmacs.
1047
1048 `vscii'
1049 `vscii-dos'
1050 `vscii-mac'
1051 `vscii-unix'
1052      Modeline indicator: `VSCII'.  A type `ccl' coding-system used for
1053      VSCII 1.1 for Vietnamese.  Differs slightly from VISCII, which is
1054      given priority by XEmacs.  Use `(prefer-coding-system
1055      'vietnamese-vscii)' to give priority to VSCII.
1056
1057 \1f
1058 File: lispref.info,  Node: CCL,  Next: Category Tables,  Prev: Coding Systems,  Up: MULE
1059
1060 CCL
1061 ===
1062
1063    CCL (Code Conversion Language) is a simple structured programming
1064 language designed for character coding conversions.  A CCL program is
1065 compiled to CCL code (represented by a vector of integers) and executed
1066 by the CCL interpreter embedded in Emacs.  The CCL interpreter
1067 implements a virtual machine with 8 registers called `r0', ..., `r7', a
1068 number of control structures, and some I/O operators.  Take care when
1069 using registers `r0' (used in implicit "set" statements) and especially
1070 `r7' (used internally by several statements and operations, especially
1071 for multiple return values and I/O operations).
1072
1073    CCL is used for code conversion during process I/O and file I/O for
1074 non-ISO2022 coding systems.  (It is the only way for a user to specify a
1075 code conversion function.)  It is also used for calculating the code
1076 point of an X11 font from a character code.  However, since CCL is
1077 designed as a powerful programming language, it can be used for more
1078 generic calculation where efficiency is demanded.  A combination of
1079 three or more arithmetic operations can be calculated faster by CCL than
1080 by Emacs Lisp.
1081
1082    *Warning:*  The code in `src/mule-ccl.c' and
1083 `$packages/lisp/mule-base/mule-ccl.el' is the definitive description of
1084 CCL's semantics.  The previous version of this section contained
1085 several typos and obsolete names left from earlier versions of MULE,
1086 and many may remain.  (I am not an experienced CCL programmer; the few
1087 who know CCL well find writing English painful.)
1088
1089    A CCL program transforms an input data stream into an output data
1090 stream.  The input stream, held in a buffer of constant bytes, is left
1091 unchanged.  The buffer may be filled by an external input operation,
1092 taken from an Emacs buffer, or taken from a Lisp string.  The output
1093 buffer is a dynamic array of bytes, which can be written by an external
1094 output operation, inserted into an Emacs buffer, or returned as a Lisp
1095 string.
1096
1097    A CCL program is a (Lisp) list containing two or three members.  The
1098 first member is the "buffer magnification", which indicates the
1099 required minimum size of the output buffer as a multiple of the input
1100 buffer.  It is followed by the "main block" which executes while there
1101 is input remaining, and an optional "EOF block" which is executed when
1102 the input is exhausted.  Both the main block and the EOF block are CCL
1103 blocks.
1104
1105    A "CCL block" is either a CCL statement or list of CCL statements.
1106 A "CCL statement" is either a "set statement" (either an integer or an
1107 "assignment", which is a list of a register to receive the assignment,
1108 an assignment operator, and an expression) or a "control statement" (a
1109 list starting with a keyword, whose allowable syntax depends on the
1110 keyword).
1111
1112 * Menu:
1113
1114 * CCL Syntax::          CCL program syntax in BNF notation.
1115 * CCL Statements::      Semantics of CCL statements.
1116 * CCL Expressions::     Operators and expressions in CCL.
1117 * Calling CCL::         Running CCL programs.
1118 * CCL Examples::        The encoding functions for Big5 and KOI-8.
1119
1120 \1f
1121 File: lispref.info,  Node: CCL Syntax,  Next: CCL Statements,  Up: CCL
1122
1123 CCL Syntax
1124 ----------
1125
1126    The full syntax of a CCL program in BNF notation:
1127
1128 CCL_PROGRAM :=
1129         (BUFFER_MAGNIFICATION
1130          CCL_MAIN_BLOCK
1131          [ CCL_EOF_BLOCK ])
1132
1133 BUFFER_MAGNIFICATION := integer
1134 CCL_MAIN_BLOCK := CCL_BLOCK
1135 CCL_EOF_BLOCK := CCL_BLOCK
1136
1137 CCL_BLOCK :=
1138         STATEMENT | (STATEMENT [STATEMENT ...])
1139 STATEMENT :=
1140         SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
1141         | CALL | END
1142
1143 SET :=
1144         (REG = EXPRESSION)
1145         | (REG ASSIGNMENT_OPERATOR EXPRESSION)
1146         | integer
1147
1148 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
1149
1150 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
1151 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
1152 LOOP := (loop STATEMENT [STATEMENT ...])
1153 BREAK := (break)
1154 REPEAT :=
1155         (repeat)
1156         | (write-repeat [REG | integer | string])
1157         | (write-read-repeat REG [integer | ARRAY])
1158 READ :=
1159         (read REG ...)
1160         | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
1161         | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
1162 WRITE :=
1163         (write REG ...)
1164         | (write EXPRESSION)
1165         | (write integer) | (write string) | (write REG ARRAY)
1166         | string
1167 CALL := (call ccl-program-name)
1168 END := (end)
1169
1170 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
1171 ARG := REG | integer
1172 OPERATOR :=
1173         + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
1174         | < | > | == | <= | >= | != | de-sjis | en-sjis
1175 ASSIGNMENT_OPERATOR :=
1176         += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
1177 ARRAY := '[' integer ... ']'
1178
1179 \1f
1180 File: lispref.info,  Node: CCL Statements,  Next: CCL Expressions,  Prev: CCL Syntax,  Up: CCL
1181
1182 CCL Statements
1183 --------------
1184
1185    The Emacs Code Conversion Language provides the following statement
1186 types: "set", "if", "branch", "loop", "repeat", "break", "read",
1187 "write", "call", and "end".
1188
1189 Set statement:
1190 ==============
1191
1192    The "set" statement has three variants with the syntaxes `(REG =
1193 EXPRESSION)', `(REG ASSIGNMENT_OPERATOR EXPRESSION)', and `INTEGER'.
1194 The assignment operator variation of the "set" statement works the same
1195 way as the corresponding C expression statement does.  The assignment
1196 operators are `+=', `-=', `*=', `/=', `%=', `&=', `|=', `^=', `<<=',
1197 and `>>=', and they have the same meanings as in C.  A "naked integer"
1198 INTEGER is equivalent to a SET statement of the form `(r0 = INTEGER)'.
1199
1200 I/O statements:
1201 ===============
1202
1203    The "read" statement takes one or more registers as arguments.  It
1204 reads one byte (a C char) from the input into each register in turn.
1205
1206    The "write" takes several forms.  In the form `(write REG ...)' it
1207 takes one or more registers as arguments and writes each in turn to the
1208 output.  The integer in a register (interpreted as an Emchar) is
1209 encoded to multibyte form (ie, Bufbytes) and written to the current
1210 output buffer.  If it is less than 256, it is written as is.  The forms
1211 `(write EXPRESSION)' and `(write INTEGER)' are treated analogously.
1212 The form `(write STRING)' writes the constant string to the output.  A
1213 "naked string" `STRING' is equivalent to the statement `(write
1214 STRING)'.  The form `(write REG ARRAY)' writes the REGth element of the
1215 ARRAY to the output.
1216
1217 Conditional statements:
1218 =======================
1219
1220    The "if" statement takes an EXPRESSION, a CCL BLOCK, and an optional
1221 SECOND CCL BLOCK as arguments.  If the EXPRESSION evaluates to
1222 non-zero, the first CCL BLOCK is executed.  Otherwise, if there is a
1223 SECOND CCL BLOCK, it is executed.
1224
1225    The "read-if" variant of the "if" statement takes an EXPRESSION, a
1226 CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.  The
1227 EXPRESSION must have the form `(REG OPERATOR OPERAND)' (where OPERAND is
1228 a register or an integer).  The `read-if' statement first reads from
1229 the input into the first register operand in the EXPRESSION, then
1230 conditionally executes a CCL block just as the `if' statement does.
1231
1232    The "branch" statement takes an EXPRESSION and one or more CCL
1233 blocks as arguments.  The CCL blocks are treated as a zero-indexed
1234 array, and the `branch' statement uses the EXPRESSION as the index of
1235 the CCL block to execute.  Null CCL blocks may be used as no-ops,
1236 continuing execution with the statement following the `branch'
1237 statement in the containing CCL block.  Out-of-range values for the
1238 EXPRESSION are also treated as no-ops.
1239
1240    The "read-branch" variant of the "branch" statement takes an
1241 REGISTER, a CCL BLOCK, and an optional SECOND CCL BLOCK as arguments.
1242 The `read-branch' statement first reads from the input into the
1243 REGISTER, then conditionally executes a CCL block just as the `branch'
1244 statement does.
1245
1246 Loop control statements:
1247 ========================
1248
1249    The "loop" statement creates a block with an implied jump from the
1250 end of the block back to its head.  The loop is exited on a `break'
1251 statement, and continued without executing the tail by a `repeat'
1252 statement.
1253
1254    The "break" statement, written `(break)', terminates the current
1255 loop and continues with the next statement in the current block.
1256
1257    The "repeat" statement has three variants, `repeat', `write-repeat',
1258 and `write-read-repeat'.  Each continues the current loop from its
1259 head, possibly after performing I/O.  `repeat' takes no arguments and
1260 does no I/O before jumping.  `write-repeat' takes a single argument (a
1261 register, an integer, or a string), writes it to the output, then jumps.
1262 `write-read-repeat' takes one or two arguments.  The first must be a
1263 register.  The second may be an integer or an array; if absent, it is
1264 implicitly set to the first (register) argument.  `write-read-repeat'
1265 writes its second argument to the output, then reads from the input
1266 into the register, and finally jumps.  See the `write' and `read'
1267 statements for the semantics of the I/O operations for each type of
1268 argument.
1269
1270 Other control statements:
1271 =========================
1272
1273    The "call" statement, written `(call CCL-PROGRAM-NAME)', executes a
1274 CCL program as a subroutine.  It does not return a value to the caller,
1275 but can modify the register status.
1276
1277    The "end" statement, written `(end)', terminates the CCL program
1278 successfully, and returns to caller (which may be a CCL program).  It
1279 does not alter the status of the registers.
1280