git.chise.org Git - chise/xemacs-chise.git-/blob - info/lispref.info-41

   1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Level 3 Basics,  Next: Level 3 Primitives,  Up: I18N Level 3
  54
  55 Level 3 Basics
  56 --------------
  57
  58    XEmacs now provides alpha-level functionality for I18N Level 3.
  59 This means that everything necessary for full messaging is available,
  60 but not every file has been converted.
  61
  62    The two message files which have been created are `src/emacs.po' and
  63 `lisp/packages/mh-e.po'.  Both files need to be converted using
  64 `msgfmt', and the resulting `.mo' files placed in some locale's
  65 `LC_MESSAGES' directory.  The test "translations" in these files are
  66 the original messages prefixed by `TRNSLT_'.
  67
  68    The domain for a variable is stored on the variable's property list
  69 under the property name VARIABLE-DOMAIN.  The function
  70 `documentation-property' uses this information when translating a
  71 variable's documentation.
  72
  73 \1f
  74 File: lispref.info,  Node: Level 3 Primitives,  Next: Dynamic Messaging,  Prev: Level 3 Basics,  Up: I18N Level 3
  75
  76 Level 3 Primitives
  77 ------------------
  78
  79  - Function: gettext string
  80      This function looks up STRING in the default message domain and
  81      returns its translation.  If `I18N3' was not enabled when XEmacs
  82      was compiled, it just returns STRING.
  83
  84  - Function: dgettext domain string
  85      This function looks up STRING in the specified message domain and
  86      returns its translation.  If `I18N3' was not enabled when XEmacs
  87      was compiled, it just returns STRING.
  88
  89  - Function: bind-text-domain domain pathname
  90      This function associates a pathname with a message domain.  Here's
  91      how the path to message file is constructed under SunOS 5.x:
  92
  93           `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo'
  94
  95      If `I18N3' was not enabled when XEmacs was compiled, this function
  96      does nothing.
  97
  98  - Special Form: domain string
  99      This function specifies the text domain used for translating
 100      documentation strings and interactive prompts of a function.  For
 101      example, write:
 102
 103           (defun foo (arg) "Doc string" (domain "emacs-foo") ...)
 104
 105      to specify `emacs-foo' as the text domain of the function `foo'.
 106      The "call" to `domain' is actually a declaration rather than a
 107      function; when actually called, `domain' just returns `nil'.
 108
 109  - Function: domain-of function
 110      This function returns the text domain of FUNCTION; it returns
 111      `nil' if it is the default domain.  If `I18N3' was not enabled
 112      when XEmacs was compiled, it always returns `nil'.
 113
 114 \1f
 115 File: lispref.info,  Node: Dynamic Messaging,  Next: Domain Specification,  Prev: Level 3 Primitives,  Up: I18N Level 3
 116
 117 Dynamic Messaging
 118 -----------------
 119
 120    The `format' function has been extended to permit you to change the
 121 order of parameter insertion.  For example, the conversion format
 122 `%1$s' inserts parameter one as a string, while `%2$s' inserts
 123 parameter two.  This is useful when creating translations which require
 124 you to change the word order.
 125
 126 \1f
 127 File: lispref.info,  Node: Domain Specification,  Next: Documentation String Extraction,  Prev: Dynamic Messaging,  Up: I18N Level 3
 128
 129 Domain Specification
 130 --------------------
 131
 132    The default message domain of XEmacs is `emacs'.  For add-on
 133 packages, it is best to use a different domain.  For example, let us
 134 say we want to convert the "gorilla" package to use the domain
 135 `emacs-gorilla'.  To translate the message "What gorilla?", use
 136 `dgettext' as follows:
 137
 138      (dgettext "emacs-gorilla" "What gorilla?")
 139
 140    A function (or macro) which has a documentation string or an
 141 interactive prompt needs to be associated with the domain in order for
 142 the documentation or prompt to be translated.  This is done with the
 143 `domain' special form as follows:
 144
 145      (defun scratch (location)
 146        "Scratch the specified location."
 147        (domain "emacs-gorilla")
 148        (interactive "sScratch: ")
 149        ... )
 150
 151    It is most efficient to specify the domain in the first line of the
 152 function body, before the `interactive' form.
 153
 154    For variables and constants which have documentation strings,
 155 specify the domain after the documentation.
 156
 157  - Special Form: defvar symbol [value [doc-string [domain]]]
 158      Example:
 159           (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla")
 160
 161  - Special Form: defconst symbol [value [doc-string [domain]]]
 162      Example:
 163           (defconst limbs 4 "Number of limbs" "emacs-gorilla")
 164
 165    Autoloaded functions which are specified in `loaddefs.el' do not need
 166 to have a domain specification, because their documentation strings are
 167 extracted into the main message base.  However, for autoloaded functions
 168 which are specified in a separate package, use following syntax:
 169
 170  - Function: autoload symbol filename &optional docstring interactive
 171           macro domain
 172      Example:
 173           (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")
 174
 175 \1f
 176 File: lispref.info,  Node: Documentation String Extraction,  Prev: Domain Specification,  Up: I18N Level 3
 177
 178 Documentation String Extraction
 179 -------------------------------
 180
 181    The utility `etc/make-po' scans the file `DOC' to extract
 182 documentation strings and creates a message file `doc.po'.  This file
 183 may then be inserted within `emacs.po'.
 184
 185    Currently, `make-po' is hard-coded to read from `DOC' and write to
 186 `doc.po'.  In order to extract documentation strings from an add-on
 187 package, first run `make-docfile' on the package to produce the `DOC'
 188 file.  Then run `make-po -p' with the `-p' argument to indicate that we
 189 are extracting documentation for an add-on package.
 190
 191    (The `-p' argument is a kludge to make up for a subtle difference
 192 between pre-loaded documentation and add-on documentation:  For add-on
 193 packages, the final carriage returns in the strings produced by
 194 `make-docfile' must be ignored.)
 195
 196 \1f
 197 File: lispref.info,  Node: I18N Level 4,  Prev: I18N Level 3,  Up: Internationalization
 198
 199 I18N Level 4
 200 ============
 201
 202    The Asian-language support in XEmacs is called "MULE".  *Note MULE::.
 203
 204 \1f
 205 File: lispref.info,  Node: MULE,  Next: Tips,  Prev: Internationalization,  Up: Top
 206
 207 MULE
 208 ****
 209
 210    "MULE" is the name originally given to the version of GNU Emacs
 211 extended for multi-lingual (and in particular Asian-language) support.
 212 "MULE" is short for "MUlti-Lingual Emacs".  It was originally called
 213 Nemacs ("Nihon Emacs" where "Nihon" is the Japanese word for "Japan"),
 214 when it only provided support for Japanese.  XEmacs refers to its
 215 multi-lingual support as "MULE support" since it is based on "MULE".
 216
 217 * Menu:
 218
 219 * Internationalization Terminology::
 220                         Definition of various internationalization terms.
 221 * Charsets::            Sets of related characters.
 222 * MULE Characters::     Working with characters in XEmacs/MULE.
 223 * Composite Characters:: Making new characters by overstriking other ones.
 224 * ISO 2022::            An international standard for charsets and encodings.
 225 * Coding Systems::      Ways of representing a string of chars using integers.
 226 * CCL::                 A special language for writing fast converters.
 227 * Category Tables::     Subdividing charsets into groups.
 228
 229 \1f
 230 File: lispref.info,  Node: Internationalization Terminology,  Next: Charsets,  Up: MULE
 231
 232 Internationalization Terminology
 233 ================================
 234
 235    In internationalization terminology, a string of text is divided up
 236 into "characters", which are the printable units that make up the text.
 237 A single character is (for example) a capital `A', the number `2', a
 238 Katakana character, a Kanji ideograph (an "ideograph" is a "picture"
 239 character, such as is used in Japanese Kanji, Chinese Hanzi, and Korean
 240 Hangul; typically there are thousands of such ideographs in each
 241 language), etc.  The basic property of a character is its shape.  Note
 242 that the same character may be drawn by two different people (or in two
 243 different fonts) in slightly different ways, although the basic shape
 244 will be the same.
 245
 246    In some cases, the differences will be significant enough that it is
 247 actually possible to identify two or more distinct shapes that both
 248 represent the same character.  For example, the lowercase letters `a'
 249 and `g' each have two distinct possible shapes--the `a' can optionally
 250 have a curved tail projecting off the top, and the `g' can be formed
 251 either of two loops, or of one loop and a tail hanging off the bottom.
 252 Such distinct possible shapes of a character are called "glyphs".  The
 253 important characteristic of two glyphs making up the same character is
 254 that the choice between one or the other is purely stylistic and has no
 255 linguistic effect on a word (this is the reason why a capital `A' and
 256 lowercase `a' are different characters rather than different
 257 glyphs--e.g.  `Aspen' is a city while `aspen' is a kind of tree).
 258
 259    Note that "character" and "glyph" are used differently here than
 260 elsewhere in XEmacs.
 261
 262    A "character set" is simply a set of related characters.  ASCII, for
 263 example, is a set of 94 characters (or 128, if you count non-printing
 264 characters).  Other character sets are ISO8859-1 (ASCII plus various
 265 accented characters and other international symbols), JISX0201 (ASCII,
 266 more or less, plus half-width Katakana), JISX0208 (Japanese Kanji),
 267 JISX0212 (a second set of less-used Japanese Kanji), GB2312 (Mainland
 268 Chinese Hanzi), etc.
 269
 270    Every character set has one or more "orderings", which can be viewed
 271 as a way of assigning a number (or set of numbers) to each character in
 272 the set.  For most character sets, there is a standard ordering, and in
 273 fact all of the character sets mentioned above define a particular
 274 ordering.  ASCII, for example, places letters in their "natural" order,
 275 puts uppercase letters before lowercase letters, numbers before
 276 letters, etc.  Note that for many of the Asian character sets, there is
 277 no natural ordering of the characters.  The actual orderings are based
 278 on one or more salient characteristic, of which there are many to
 279 choose from--e.g. number of strokes, common radicals, phonetic
 280 ordering, etc.
 281
 282    The set of numbers assigned to any particular character are called
 283 the character's "position codes".  The number of position codes
 284 required to index a particular character in a character set is called
 285 the "dimension" of the character set.  ASCII, being a relatively small
 286 character set, is of dimension one, and each character in the set is
 287 indexed using a single position code, in the range 0 through 127 (if
 288 non-printing characters are included) or 33 through 126 (if only the
 289 printing characters are considered).  JISX0208, i.e.  Japanese Kanji,
 290 has thousands of characters, and is of dimension two - every character
 291 is indexed by two position codes, each in the range 33 through 126.
 292 (Note that the choice of the range here is somewhat arbitrary.
 293 Although a character set such as JISX0208 defines an _ordering_ of all
 294 its characters, it does not define the actual mapping between numbers
 295 and characters.  You could just as easily index the characters in
 296 JISX0208 using numbers in the range 0 through 93, 1 through 94, 2
 297 through 95, etc.  The reason for the actual range chosen is so that the
 298 position codes match up with the actual values used in the common
 299 encodings.)
 300
 301    An "encoding" is a way of numerically representing characters from
 302 one or more character sets into a stream of like-sized numerical values
 303 called "words"; typically these are 8-bit, 16-bit, or 32-bit
 304 quantities.  If an encoding encompasses only one character set, then the
 305 position codes for the characters in that character set could be used
 306 directly. (This is the case with ASCII, and as a result, most people do
 307 not understand the difference between a character set and an encoding.)
 308 This is not possible, however, if more than one character set is to be
 309 used in the encoding.  For example, printed Japanese text typically
 310 requires characters from multiple character sets--ASCII, JISX0208, and
 311 JISX0212, to be specific.  Each of these is indexed using one or more
 312 position codes in the range 33 through 126, so the position codes could
 313 not be used directly or there would be no way to tell which character
 314 was meant.  Different Japanese encodings handle this differently--JIS
 315 uses special escape characters to denote different character sets; EUC
 316 sets the high bit of the position codes for JISX0208 and JISX0212, and
 317 puts a special extra byte before each JISX0212 character; etc. (JIS,
 318 EUC, and most of the other encodings you will encounter are 7-bit or
 319 8-bit encodings.  There is one common 16-bit encoding, which is Unicode;
 320 this strives to represent all the world's characters in a single large
 321 character set.  32-bit encodings are generally used internally in
 322 programs to simplify the code that manipulates them; however, they are
 323 not much used externally because they are not very space-efficient.)
 324
 325    Encodings are classified as either "modal" or "non-modal".  In a
 326 "modal encoding", there are multiple states that the encoding can be in,
 327 and the interpretation of the values in the stream depends on the
 328 current global state of the encoding.  Special values in the encoding,
 329 called "escape sequences", are used to change the global state.  JIS,
 330 for example, is a modal encoding.  The bytes `ESC $ B' indicate that,
 331 from then on, bytes are to be interpreted as position codes for
 332 JISX0208, rather than as ASCII.  This effect is cancelled using the
 333 bytes `ESC ( B', which mean "switch from whatever the current state is
 334 to ASCII".  To switch to JISX0212, the escape sequence `ESC $ ( D'.
 335 (Note that here, as is common, the escape sequences do in fact begin
 336 with `ESC'.  This is not necessarily the case, however.)
 337
 338    A "non-modal encoding" has no global state that extends past the
 339 character currently being interpreted.  EUC, for example, is a
 340 non-modal encoding.  Characters in JISX0208 are encoded by setting the
 341 high bit of the position codes, and characters in JISX0212 are encoded
 342 by doing the same but also prefixing the character with the byte 0x8F.
 343
 344    The advantage of a modal encoding is that it is generally more
 345 space-efficient, and is easily extendable because there are essentially
 346 an arbitrary number of escape sequences that can be created.  The
 347 disadvantage, however, is that it is much more difficult to work with
 348 if it is not being processed in a sequential manner.  In the non-modal
 349 EUC encoding, for example, the byte 0x41 always refers to the letter
 350 `A'; whereas in JIS, it could either be the letter `A', or one of the
 351 two position codes in a JISX0208 character, or one of the two position
 352 codes in a JISX0212 character.  Determining exactly which one is meant
 353 could be difficult and time-consuming if the previous bytes in the
 354 string have not already been processed.
 355
 356    Non-modal encodings are further divided into "fixed-width" and
 357 "variable-width" formats.  A fixed-width encoding always uses the same
 358 number of words per character, whereas a variable-width encoding does
 359 not.  EUC is a good example of a variable-width encoding: one to three
 360 bytes are used per character, depending on the character set.  16-bit
 361 and 32-bit encodings are nearly always fixed-width, and this is in fact
 362 one of the main reasons for using an encoding with a larger word size.
 363 The advantages of fixed-width encodings should be obvious.  The
 364 advantages of variable-width encodings are that they are generally more
 365 space-efficient and allow for compatibility with existing 8-bit
 366 encodings such as ASCII.
 367
 368    Note that the bytes in an 8-bit encoding are often referred to as
 369 "octets" rather than simply as bytes.  This terminology dates back to
 370 the days before 8-bit bytes were universal, when some computers had
 371 9-bit bytes, others had 10-bit bytes, etc.
 372
 373 \1f
 374 File: lispref.info,  Node: Charsets,  Next: MULE Characters,  Prev: Internationalization Terminology,  Up: MULE
 375
 376 Charsets
 377 ========
 378
 379    A "charset" in MULE is an object that encapsulates a particular
 380 character set as well as an ordering of those characters.  Charsets are
 381 permanent objects and are named using symbols, like faces.
 382
 383  - Function: charsetp object
 384      This function returns non-`nil' if OBJECT is a charset.
 385
 386 * Menu:
 387
 388 * Charset Properties::          Properties of a charset.
 389 * Basic Charset Functions::     Functions for working with charsets.
 390 * Charset Property Functions::  Functions for accessing charset properties.
 391 * Predefined Charsets::         Predefined charset objects.
 392
 393 \1f
 394 File: lispref.info,  Node: Charset Properties,  Next: Basic Charset Functions,  Up: Charsets
 395
 396 Charset Properties
 397 ------------------
 398
 399    Charsets have the following properties:
 400
 401 `name'
 402      A symbol naming the charset.  Every charset must have a different
 403      name; this allows a charset to be referred to using its name
 404      rather than the actual charset object.
 405
 406 `doc-string'
 407      A documentation string describing the charset.
 408
 409 `registry'
 410      A regular expression matching the font registry field for this
 411      character set.  For example, both the `ascii' and `latin-iso8859-1'
 412      charsets use the registry `"ISO8859-1"'.  This field is used to
 413      choose an appropriate font when the user gives a general font
 414      specification such as `-*-courier-medium-r-*-140-*', i.e. a
 415      14-point upright medium-weight Courier font.
 416
 417 `dimension'
 418      Number of position codes used to index a character in the
 419      character set.  XEmacs/MULE can only handle character sets of
 420      dimension 1 or 2.  This property defaults to 1.
 421
 422 `chars'
 423      Number of characters in each dimension.  In XEmacs/MULE, the only
 424      allowed values are 94 or 96. (There are a couple of pre-defined
 425      character sets, such as ASCII, that do not follow this, but you
 426      cannot define new ones like this.) Defaults to 94.  Note that if
 427      the dimension is 2, the character set thus described is 94x94 or
 428      96x96.
 429
 430 `columns'
 431      Number of columns used to display a character in this charset.
 432      Only used in TTY mode. (Under X, the actual width of a character
 433      can be derived from the font used to display the characters.)  If
 434      unspecified, defaults to the dimension. (This is almost always the
 435      correct value, because character sets with dimension 2 are usually
 436      ideograph character sets, which need two columns to display the
 437      intricate ideographs.)
 438
 439 `direction'
 440      A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left).
 441      Defaults to `l2r'.  This specifies the direction that the text
 442      should be displayed in, and will be left-to-right for most
 443      charsets but right-to-left for Hebrew and Arabic. (Right-to-left
 444      display is not currently implemented.)
 445
 446 `final'
 447      Final byte of the standard ISO 2022 escape sequence designating
 448      this charset.  Must be supplied.  Each combination of (DIMENSION,
 449      CHARS) defines a separate namespace for final bytes, and each
 450      charset within a particular namespace must have a different final
 451      byte.  Note that ISO 2022 restricts the final byte to the range
 452      0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2.
 453      Note also that final bytes in the range 0x30 - 0x3F are reserved
 454      for user-defined (not official) character sets.  For more
 455      information on ISO 2022, see *Note Coding Systems::.
 456
 457 `graphic'
 458      0 (use left half of font on output) or 1 (use right half of font on
 459      output).  Defaults to 0.  This specifies how to convert the
 460      position codes that index a character in a character set into an
 461      index into the font used to display the character set.  With
 462      `graphic' set to 0, position codes 33 through 126 map to font
 463      indices 33 through 126; with it set to 1, position codes 33
 464      through 126 map to font indices 161 through 254 (i.e. the same
 465      number but with the high bit set).  For example, for a font whose
 466      registry is ISO8859-1, the left half of the font (octets 0x20 -
 467      0x7F) is the `ascii' charset, while the right half (octets 0xA0 -
 468      0xFF) is the `latin-iso8859-1' charset.
 469
 470 `ccl-program'
 471      A compiled CCL program used to convert a character in this charset
 472      into an index into the font.  This is in addition to the `graphic'
 473      property.  If a CCL program is defined, the position codes of a
 474      character will first be processed according to `graphic' and then
 475      passed through the CCL program, with the resulting values used to
 476      index the font.
 477
 478      This is used, for example, in the Big5 character set (used in
 479      Taiwan).  This character set is not ISO-2022-compliant, and its
 480      size (94x157) does not fit within the maximum 96x96 size of
 481      ISO-2022-compliant character sets.  As a result, XEmacs/MULE
 482      splits it (in a rather complex fashion, so as to group the most
 483      commonly used characters together) into two charset objects
 484      (`big5-1' and `big5-2'), each of size 94x94, and each charset
 485      object uses a CCL program to convert the modified position codes
 486      back into standard Big5 indices to retrieve a character from a
 487      Big5 font.
 488
 489    Most of the above properties can only be changed when the charset is
 490 created.  *Note Charset Property Functions::.
 491
 492 \1f
 493 File: lispref.info,  Node: Basic Charset Functions,  Next: Charset Property Functions,  Prev: Charset Properties,  Up: Charsets
 494
 495 Basic Charset Functions
 496 -----------------------
 497
 498  - Function: find-charset charset-or-name
 499      This function retrieves the charset of the given name.  If
 500      CHARSET-OR-NAME is a charset object, it is simply returned.
 501      Otherwise, CHARSET-OR-NAME should be a symbol.  If there is no
 502      such charset, `nil' is returned.  Otherwise the associated charset
 503      object is returned.
 504
 505  - Function: get-charset name
 506      This function retrieves the charset of the given name.  Same as
 507      `find-charset' except an error is signalled if there is no such
 508      charset instead of returning `nil'.
 509
 510  - Function: charset-list
 511      This function returns a list of the names of all defined charsets.
 512
 513  - Function: make-charset name doc-string props
 514      This function defines a new character set.  This function is for
 515      use with Mule support.  NAME is a symbol, the name by which the
 516      character set is normally referred.  DOC-STRING is a string
 517      describing the character set.  PROPS is a property list,
 518      describing the specific nature of the character set.  The
 519      recognized properties are `registry', `dimension', `columns',
 520      `chars', `final', `graphic', `direction', and `ccl-program', as
 521      previously described.
 522
 523  - Function: make-reverse-direction-charset charset new-name
 524      This function makes a charset equivalent to CHARSET but which goes
 525      in the opposite direction.  NEW-NAME is the name of the new
 526      charset.  The new charset is returned.
 527
 528  - Function: charset-from-attributes dimension chars final &optional
 529           direction
 530      This function returns a charset with the given DIMENSION, CHARS,
 531      FINAL, and DIRECTION.  If DIRECTION is omitted, both directions
 532      will be checked (left-to-right will be returned if character sets
 533      exist for both directions).
 534
 535  - Function: charset-reverse-direction-charset charset
 536      This function returns the charset (if any) with the same dimension,
 537      number of characters, and final byte as CHARSET, but which is
 538      displayed in the opposite direction.
 539
 540 \1f
 541 File: lispref.info,  Node: Charset Property Functions,  Next: Predefined Charsets,  Prev: Basic Charset Functions,  Up: Charsets
 542
 543 Charset Property Functions
 544 --------------------------
 545
 546    All of these functions accept either a charset name or charset
 547 object.
 548
 549  - Function: charset-property charset prop
 550      This function returns property PROP of CHARSET.  *Note Charset
 551      Properties::.
 552
 553    Convenience functions are also provided for retrieving individual
 554 properties of a charset.
 555
 556  - Function: charset-name charset
 557      This function returns the name of CHARSET.  This will be a symbol.
 558
 559  - Function: charset-doc-string charset
 560      This function returns the doc string of CHARSET.
 561
 562  - Function: charset-registry charset
 563      This function returns the registry of CHARSET.
 564
 565  - Function: charset-dimension charset
 566      This function returns the dimension of CHARSET.
 567
 568  - Function: charset-chars charset
 569      This function returns the number of characters per dimension of
 570      CHARSET.
 571
 572  - Function: charset-columns charset
 573      This function returns the number of display columns per character
 574      (in TTY mode) of CHARSET.
 575
 576  - Function: charset-direction charset
 577      This function returns the display direction of CHARSET--either
 578      `l2r' or `r2l'.
 579
 580  - Function: charset-final charset
 581      This function returns the final byte of the ISO 2022 escape
 582      sequence designating CHARSET.
 583
 584  - Function: charset-graphic charset
 585      This function returns either 0 or 1, depending on whether the
 586      position codes of characters in CHARSET map to the left or right
 587      half of their font, respectively.
 588
 589  - Function: charset-ccl-program charset
 590      This function returns the CCL program, if any, for converting
 591      position codes of characters in CHARSET into font indices.
 592
 593    The only property of a charset that can currently be set after the
 594 charset has been created is the CCL program.
 595
 596  - Function: set-charset-ccl-program charset ccl-program
 597      This function sets the `ccl-program' property of CHARSET to
 598      CCL-PROGRAM.
 599
 600 \1f
 601 File: lispref.info,  Node: Predefined Charsets,  Prev: Charset Property Functions,  Up: Charsets
 602
 603 Predefined Charsets
 604 -------------------
 605
 606    The following charsets are predefined in the C code.
 607
 608      Name                    Type  Fi Gr Dir Registry
 609      --------------------------------------------------------------
 610      ascii                    94    B  0  l2r ISO8859-1
 611      control-1                94       0  l2r ---
 612      latin-iso8859-1          94    A  1  l2r ISO8859-1
 613      latin-iso8859-2          96    B  1  l2r ISO8859-2
 614      latin-iso8859-3          96    C  1  l2r ISO8859-3
 615      latin-iso8859-4          96    D  1  l2r ISO8859-4
 616      cyrillic-iso8859-5       96    L  1  l2r ISO8859-5
 617      arabic-iso8859-6         96    G  1  r2l ISO8859-6
 618      greek-iso8859-7          96    F  1  l2r ISO8859-7
 619      hebrew-iso8859-8         96    H  1  r2l ISO8859-8
 620      latin-iso8859-9          96    M  1  l2r ISO8859-9
 621      thai-tis620              96    T  1  l2r TIS620
 622      katakana-jisx0201        94    I  1  l2r JISX0201.1976
 623      latin-jisx0201           94    J  0  l2r JISX0201.1976
 624      japanese-jisx0208-1978   94x94 @  0  l2r JISX0208.1978
 625      japanese-jisx0208        94x94 B  0  l2r JISX0208.19(83|90)
 626      japanese-jisx0212        94x94 D  0  l2r JISX0212
 627      chinese-gb2312           94x94 A  0  l2r GB2312
 628      chinese-cns11643-1       94x94 G  0  l2r CNS11643.1
 629      chinese-cns11643-2       94x94 H  0  l2r CNS11643.2
 630      chinese-big5-1           94x94 0  0  l2r Big5
 631      chinese-big5-2           94x94 1  0  l2r Big5
 632      korean-ksc5601           94x94 C  0  l2r KSC5601
 633      composite                96x96    0  l2r ---
 634
 635    The following charsets are predefined in the Lisp code.
 636
 637      Name                     Type  Fi Gr Dir Registry
 638      --------------------------------------------------------------
 639      arabic-digit             94    2  0  l2r MuleArabic-0
 640      arabic-1-column          94    3  0  r2l MuleArabic-1
 641      arabic-2-column          94    4  0  r2l MuleArabic-2
 642      sisheng                  94    0  0  l2r sisheng_cwnn\|OMRON_UDC_ZH
 643      chinese-cns11643-3       94x94 I  0  l2r CNS11643.1
 644      chinese-cns11643-4       94x94 J  0  l2r CNS11643.1
 645      chinese-cns11643-5       94x94 K  0  l2r CNS11643.1
 646      chinese-cns11643-6       94x94 L  0  l2r CNS11643.1
 647      chinese-cns11643-7       94x94 M  0  l2r CNS11643.1
 648      ethiopic                 94x94 2  0  l2r Ethio
 649      ascii-r2l                94    B  0  r2l ISO8859-1
 650      ipa                      96    0  1  l2r MuleIPA
 651      vietnamese-lower         96    1  1  l2r VISCII1.1
 652      vietnamese-upper         96    2  1  l2r VISCII1.1
 653
 654    For all of the above charsets, the dimension and number of columns
 655 are the same.
 656
 657    Note that ASCII, Control-1, and Composite are handled specially.
 658 This is why some of the fields are blank; and some of the filled-in
 659 fields (e.g. the type) are not really accurate.
 660
 661 \1f
 662 File: lispref.info,  Node: MULE Characters,  Next: Composite Characters,  Prev: Charsets,  Up: MULE
 663
 664 MULE Characters
 665 ===============
 666
 667  - Function: make-char charset arg1 &optional arg2
 668      This function makes a multi-byte character from CHARSET and octets
 669      ARG1 and ARG2.
 670
 671  - Function: char-charset ch
 672      This function returns the character set of char CH.
 673
 674  - Function: char-octet ch &optional n
 675      This function returns the octet (i.e. position code) numbered N
 676      (should be 0 or 1) of char CH.  N defaults to 0 if omitted.
 677
 678  - Function: find-charset-region start end &optional buffer
 679      This function returns a list of the charsets in the region between
 680      START and END.  BUFFER defaults to the current buffer if omitted.
 681
 682  - Function: find-charset-string string
 683      This function returns a list of the charsets in STRING.
 684
 685 \1f
 686 File: lispref.info,  Node: Composite Characters,  Next: ISO 2022,  Prev: MULE Characters,  Up: MULE
 687
 688 Composite Characters
 689 ====================
 690
 691    Composite characters are not yet completely implemented.
 692
 693  - Function: make-composite-char string
 694      This function converts a string into a single composite character.
 695      The character is the result of overstriking all the characters in
 696      the string.
 697
 698  - Function: composite-char-string ch
 699      This function returns a string of the characters comprising a
 700      composite character.
 701
 702  - Function: compose-region start end &optional buffer
 703      This function composes the characters in the region from START to
 704      END in BUFFER into one composite character.  The composite
 705      character replaces the composed characters.  BUFFER defaults to
 706      the current buffer if omitted.
 707
 708  - Function: decompose-region start end &optional buffer
 709      This function decomposes any composite characters in the region
 710      from START to END in BUFFER.  This converts each composite
 711      character into one or more characters, the individual characters
 712      out of which the composite character was formed.  Non-composite
 713      characters are left as-is.  BUFFER defaults to the current buffer
 714      if omitted.
 715
 716 \1f
 717 File: lispref.info,  Node: ISO 2022,  Next: Coding Systems,  Prev: Composite Characters,  Up: MULE
 718
 719 ISO 2022
 720 ========
 721
 722    This section briefly describes the ISO 2022 encoding standard.  For
 723 more thorough understanding, please refer to the original document of
 724 ISO 2022.
 725
 726    Character sets ("charsets") are classified into the following four
 727 categories, according to the number of characters of charset:
 728 94-charset, 96-charset, 94x94-charset, and 96x96-charset.
 729
 730 94-charset
 731      ASCII(B), left(J) and right(I) half of JISX0201, ...
 732
 733 96-charset
 734      Latin-1(A), Latin-2(B), Latin-3(C), ...
 735
 736 94x94-charset
 737      GB2312(A), JISX0208(B), KSC5601(C), ...
 738
 739 96x96-charset
 740      none for the moment
 741
 742    The character in parentheses after the name of each charset is the
 743 "final character" F, which can be regarded as the identifier of the
 744 charset.  ECMA allocates F to each charset.  F is in the range of
 745 0x30..0x7F, but 0x30..0x3F are only for private use.
 746
 747    Note: "ECMA" = European Computer Manufacturers Association
 748
 749    There are four "registers of charsets", called G0 thru G3.  You can
 750 designate (or assign) any charset to one of these registers.
 751
 752    The code space contained within one octet (of size 256) is divided
 753 into 4 areas: C0, GL, C1, and GR.  GL and GR are the areas into which a
 754 register of charset can be invoked into.
 755
 756              C0: 0x00 - 0x1F
 757              GL: 0x20 - 0x7F
 758              C1: 0x80 - 0x9F
 759              GR: 0xA0 - 0xFF
 760
 761    Usually, in the initial state, G0 is invoked into GL, and G1 is
 762 invoked into GR.
 763
 764    ISO 2022 distinguishes 7-bit environments and 8-bit environments.  In
 765 7-bit environments, only C0 and GL are used.
 766
 767    Charset designation is done by escape sequences of the form:
 768
 769              ESC [I] I F
 770
 771    where I is an intermediate character in the range 0x20 - 0x2F, and F
 772 is the final character identifying this charset.
 773
 774    The meaning of intermediate characters are:
 775
 776              $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
 777              ( [0x28]: designate to G0 a 94-charset whose final byte is F.
 778              ) [0x29]: designate to G1 a 94-charset whose final byte is F.
 779              * [0x2A]: designate to G2 a 94-charset whose final byte is F.
 780              + [0x2B]: designate to G3 a 94-charset whose final byte is F.
 781              - [0x2D]: designate to G1 a 96-charset whose final byte is F.
 782              . [0x2E]: designate to G2 a 96-charset whose final byte is F.
 783              / [0x2F]: designate to G3 a 96-charset whose final byte is F.
 784
 785    The following rule is not allowed in ISO 2022 but can be used in
 786 Mule.
 787
 788              , [0x2C]: designate to G0 a 96-charset whose final byte is F.
 789
 790    Here are examples of designations:
 791
 792              ESC ( B :              designate to G0 ASCII
 793              ESC - A :              designate to G1 Latin-1
 794              ESC $ ( A or ESC $ A : designate to G0 GB2312
 795              ESC $ ( B or ESC $ B : designate to G0 JISX0208
 796              ESC $ ) C :            designate to G1 KSC5601
 797
 798    To use a charset designated to G2 or G3, and to use a charset
 799 designated to G1 in a 7-bit environment, you must explicitly invoke G1,
 800 G2, or G3 into GL.  There are two types of invocation, Locking Shift
 801 (forever) and Single Shift (one character only).
 802
 803    Locking Shift is done as follows:
 804
 805              LS0 or SI (0x0F): invoke G0 into GL
 806              LS1 or SO (0x0E): invoke G1 into GL
 807              LS2:  invoke G2 into GL
 808              LS3:  invoke G3 into GL
 809              LS1R: invoke G1 into GR
 810              LS2R: invoke G2 into GR
 811              LS3R: invoke G3 into GR
 812
 813    Single Shift is done as follows:
 814
 815              SS2 or ESC N: invoke G2 into GL
 816              SS3 or ESC O: invoke G3 into GL
 817
 818    (#### Ben says: I think the above is slightly incorrect.  It appears
 819 that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
 820 and ESC O behave as indicated.  The above definitions will not parse
 821 EUC-encoded text correctly, and it looks like the code in mule-coding.c
 822 has similar problems.)
 823
 824    You may realize that there are a lot of ISO-2022-compliant ways of
 825 encoding multilingual text.  Now, in the world, there exist many coding
 826 systems such as X11's Compound Text, Japanese JUNET code, and so-called
 827 EUC (Extended UNIX Code); all of these are variants of ISO 2022.
 828
 829    In Mule, we characterize ISO 2022 by the following attributes:
 830
 831   1. Initial designation to G0 thru G3.
 832
 833   2. Allow designation of short form for Japanese and Chinese.
 834
 835   3. Should we designate ASCII to G0 before control characters?
 836
 837   4. Should we designate ASCII to G0 at the end of line?
 838
 839   5. 7-bit environment or 8-bit environment.
 840
 841   6. Use Locking Shift or not.
 842
 843   7. Use ASCII or JIS0201-1976-Roman.
 844
 845   8. Use JISX0208-1983 or JISX0208-1976.
 846
 847    (The last two are only for Japanese.)
 848
 849    By specifying these attributes, you can create any variant of ISO
 850 2022.
 851
 852    Here are several examples:
 853
 854      junet -- Coding system used in JUNET.
 855              1. G0 <- ASCII, G1..3 <- never used
 856              2. Yes.
 857              3. Yes.
 858              4. Yes.
 859              5. 7-bit environment
 860              6. No.
 861              7. Use ASCII
 862              8. Use JISX0208-1983
 863
 864      ctext -- Compound Text
 865              1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used
 866              2. No.
 867              3. No.
 868              4. Yes.
 869              5. 8-bit environment
 870              6. No.
 871              7. Use ASCII
 872              8. Use JISX0208-1983
 873
 874      euc-china -- Chinese EUC.  Although many people call this
 875      as "GB encoding", the name may cause misunderstanding.
 876              1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used
 877              2. No.
 878              3. Yes.
 879              4. Yes.
 880              5. 8-bit environment
 881              6. No.
 882              7. Use ASCII
 883              8. Use JISX0208-1983
 884
 885      korean-mail -- Coding system used in Korean network.
 886              1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used
 887              2. No.
 888              3. Yes.
 889              4. Yes.
 890              5. 7-bit environment
 891              6. Yes.
 892              7. No.
 893              8. No.
 894
 895    Mule creates all these coding systems by default.
 896
 897 \1f
 898 File: lispref.info,  Node: Coding Systems,  Next: CCL,  Prev: ISO 2022,  Up: MULE
 899
 900 Coding Systems
 901 ==============
 902
 903    A coding system is an object that defines how text containing
 904 multiple character sets is encoded into a stream of (typically 8-bit)
 905 bytes.  The coding system is used to decode the stream into a series of
 906 characters (which may be from multiple charsets) when the text is read
 907 from a file or process, and is used to encode the text back into the
 908 same format when it is written out to a file or process.
 909
 910    For example, many ISO-2022-compliant coding systems (such as Compound
 911 Text, which is used for inter-client data under the X Window System) use
 912 escape sequences to switch between different charsets--Japanese Kanji,
 913 for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC (
 914 B'; and Cyrillic is invoked with `ESC - L'.  See `make-coding-system'
 915 for more information.
 916
 917    Coding systems are normally identified using a symbol, and the
 918 symbol is accepted in place of the actual coding system object whenever
 919 a coding system is called for. (This is similar to how faces and
 920 charsets work.)
 921
 922  - Function: coding-system-p object
 923      This function returns non-`nil' if OBJECT is a coding system.
 924
 925 * Menu:
 926
 927 * Coding System Types::               Classifying coding systems.
 928 * EOL Conversion::                    Dealing with different ways of denoting
 929                                         the end of a line.
 930 * Coding System Properties::          Properties of a coding system.
 931 * Basic Coding System Functions::     Working with coding systems.
 932 * Coding System Property Functions::  Retrieving a coding system's properties.
 933 * Encoding and Decoding Text::        Encoding and decoding text.
 934 * Detection of Textual Encoding::     Determining how text is encoded.
 935 * Big5 and Shift-JIS Functions::      Special functions for these non-standard
 936                                         encodings.
 937
 938 \1f
 939 File: lispref.info,  Node: Coding System Types,  Next: EOL Conversion,  Up: Coding Systems
 940
 941 Coding System Types
 942 -------------------
 943
 944 `nil'
 945 `autodetect'
 946      Automatic conversion.  XEmacs attempts to detect the coding system
 947      used in the file.
 948
 949 `no-conversion'
 950      No conversion.  Use this for binary files and such.  On output,
 951      graphic characters that are not in ASCII or Latin-1 will be
 952      replaced by a `?'. (For a no-conversion-encoded buffer, these
 953      characters will only be present if you explicitly insert them.)
 954
 955 `shift-jis'
 956      Shift-JIS (a Japanese encoding commonly used in PC operating
 957      systems).
 958
 959 `iso2022'
 960      Any ISO-2022-compliant encoding.  Among other things, this
 961      includes JIS (the Japanese encoding commonly used for e-mail),
 962      national variants of EUC (the standard Unix encoding for Japanese
 963      and other languages), and Compound Text (an encoding used in X11).
 964      You can specify more specific information about the conversion
 965      with the FLAGS argument.
 966
 967 `big5'
 968      Big5 (the encoding commonly used for Taiwanese).
 969
 970 `ccl'
 971      The conversion is performed using a user-written pseudo-code
 972      program.  CCL (Code Conversion Language) is the name of this
 973      pseudo-code.
 974
 975 `internal'
 976      Write out or read in the raw contents of the memory representing
 977      the buffer's text.  This is primarily useful for debugging
 978      purposes, and is only enabled when XEmacs has been compiled with
 979      `DEBUG_XEMACS' set (the `--debug' configure option).  *Warning*:
 980      Reading in a file using `internal' conversion can result in an
 981      internal inconsistency in the memory representing a buffer's text,
 982      which will produce unpredictable results and may cause XEmacs to
 983      crash.  Under normal circumstances you should never use `internal'
 984      conversion.
 985
 986 \1f
 987 File: lispref.info,  Node: EOL Conversion,  Next: Coding System Properties,  Prev: Coding System Types,  Up: Coding Systems
 988
 989 EOL Conversion
 990 --------------
 991
 992 `nil'
 993      Automatically detect the end-of-line type (LF, CRLF, or CR).  Also
 994      generate subsidiary coding systems named `NAME-unix', `NAME-dos',
 995      and `NAME-mac', that are identical to this coding system but have
 996      an EOL-TYPE value of `lf', `crlf', and `cr', respectively.
 997
 998 `lf'
 999      The end of a line is marked externally using ASCII LF.  Since this
1000      is also the way that XEmacs represents an end-of-line internally,
1001      specifying this option results in no end-of-line conversion.  This
1002      is the standard format for Unix text files.
1003
1004 `crlf'
1005      The end of a line is marked externally using ASCII CRLF.  This is
1006      the standard format for MS-DOS text files.
1007
1008 `cr'
1009      The end of a line is marked externally using ASCII CR.  This is the
1010      standard format for Macintosh text files.
1011
1012 `t'
1013      Automatically detect the end-of-line type but do not generate
1014      subsidiary coding systems.  (This value is converted to `nil' when
1015      stored internally, and `coding-system-property' will return `nil'.)
1016
1017 \1f
1018 File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems
1019
1020 Coding System Properties
1021 ------------------------
1022
1023 `mnemonic'
1024      String to be displayed in the modeline when this coding system is
1025      active.
1026
1027 `eol-type'
1028      End-of-line conversion to be used.  It should be one of the types
1029      listed in *Note EOL Conversion::.
1030
1031 `post-read-conversion'
1032      Function called after a file has been read in, to perform the
1033      decoding.  Called with two arguments, BEG and END, denoting a
1034      region of the current buffer to be decoded.
1035
1036 `pre-write-conversion'
1037      Function called before a file is written out, to perform the
1038      encoding.  Called with two arguments, BEG and END, denoting a
1039      region of the current buffer to be encoded.
1040
1041    The following additional properties are recognized if TYPE is
1042 `iso2022':
1043
1044 `charset-g0'
1045 `charset-g1'
1046 `charset-g2'
1047 `charset-g3'
1048      The character set initially designated to the G0 - G3 registers.
1049      The value should be one of
1050
1051         * A charset object (designate that character set)
1052
1053         * `nil' (do not ever use this register)
1054
1055         * `t' (no character set is initially designated to the
1056           register, but may be later on; this automatically sets the
1057           corresponding `force-g*-on-output' property)
1058
1059 `force-g0-on-output'
1060 `force-g1-on-output'
1061 `force-g2-on-output'
1062 `force-g3-on-output'
1063      If non-`nil', send an explicit designation sequence on output
1064      before using the specified register.
1065
1066 `short'
1067      If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
1068      B' on output in place of the full designation sequences `ESC $ (
1069      @', `ESC $ ( A', and `ESC $ ( B'.
1070
1071 `no-ascii-eol'
1072      If non-`nil', don't designate ASCII to G0 at each end of line on
1073      output.  Setting this to non-`nil' also suppresses other
1074      state-resetting that normally happens at the end of a line.
1075
1076 `no-ascii-cntl'
1077      If non-`nil', don't designate ASCII to G0 before control chars on
1078      output.
1079
1080 `seven'
1081      If non-`nil', use 7-bit environment on output.  Otherwise, use
1082      8-bit environment.
1083
1084 `lock-shift'
1085      If non-`nil', use locking-shift (SO/SI) instead of single-shift or
1086      designation by escape sequence.
1087
1088 `no-iso6429'
1089      If non-`nil', don't use ISO6429's direction specification.
1090
1091 `escape-quoted'
1092      If non-nil, literal control characters that are the same as the
1093      beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
1094      particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
1095      (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
1096      that they can be properly distinguished from an escape sequence.
1097      (Note that doing this results in a non-portable encoding.) This
1098      encoding flag is used for byte-compiled files.  Note that ESC is a
1099      good choice for a quoting character because there are no escape
1100      sequences whose second byte is a character from the Control-0 or
1101      Control-1 character sets; this is explicitly disallowed by the ISO
1102      2022 standard.
1103
1104 `input-charset-conversion'
1105      A list of conversion specifications, specifying conversion of
1106      characters in one charset to another when decoding is performed.
1107      Each specification is a list of two elements: the source charset,
1108      and the destination charset.
1109
1110 `output-charset-conversion'
1111      A list of conversion specifications, specifying conversion of
1112      characters in one charset to another when encoding is performed.
1113      The form of each specification is the same as for
1114      `input-charset-conversion'.
1115
1116    The following additional properties are recognized (and required) if
1117 TYPE is `ccl':
1118
1119 `decode'
1120      CCL program used for decoding (converting to internal format).
1121
1122 `encode'
1123      CCL program used for encoding (converting to external format).
1124
1125 \1f
1126 File: lispref.info,  Node: Basic Coding System Functions,  Next: Coding System Property Functions,  Prev: Coding System Properties,  Up: Coding Systems
1127
1128 Basic Coding System Functions
1129 -----------------------------
1130
1131  - Function: find-coding-system coding-system-or-name
1132      This function retrieves the coding system of the given name.
1133
1134      If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
1135      returned.  Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
1136      If there is no such coding system, `nil' is returned.  Otherwise
1137      the associated coding system object is returned.
1138
1139  - Function: get-coding-system name
1140      This function retrieves the coding system of the given name.  Same
1141      as `find-coding-system' except an error is signalled if there is no
1142      such coding system instead of returning `nil'.
1143
1144  - Function: coding-system-list
1145      This function returns a list of the names of all defined coding
1146      systems.
1147
1148  - Function: coding-system-name coding-system
1149      This function returns the name of the given coding system.
1150
1151  - Function: make-coding-system name type &optional doc-string props
1152      This function registers symbol NAME as a coding system.
1153
1154      TYPE describes the conversion method used and should be one of the
1155      types listed in *Note Coding System Types::.
1156
1157      DOC-STRING is a string describing the coding system.
1158
1159      PROPS is a property list, describing the specific nature of the
1160      character set.  Recognized properties are as in *Note Coding
1161      System Properties::.
1162
1163  - Function: copy-coding-system old-coding-system new-name
1164      This function copies OLD-CODING-SYSTEM to NEW-NAME.  If NEW-NAME
1165      does not name an existing coding system, a new one will be created.
1166
1167  - Function: subsidiary-coding-system coding-system eol-type
1168      This function returns the subsidiary coding system of
1169      CODING-SYSTEM with eol type EOL-TYPE.
1170
1171 \1f
1172 File: lispref.info,  Node: Coding System Property Functions,  Next: Encoding and Decoding Text,  Prev: Basic Coding System Functions,  Up: Coding Systems
1173
1174 Coding System Property Functions
1175 --------------------------------
1176
1177  - Function: coding-system-doc-string coding-system
1178      This function returns the doc string for CODING-SYSTEM.
1179
1180  - Function: coding-system-type coding-system
1181      This function returns the type of CODING-SYSTEM.
1182
1183  - Function: coding-system-property coding-system prop
1184      This function returns the PROP property of CODING-SYSTEM.
1185
1186 \1f
1187 File: lispref.info,  Node: Encoding and Decoding Text,  Next: Detection of Textual Encoding,  Prev: Coding System Property Functions,  Up: Coding Systems
1188
1189 Encoding and Decoding Text
1190 --------------------------
1191
1192  - Function: decode-coding-region start end coding-system &optional
1193           buffer
1194      This function decodes the text between START and END which is
1195      encoded in CODING-SYSTEM.  This is useful if you've read in
1196      encoded text from a file without decoding it (e.g. you read in a
1197      JIS-formatted file but used the `binary' or `no-conversion' coding
1198      system, so that it shows up as `^[$B!<!+^[(B').  The length of the
1199      encoded text is returned.  BUFFER defaults to the current buffer
1200      if unspecified.
1201
1202  - Function: encode-coding-region start end coding-system &optional
1203           buffer
1204      This function encodes the text between START and END using
1205      CODING-SYSTEM.  This will, for example, convert Japanese
1206      characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
1207      encoding.  The length of the encoded text is returned.  BUFFER
1208      defaults to the current buffer if unspecified.
1209