git.chise.org Git - chise/xemacs-chise.git.1/blob - info/lispref.info-41

   1 This is Info file ../../info/lispref.info, produced by Makeinfo version
   2 1.68 from the input file lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Level 3 Primitives,  Next: Dynamic Messaging,  Prev: Level 3 Basics,  Up: I18N Level 3
  54
  55 Level 3 Primitives
  56 ------------------
  57
  58  - Function: gettext STRING
  59      This function looks up STRING in the default message domain and
  60      returns its translation.  If `I18N3' was not enabled when XEmacs
  61      was compiled, it just returns STRING.
  62
  63  - Function: dgettext DOMAIN STRING
  64      This function looks up STRING in the specified message domain and
  65      returns its translation.  If `I18N3' was not enabled when XEmacs
  66      was compiled, it just returns STRING.
  67
  68  - Function: bind-text-domain DOMAIN PATHNAME
  69      This function associates a pathname with a message domain.  Here's
  70      how the path to message file is constructed under SunOS 5.x:
  71
  72           `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo'
  73
  74      If `I18N3' was not enabled when XEmacs was compiled, this function
  75      does nothing.
  76
  77  - Special Form: domain STRING
  78      This function specifies the text domain used for translating
  79      documentation strings and interactive prompts of a function.  For
  80      example, write:
  81
  82           (defun foo (arg) "Doc string" (domain "emacs-foo") ...)
  83
  84      to specify `emacs-foo' as the text domain of the function `foo'.
  85      The "call" to `domain' is actually a declaration rather than a
  86      function; when actually called, `domain' just returns `nil'.
  87
  88  - Function: domain-of FUNCTION
  89      This function returns the text domain of FUNCTION; it returns
  90      `nil' if it is the default domain.  If `I18N3' was not enabled
  91      when XEmacs was compiled, it always returns `nil'.
  92
  93 \1f
  94 File: lispref.info,  Node: Dynamic Messaging,  Next: Domain Specification,  Prev: Level 3 Primitives,  Up: I18N Level 3
  95
  96 Dynamic Messaging
  97 -----------------
  98
  99    The `format' function has been extended to permit you to change the
 100 order of parameter insertion.  For example, the conversion format
 101 `%1$s' inserts parameter one as a string, while `%2$s' inserts
 102 parameter two.  This is useful when creating translations which require
 103 you to change the word order.
 104
 105 \1f
 106 File: lispref.info,  Node: Domain Specification,  Next: Documentation String Extraction,  Prev: Dynamic Messaging,  Up: I18N Level 3
 107
 108 Domain Specification
 109 --------------------
 110
 111    The default message domain of XEmacs is `emacs'.  For add-on
 112 packages, it is best to use a different domain.  For example, let us
 113 say we want to convert the "gorilla" package to use the domain
 114 `emacs-gorilla'.  To translate the message "What gorilla?", use
 115 `dgettext' as follows:
 116
 117      (dgettext "emacs-gorilla" "What gorilla?")
 118
 119    A function (or macro) which has a documentation string or an
 120 interactive prompt needs to be associated with the domain in order for
 121 the documentation or prompt to be translated.  This is done with the
 122 `domain' special form as follows:
 123
 124      (defun scratch (location)
 125        "Scratch the specified location."
 126        (domain "emacs-gorilla")
 127        (interactive "sScratch: ")
 128        ... )
 129
 130    It is most efficient to specify the domain in the first line of the
 131 function body, before the `interactive' form.
 132
 133    For variables and constants which have documentation strings,
 134 specify the domain after the documentation.
 135
 136  - Special Form: defvar SYMBOL [VALUE [DOC-STRING [DOMAIN]]]
 137      Example:
 138           (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla")
 139
 140  - Special Form: defconst SYMBOL [VALUE [DOC-STRING [DOMAIN]]]
 141      Example:
 142           (defconst limbs 4 "Number of limbs" "emacs-gorilla")
 143
 144    Autoloaded functions which are specified in `loaddefs.el' do not need
 145 to have a domain specification, because their documentation strings are
 146 extracted into the main message base.  However, for autoloaded functions
 147 which are specified in a separate package, use following syntax:
 148
 149  - Function: autoload SYMBOL FILENAME &optional DOCSTRING INTERACTIVE
 150           MACRO DOMAIN
 151      Example:
 152           (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")
 153
 154 \1f
 155 File: lispref.info,  Node: Documentation String Extraction,  Prev: Domain Specification,  Up: I18N Level 3
 156
 157 Documentation String Extraction
 158 -------------------------------
 159
 160    The utility `etc/make-po' scans the file `DOC' to extract
 161 documentation strings and creates a message file `doc.po'.  This file
 162 may then be inserted within `emacs.po'.
 163
 164    Currently, `make-po' is hard-coded to read from `DOC' and write to
 165 `doc.po'.  In order to extract documentation strings from an add-on
 166 package, first run `make-docfile' on the package to produce the `DOC'
 167 file.  Then run `make-po -p' with the `-p' argument to indicate that we
 168 are extracting documentation for an add-on package.
 169
 170    (The `-p' argument is a kludge to make up for a subtle difference
 171 between pre-loaded documentation and add-on documentation:  For add-on
 172 packages, the final carriage returns in the strings produced by
 173 `make-docfile' must be ignored.)
 174
 175 \1f
 176 File: lispref.info,  Node: I18N Level 4,  Prev: I18N Level 3,  Up: Internationalization
 177
 178 I18N Level 4
 179 ============
 180
 181    The Asian-language support in XEmacs is called "MULE".  *Note MULE::.
 182
 183 \1f
 184 File: lispref.info,  Node: MULE,  Next: Tips,  Prev: Internationalization,  Up: Top
 185
 186 MULE
 187 ****
 188
 189    "MULE" is the name originally given to the version of GNU Emacs
 190 extended for multi-lingual (and in particular Asian-language) support.
 191 "MULE" is short for "MUlti-Lingual Emacs".  It was originally called
 192 Nemacs ("Nihon Emacs" where "Nihon" is the Japanese word for "Japan"),
 193 when it only provided support for Japanese.  XEmacs refers to its
 194 multi-lingual support as "MULE support" since it is based on "MULE".
 195
 196 * Menu:
 197
 198 * Internationalization Terminology::
 199                         Definition of various internationalization terms.
 200 * Charsets::            Sets of related characters.
 201 * MULE Characters::     Working with characters in XEmacs/MULE.
 202 * Composite Characters:: Making new characters by overstriking other ones.
 203 * ISO 2022::            An international standard for charsets and encodings.
 204 * Coding Systems::      Ways of representing a string of chars using integers.
 205 * CCL::                 A special language for writing fast converters.
 206 * Category Tables::     Subdividing charsets into groups.
 207
 208 \1f
 209 File: lispref.info,  Node: Internationalization Terminology,  Next: Charsets,  Up: MULE
 210
 211 Internationalization Terminology
 212 ================================
 213
 214    In internationalization terminology, a string of text is divided up
 215 into "characters", which are the printable units that make up the text.
 216 A single character is (for example) a capital `A', the number `2', a
 217 Katakana character, a Kanji ideograph (an "ideograph" is a "picture"
 218 character, such as is used in Japanese Kanji, Chinese Hanzi, and Korean
 219 Hangul; typically there are thousands of such ideographs in each
 220 language), etc.  The basic property of a character is its shape.  Note
 221 that the same character may be drawn by two different people (or in two
 222 different fonts) in slightly different ways, although the basic shape
 223 will be the same.
 224
 225    In some cases, the differences will be significant enough that it is
 226 actually possible to identify two or more distinct shapes that both
 227 represent the same character.  For example, the lowercase letters `a'
 228 and `g' each have two distinct possible shapes - the `a' can optionally
 229 have a curved tail projecting off the top, and the `g' can be formed
 230 either of two loops, or of one loop and a tail hanging off the bottom.
 231 Such distinct possible shapes of a character are called "glyphs".  The
 232 important characteristic of two glyphs making up the same character is
 233 that the choice between one or the other is purely stylistic and has no
 234 linguistic effect on a word (this is the reason why a capital `A' and
 235 lowercase `a' are different characters rather than different glyphs -
 236 e.g.  `Aspen' is a city while `aspen' is a kind of tree).
 237
 238    Note that "character" and "glyph" are used differently here than
 239 elsewhere in XEmacs.
 240
 241    A "character set" is simply a set of related characters.  ASCII, for
 242 example, is a set of 94 characters (or 128, if you count non-printing
 243 characters).  Other character sets are ISO8859-1 (ASCII plus various
 244 accented characters and other international symbols), JISX0201 (ASCII,
 245 more or less, plus half-width Katakana), JISX0208 (Japanese Kanji),
 246 JISX0212 (a second set of less-used Japanese Kanji), GB2312 (Mainland
 247 Chinese Hanzi), etc.
 248
 249    Every character set has one or more "orderings", which can be viewed
 250 as a way of assigning a number (or set of numbers) to each character in
 251 the set.  For most character sets, there is a standard ordering, and in
 252 fact all of the character sets mentioned above define a particular
 253 ordering.  ASCII, for example, places letters in their "natural" order,
 254 puts uppercase letters before lowercase letters, numbers before
 255 letters, etc.  Note that for many of the Asian character sets, there is
 256 no natural ordering of the characters.  The actual orderings are based
 257 on one or more salient characteristic, of which there are many to
 258 choose from - e.g. number of strokes, common radicals, phonetic
 259 ordering, etc.
 260
 261    The set of numbers assigned to any particular character are called
 262 the character's "position codes".  The number of position codes
 263 required to index a particular character in a character set is called
 264 the "dimension" of the character set.  ASCII, being a relatively small
 265 character set, is of dimension one, and each character in the set is
 266 indexed using a single position code, in the range 0 through 127 (if
 267 non-printing characters are included) or 33 through 126 (if only the
 268 printing characters are considered).  JISX0208, i.e.  Japanese Kanji,
 269 has thousands of characters, and is of dimension two - every character
 270 is indexed by two position codes, each in the range 33 through 126.
 271 (Note that the choice of the range here is somewhat arbitrary.
 272 Although a character set such as JISX0208 defines an *ordering* of all
 273 its characters, it does not define the actual mapping between numbers
 274 and characters.  You could just as easily index the characters in
 275 JISX0208 using numbers in the range 0 through 93, 1 through 94, 2
 276 through 95, etc.  The reason for the actual range chosen is so that the
 277 position codes match up with the actual values used in the common
 278 encodings.)
 279
 280    An "encoding" is a way of numerically representing characters from
 281 one or more character sets into a stream of like-sized numerical values
 282 called "words"; typically these are 8-bit, 16-bit, or 32-bit
 283 quantities.  If an encoding encompasses only one character set, then the
 284 position codes for the characters in that character set could be used
 285 directly. (This is the case with ASCII, and as a result, most people do
 286 not understand the difference between a character set and an encoding.)
 287 This is not possible, however, if more than one character set is to be
 288 used in the encoding.  For example, printed Japanese text typically
 289 requires characters from multiple character sets - ASCII, JISX0208, and
 290 JISX0212, to be specific.  Each of these is indexed using one or more
 291 position codes in the range 33 through 126, so the position codes could
 292 not be used directly or there would be no way to tell which character
 293 was meant.  Different Japanese encodings handle this differently - JIS
 294 uses special escape characters to denote different character sets; EUC
 295 sets the high bit of the position codes for JISX0208 and JISX0212, and
 296 puts a special extra byte before each JISX0212 character; etc. (JIS,
 297 EUC, and most of the other encodings you will encounter are 7-bit or
 298 8-bit encodings.  There is one common 16-bit encoding, which is Unicode;
 299 this strives to represent all the world's characters in a single large
 300 character set.  32-bit encodings are generally used internally in
 301 programs to simplify the code that manipulates them; however, they are
 302 not much used externally because they are not very space-efficient.)
 303
 304    Encodings are classified as either "modal" or "non-modal".  In a
 305 "modal encoding", there are multiple states that the encoding can be in,
 306 and the interpretation of the values in the stream depends on the
 307 current global state of the encoding.  Special values in the encoding,
 308 called "escape sequences", are used to change the global state.  JIS,
 309 for example, is a modal encoding.  The bytes `ESC $ B' indicate that,
 310 from then on, bytes are to be interpreted as position codes for
 311 JISX0208, rather than as ASCII.  This effect is cancelled using the
 312 bytes `ESC ( B', which mean "switch from whatever the current state is
 313 to ASCII".  To switch to JISX0212, the escape sequence `ESC $ ( D'.
 314 (Note that here, as is common, the escape sequences do in fact begin
 315 with `ESC'.  This is not necessarily the case, however.)
 316
 317    A "non-modal encoding" has no global state that extends past the
 318 character currently being interpreted.  EUC, for example, is a
 319 non-modal encoding.  Characters in JISX0208 are encoded by setting the
 320 high bit of the position codes, and characters in JISX0212 are encoded
 321 by doing the same but also prefixing the character with the byte 0x8F.
 322
 323    The advantage of a modal encoding is that it is generally more
 324 space-efficient, and is easily extendable because there are essentially
 325 an arbitrary number of escape sequences that can be created.  The
 326 disadvantage, however, is that it is much more difficult to work with
 327 if it is not being processed in a sequential manner.  In the non-modal
 328 EUC encoding, for example, the byte 0x41 always refers to the letter
 329 `A'; whereas in JIS, it could either be the letter `A', or one of the
 330 two position codes in a JISX0208 character, or one of the two position
 331 codes in a JISX0212 character.  Determining exactly which one is meant
 332 could be difficult and time-consuming if the previous bytes in the
 333 string have not already been processed.
 334
 335    Non-modal encodings are further divided into "fixed-width" and
 336 "variable-width" formats.  A fixed-width encoding always uses the same
 337 number of words per character, whereas a variable-width encoding does
 338 not.  EUC is a good example of a variable-width encoding: one to three
 339 bytes are used per character, depending on the character set.  16-bit
 340 and 32-bit encodings are nearly always fixed-width, and this is in fact
 341 one of the main reasons for using an encoding with a larger word size.
 342 The advantages of fixed-width encodings should be obvious.  The
 343 advantages of variable-width encodings are that they are generally more
 344 space-efficient and allow for compatibility with existing 8-bit
 345 encodings such as ASCII.
 346
 347    Note that the bytes in an 8-bit encoding are often referred to as
 348 "octets" rather than simply as bytes.  This terminology dates back to
 349 the days before 8-bit bytes were universal, when some computers had
 350 9-bit bytes, others had 10-bit bytes, etc.
 351
 352 \1f
 353 File: lispref.info,  Node: Charsets,  Next: MULE Characters,  Prev: Internationalization Terminology,  Up: MULE
 354
 355 Charsets
 356 ========
 357
 358    A "charset" in MULE is an object that encapsulates a particular
 359 character set as well as an ordering of those characters.  Charsets are
 360 permanent objects and are named using symbols, like faces.
 361
 362  - Function: charsetp OBJECT
 363      This function returns non-`nil' if OBJECT is a charset.
 364
 365 * Menu:
 366
 367 * Charset Properties::          Properties of a charset.
 368 * Basic Charset Functions::     Functions for working with charsets.
 369 * Charset Property Functions::  Functions for accessing charset properties.
 370 * Predefined Charsets::         Predefined charset objects.
 371
 372 \1f
 373 File: lispref.info,  Node: Charset Properties,  Next: Basic Charset Functions,  Up: Charsets
 374
 375 Charset Properties
 376 ------------------
 377
 378    Charsets have the following properties:
 379
 380 `name'
 381      A symbol naming the charset.  Every charset must have a different
 382      name; this allows a charset to be referred to using its name
 383      rather than the actual charset object.
 384
 385 `doc-string'
 386      A documentation string describing the charset.
 387
 388 `registry'
 389      A regular expression matching the font registry field for this
 390      character set.  For example, both the `ascii' and `latin-iso8859-1'
 391      charsets use the registry `"ISO8859-1"'.  This field is used to
 392      choose an appropriate font when the user gives a general font
 393      specification such as `-*-courier-medium-r-*-140-*', i.e. a
 394      14-point upright medium-weight Courier font.
 395
 396 `dimension'
 397      Number of position codes used to index a character in the
 398      character set.  XEmacs/MULE can only handle character sets of
 399      dimension 1 or 2.  This property defaults to 1.
 400
 401 `chars'
 402      Number of characters in each dimension.  In XEmacs/MULE, the only
 403      allowed values are 94 or 96. (There are a couple of pre-defined
 404      character sets, such as ASCII, that do not follow this, but you
 405      cannot define new ones like this.) Defaults to 94.  Note that if
 406      the dimension is 2, the character set thus described is 94x94 or
 407      96x96.
 408
 409 `columns'
 410      Number of columns used to display a character in this charset.
 411      Only used in TTY mode. (Under X, the actual width of a character
 412      can be derived from the font used to display the characters.)  If
 413      unspecified, defaults to the dimension. (This is almost always the
 414      correct value, because character sets with dimension 2 are usually
 415      ideograph character sets, which need two columns to display the
 416      intricate ideographs.)
 417
 418 `direction'
 419      A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left).
 420      Defaults to `l2r'.  This specifies the direction that the text
 421      should be displayed in, and will be left-to-right for most
 422      charsets but right-to-left for Hebrew and Arabic. (Right-to-left
 423      display is not currently implemented.)
 424
 425 `final'
 426      Final byte of the standard ISO 2022 escape sequence designating
 427      this charset.  Must be supplied.  Each combination of (DIMENSION,
 428      CHARS) defines a separate namespace for final bytes, and each
 429      charset within a particular namespace must have a different final
 430      byte.  Note that ISO 2022 restricts the final byte to the range
 431      0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2.
 432      Note also that final bytes in the range 0x30 - 0x3F are reserved
 433      for user-defined (not official) character sets.  For more
 434      information on ISO 2022, see *Note Coding Systems::.
 435
 436 `graphic'
 437      0 (use left half of font on output) or 1 (use right half of font on
 438      output).  Defaults to 0.  This specifies how to convert the
 439      position codes that index a character in a character set into an
 440      index into the font used to display the character set.  With
 441      `graphic' set to 0, position codes 33 through 126 map to font
 442      indices 33 through 126; with it set to 1, position codes 33
 443      through 126 map to font indices 161 through 254 (i.e. the same
 444      number but with the high bit set).  For example, for a font whose
 445      registry is ISO8859-1, the left half of the font (octets 0x20 -
 446      0x7F) is the `ascii' charset, while the right half (octets 0xA0 -
 447      0xFF) is the `latin-iso8859-1' charset.
 448
 449 `ccl-program'
 450      A compiled CCL program used to convert a character in this charset
 451      into an index into the font.  This is in addition to the `graphic'
 452      property.  If a CCL program is defined, the position codes of a
 453      character will first be processed according to `graphic' and then
 454      passed through the CCL program, with the resulting values used to
 455      index the font.
 456
 457      This is used, for example, in the Big5 character set (used in
 458      Taiwan).  This character set is not ISO-2022-compliant, and its
 459      size (94x157) does not fit within the maximum 96x96 size of
 460      ISO-2022-compliant character sets.  As a result, XEmacs/MULE
 461      splits it (in a rather complex fashion, so as to group the most
 462      commonly used characters together) into two charset objects
 463      (`big5-1' and `big5-2'), each of size 94x94, and each charset
 464      object uses a CCL program to convert the modified position codes
 465      back into standard Big5 indices to retrieve a character from a
 466      Big5 font.
 467
 468    Most of the above properties can only be changed when the charset is
 469 created.  *Note Charset Property Functions::.
 470
 471 \1f
 472 File: lispref.info,  Node: Basic Charset Functions,  Next: Charset Property Functions,  Prev: Charset Properties,  Up: Charsets
 473
 474 Basic Charset Functions
 475 -----------------------
 476
 477  - Function: find-charset CHARSET-OR-NAME
 478      This function retrieves the charset of the given name.  If
 479      CHARSET-OR-NAME is a charset object, it is simply returned.
 480      Otherwise, CHARSET-OR-NAME should be a symbol.  If there is no
 481      such charset, `nil' is returned.  Otherwise the associated charset
 482      object is returned.
 483
 484  - Function: get-charset NAME
 485      This function retrieves the charset of the given name.  Same as
 486      `find-charset' except an error is signalled if there is no such
 487      charset instead of returning `nil'.
 488
 489  - Function: charset-list
 490      This function returns a list of the names of all defined charsets.
 491
 492  - Function: make-charset NAME DOC-STRING PROPS
 493      This function defines a new character set.  This function is for
 494      use with Mule support.  NAME is a symbol, the name by which the
 495      character set is normally referred.  DOC-STRING is a string
 496      describing the character set.  PROPS is a property list,
 497      describing the specific nature of the character set.  The
 498      recognized properties are `registry', `dimension', `columns',
 499      `chars', `final', `graphic', `direction', and `ccl-program', as
 500      previously described.
 501
 502  - Function: make-reverse-direction-charset CHARSET NEW-NAME
 503      This function makes a charset equivalent to CHARSET but which goes
 504      in the opposite direction.  NEW-NAME is the name of the new
 505      charset.  The new charset is returned.
 506
 507  - Function: charset-from-attributes DIMENSION CHARS FINAL &optional
 508           DIRECTION
 509      This function returns a charset with the given DIMENSION, CHARS,
 510      FINAL, and DIRECTION.  If DIRECTION is omitted, both directions
 511      will be checked (left-to-right will be returned if character sets
 512      exist for both directions).
 513
 514  - Function: charset-reverse-direction-charset CHARSET
 515      This function returns the charset (if any) with the same dimension,
 516      number of characters, and final byte as CHARSET, but which is
 517      displayed in the opposite direction.
 518
 519 \1f
 520 File: lispref.info,  Node: Charset Property Functions,  Next: Predefined Charsets,  Prev: Basic Charset Functions,  Up: Charsets
 521
 522 Charset Property Functions
 523 --------------------------
 524
 525    All of these functions accept either a charset name or charset
 526 object.
 527
 528  - Function: charset-property CHARSET PROP
 529      This function returns property PROP of CHARSET.  *Note Charset
 530      Properties::.
 531
 532    Convenience functions are also provided for retrieving individual
 533 properties of a charset.
 534
 535  - Function: charset-name CHARSET
 536      This function returns the name of CHARSET.  This will be a symbol.
 537
 538  - Function: charset-doc-string CHARSET
 539      This function returns the doc string of CHARSET.
 540
 541  - Function: charset-registry CHARSET
 542      This function returns the registry of CHARSET.
 543
 544  - Function: charset-dimension CHARSET
 545      This function returns the dimension of CHARSET.
 546
 547  - Function: charset-chars CHARSET
 548      This function returns the number of characters per dimension of
 549      CHARSET.
 550
 551  - Function: charset-columns CHARSET
 552      This function returns the number of display columns per character
 553      (in TTY mode) of CHARSET.
 554
 555  - Function: charset-direction CHARSET
 556      This function returns the display direction of CHARSET - either
 557      `l2r' or `r2l'.
 558
 559  - Function: charset-final CHARSET
 560      This function returns the final byte of the ISO 2022 escape
 561      sequence designating CHARSET.
 562
 563  - Function: charset-graphic CHARSET
 564      This function returns either 0 or 1, depending on whether the
 565      position codes of characters in CHARSET map to the left or right
 566      half of their font, respectively.
 567
 568  - Function: charset-ccl-program CHARSET
 569      This function returns the CCL program, if any, for converting
 570      position codes of characters in CHARSET into font indices.
 571
 572    The only property of a charset that can currently be set after the
 573 charset has been created is the CCL program.
 574
 575  - Function: set-charset-ccl-program CHARSET CCL-PROGRAM
 576      This function sets the `ccl-program' property of CHARSET to
 577      CCL-PROGRAM.
 578
 579 \1f
 580 File: lispref.info,  Node: Predefined Charsets,  Prev: Charset Property Functions,  Up: Charsets
 581
 582 Predefined Charsets
 583 -------------------
 584
 585    The following charsets are predefined in the C code.
 586
 587      Name                    Type  Fi Gr Dir Registry
 588      --------------------------------------------------------------
 589      ascii                    94    B  0  l2r ISO8859-1
 590      control-1                94       0  l2r ---
 591      latin-iso8859-1          94    A  1  l2r ISO8859-1
 592      latin-iso8859-2          96    B  1  l2r ISO8859-2
 593      latin-iso8859-3          96    C  1  l2r ISO8859-3
 594      latin-iso8859-4          96    D  1  l2r ISO8859-4
 595      cyrillic-iso8859-5       96    L  1  l2r ISO8859-5
 596      arabic-iso8859-6         96    G  1  r2l ISO8859-6
 597      greek-iso8859-7          96    F  1  l2r ISO8859-7
 598      hebrew-iso8859-8         96    H  1  r2l ISO8859-8
 599      latin-iso8859-9          96    M  1  l2r ISO8859-9
 600      thai-tis620              96    T  1  l2r TIS620
 601      katakana-jisx0201        94    I  1  l2r JISX0201.1976
 602      latin-jisx0201           94    J  0  l2r JISX0201.1976
 603      japanese-jisx0208-1978   94x94 @  0  l2r JISX0208.1978
 604      japanese-jisx0208        94x94 B  0  l2r JISX0208.19(83|90)
 605      japanese-jisx0212        94x94 D  0  l2r JISX0212
 606      chinese-gb2312           94x94 A  0  l2r GB2312
 607      chinese-cns11643-1       94x94 G  0  l2r CNS11643.1
 608      chinese-cns11643-2       94x94 H  0  l2r CNS11643.2
 609      chinese-big5-1           94x94 0  0  l2r Big5
 610      chinese-big5-2           94x94 1  0  l2r Big5
 611      korean-ksc5601           94x94 C  0  l2r KSC5601
 612      composite                96x96    0  l2r ---
 613
 614    The following charsets are predefined in the Lisp code.
 615
 616      Name                     Type  Fi Gr Dir Registry
 617      --------------------------------------------------------------
 618      arabic-digit             94    2  0  l2r MuleArabic-0
 619      arabic-1-column          94    3  0  r2l MuleArabic-1
 620      arabic-2-column          94    4  0  r2l MuleArabic-2
 621      sisheng                  94    0  0  l2r sisheng_cwnn\|OMRON_UDC_ZH
 622      chinese-cns11643-3       94x94 I  0  l2r CNS11643.1
 623      chinese-cns11643-4       94x94 J  0  l2r CNS11643.1
 624      chinese-cns11643-5       94x94 K  0  l2r CNS11643.1
 625      chinese-cns11643-6       94x94 L  0  l2r CNS11643.1
 626      chinese-cns11643-7       94x94 M  0  l2r CNS11643.1
 627      ethiopic                 94x94 2  0  l2r Ethio
 628      ascii-r2l                94    B  0  r2l ISO8859-1
 629      ipa                      96    0  1  l2r MuleIPA
 630      vietnamese-lower         96    1  1  l2r VISCII1.1
 631      vietnamese-upper         96    2  1  l2r VISCII1.1
 632
 633    For all of the above charsets, the dimension and number of columns
 634 are the same.
 635
 636    Note that ASCII, Control-1, and Composite are handled specially.
 637 This is why some of the fields are blank; and some of the filled-in
 638 fields (e.g. the type) are not really accurate.
 639
 640 \1f
 641 File: lispref.info,  Node: MULE Characters,  Next: Composite Characters,  Prev: Charsets,  Up: MULE
 642
 643 MULE Characters
 644 ===============
 645
 646  - Function: make-char CHARSET ARG1 &optional ARG2
 647      This function makes a multi-byte character from CHARSET and octets
 648      ARG1 and ARG2.
 649
 650  - Function: char-charset CH
 651      This function returns the character set of char CH.
 652
 653  - Function: char-octet CH &optional N
 654      This function returns the octet (i.e. position code) numbered N
 655      (should be 0 or 1) of char CH.  N defaults to 0 if omitted.
 656
 657  - Function: find-charset-region START END &optional BUFFER
 658      This function returns a list of the charsets in the region between
 659      START and END.  BUFFER defaults to the current buffer if omitted.
 660
 661  - Function: find-charset-string STRING
 662      This function returns a list of the charsets in STRING.
 663
 664 \1f
 665 File: lispref.info,  Node: Composite Characters,  Next: ISO 2022,  Prev: MULE Characters,  Up: MULE
 666
 667 Composite Characters
 668 ====================
 669
 670    Composite characters are not yet completely implemented.
 671
 672  - Function: make-composite-char STRING
 673      This function converts a string into a single composite character.
 674      The character is the result of overstriking all the characters in
 675      the string.
 676
 677  - Function: composite-char-string CH
 678      This function returns a string of the characters comprising a
 679      composite character.
 680
 681  - Function: compose-region START END &optional BUFFER
 682      This function composes the characters in the region from START to
 683      END in BUFFER into one composite character.  The composite
 684      character replaces the composed characters.  BUFFER defaults to
 685      the current buffer if omitted.
 686
 687  - Function: decompose-region START END &optional BUFFER
 688      This function decomposes any composite characters in the region
 689      from START to END in BUFFER.  This converts each composite
 690      character into one or more characters, the individual characters
 691      out of which the composite character was formed.  Non-composite
 692      characters are left as-is.  BUFFER defaults to the current buffer
 693      if omitted.
 694
 695 \1f
 696 File: lispref.info,  Node: ISO 2022,  Next: Coding Systems,  Prev: Composite Characters,  Up: MULE
 697
 698 ISO 2022
 699 ========
 700
 701    This section briefly describes the ISO 2022 encoding standard.  For
 702 more thorough understanding, please refer to the original document of
 703 ISO 2022.
 704
 705    Character sets ("charsets") are classified into the following four
 706 categories, according to the number of characters of charset:
 707 94-charset, 96-charset, 94x94-charset, and 96x96-charset.
 708
 709 94-charset
 710      ASCII(B), left(J) and right(I) half of JISX0201, ...
 711
 712 96-charset
 713      Latin-1(A), Latin-2(B), Latin-3(C), ...
 714
 715 94x94-charset
 716      GB2312(A), JISX0208(B), KSC5601(C), ...
 717
 718 96x96-charset
 719      none for the moment
 720
 721    The character in parentheses after the name of each charset is the
 722 "final character" F, which can be regarded as the identifier of the
 723 charset.  ECMA allocates F to each charset.  F is in the range of
 724 0x30..0x7F, but 0x30..0x3F are only for private use.
 725
 726    Note: "ECMA" = European Computer Manufacturers Association
 727
 728    There are four "registers of charsets", called G0 thru G3.  You can
 729 designate (or assign) any charset to one of these registers.
 730
 731    The code space contained within one octet (of size 256) is divided
 732 into 4 areas: C0, GL, C1, and GR.  GL and GR are the areas into which a
 733 register of charset can be invoked into.
 734
 735         C0: 0x00 - 0x1F
 736         GL: 0x20 - 0x7F
 737         C1: 0x80 - 0x9F
 738         GR: 0xA0 - 0xFF
 739
 740    Usually, in the initial state, G0 is invoked into GL, and G1 is
 741 invoked into GR.
 742
 743    ISO 2022 distinguishes 7-bit environments and 8-bit environments.  In
 744 7-bit environments, only C0 and GL are used.
 745
 746    Charset designation is done by escape sequences of the form:
 747
 748         ESC [I] I F
 749
 750    where I is an intermediate character in the range 0x20 - 0x2F, and F
 751 is the final character identifying this charset.
 752
 753    The meaning of intermediate characters are:
 754
 755         $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
 756         ( [0x28]: designate to G0 a 94-charset whose final byte is F.
 757         ) [0x29]: designate to G1 a 94-charset whose final byte is F.
 758         * [0x2A]: designate to G2 a 94-charset whose final byte is F.
 759         + [0x2B]: designate to G3 a 94-charset whose final byte is F.
 760         - [0x2D]: designate to G1 a 96-charset whose final byte is F.
 761         . [0x2E]: designate to G2 a 96-charset whose final byte is F.
 762         / [0x2F]: designate to G3 a 96-charset whose final byte is F.
 763
 764    The following rule is not allowed in ISO 2022 but can be used in
 765 Mule.
 766
 767         , [0x2C]: designate to G0 a 96-charset whose final byte is F.
 768
 769    Here are examples of designations:
 770
 771         ESC ( B :              designate to G0 ASCII
 772         ESC - A :              designate to G1 Latin-1
 773         ESC $ ( A or ESC $ A : designate to G0 GB2312
 774         ESC $ ( B or ESC $ B : designate to G0 JISX0208
 775         ESC $ ) C :            designate to G1 KSC5601
 776
 777    To use a charset designated to G2 or G3, and to use a charset
 778 designated to G1 in a 7-bit environment, you must explicitly invoke G1,
 779 G2, or G3 into GL.  There are two types of invocation, Locking Shift
 780 (forever) and Single Shift (one character only).
 781
 782    Locking Shift is done as follows:
 783
 784         LS0 or SI (0x0F): invoke G0 into GL
 785         LS1 or SO (0x0E): invoke G1 into GL
 786         LS2:  invoke G2 into GL
 787         LS3:  invoke G3 into GL
 788         LS1R: invoke G1 into GR
 789         LS2R: invoke G2 into GR
 790         LS3R: invoke G3 into GR
 791
 792    Single Shift is done as follows:
 793
 794         SS2 or ESC N: invoke G2 into GL
 795         SS3 or ESC O: invoke G3 into GL
 796
 797    (#### Ben says: I think the above is slightly incorrect.  It appears
 798 that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
 799 and ESC O behave as indicated.  The above definitions will not parse
 800 EUC-encoded text correctly, and it looks like the code in mule-coding.c
 801 has similar problems.)
 802
 803    You may realize that there are a lot of ISO-2022-compliant ways of
 804 encoding multilingual text.  Now, in the world, there exist many coding
 805 systems such as X11's Compound Text, Japanese JUNET code, and so-called
 806 EUC (Extended UNIX Code); all of these are variants of ISO 2022.
 807
 808    In Mule, we characterize ISO 2022 by the following attributes:
 809
 810   1. Initial designation to G0 thru G3.
 811
 812   2. Allow designation of short form for Japanese and Chinese.
 813
 814   3. Should we designate ASCII to G0 before control characters?
 815
 816   4. Should we designate ASCII to G0 at the end of line?
 817
 818   5. 7-bit environment or 8-bit environment.
 819
 820   6. Use Locking Shift or not.
 821
 822   7. Use ASCII or JIS0201-1976-Roman.
 823
 824   8. Use JISX0208-1983 or JISX0208-1976.
 825
 826    (The last two are only for Japanese.)
 827
 828    By specifying these attributes, you can create any variant of ISO
 829 2022.
 830
 831    Here are several examples:
 832
 833      junet -- Coding system used in JUNET.
 834         1. G0 <- ASCII, G1..3 <- never used
 835         2. Yes.
 836         3. Yes.
 837         4. Yes.
 838         5. 7-bit environment
 839         6. No.
 840         7. Use ASCII
 841         8. Use JISX0208-1983
 842
 843      ctext -- Compound Text
 844         1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used
 845         2. No.
 846         3. No.
 847         4. Yes.
 848         5. 8-bit environment
 849         6. No.
 850         7. Use ASCII
 851         8. Use JISX0208-1983
 852
 853      euc-china -- Chinese EUC.  Although many people call this
 854      as "GB encoding", the name may cause misunderstanding.
 855         1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used
 856         2. No.
 857         3. Yes.
 858         4. Yes.
 859         5. 8-bit environment
 860         6. No.
 861         7. Use ASCII
 862         8. Use JISX0208-1983
 863
 864      korean-mail -- Coding system used in Korean network.
 865         1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used
 866         2. No.
 867         3. Yes.
 868         4. Yes.
 869         5. 7-bit environment
 870         6. Yes.
 871         7. No.
 872         8. No.
 873
 874    Mule creates all these coding systems by default.
 875
 876 \1f
 877 File: lispref.info,  Node: Coding Systems,  Next: CCL,  Prev: ISO 2022,  Up: MULE
 878
 879 Coding Systems
 880 ==============
 881
 882    A coding system is an object that defines how text containing
 883 multiple character sets is encoded into a stream of (typically 8-bit)
 884 bytes.  The coding system is used to decode the stream into a series of
 885 characters (which may be from multiple charsets) when the text is read
 886 from a file or process, and is used to encode the text back into the
 887 same format when it is written out to a file or process.
 888
 889    For example, many ISO-2022-compliant coding systems (such as Compound
 890 Text, which is used for inter-client data under the X Window System) use
 891 escape sequences to switch between different charsets - Japanese Kanji,
 892 for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC (
 893 B'; and Cyrillic is invoked with `ESC - L'.  See `make-coding-system'
 894 for more information.
 895
 896    Coding systems are normally identified using a symbol, and the
 897 symbol is accepted in place of the actual coding system object whenever
 898 a coding system is called for. (This is similar to how faces and
 899 charsets work.)
 900
 901  - Function: coding-system-p OBJECT
 902      This function returns non-`nil' if OBJECT is a coding system.
 903
 904 * Menu:
 905
 906 * Coding System Types::               Classifying coding systems.
 907 * EOL Conversion::                    Dealing with different ways of denoting
 908                                         the end of a line.
 909 * Coding System Properties::          Properties of a coding system.
 910 * Basic Coding System Functions::     Working with coding systems.
 911 * Coding System Property Functions::  Retrieving a coding system's properties.
 912 * Encoding and Decoding Text::        Encoding and decoding text.
 913 * Detection of Textual Encoding::     Determining how text is encoded.
 914 * Big5 and Shift-JIS Functions::      Special functions for these non-standard
 915                                         encodings.
 916
 917 \1f
 918 File: lispref.info,  Node: Coding System Types,  Next: EOL Conversion,  Up: Coding Systems
 919
 920 Coding System Types
 921 -------------------
 922
 923 `nil'
 924 `autodetect'
 925      Automatic conversion.  XEmacs attempts to detect the coding system
 926      used in the file.
 927
 928 `no-conversion'
 929      No conversion.  Use this for binary files and such.  On output,
 930      graphic characters that are not in ASCII or Latin-1 will be
 931      replaced by a `?'. (For a no-conversion-encoded buffer, these
 932      characters will only be present if you explicitly insert them.)
 933
 934 `shift-jis'
 935      Shift-JIS (a Japanese encoding commonly used in PC operating
 936      systems).
 937
 938 `iso2022'
 939      Any ISO-2022-compliant encoding.  Among other things, this
 940      includes JIS (the Japanese encoding commonly used for e-mail),
 941      national variants of EUC (the standard Unix encoding for Japanese
 942      and other languages), and Compound Text (an encoding used in X11).
 943      You can specify more specific information about the conversion
 944      with the FLAGS argument.
 945
 946 `big5'
 947      Big5 (the encoding commonly used for Taiwanese).
 948
 949 `ccl'
 950      The conversion is performed using a user-written pseudo-code
 951      program.  CCL (Code Conversion Language) is the name of this
 952      pseudo-code.
 953
 954 `internal'
 955      Write out or read in the raw contents of the memory representing
 956      the buffer's text.  This is primarily useful for debugging
 957      purposes, and is only enabled when XEmacs has been compiled with
 958      `DEBUG_XEMACS' set (the `--debug' configure option).  *Warning*:
 959      Reading in a file using `internal' conversion can result in an
 960      internal inconsistency in the memory representing a buffer's text,
 961      which will produce unpredictable results and may cause XEmacs to
 962      crash.  Under normal circumstances you should never use `internal'
 963      conversion.
 964
 965 \1f
 966 File: lispref.info,  Node: EOL Conversion,  Next: Coding System Properties,  Prev: Coding System Types,  Up: Coding Systems
 967
 968 EOL Conversion
 969 --------------
 970
 971 `nil'
 972      Automatically detect the end-of-line type (LF, CRLF, or CR).  Also
 973      generate subsidiary coding systems named `NAME-unix', `NAME-dos',
 974      and `NAME-mac', that are identical to this coding system but have
 975      an EOL-TYPE value of `lf', `crlf', and `cr', respectively.
 976
 977 `lf'
 978      The end of a line is marked externally using ASCII LF.  Since this
 979      is also the way that XEmacs represents an end-of-line internally,
 980      specifying this option results in no end-of-line conversion.  This
 981      is the standard format for Unix text files.
 982
 983 `crlf'
 984      The end of a line is marked externally using ASCII CRLF.  This is
 985      the standard format for MS-DOS text files.
 986
 987 `cr'
 988      The end of a line is marked externally using ASCII CR.  This is the
 989      standard format for Macintosh text files.
 990
 991 `t'
 992      Automatically detect the end-of-line type but do not generate
 993      subsidiary coding systems.  (This value is converted to `nil' when
 994      stored internally, and `coding-system-property' will return `nil'.)
 995
 996 \1f
 997 File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems
 998
 999 Coding System Properties
1000 ------------------------
1001
1002 `mnemonic'
1003      String to be displayed in the modeline when this coding system is
1004      active.
1005
1006 `eol-type'
1007      End-of-line conversion to be used.  It should be one of the types
1008      listed in *Note EOL Conversion::.
1009
1010 `post-read-conversion'
1011      Function called after a file has been read in, to perform the
1012      decoding.  Called with two arguments, BEG and END, denoting a
1013      region of the current buffer to be decoded.
1014
1015 `pre-write-conversion'
1016      Function called before a file is written out, to perform the
1017      encoding.  Called with two arguments, BEG and END, denoting a
1018      region of the current buffer to be encoded.
1019
1020    The following additional properties are recognized if TYPE is
1021 `iso2022':
1022
1023 `charset-g0'
1024 `charset-g1'
1025 `charset-g2'
1026 `charset-g3'
1027      The character set initially designated to the G0 - G3 registers.
1028      The value should be one of
1029
1030         * A charset object (designate that character set)
1031
1032         * `nil' (do not ever use this register)
1033
1034         * `t' (no character set is initially designated to the
1035           register, but may be later on; this automatically sets the
1036           corresponding `force-g*-on-output' property)
1037
1038 `force-g0-on-output'
1039 `force-g1-on-output'
1040 `force-g2-on-output'
1041 `force-g3-on-output'
1042      If non-`nil', send an explicit designation sequence on output
1043      before using the specified register.
1044
1045 `short'
1046      If non-`nil', use the short forms `ESC $ @', `ESC $ A', and `ESC $
1047      B' on output in place of the full designation sequences `ESC $ (
1048      @', `ESC $ ( A', and `ESC $ ( B'.
1049
1050 `no-ascii-eol'
1051      If non-`nil', don't designate ASCII to G0 at each end of line on
1052      output.  Setting this to non-`nil' also suppresses other
1053      state-resetting that normally happens at the end of a line.
1054
1055 `no-ascii-cntl'
1056      If non-`nil', don't designate ASCII to G0 before control chars on
1057      output.
1058
1059 `seven'
1060      If non-`nil', use 7-bit environment on output.  Otherwise, use
1061      8-bit environment.
1062
1063 `lock-shift'
1064      If non-`nil', use locking-shift (SO/SI) instead of single-shift or
1065      designation by escape sequence.
1066
1067 `no-iso6429'
1068      If non-`nil', don't use ISO6429's direction specification.
1069
1070 `escape-quoted'
1071      If non-nil, literal control characters that are the same as the
1072      beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
1073      particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3
1074      (0x8F), and CSI (0x9B)) are "quoted" with an escape character so
1075      that they can be properly distinguished from an escape sequence.
1076      (Note that doing this results in a non-portable encoding.) This
1077      encoding flag is used for byte-compiled files.  Note that ESC is a
1078      good choice for a quoting character because there are no escape
1079      sequences whose second byte is a character from the Control-0 or
1080      Control-1 character sets; this is explicitly disallowed by the ISO
1081      2022 standard.
1082
1083 `input-charset-conversion'
1084      A list of conversion specifications, specifying conversion of
1085      characters in one charset to another when decoding is performed.
1086      Each specification is a list of two elements: the source charset,
1087      and the destination charset.
1088
1089 `output-charset-conversion'
1090      A list of conversion specifications, specifying conversion of
1091      characters in one charset to another when encoding is performed.
1092      The form of each specification is the same as for
1093      `input-charset-conversion'.
1094
1095    The following additional properties are recognized (and required) if
1096 TYPE is `ccl':
1097
1098 `decode'
1099      CCL program used for decoding (converting to internal format).
1100
1101 `encode'
1102      CCL program used for encoding (converting to external format).
1103
1104 \1f
1105 File: lispref.info,  Node: Basic Coding System Functions,  Next: Coding System Property Functions,  Prev: Coding System Properties,  Up: Coding Systems
1106
1107 Basic Coding System Functions
1108 -----------------------------
1109
1110  - Function: find-coding-system CODING-SYSTEM-OR-NAME
1111      This function retrieves the coding system of the given name.
1112
1113      If CODING-SYSTEM-OR-NAME is a coding-system object, it is simply
1114      returned.  Otherwise, CODING-SYSTEM-OR-NAME should be a symbol.
1115      If there is no such coding system, `nil' is returned.  Otherwise
1116      the associated coding system object is returned.
1117
1118  - Function: get-coding-system NAME
1119      This function retrieves the coding system of the given name.  Same
1120      as `find-coding-system' except an error is signalled if there is no
1121      such coding system instead of returning `nil'.
1122
1123  - Function: coding-system-list
1124      This function returns a list of the names of all defined coding
1125      systems.
1126
1127  - Function: coding-system-name CODING-SYSTEM
1128      This function returns the name of the given coding system.
1129
1130  - Function: make-coding-system NAME TYPE &optional DOC-STRING PROPS
1131      This function registers symbol NAME as a coding system.
1132
1133      TYPE describes the conversion method used and should be one of the
1134      types listed in *Note Coding System Types::.
1135
1136      DOC-STRING is a string describing the coding system.
1137
1138      PROPS is a property list, describing the specific nature of the
1139      character set.  Recognized properties are as in *Note Coding
1140      System Properties::.
1141
1142  - Function: copy-coding-system OLD-CODING-SYSTEM NEW-NAME
1143      This function copies OLD-CODING-SYSTEM to NEW-NAME.  If NEW-NAME
1144      does not name an existing coding system, a new one will be created.
1145
1146  - Function: subsidiary-coding-system CODING-SYSTEM EOL-TYPE
1147      This function returns the subsidiary coding system of
1148      CODING-SYSTEM with eol type EOL-TYPE.
1149
1150 \1f
1151 File: lispref.info,  Node: Coding System Property Functions,  Next: Encoding and Decoding Text,  Prev: Basic Coding System Functions,  Up: Coding Systems
1152
1153 Coding System Property Functions
1154 --------------------------------
1155
1156  - Function: coding-system-doc-string CODING-SYSTEM
1157      This function returns the doc string for CODING-SYSTEM.
1158
1159  - Function: coding-system-type CODING-SYSTEM
1160      This function returns the type of CODING-SYSTEM.
1161
1162  - Function: coding-system-property CODING-SYSTEM PROP
1163      This function returns the PROP property of CODING-SYSTEM.
1164
1165 \1f
1166 File: lispref.info,  Node: Encoding and Decoding Text,  Next: Detection of Textual Encoding,  Prev: Coding System Property Functions,  Up: Coding Systems
1167
1168 Encoding and Decoding Text
1169 --------------------------
1170
1171  - Function: decode-coding-region START END CODING-SYSTEM &optional
1172           BUFFER
1173      This function decodes the text between START and END which is
1174      encoded in CODING-SYSTEM.  This is useful if you've read in
1175      encoded text from a file without decoding it (e.g. you read in a
1176      JIS-formatted file but used the `binary' or `no-conversion' coding
1177      system, so that it shows up as `^[$B!<!+^[(B').  The length of the
1178      encoded text is returned.  BUFFER defaults to the current buffer
1179      if unspecified.
1180
1181  - Function: encode-coding-region START END CODING-SYSTEM &optional
1182           BUFFER
1183      This function encodes the text between START and END using
1184      CODING-SYSTEM.  This will, for example, convert Japanese
1185      characters into stuff such as `^[$B!<!+^[(B' if you use the JIS
1186      encoding.  The length of the encoded text is returned.  BUFFER
1187      defaults to the current buffer if unspecified.
1188
1189 \1f
1190 File: lispref.info,  Node: Detection of Textual Encoding,  Next: Big5 and Shift-JIS Functions,  Prev: Encoding and Decoding Text,  Up: Coding Systems
1191
1192 Detection of Textual Encoding
1193 -----------------------------
1194
1195  - Function: coding-category-list
1196      This function returns a list of all recognized coding categories.
1197
1198  - Function: set-coding-priority-list LIST
1199      This function changes the priority order of the coding categories.
1200      LIST should be a list of coding categories, in descending order of
1201      priority.  Unspecified coding categories will be lower in priority
1202      than all specified ones, in the same relative order they were in
1203      previously.
1204
1205  - Function: coding-priority-list
1206      This function returns a list of coding categories in descending
1207      order of priority.
1208
1209  - Function: set-coding-category-system CODING-CATEGORY CODING-SYSTEM
1210      This function changes the coding system associated with a coding
1211      category.
1212
1213  - Function: coding-category-system CODING-CATEGORY
1214      This function returns the coding system associated with a coding
1215      category.
1216
1217  - Function: detect-coding-region START END &optional BUFFER
1218      This function detects coding system of the text in the region
1219      between START and END.  Returned value is a list of possible coding
1220      systems ordered by priority.  If only ASCII characters are found,
1221      it returns `autodetect' or one of its subsidiary coding systems
1222      according to a detected end-of-line type.  Optional arg BUFFER
1223      defaults to the current buffer.
1224