update.

[chise/xemacs-chise.git-] / info / lispref.info-44
diff --git a/info/lispref.info-44 b/info/lispref.info-44

index ace6584..d539962 100644 (file)
--- a/info/lispref.info-44
+++ b/info/lispref.info-44
@@ -50,6 +50,399 @@ may be included in a translation approved by the Free Software
  Foundation instead of in the original English.
  
  \1f
  Foundation instead of in the original English.
  
  \1f
+File: lispref.info,  Node: Coding System Types,  Next: ISO 2022,  Up: Coding Systems
+
+Coding System Types
+-------------------
+
+   The coding system type determines the basic algorithm XEmacs will
+use to decode or encode a data stream.  Character encodings will be
+converted to the MULE encoding, escape sequences processed, and newline
+sequences converted to XEmacs's internal representation.  There are
+three basic classes of coding system type: no-conversion, ISO-2022, and
+special.
+
+   No conversion allows you to look at the file's internal
+representation.  Since XEmacs is basically a text editor, "no
+conversion" does convert newline conventions by default.  (Use the
+'binary coding-system if this is not desired.)
+
+   ISO 2022 (*note ISO 2022::) is the basic international standard
+regulating use of "coded character sets for the exchange of data", ie,
+text streams.  ISO 2022 contains functions that make it possible to
+encode text streams to comply with restrictions of the Internet mail
+system and de facto restrictions of most file systems (eg, use of the
+separator character in file names).  Coding systems which are not ISO
+2022 conformant can be difficult to handle.  Perhaps more important,
+they are not adaptable to multilingual information interchange, with
+the obvious exception of ISO 10646 (Unicode).  (Unicode is partially
+supported by XEmacs with the addition of the Lisp package ucs-conv.)
+
+   The special class of coding systems includes automatic detection,
+CCL (a "little language" embedded as an interpreter, useful for
+translating between variants of a single character set),
+non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5,
+and MULE internal coding.  (NB: this list is based on XEmacs 21.2.
+Terminology may vary slightly for other versions of XEmacs and for GNU
+Emacs 20.)
+
+`no-conversion'
+     No conversion, for binary files, and a few special cases of
+     non-ISO-2022 coding systems where conversion is done by hook
+     functions (usually implemented in CCL).  On output, graphic
+     characters that are not in ASCII or Latin-1 will be replaced by a
+     `?'. (For a no-conversion-encoded buffer, these characters will
+     only be present if you explicitly insert them.)
+
+`iso2022'
+     Any ISO-2022-compliant encoding.  Among others, this includes JIS
+     (the Japanese encoding commonly used for e-mail), national
+     variants of EUC (the standard Unix encoding for Japanese and other
+     languages), and Compound Text (an encoding used in X11).  You can
+     specify more specific information about the conversion with the
+     FLAGS argument.
+
+`ucs-4'
+     ISO 10646 UCS-4 encoding.  A 31-bit fixed-width superset of
+     Unicode.
+
+`utf-8'
+     ISO 10646 UTF-8 encoding.  A "file system safe" transformation
+     format that can be used with both UCS-4 and Unicode.
+
+`undecided'
+     Automatic conversion.  XEmacs attempts to detect the coding system
+     used in the file.
+
+`shift-jis'
+     Shift-JIS (a Japanese encoding commonly used in PC operating
+     systems).
+
+`big5'
+     Big5 (the encoding commonly used for Taiwanese).
+
+`ccl'
+     The conversion is performed using a user-written pseudo-code
+     program.  CCL (Code Conversion Language) is the name of this
+     pseudo-code.  For example, CCL is used to map KOI8-R characters
+     (an encoding for Russian Cyrillic) to ISO8859-5 (the form used
+     internally by MULE).
+
+`internal'
+     Write out or read in the raw contents of the memory representing
+     the buffer's text.  This is primarily useful for debugging
+     purposes, and is only enabled when XEmacs has been compiled with
+     `DEBUG_XEMACS' set (the `--debug' configure option).  *Warning*:
+     Reading in a file using `internal' conversion can result in an
+     internal inconsistency in the memory representing a buffer's text,
+     which will produce unpredictable results and may cause XEmacs to
+     crash.  Under normal circumstances you should never use `internal'
+     conversion.
+
+\1f
+File: lispref.info,  Node: ISO 2022,  Next: EOL Conversion,  Prev: Coding System Types,  Up: Coding Systems
+
+ISO 2022
+========
+
+   This section briefly describes the ISO 2022 encoding standard.  A
+more thorough treatment is available in the original document of ISO
+2022 as well as various national standards (such as JIS X 0202).
+
+   Character sets ("charsets") are classified into the following four
+categories, according to the number of characters in the charset:
+94-charset, 96-charset, 94x94-charset, and 96x96-charset.  This means
+that although an ISO 2022 coding system may have variable width
+characters, each charset used is fixed-width (in contrast to the MULE
+character set and UTF-8, for example).
+
+   ISO 2022 provides for switching between character sets via escape
+sequences.  This switching is somewhat complicated, because ISO 2022
+provides for both legacy applications like Internet mail that accept
+only 7 significant bits in some contexts (RFC 822 headers, for example),
+and more modern "8-bit clean" applications.  It also provides for
+compact and transparent representation of languages like Japanese which
+mix ASCII and a national script (even outside of computer programs).
+
+   First, ISO 2022 codified prevailing practice by dividing the code
+space into "control" and "graphic" regions.  The code points 0x00-0x1F
+and 0x80-0x9F are reserved for "control characters", while "graphic
+characters" must be assigned to code points in the regions 0x20-0x7F and
+0xA0-0xFF.  The positions 0x20 and 0x7F are special, and under some
+circumstances must be assigned the graphic character "ASCII SPACE" and
+the control character "ASCII DEL" respectively.
+
+   The various regions are given the name C0 (0x00-0x1F), GL
+(0x20-0x7F), C1 (0x80-0x9F), and GR (0xA0-0xFF).  GL and GR stand for
+"graphic left" and "graphic right", respectively, because of the
+standard method of displaying graphic character sets in tables with the
+high byte indexing columns and the low byte indexing rows.  I don't
+find it very intuitive, but these are called "registers".
+
+   An ISO 2022-conformant encoding for a graphic character set must use
+a fixed number of bytes per character, and the values must fit into a
+single register; that is, each byte must range over either 0x20-0x7F, or
+0xA0-0xFF.  It is not allowed to extend the range of the repertoire of a
+character set by using both ranges at the same.  This is why a standard
+character set such as ISO 8859-1 is actually considered by ISO 2022 to
+be an aggregation of two character sets, ASCII and LATIN-1, and why it
+is technically incorrect to refer to ISO 8859-1 as "Latin 1".  Also, a
+single character's bytes must all be drawn from the same register; this
+is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
+2022-compatible encodings.
+
+   The reason for this restriction becomes clear when you attempt to
+define an efficient, robust encoding for a language like Japanese.
+Like ISO 8859, Japanese encodings are aggregations of several character
+sets.  In practice, the vast majority of characters are drawn from the
+"JIS Roman" character set (a derivative of ASCII; it won't hurt to
+think of it as ASCII) and the JIS X 0208 standard "basic Japanese"
+character set including not only ideographic characters ("kanji") but
+syllabic Japanese characters ("kana"), a wide variety of symbols, and
+many alphabetic characters (Roman, Greek, and Cyrillic) as well.
+Although JIS X 0208 includes the whole Roman alphabet, as a 2-byte code
+it is not suited to programming; thus the inclusion of ASCII in the
+standard Japanese encodings.
+
+   For normal Japanese text such as in newspapers, a broad repertoire of
+approximately 3000 characters is used.  Evidently this won't fit into
+one byte; two must be used.  But much of the text processed by Japanese
+computers is computer source code, nearly all of which is ASCII.  A not
+insignificant portion of ordinary text is English (as such or as
+borrowed Japanese vocabulary) or other languages which can represented
+at least approximately in ASCII, as well.  It seems reasonable then to
+represent ASCII in one byte, and JIS X 0208 in two.  And this is exactly
+what the Extended Unix Code for Japanese (EUC-JP) does.  ASCII is
+invoked to the GL register, and JIS X 0208 is invoked to the GR
+register.  Thus, each byte can be tested for its character set by
+looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
+Furthermore, since control characters like newline can never be part of
+a graphic character, even in the case of corruption in transmission the
+stream will be resynchronized at every line break, on the order of 60-80
+bytes.  This coding system requires no escape sequences or special
+control codes to represent 99.9% of all Japanese text.
+
+   Note carefully the distinction between the character sets (ASCII and
+JIS X 0208), the encoding (EUC-JP), and the coding system (ISO 2022).
+The JIS X 0208 character set is used in three different encodings for
+Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
+always clear), in EUC-JP it is invoked into GR (setting the high bit in
+the process), and in Shift JIS the high bit may be set or reset, and the
+significant bits are shifted within the 16-bit character so that the two
+main character sets can coexist with a third (the "halfwidth katakana"
+of JIS X 0201).  As the name implies, the ISO-2022-JP encoding is also a
+version of the ISO-2022 coding system.
+
+   In order to systematically treat subsidiary character sets (like the
+"halfwidth katakana" already mentioned, and the "supplementary kanji" of
+JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
+Unlike GL and GR, they are not logically distinguished by internal
+format.  Instead, the process of "invocation" mentioned earlier is
+broken into two steps: first, a character set is "designated" to one of
+the registers G0-G3 by use of an "escape sequence" of the form:
+
+             ESC [I] I F
+
+   where I is an intermediate character or characters in the range 0x20
+- 0x3F, and F, from the range 0x30-0x7Fm is the final character
+identifying this charset.  (Final characters in the range 0x30-0x3F are
+reserved for private use and will never have a publically registered
+meaning.)
+
+   Then that register is "invoked" to either GL or GR, either
+automatically (designations to G0 normally involve invocation to GL as
+well), or by use of shifting (affecting only the following character in
+the data stream) or locking (effective until the next designation or
+locking) control sequences.  An encoding conformant to ISO 2022 is
+typically defined by designating the initial contents of the G0-G3
+registers, specifying an 7 or 8 bit environment, and specifying whether
+further designations will be recognized.
+
+   Some examples of character sets and the registered final characters
+F used to designate them:
+
+94-charset
+     ASCII (B), left (J) and right (I) half of JIS X 0201, ...
+
+96-charset
+     Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
+
+94x94-charset
+     GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
+
+96x96-charset
+     none for the moment
+
+   The meanings of the various characters in these sequences, where not
+specified by the ISO 2022 standard (such as the ESC character), are
+assigned by "ECMA", the European Computer Manufacturers Association.
+
+   The meaning of intermediate characters are:
+
+             $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
+             ( [0x28]: designate to G0 a 94-charset whose final byte is F.
+             ) [0x29]: designate to G1 a 94-charset whose final byte is F.
+             * [0x2A]: designate to G2 a 94-charset whose final byte is F.
+             + [0x2B]: designate to G3 a 94-charset whose final byte is F.
+             , [0x2C]: designate to G0 a 96-charset whose final byte is F.
+             - [0x2D]: designate to G1 a 96-charset whose final byte is F.
+             . [0x2E]: designate to G2 a 96-charset whose final byte is F.
+             / [0x2F]: designate to G3 a 96-charset whose final byte is F.
+
+   The comma may be used in files read and written only by MULE, as a
+MULE extension, but this is illegal in ISO 2022.  (The reason is that
+in ISO 2022 G0 must be a 94-member character set, with 0x20 assigned
+the value SPACE, and 0x7F assigned the value DEL.)
+
+   Here are examples of designations:
+
+             ESC ( B :              designate to G0 ASCII
+             ESC - A :              designate to G1 Latin-1
+             ESC $ ( A or ESC $ A : designate to G0 GB2312
+             ESC $ ( B or ESC $ B : designate to G0 JISX0208
+             ESC $ ) C :            designate to G1 KSC5601
+
+   (The short forms used to designate GB2312 and JIS X 0208 are for
+backwards compatibility; the long forms are preferred.)
+
+   To use a charset designated to G2 or G3, and to use a charset
+designated to G1 in a 7-bit environment, you must explicitly invoke G1,
+G2, or G3 into GL.  There are two types of invocation, Locking Shift
+(forever) and Single Shift (one character only).
+
+   Locking Shift is done as follows:
+
+             LS0 or SI (0x0F): invoke G0 into GL
+             LS1 or SO (0x0E): invoke G1 into GL
+             LS2:  invoke G2 into GL
+             LS3:  invoke G3 into GL
+             LS1R: invoke G1 into GR
+             LS2R: invoke G2 into GR
+             LS3R: invoke G3 into GR
+
+   Single Shift is done as follows:
+
+             SS2 or ESC N: invoke G2 into GL
+             SS3 or ESC O: invoke G3 into GL
+
+   The shift functions (such as LS1R and SS3) are represented by control
+characters (from C1) in 8 bit environments and by escape sequences in 7
+bit environments.
+
+   (#### Ben says: I think the above is slightly incorrect.  It appears
+that SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N
+and ESC O behave as indicated.  The above definitions will not parse
+EUC-encoded text correctly, and it looks like the code in mule-coding.c
+has similar problems.)
+
+   Evidently there are a lot of ISO-2022-compliant ways of encoding
+multilingual text.  Now, in the world, there exist many coding systems
+such as X11's Compound Text, Japanese JUNET code, and so-called EUC
+(Extended UNIX Code); all of these are variants of ISO 2022.
+
+   In MULE, we characterize a version of ISO 2022 by the following
+attributes:
+
+  1. The character sets initially designated to G0 thru G3.
+
+  2. Whether short form designations are allowed for Japanese and
+     Chinese.
+
+  3. Whether ASCII should be designated to G0 before control characters.
+
+  4. Whether ASCII should be designated to G0 at the end of line.
+
+  5. 7-bit environment or 8-bit environment.
+
+  6. Whether Locking Shifts are used or not.
+
+  7. Whether to use ASCII or the variant JIS X 0201-1976-Roman.
+
+  8. Whether to use JIS X 0208-1983 or the older version JIS X
+     0208-1976.
+
+   (The last two are only for Japanese.)
+
+   By specifying these attributes, you can create any variant of ISO
+2022.
+
+   Here are several examples:
+
+     ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
+             1. G0 <- ASCII, G1..3 <- never used
+             2. Yes.
+             3. Yes.
+             4. Yes.
+             5. 7-bit environment
+             6. No.
+             7. Use ASCII
+             8. Use JIS X 0208-1983
+     
+     ctext -- X11 Compound Text
+             1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
+             2. No.
+             3. No.
+             4. Yes.
+             5. 8-bit environment.
+             6. No.
+             7. Use ASCII.
+             8. Use JIS X 0208-1983.
+     
+     euc-china -- Chinese EUC.  Often called the "GB encoding", but that is
+     technically incorrect.
+             1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
+             2. No.
+             3. Yes.
+             4. Yes.
+             5. 8-bit environment.
+             6. No.
+             7. Use ASCII.
+             8. Use JIS X 0208-1983.
+     
+     ISO-2022-KR -- Coding system used in Korean email.
+             1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
+             2. No.
+             3. Yes.
+             4. Yes.
+             5. 7-bit environment.
+             6. Yes.
+             7. Use ASCII.
+             8. Use JIS X 0208-1983.
+
+   MULE creates all of these coding systems by default.
+
+\1f
+File: lispref.info,  Node: EOL Conversion,  Next: Coding System Properties,  Prev: ISO 2022,  Up: Coding Systems
+
+EOL Conversion
+--------------
+
+`nil'
+     Automatically detect the end-of-line type (LF, CRLF, or CR).  Also
+     generate subsidiary coding systems named `NAME-unix', `NAME-dos',
+     and `NAME-mac', that are identical to this coding system but have
+     an EOL-TYPE value of `lf', `crlf', and `cr', respectively.
+
+`lf'
+     The end of a line is marked externally using ASCII LF.  Since this
+     is also the way that XEmacs represents an end-of-line internally,
+     specifying this option results in no end-of-line conversion.  This
+     is the standard format for Unix text files.
+
+`crlf'
+     The end of a line is marked externally using ASCII CRLF.  This is
+     the standard format for MS-DOS text files.
+
+`cr'
+     The end of a line is marked externally using ASCII CR.  This is the
+     standard format for Macintosh text files.
+
+`t'
+     Automatically detect the end-of-line type but do not generate
+     subsidiary coding systems.  (This value is converted to `nil' when
+     stored internally, and `coding-system-property' will return `nil'.)
+
+\1f
  File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems
  
  Coding System Properties
  File: lispref.info,  Node: Coding System Properties,  Next: Basic Coding System Functions,  Prev: EOL Conversion,  Up: Coding Systems
  
  Coding System Properties
@@ -885,199 +1278,3 @@ but can modify the register status.
  successfully, and returns to caller (which may be a CCL program).  It
  does not alter the status of the registers.
  
  successfully, and returns to caller (which may be a CCL program).  It
  does not alter the status of the registers.
  
-\1f
-File: lispref.info,  Node: CCL Expressions,  Next: Calling CCL,  Prev: CCL Statements,  Up: CCL
-
-CCL Expressions
----------------
-
-   CCL, unlike Lisp, uses infix expressions.  The simplest CCL
-expressions consist of a single OPERAND, either a register (one of `r0',
-..., `r0') or an integer.  Complex expressions are lists of the form `(
-EXPRESSION OPERATOR OPERAND )'.  Unlike C, assignments are not
-expressions.
-
-   In the following table, X is the target resister for a "set".  In
-subexpressions, this is implicitly `r7'.  This means that `>8', `//',
-`de-sjis', and `en-sjis' cannot be used freely in subexpressions, since
-they return parts of their values in `r7'.  Y may be an expression,
-register, or integer, while Z must be a register or an integer.
-
-Name             Operator   Code   C-like Description
-CCL_PLUS         `+'        0x00   X = Y + Z
-CCL_MINUS        `-'        0x01   X = Y - Z
-CCL_MUL          `*'        0x02   X = Y * Z
-CCL_DIV          `/'        0x03   X = Y / Z
-CCL_MOD          `%'        0x04   X = Y % Z
-CCL_AND          `&'        0x05   X = Y & Z
-CCL_OR           `|'        0x06   X = Y | Z
-CCL_XOR          `^'        0x07   X = Y ^ Z
-CCL_LSH          `<<'       0x08   X = Y << Z
-CCL_RSH          `>>'       0x09   X = Y >> Z
-CCL_LSH8         `<8'       0x0A   X = (Y << 8) | Z
-CCL_RSH8         `>8'       0x0B   X = Y >> 8, r[7] = Y & 0xFF
-CCL_DIVMOD       `//'       0x0C   X = Y / Z, r[7] = Y % Z
-CCL_LS           `<'        0x10   X = (X < Y)
-CCL_GT           `>'        0x11   X = (X > Y)
-CCL_EQ           `=='       0x12   X = (X == Y)
-CCL_LE           `<='       0x13   X = (X <= Y)
-CCL_GE           `>='       0x14   X = (X >= Y)
-CCL_NE           `!='       0x15   X = (X != Y)
-CCL_ENCODE_SJIS  `en-sjis'  0x16   X = HIGHER_BYTE (SJIS (Y, Z))
-                                   r[7] = LOWER_BYTE (SJIS (Y, Z)
-CCL_DECODE_SJIS  `de-sjis'  0x17   X = HIGHER_BYTE (DE-SJIS (Y, Z))
-                                   r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
-
-   The CCL operators are as in C, with the addition of CCL_LSH8,
-CCL_RSH8, CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS.  The
-CCL_ENCODE_SJIS and CCL_DECODE_SJIS treat their first and second bytes
-as the high and low bytes of a two-byte character code.  (SJIS stands
-for Shift JIS, an encoding of Japanese characters used by Microsoft.
-CCL_ENCODE_SJIS is a complicated transformation of the Japanese
-standard JIS encoding to Shift JIS.  CCL_DECODE_SJIS is its inverse.)
-It is somewhat odd to represent the SJIS operations in infix form.
-
-\1f
-File: lispref.info,  Node: Calling CCL,  Next: CCL Examples,  Prev: CCL Expressions,  Up: CCL
-
-Calling CCL
------------
-
-   CCL programs are called automatically during Emacs buffer I/O when
-the external representation has a coding system type of `shift-jis',
-`big5', or `ccl'.  The program is specified by the coding system (*note
-Coding Systems::).  You can also call CCL programs from other CCL
-programs, and from Lisp using these functions:
-
- - Function: ccl-execute ccl-program status
-     Execute CCL-PROGRAM with registers initialized by STATUS.
-     CCL-PROGRAM is a vector of compiled CCL code created by
-     `ccl-compile'.  It is an error for the program to try to execute a
-     CCL I/O command.  STATUS must be a vector of nine values,
-     specifying the initial value for the R0, R1 .. R7 registers and
-     for the instruction counter IC.  A `nil' value for a register
-     initializer causes the register to be set to 0.  A `nil' value for
-     the IC initializer causes execution to start at the beginning of
-     the program.  When the program is done, STATUS is modified (by
-     side-effect) to contain the ending values for the corresponding
-     registers and IC.
-
- - Function: ccl-execute-on-string ccl-program status str &optional
-          continue
-     Execute CCL-PROGRAM with initial STATUS on STRING.  CCL-PROGRAM is
-     a vector of compiled CCL code created by `ccl-compile'.  STATUS
-     must be a vector of nine values, specifying the initial value for
-     the R0, R1 .. R7 registers and for the instruction counter IC.  A
-     `nil' value for a register initializer causes the register to be
-     set to 0.  A `nil' value for the IC initializer causes execution
-     to start at the beginning of the program.  An optional fourth
-     argument CONTINUE, if non-nil, causes the IC to remain on the
-     unsatisfied read operation if the program terminates due to
-     exhaustion of the input buffer.  Otherwise the IC is set to the end
-     of the program.  When the program is done, STATUS is modified (by
-     side-effect) to contain the ending values for the corresponding
-     registers and IC.  Returns the resulting string.
-
-   To call a CCL program from another CCL program, it must first be
-registered:
-
- - Function: register-ccl-program name ccl-program
-     Register NAME for CCL program PROGRAM in `ccl-program-table'.
-     PROGRAM should be the compiled form of a CCL program, or nil.
-     Return index number of the registered CCL program.
-
-   Information about the processor time used by the CCL interpreter can
-be obtained using these functions:
-
- - Function: ccl-elapsed-time
-     Returns the elapsed processor time of the CCL interpreter as cons
-     of user and system time, as floating point numbers measured in
-     seconds.  If only one overall value can be determined, the return
-     value will be a cons of that value and 0.
-
- - Function: ccl-reset-elapsed-time
-     Resets the CCL interpreter's internal elapsed time registers.
-
-\1f
-File: lispref.info,  Node: CCL Examples,  Prev: Calling CCL,  Up: CCL
-
-CCL Examples
-------------
-
-   This section is not yet written.
-
-\1f
-File: lispref.info,  Node: Category Tables,  Prev: CCL,  Up: MULE
-
-Category Tables
-===============
-
-   A category table is a type of char table used for keeping track of
-categories.  Categories are used for classifying characters for use in
-regexps--you can refer to a category rather than having to use a
-complicated [] expression (and category lookups are significantly
-faster).
-
-   There are 95 different categories available, one for each printable
-character (including space) in the ASCII charset.  Each category is
-designated by one such character, called a "category designator".  They
-are specified in a regexp using the syntax `\cX', where X is a category
-designator. (This is not yet implemented.)
-
-   A category table specifies, for each character, the categories that
-the character is in.  Note that a character can be in more than one
-category.  More specifically, a category table maps from a character to
-either the value `nil' (meaning the character is in no categories) or a
-95-element bit vector, specifying for each of the 95 categories whether
-the character is in that category.
-
-   Special Lisp functions are provided that abstract this, so you do not
-have to directly manipulate bit vectors.
-
- - Function: category-table-p obj
-     This function returns `t' if ARG is a category table.
-
- - Function: category-table &optional buffer
-     This function returns the current category table.  This is the one
-     specified by the current buffer, or by BUFFER if it is non-`nil'.
-
- - Function: standard-category-table
-     This function returns the standard category table.  This is the
-     one used for new buffers.
-
- - Function: copy-category-table &optional table
-     This function constructs a new category table and return it.  It
-     is a copy of the TABLE, which defaults to the standard category
-     table.
-
- - Function: set-category-table table &optional buffer
-     This function selects a new category table for BUFFER.  One
-     argument, a category table.  BUFFER defaults to the current buffer
-     if omitted.
-
- - Function: category-designator-p obj
-     This function returns `t' if ARG is a category designator (a char
-     in the range `' '' to `'~'').
-
- - Function: category-table-value-p obj
-     This function returns `t' if ARG is a category table value.  Valid
-     values are `nil' or a bit vector of size 95.
-
-\1f
-File: lispref.info,  Node: Tips,  Next: Building XEmacs and Object Allocation,  Prev: MULE,  Up: Top
-
-Tips and Standards
-******************
-
-   This chapter describes no additional features of XEmacs Lisp.
-Instead it gives advice on making effective use of the features
-described in the previous chapters.
-
-* Menu:
-
-* Style Tips::                Writing clean and robust programs.
-* Compilation Tips::          Making compiled code run fast.
-* Documentation Tips::        Writing readable documentation strings.
-* Comment Tips::             Conventions for writing comments.
-* Library Headers::           Standard headers for library packages.
-