git.chise.org Git - chise/xemacs-chise.git-/blob - info/lispref.info-43

   1 This is ../info/lispref.info, produced by makeinfo version 4.0 from
   2 lispref/lispref.texi.
   3
   4 INFO-DIR-SECTION XEmacs Editor
   5 START-INFO-DIR-ENTRY
   6 * Lispref: (lispref).           XEmacs Lisp Reference Manual.
   7 END-INFO-DIR-ENTRY
   8
   9    Edition History:
  10
  11    GNU Emacs Lisp Reference Manual Second Edition (v2.01), May 1993 GNU
  12 Emacs Lisp Reference Manual Further Revised (v2.02), August 1993 Lucid
  13 Emacs Lisp Reference Manual (for 19.10) First Edition, March 1994
  14 XEmacs Lisp Programmer's Manual (for 19.12) Second Edition, April 1995
  15 GNU Emacs Lisp Reference Manual v2.4, June 1995 XEmacs Lisp
  16 Programmer's Manual (for 19.13) Third Edition, July 1995 XEmacs Lisp
  17 Reference Manual (for 19.14 and 20.0) v3.1, March 1996 XEmacs Lisp
  18 Reference Manual (for 19.15 and 20.1, 20.2, 20.3) v3.2, April, May,
  19 November 1997 XEmacs Lisp Reference Manual (for 21.0) v3.3, April 1998
  20
  21    Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Free Software
  22 Foundation, Inc.  Copyright (C) 1994, 1995 Sun Microsystems, Inc.
  23 Copyright (C) 1995, 1996 Ben Wing.
  24
  25    Permission is granted to make and distribute verbatim copies of this
  26 manual provided the copyright notice and this permission notice are
  27 preserved on all copies.
  28
  29    Permission is granted to copy and distribute modified versions of
  30 this manual under the conditions for verbatim copying, provided that the
  31 entire resulting derived work is distributed under the terms of a
  32 permission notice identical to this one.
  33
  34    Permission is granted to copy and distribute translations of this
  35 manual into another language, under the above conditions for modified
  36 versions, except that this permission notice may be stated in a
  37 translation approved by the Foundation.
  38
  39    Permission is granted to copy and distribute modified versions of
  40 this manual under the conditions for verbatim copying, provided also
  41 that the section entitled "GNU General Public License" is included
  42 exactly as in the original, and provided that the entire resulting
  43 derived work is distributed under the terms of a permission notice
  44 identical to this one.
  45
  46    Permission is granted to copy and distribute translations of this
  47 manual into another language, under the above conditions for modified
  48 versions, except that the section entitled "GNU General Public License"
  49 may be included in a translation approved by the Free Software
  50 Foundation instead of in the original English.
  51
  52 \1f
  53 File: lispref.info,  Node: Unimplemented libpq Functions,  Prev: Other libpq Functions,  Up: XEmacs PostgreSQL libpq API
  54
  55 Unimplemented libpq Functions
  56 -----------------------------
  57
  58  - Unimplemented Function: PGconn *PQsetdbLogin (char *pghost, char
  59           *pgport, char *pgoptions, char *pgtty, char *dbName, char
  60           *login, char *pwd)
  61      Synchronous database connection.  PGHOST is the hostname of the
  62      PostgreSQL backend to connect to.  PGPORT is the TCP port number
  63      to use.  PGOPTIONS specifies other backend options.  PGTTY
  64      specifies the debugging tty to use.  DBNAME specifies the database
  65      name to use.  LOGIN specifies the database user name.  PWD
  66      specifies the database user's password.
  67
  68      This routine is deprecated as of libpq-7.0, and its functionality
  69      can be replaced by external Lisp code if needed.
  70
  71  - Unimplemented Function: PGconn *PQsetdb (char *pghost, char *pgport,
  72           char *pgoptions, char *pgtty, char *dbName)
  73      Synchronous database connection.  PGHOST is the hostname of the
  74      PostgreSQL backend to connect to.  PGPORT is the TCP port number
  75      to use.  PGOPTIONS specifies other backend options.  PGTTY
  76      specifies the debugging tty to use.  DBNAME specifies the database
  77      name to use.
  78
  79      This routine was deprecated in libpq-6.5.
  80
  81  - Unimplemented Function: int PQsocket (PGconn *conn)
  82      Return socket file descriptor to a backend database process.  CONN
  83      database connection object.
  84
  85  - Unimplemented Function: void PQprint (FILE *fout, PGresult *res,
  86           PGprintOpt *ps)
  87      Print out the results of a query to a designated C stream.  FOUT C
  88      stream to print to RES the query result object to print PS the
  89      print options structure.
  90
  91      This routine is deprecated as of libpq-7.0 and cannot be sensibly
  92      exported to XEmacs Lisp.
  93
  94  - Unimplemented Function: void PQdisplayTuples (PGresult *res, FILE
  95           *fp, int fillAlign, char *fieldSep, int printHeader, int
  96           quiet)
  97      RES query result object to print FP C stream to print to FILLALIGN
  98      pad the fields with spaces FIELDSEP field separator PRINTHEADER
  99      display headers?  QUIET
 100
 101      This routine was deprecated in libpq-6.5.
 102
 103  - Unimplemented Function: void PQprintTuples (PGresult *res, FILE
 104           *fout, int printAttName, int terseOutput, int width)
 105      RES query result object to print FOUT C stream to print to
 106      PRINTATTNAME print attribute names TERSEOUTPUT delimiter bars
 107      WIDTH width of column, if 0, use variable width
 108
 109      This routine was deprecated in libpq-6.5.
 110
 111  - Unimplemented Function: int PQmblen (char *s, int encoding)
 112      Determine length of a multibyte encoded char at `*s'.  S encoded
 113      string ENCODING type of encoding
 114
 115      Compatibility note:  This function was introduced in libpq-7.0.
 116
 117  - Unimplemented Function: void PQtrace (PGconn *conn, FILE *debug_port)
 118      Enable tracing on `debug_port'.  CONN database connection object.
 119      DEBUG_PORT C output stream to use.
 120
 121  - Unimplemented Function: void PQuntrace (PGconn *conn)
 122      Disable tracing.  CONN database connection object.
 123
 124  - Unimplemented Function: char *PQoidStatus (PGconn *conn)
 125      Return the object id as a string of the last tuple inserted.  CONN
 126      database connection object.
 127
 128      Compatibility note: This function is deprecated in libpq-7.0,
 129      however it is used internally by the XEmacs binding code when
 130      linked against versions prior to 7.0.
 131
 132  - Unimplemented Function: PGresult *PQfn (PGconn *conn, int fnid, int
 133           *result_buf, int *result_len, int result_is_int, PQArgBlock
 134           *args, int nargs)
 135      "Fast path" interface -- not really recommended for application use
 136      CONN A database connection object.  FNID RESULT_BUF RESULT_LEN
 137      RESULT_IS_INT ARGS NARGS
 138
 139    The following set of very low level large object functions aren't
 140 appropriate to be exported to Lisp.
 141
 142  - Unimplemented Function: int pq-lo-open (PGconn *conn, int lobjid,
 143           int mode)
 144      CONN a database connection object.  LOBJID a large object ID.
 145      MODE opening modes.
 146
 147  - Unimplemented Function: int pq-lo-close (PGconn *conn, int fd)
 148      CONN a database connection object.  FD a large object file
 149      descriptor
 150
 151  - Unimplemented Function: int pq-lo-read (PGconn *conn, int fd, char
 152           *buf, int len)
 153      CONN a database connection object.  FD a large object file
 154      descriptor.  BUF buffer to read into.  LEN size of buffer.
 155
 156  - Unimplemented Function: int pq-lo-write (PGconn *conn, int fd, char
 157           *buf, size_t len)
 158      CONN a database connection object.  FD a large object file
 159      descriptor.  BUF buffer to write from.  LEN size of buffer.
 160
 161  - Unimplemented Function: int pq-lo-lseek (PGconn *conn, int fd, int
 162           offset, int whence)
 163      CONN a database connection object.  FD a large object file
 164      descriptor.  OFFSET WHENCE
 165
 166  - Unimplemented Function: int pq-lo-creat (PGconn *conn, int mode)
 167      CONN a database connection object.  MODE opening modes.
 168
 169  - Unimplemented Function: int pq-lo-tell (PGconn *conn, int fd)
 170      CONN a database connection object.  FD a large object file
 171      descriptor.
 172
 173  - Unimplemented Function: int pq-lo-unlink (PGconn *conn, int lobjid)
 174      CONN a database connection object.  LBOJID a large object ID.
 175
 176 \1f
 177 File: lispref.info,  Node: XEmacs PostgreSQL libpq Examples,  Prev: XEmacs PostgreSQL libpq API,  Up: PostgreSQL Support
 178
 179 XEmacs PostgreSQL libpq Examples
 180 ================================
 181
 182    This is an example of one method of establishing an asynchronous
 183 connection.
 184
 185      (defun database-poller (P)
 186        (message "%S before poll" (pq-pgconn P 'pq::status))
 187        (pq-connect-poll P)
 188        (message "%S after poll" (pq-pgconn P 'pq::status))
 189        (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
 190            (message "Done!")
 191          (add-timeout .1 'database-poller P)))
 192           => database-poller
 193      (progn
 194        (setq P (pq-connect-start ""))
 195        (add-timeout .1 'database-poller P))
 196           => pg::connection-started before poll
 197           => pg::connection-made after poll
 198           => pg::connection-made before poll
 199           => pg::connection-awaiting-response after poll
 200           => pg::connection-awaiting-response before poll
 201           => pg::connection-auth-ok after poll
 202           => pg::connection-auth-ok before poll
 203           => pg::connection-setenv after poll
 204           => pg::connection-setenv before poll
 205           => pg::connection-ok after poll
 206           => Done!
 207      P
 208           => #<PGconn localhost:25432 steve/steve>
 209
 210    Here is an example of one method of doing an asynchronous reset.
 211
 212      (defun database-poller (P)
 213        (let (PS)
 214          (message "%S before poll" (pq-pgconn P 'pq::status))
 215          (setq PS (pq-reset-poll P))
 216          (message "%S after poll [%S]" (pq-pgconn P 'pq::status) PS)
 217          (if (eq (pq-pgconn P 'pq::status) 'pg::connection-ok)
 218         (message "Done!")
 219            (add-timeout .1 'database-poller P))))
 220           => database-poller
 221      (progn
 222        (pq-reset-start P)
 223        (add-timeout .1 'database-poller P))
 224           => pg::connection-started before poll
 225           => pg::connection-made after poll [pgres::polling-writing]
 226           => pg::connection-made before poll
 227           => pg::connection-awaiting-response after poll [pgres::polling-reading]
 228           => pg::connection-awaiting-response before poll
 229           => pg::connection-setenv after poll [pgres::polling-reading]
 230           => pg::connection-setenv before poll
 231           => pg::connection-ok after poll [pgres::polling-ok]
 232           => Done!
 233      P
 234           => #<PGconn localhost:25432 steve/steve>
 235
 236    And finally, an asynchronous query.
 237
 238      (defun database-poller (P)
 239        (let (R)
 240          (pq-consume-input P)
 241          (if (pq-is-busy P)
 242         (add-timeout .1 'database-poller P)
 243            (setq R (pq-get-result P))
 244            (if R
 245           (progn
 246             (push R result-list)
 247             (add-timeout .1 'database-poller P))))))
 248           => database-poller
 249      (when (pq-send-query P "SELECT * FROM xemacs_test;")
 250        (setq result-list nil)
 251        (add-timeout .1 'database-poller P))
 252           => 885
 253      ;; wait a moment
 254      result-list
 255           => (#<PGresult PGRES_TUPLES_OK - SELECT>)
 256
 257    Here is an example showing how multiple SQL statements in a single
 258 query can have all their results collected.
 259      ;; Using the same `database-poller' function from the previous example
 260      (when (pq-send-query P "SELECT * FROM xemacs_test;
 261      SELECT * FROM pg_database;
 262      SELECT * FROM pg_user;")
 263        (setq result-list nil)
 264        (add-timeout .1 'database-poller P))
 265           => 1782
 266      ;; wait a moment
 267      result-list
 268           => (#<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT> #<PGresult PGRES_TUPLES_OK - SELECT>)
 269
 270    Here is an example which illustrates collecting all data from a
 271 query, including the field names.
 272
 273      (defun pg-util-query-results (results)
 274        "Retrieve results of last SQL query into a list structure."
 275        (let ((i (1- (pq-ntuples R)))
 276         j l1 l2)
 277          (while (>= i 0)
 278            (setq j (1- (pq-nfields R)))
 279            (setq l2 nil)
 280            (while (>= j 0)
 281         (push (pq-get-value R i j) l2)
 282         (decf j))
 283            (push l2 l1)
 284            (decf i))
 285          (setq j (1- (pq-nfields R)))
 286          (setq l2 nil)
 287          (while (>= j 0)
 288            (push (pq-fname R j) l2)
 289            (decf j))
 290          (push l2 l1)
 291          l1))
 292           => pg-util-query-results
 293      (setq R (pq-exec P "SELECT * FROM xemacs_test ORDER BY field2 DESC;"))
 294           => #<PGresult PGRES_TUPLES_OK - SELECT>
 295      (pg-util-query-results R)
 296           => (("f1" "field2") ("a" "97") ("b" "97") ("stuff" "42") ("a string" "12") ("foo" "10") ("string" "2") ("text" "1"))
 297
 298    Here is an example of a query that uses a database cursor.
 299
 300      (let (data R)
 301        (setq R (pq-exec P "BEGIN;"))
 302        (setq R (pq-exec P "DECLARE k_cursor CURSOR FOR SELECT * FROM xemacs_test ORDER BY f1 DESC;"))
 303
 304        (setq R (pq-exec P "FETCH k_cursor;"))
 305        (while (eq (pq-ntuples R) 1)
 306          (push (list (pq-get-value R 0 0) (pq-get-value R 0 1)) data)
 307          (setq R (pq-exec P "FETCH k_cursor;")))
 308        (setq R (pq-exec P "END;"))
 309        data)
 310           => (("a" "97") ("a string" "12") ("b" "97") ("foo" "10") ("string" "2") ("stuff" "42") ("text" "1"))
 311
 312    Here's another example of cursors, this time with a Lisp macro to
 313 implement a mapping function over a table.
 314
 315      (defmacro map-db (P table condition callout)
 316        `(let (R)
 317           (pq-exec ,P "BEGIN;")
 318           (pq-exec ,P (concat "DECLARE k_cursor CURSOR FOR SELECT * FROM "
 319                          ,table
 320                          " "
 321                          ,condition
 322                          " ORDER BY f1 DESC;"))
 323           (setq R (pq-exec P "FETCH k_cursor;"))
 324           (while (eq (pq-ntuples R) 1)
 325             (,callout (pq-get-value R 0 0) (pq-get-value R 0 1))
 326             (setq R (pq-exec P "FETCH k_cursor;")))
 327           (pq-exec P "END;")))
 328           => map-db
 329      (defun callback (arg1 arg2)
 330        (message "arg1 = %s, arg2 = %s" arg1 arg2))
 331           => callback
 332      (map-db P "xemacs_test" "WHERE field2 > 10" callback)
 333           => arg1 = stuff, arg2 = 42
 334           => arg1 = b, arg2 = 97
 335           => arg1 = a string, arg2 = 12
 336           => arg1 = a, arg2 = 97
 337           => #<PGresult PGRES_COMMAND_OK - COMMIT>
 338
 339 \1f
 340 File: lispref.info,  Node: Internationalization,  Next: MULE,  Prev: PostgreSQL Support,  Up: Top
 341
 342 Internationalization
 343 ********************
 344
 345 * Menu:
 346
 347 * I18N Levels 1 and 2:: Support for different time, date, and currency formats.
 348 * I18N Level 3::        Support for localized messages.
 349 * I18N Level 4::        Support for Asian languages.
 350
 351 \1f
 352 File: lispref.info,  Node: I18N Levels 1 and 2,  Next: I18N Level 3,  Up: Internationalization
 353
 354 I18N Levels 1 and 2
 355 ===================
 356
 357    XEmacs is now compliant with I18N levels 1 and 2.  Specifically,
 358 this means that it is 8-bit clean and correctly handles time and date
 359 functions.  XEmacs will correctly display the entire ISO-Latin 1
 360 character set.
 361
 362    The compose key may now be used to create any character in the
 363 ISO-Latin 1 character set not directly available via the keyboard..  In
 364 order for the compose key to work it is necessary to load the file
 365 `x-compose.el'.  At any time while composing a character, `C-h' will
 366 display all valid completions and the character which would be produced.
 367
 368 \1f
 369 File: lispref.info,  Node: I18N Level 3,  Next: I18N Level 4,  Prev: I18N Levels 1 and 2,  Up: Internationalization
 370
 371 I18N Level 3
 372 ============
 373
 374 * Menu:
 375
 376 * Level 3 Basics::
 377 * Level 3 Primitives::
 378 * Dynamic Messaging::
 379 * Domain Specification::
 380 * Documentation String Extraction::
 381
 382 \1f
 383 File: lispref.info,  Node: Level 3 Basics,  Next: Level 3 Primitives,  Up: I18N Level 3
 384
 385 Level 3 Basics
 386 --------------
 387
 388    XEmacs now provides alpha-level functionality for I18N Level 3.
 389 This means that everything necessary for full messaging is available,
 390 but not every file has been converted.
 391
 392    The two message files which have been created are `src/emacs.po' and
 393 `lisp/packages/mh-e.po'.  Both files need to be converted using
 394 `msgfmt', and the resulting `.mo' files placed in some locale's
 395 `LC_MESSAGES' directory.  The test "translations" in these files are
 396 the original messages prefixed by `TRNSLT_'.
 397
 398    The domain for a variable is stored on the variable's property list
 399 under the property name VARIABLE-DOMAIN.  The function
 400 `documentation-property' uses this information when translating a
 401 variable's documentation.
 402
 403 \1f
 404 File: lispref.info,  Node: Level 3 Primitives,  Next: Dynamic Messaging,  Prev: Level 3 Basics,  Up: I18N Level 3
 405
 406 Level 3 Primitives
 407 ------------------
 408
 409  - Function: gettext string
 410      This function looks up STRING in the default message domain and
 411      returns its translation.  If `I18N3' was not enabled when XEmacs
 412      was compiled, it just returns STRING.
 413
 414  - Function: dgettext domain string
 415      This function looks up STRING in the specified message domain and
 416      returns its translation.  If `I18N3' was not enabled when XEmacs
 417      was compiled, it just returns STRING.
 418
 419  - Function: bind-text-domain domain pathname
 420      This function associates a pathname with a message domain.  Here's
 421      how the path to message file is constructed under SunOS 5.x:
 422
 423           `{pathname}/{LANG}/LC_MESSAGES/{domain}.mo'
 424
 425      If `I18N3' was not enabled when XEmacs was compiled, this function
 426      does nothing.
 427
 428  - Special Form: domain string
 429      This function specifies the text domain used for translating
 430      documentation strings and interactive prompts of a function.  For
 431      example, write:
 432
 433           (defun foo (arg) "Doc string" (domain "emacs-foo") ...)
 434
 435      to specify `emacs-foo' as the text domain of the function `foo'.
 436      The "call" to `domain' is actually a declaration rather than a
 437      function; when actually called, `domain' just returns `nil'.
 438
 439  - Function: domain-of function
 440      This function returns the text domain of FUNCTION; it returns
 441      `nil' if it is the default domain.  If `I18N3' was not enabled
 442      when XEmacs was compiled, it always returns `nil'.
 443
 444 \1f
 445 File: lispref.info,  Node: Dynamic Messaging,  Next: Domain Specification,  Prev: Level 3 Primitives,  Up: I18N Level 3
 446
 447 Dynamic Messaging
 448 -----------------
 449
 450    The `format' function has been extended to permit you to change the
 451 order of parameter insertion.  For example, the conversion format
 452 `%1$s' inserts parameter one as a string, while `%2$s' inserts
 453 parameter two.  This is useful when creating translations which require
 454 you to change the word order.
 455
 456 \1f
 457 File: lispref.info,  Node: Domain Specification,  Next: Documentation String Extraction,  Prev: Dynamic Messaging,  Up: I18N Level 3
 458
 459 Domain Specification
 460 --------------------
 461
 462    The default message domain of XEmacs is `emacs'.  For add-on
 463 packages, it is best to use a different domain.  For example, let us
 464 say we want to convert the "gorilla" package to use the domain
 465 `emacs-gorilla'.  To translate the message "What gorilla?", use
 466 `dgettext' as follows:
 467
 468      (dgettext "emacs-gorilla" "What gorilla?")
 469
 470    A function (or macro) which has a documentation string or an
 471 interactive prompt needs to be associated with the domain in order for
 472 the documentation or prompt to be translated.  This is done with the
 473 `domain' special form as follows:
 474
 475      (defun scratch (location)
 476        "Scratch the specified location."
 477        (domain "emacs-gorilla")
 478        (interactive "sScratch: ")
 479        ... )
 480
 481    It is most efficient to specify the domain in the first line of the
 482 function body, before the `interactive' form.
 483
 484    For variables and constants which have documentation strings,
 485 specify the domain after the documentation.
 486
 487  - Special Form: defvar symbol [value [doc-string [domain]]]
 488      Example:
 489           (defvar weight 250 "Weight of gorilla, in pounds." "emacs-gorilla")
 490
 491  - Special Form: defconst symbol [value [doc-string [domain]]]
 492      Example:
 493           (defconst limbs 4 "Number of limbs" "emacs-gorilla")
 494
 495    Autoloaded functions which are specified in `loaddefs.el' do not need
 496 to have a domain specification, because their documentation strings are
 497 extracted into the main message base.  However, for autoloaded functions
 498 which are specified in a separate package, use following syntax:
 499
 500  - Function: autoload symbol filename &optional docstring interactive
 501           macro domain
 502      Example:
 503           (autoload 'explore "jungle" "Explore the jungle." nil nil "emacs-gorilla")
 504
 505 \1f
 506 File: lispref.info,  Node: Documentation String Extraction,  Prev: Domain Specification,  Up: I18N Level 3
 507
 508 Documentation String Extraction
 509 -------------------------------
 510
 511    The utility `etc/make-po' scans the file `DOC' to extract
 512 documentation strings and creates a message file `doc.po'.  This file
 513 may then be inserted within `emacs.po'.
 514
 515    Currently, `make-po' is hard-coded to read from `DOC' and write to
 516 `doc.po'.  In order to extract documentation strings from an add-on
 517 package, first run `make-docfile' on the package to produce the `DOC'
 518 file.  Then run `make-po -p' with the `-p' argument to indicate that we
 519 are extracting documentation for an add-on package.
 520
 521    (The `-p' argument is a kludge to make up for a subtle difference
 522 between pre-loaded documentation and add-on documentation:  For add-on
 523 packages, the final carriage returns in the strings produced by
 524 `make-docfile' must be ignored.)
 525
 526 \1f
 527 File: lispref.info,  Node: I18N Level 4,  Prev: I18N Level 3,  Up: Internationalization
 528
 529 I18N Level 4
 530 ============
 531
 532    The Asian-language support in XEmacs is called "MULE".  *Note MULE::.
 533
 534 \1f
 535 File: lispref.info,  Node: MULE,  Next: Tips,  Prev: Internationalization,  Up: Top
 536
 537 MULE
 538 ****
 539
 540    "MULE" is the name originally given to the version of GNU Emacs
 541 extended for multi-lingual (and in particular Asian-language) support.
 542 "MULE" is short for "MUlti-Lingual Emacs".  It is an extension and
 543 complete rewrite of Nemacs ("Nihon Emacs" where "Nihon" is the Japanese
 544 word for "Japan"), which only provided support for Japanese.  XEmacs
 545 refers to its multi-lingual support as "MULE support" since it is based
 546 on "MULE".
 547
 548 * Menu:
 549
 550 * Internationalization Terminology::
 551                         Definition of various internationalization terms.
 552 * Charsets::            Sets of related characters.
 553 * MULE Characters::     Working with characters in XEmacs/MULE.
 554 * Composite Characters:: Making new characters by overstriking other ones.
 555 * Coding Systems::      Ways of representing a string of chars using integers.
 556 * CCL::                 A special language for writing fast converters.
 557 * Category Tables::     Subdividing charsets into groups.
 558
 559 \1f
 560 File: lispref.info,  Node: Internationalization Terminology,  Next: Charsets,  Up: MULE
 561
 562 Internationalization Terminology
 563 ================================
 564
 565    In internationalization terminology, a string of text is divided up
 566 into "characters", which are the printable units that make up the text.
 567 A single character is (for example) a capital `A', the number `2', a
 568 Katakana character, a Hangul character, a Kanji ideograph (an
 569 "ideograph" is a "picture" character, such as is used in Japanese
 570 Kanji, Chinese Hanzi, and Korean Hanja; typically there are thousands
 571 of such ideographs in each language), etc.  The basic property of a
 572 character is that it is the smallest unit of text with semantic
 573 significance in text processing.
 574
 575    Human beings normally process text visually, so to a first
 576 approximation a character may be identified with its shape.  Note that
 577 the same character may be drawn by two different people (or in two
 578 different fonts) in slightly different ways, although the "basic shape"
 579 will be the same.  But consider the works of Scott Kim; human beings
 580 can recognize hugely variant shapes as the "same" character.
 581 Sometimes, especially where characters are extremely complicated to
 582 write, completely different shapes may be defined as the "same"
 583 character in national standards.  The Taiwanese variant of Hanzi is
 584 generally the most complicated; over the centuries, the Japanese,
 585 Koreans, and the People's Republic of China have adopted
 586 simplifications of the shape, but the line of descent from the original
 587 shape is recorded, and the meanings and pronunciation of different
 588 forms of the same character are considered to be identical within each
 589 language.  (Of course, it may take a specialist to recognize the
 590 related form; the point is that the relations are standardized, despite
 591 the differing shapes.)
 592
 593    In some cases, the differences will be significant enough that it is
 594 actually possible to identify two or more distinct shapes that both
 595 represent the same character.  For example, the lowercase letters `a'
 596 and `g' each have two distinct possible shapes--the `a' can optionally
 597 have a curved tail projecting off the top, and the `g' can be formed
 598 either of two loops, or of one loop and a tail hanging off the bottom.
 599 Such distinct possible shapes of a character are called "glyphs".  The
 600 important characteristic of two glyphs making up the same character is
 601 that the choice between one or the other is purely stylistic and has no
 602 linguistic effect on a word (this is the reason why a capital `A' and
 603 lowercase `a' are different characters rather than different
 604 glyphs--e.g.  `Aspen' is a city while `aspen' is a kind of tree).
 605
 606    Note that "character" and "glyph" are used differently here than
 607 elsewhere in XEmacs.
 608
 609    A "character set" is essentially a set of related characters.  ASCII,
 610 for example, is a set of 94 characters (or 128, if you count
 611 non-printing characters).  Other character sets are ISO8859-1 (ASCII
 612 plus various accented characters and other international symbols), JIS
 613 X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
 614 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
 615 GB2312 (Mainland Chinese Hanzi), etc.
 616
 617    The definition of a character set will implicitly or explicitly give
 618 it an "ordering", a way of assigning a number to each character in the
 619 set.  For many character sets, there is a natural ordering, for example
 620 the "ABC" ordering of the Roman letters.  But it is not clear whether
 621 digits should come before or after the letters, and in fact different
 622 European languages treat the ordering of accented characters
 623 differently.  It is useful to use the natural order where available, of
 624 course.  The number assigned to any particular character is called the
 625 character's "code point".  (Within a given character set, each
 626 character has a unique code point.  Thus the word "set" is ill-chosen;
 627 different orderings of the same characters are different character sets.
 628 Identifying characters is simple enough for alphabetic character sets,
 629 but the difference in ordering can cause great headaches when the same
 630 thousands of characters are used by different cultures as in the Hanzi.)
 631
 632    A code point may be broken into a number of "position codes".  The
 633 number of position codes required to index a particular character in a
 634 character set is called the "dimension" of the character set.  For
 635 practical purposes, a position code may be thought of as a byte-sized
 636 index.  The printing characters of ASCII, being a relatively small
 637 character set, is of dimension one, and each character in the set is
 638 indexed using a single position code, in the range 1 through 94.  Use of
 639 this unusual range, rather than the familiar 33 through 126, is an
 640 intentional abstraction; to understand the programming issues you must
 641 break the equation between character sets and encodings.
 642
 643    JIS X 0208, i.e. Japanese Kanji, has thousands of characters, and is
 644 of dimension two - every character is indexed by two position codes,
 645 each in the range 1 through 94.  (This number "94" is not a
 646 coincidence; we shall see that the JIS position codes were chosen so
 647 that JIS kanji could be encoded without using codes that in ASCII are
 648 associated with device control functions.)  Note that the choice of the
 649 range here is somewhat arbitrary.  You could just as easily index the
 650 printing characters in ASCII using numbers in the range 0 through 93, 2
 651 through 95, 3 through 96, etc.  In fact, the standardized _encoding_
 652 for the ASCII _character set_ uses the range 33 through 126.
 653
 654    An "encoding" is a way of numerically representing characters from
 655 one or more character sets into a stream of like-sized numerical values
 656 called "words"; typically these are 8-bit, 16-bit, or 32-bit
 657 quantities.  If an encoding encompasses only one character set, then the
 658 position codes for the characters in that character set could be used
 659 directly.  (This is the case with the trivial cipher used by children,
 660 assigning 1 to `A', 2 to `B', and so on.)  However, even with ASCII,
 661 other considerations intrude.  For example, why are the upper- and
 662 lowercase alphabets separated by 8 characters?  Why do the digits start
 663 with `0' being assigned the code 48?  In both cases because semantically
 664 interesting operations (case conversion and numerical value extraction)
 665 become convenient masking operations.  Other artificial aspects (the
 666 control characters being assigned to codes 0-31 and 127) are historical
 667 accidents.  (The use of 127 for `DEL' is an artifact of the "punch
 668 once" nature of paper tape, for example.)
 669
 670    Naive use of the position code is not possible, however, if more than
 671 one character set is to be used in the encoding.  For example, printed
 672 Japanese text typically requires characters from multiple character sets
 673 - ASCII, JIS X 0208, and JIS X 0212, to be specific.  Each of these is
 674 indexed using one or more position codes in the range 1 through 94, so
 675 the position codes could not be used directly or there would be no way
 676 to tell which character was meant.  Different Japanese encodings handle
 677 this differently - JIS uses special escape characters to denote
 678 different character sets; EUC sets the high bit of the position codes
 679 for JIS X 0208 and JIS X 0212, and puts a special extra byte before each
 680 JIS X 0212 character; etc.  (JIS, EUC, and most of the other encodings
 681 you will encounter in files are 7-bit or 8-bit encodings.  There is one
 682 common 16-bit encoding, which is Unicode; this strives to represent all
 683 the world's characters in a single large character set.  32-bit
 684 encodings are often used internally in programs, such as XEmacs with
 685 MULE support, to simplify the code that manipulates them; however, they
 686 are not used externally because they are not very space-efficient.)
 687
 688    A general method of handling text using multiple character sets
 689 (whether for multilingual text, or simply text in an extremely
 690 complicated single language like Japanese) is defined in the
 691 international standard ISO 2022.  ISO 2022 will be discussed in more
 692 detail later (*note ISO 2022::), but for now suffice it to say that text
 693 needs control functions (at least spacing), and if escape sequences are
 694 to be used, an escape sequence introducer.  It was decided to make all
 695 text streams compatible with ASCII in the sense that the codes 0-31
 696 (and 128-159) would always be control codes, never graphic characters,
 697 and where defined by the character set the `SPC' character would be
 698 assigned code 32, and `DEL' would be assigned 127.  Thus there are 94
 699 code points remaining if 7 bits are used.  This is the reason that most
 700 character sets are defined using position codes in the range 1 through
 701 94.  Then ISO 2022 compatible encodings are produced by shifting the
 702 position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
 703 codes are available) into character codes 161 to 254.
 704
 705    Encodings are classified as either "modal" or "non-modal".  In a
 706 "modal encoding", there are multiple states that the encoding can be
 707 in, and the interpretation of the values in the stream depends on the
 708 current global state of the encoding.  Special values in the encoding,
 709 called "escape sequences", are used to change the global state.  JIS,
 710 for example, is a modal encoding.  The bytes `ESC $ B' indicate that,
 711 from then on, bytes are to be interpreted as position codes for JIS X
 712 0208, rather than as ASCII.  This effect is cancelled using the bytes
 713 `ESC ( B', which mean "switch from whatever the current state is to
 714 ASCII".  To switch to JIS X 0212, the escape sequence `ESC $ ( D'.
 715 (Note that here, as is common, the escape sequences do in fact begin
 716 with `ESC'.  This is not necessarily the case, however.  Some encodings
 717 use control characters called "locking shifts" (effect persists until
 718 cancelled) to switch character sets.)
 719
 720    A "non-modal encoding" has no global state that extends past the
 721 character currently being interpreted.  EUC, for example, is a
 722 non-modal encoding.  Characters in JIS X 0208 are encoded by setting
 723 the high bit of the position codes, and characters in JIS X 0212 are
 724 encoded by doing the same but also prefixing the character with the
 725 byte 0x8F.
 726
 727    The advantage of a modal encoding is that it is generally more
 728 space-efficient, and is easily extendible because there are essentially
 729 an arbitrary number of escape sequences that can be created.  The
 730 disadvantage, however, is that it is much more difficult to work with
 731 if it is not being processed in a sequential manner.  In the non-modal
 732 EUC encoding, for example, the byte 0x41 always refers to the letter
 733 `A'; whereas in JIS, it could either be the letter `A', or one of the
 734 two position codes in a JIS X 0208 character, or one of the two
 735 position codes in a JIS X 0212 character.  Determining exactly which
 736 one is meant could be difficult and time-consuming if the previous
 737 bytes in the string have not already been processed, or impossible if
 738 they are drawn from an external stream that cannot be rewound.
 739
 740    Non-modal encodings are further divided into "fixed-width" and
 741 "variable-width" formats.  A fixed-width encoding always uses the same
 742 number of words per character, whereas a variable-width encoding does
 743 not.  EUC is a good example of a variable-width encoding: one to three
 744 bytes are used per character, depending on the character set.  16-bit
 745 and 32-bit encodings are nearly always fixed-width, and this is in fact
 746 one of the main reasons for using an encoding with a larger word size.
 747 The advantages of fixed-width encodings should be obvious.  The
 748 advantages of variable-width encodings are that they are generally more
 749 space-efficient and allow for compatibility with existing 8-bit
 750 encodings such as ASCII.  (For example, in Unicode ASCII characters are
 751 simply promoted to a 16-bit representation.  That means that every
 752 ASCII character contains a `NUL' byte; evidently all of the standard
 753 string manipulation functions will lose badly in a fixed-width Unicode
 754 environment.)
 755
 756    The bytes in an 8-bit encoding are often referred to as "octets"
 757 rather than simply as bytes.  This terminology dates back to the days
 758 before 8-bit bytes were universal, when some computers had 9-bit bytes,
 759 others had 10-bit bytes, etc.
 760
 761 \1f
 762 File: lispref.info,  Node: Charsets,  Next: MULE Characters,  Prev: Internationalization Terminology,  Up: MULE
 763
 764 Charsets
 765 ========
 766
 767    A "charset" in MULE is an object that encapsulates a particular
 768 character set as well as an ordering of those characters.  Charsets are
 769 permanent objects and are named using symbols, like faces.
 770
 771  - Function: charsetp object
 772      This function returns non-`nil' if OBJECT is a charset.
 773
 774 * Menu:
 775
 776 * Charset Properties::          Properties of a charset.
 777 * Basic Charset Functions::     Functions for working with charsets.
 778 * Charset Property Functions::  Functions for accessing charset properties.
 779 * Predefined Charsets::         Predefined charset objects.
 780
 781 \1f
 782 File: lispref.info,  Node: Charset Properties,  Next: Basic Charset Functions,  Up: Charsets
 783
 784 Charset Properties
 785 ------------------
 786
 787    Charsets have the following properties:
 788
 789 `name'
 790      A symbol naming the charset.  Every charset must have a different
 791      name; this allows a charset to be referred to using its name
 792      rather than the actual charset object.
 793
 794 `doc-string'
 795      A documentation string describing the charset.
 796
 797 `registry'
 798      A regular expression matching the font registry field for this
 799      character set.  For example, both the `ascii' and `latin-iso8859-1'
 800      charsets use the registry `"ISO8859-1"'.  This field is used to
 801      choose an appropriate font when the user gives a general font
 802      specification such as `-*-courier-medium-r-*-140-*', i.e. a
 803      14-point upright medium-weight Courier font.
 804
 805 `dimension'
 806      Number of position codes used to index a character in the
 807      character set.  XEmacs/MULE can only handle character sets of
 808      dimension 1 or 2.  This property defaults to 1.
 809
 810 `chars'
 811      Number of characters in each dimension.  In XEmacs/MULE, the only
 812      allowed values are 94 or 96. (There are a couple of pre-defined
 813      character sets, such as ASCII, that do not follow this, but you
 814      cannot define new ones like this.) Defaults to 94.  Note that if
 815      the dimension is 2, the character set thus described is 94x94 or
 816      96x96.
 817
 818 `columns'
 819      Number of columns used to display a character in this charset.
 820      Only used in TTY mode. (Under X, the actual width of a character
 821      can be derived from the font used to display the characters.)  If
 822      unspecified, defaults to the dimension. (This is almost always the
 823      correct value, because character sets with dimension 2 are usually
 824      ideograph character sets, which need two columns to display the
 825      intricate ideographs.)
 826
 827 `direction'
 828      A symbol, either `l2r' (left-to-right) or `r2l' (right-to-left).
 829      Defaults to `l2r'.  This specifies the direction that the text
 830      should be displayed in, and will be left-to-right for most
 831      charsets but right-to-left for Hebrew and Arabic. (Right-to-left
 832      display is not currently implemented.)
 833
 834 `final'
 835      Final byte of the standard ISO 2022 escape sequence designating
 836      this charset.  Must be supplied.  Each combination of (DIMENSION,
 837      CHARS) defines a separate namespace for final bytes, and each
 838      charset within a particular namespace must have a different final
 839      byte.  Note that ISO 2022 restricts the final byte to the range
 840      0x30 - 0x7E if dimension == 1, and 0x30 - 0x5F if dimension == 2.
 841      Note also that final bytes in the range 0x30 - 0x3F are reserved
 842      for user-defined (not official) character sets.  For more
 843      information on ISO 2022, see *Note Coding Systems::.
 844
 845 `graphic'
 846      0 (use left half of font on output) or 1 (use right half of font on
 847      output).  Defaults to 0.  This specifies how to convert the
 848      position codes that index a character in a character set into an
 849      index into the font used to display the character set.  With
 850      `graphic' set to 0, position codes 33 through 126 map to font
 851      indices 33 through 126; with it set to 1, position codes 33
 852      through 126 map to font indices 161 through 254 (i.e. the same
 853      number but with the high bit set).  For example, for a font whose
 854      registry is ISO8859-1, the left half of the font (octets 0x20 -
 855      0x7F) is the `ascii' charset, while the right half (octets 0xA0 -
 856      0xFF) is the `latin-iso8859-1' charset.
 857
 858 `ccl-program'
 859      A compiled CCL program used to convert a character in this charset
 860      into an index into the font.  This is in addition to the `graphic'
 861      property.  If a CCL program is defined, the position codes of a
 862      character will first be processed according to `graphic' and then
 863      passed through the CCL program, with the resulting values used to
 864      index the font.
 865
 866      This is used, for example, in the Big5 character set (used in
 867      Taiwan).  This character set is not ISO-2022-compliant, and its
 868      size (94x157) does not fit within the maximum 96x96 size of
 869      ISO-2022-compliant character sets.  As a result, XEmacs/MULE
 870      splits it (in a rather complex fashion, so as to group the most
 871      commonly used characters together) into two charset objects
 872      (`big5-1' and `big5-2'), each of size 94x94, and each charset
 873      object uses a CCL program to convert the modified position codes
 874      back into standard Big5 indices to retrieve a character from a
 875      Big5 font.
 876
 877    Most of the above properties can only be set when the charset is
 878 initialized, and cannot be changed later.  *Note Charset Property
 879 Functions::.
 880
 881 \1f
 882 File: lispref.info,  Node: Basic Charset Functions,  Next: Charset Property Functions,  Prev: Charset Properties,  Up: Charsets
 883
 884 Basic Charset Functions
 885 -----------------------
 886
 887  - Function: find-charset charset-or-name
 888      This function retrieves the charset of the given name.  If
 889      CHARSET-OR-NAME is a charset object, it is simply returned.
 890      Otherwise, CHARSET-OR-NAME should be a symbol.  If there is no
 891      such charset, `nil' is returned.  Otherwise the associated charset
 892      object is returned.
 893
 894  - Function: get-charset name
 895      This function retrieves the charset of the given name.  Same as
 896      `find-charset' except an error is signalled if there is no such
 897      charset instead of returning `nil'.
 898
 899  - Function: charset-list
 900      This function returns a list of the names of all defined charsets.
 901
 902  - Function: make-charset name doc-string props
 903      This function defines a new character set.  This function is for
 904      use with MULE support.  NAME is a symbol, the name by which the
 905      character set is normally referred.  DOC-STRING is a string
 906      describing the character set.  PROPS is a property list,
 907      describing the specific nature of the character set.  The
 908      recognized properties are `registry', `dimension', `columns',
 909      `chars', `final', `graphic', `direction', and `ccl-program', as
 910      previously described.
 911
 912  - Function: make-reverse-direction-charset charset new-name
 913      This function makes a charset equivalent to CHARSET but which goes
 914      in the opposite direction.  NEW-NAME is the name of the new
 915      charset.  The new charset is returned.
 916
 917  - Function: charset-from-attributes dimension chars final &optional
 918           direction
 919      This function returns a charset with the given DIMENSION, CHARS,
 920      FINAL, and DIRECTION.  If DIRECTION is omitted, both directions
 921      will be checked (left-to-right will be returned if character sets
 922      exist for both directions).
 923
 924  - Function: charset-reverse-direction-charset charset
 925      This function returns the charset (if any) with the same dimension,
 926      number of characters, and final byte as CHARSET, but which is
 927      displayed in the opposite direction.
 928
 929 \1f
 930 File: lispref.info,  Node: Charset Property Functions,  Next: Predefined Charsets,  Prev: Basic Charset Functions,  Up: Charsets
 931
 932 Charset Property Functions
 933 --------------------------
 934
 935    All of these functions accept either a charset name or charset
 936 object.
 937
 938  - Function: charset-property charset prop
 939      This function returns property PROP of CHARSET.  *Note Charset
 940      Properties::.
 941
 942    Convenience functions are also provided for retrieving individual
 943 properties of a charset.
 944
 945  - Function: charset-name charset
 946      This function returns the name of CHARSET.  This will be a symbol.
 947
 948  - Function: charset-doc-string charset
 949      This function returns the doc string of CHARSET.
 950
 951  - Function: charset-registry charset
 952      This function returns the registry of CHARSET.
 953
 954  - Function: charset-dimension charset
 955      This function returns the dimension of CHARSET.
 956
 957  - Function: charset-chars charset
 958      This function returns the number of characters per dimension of
 959      CHARSET.
 960
 961  - Function: charset-columns charset
 962      This function returns the number of display columns per character
 963      (in TTY mode) of CHARSET.
 964
 965  - Function: charset-direction charset
 966      This function returns the display direction of CHARSET--either
 967      `l2r' or `r2l'.
 968
 969  - Function: charset-final charset
 970      This function returns the final byte of the ISO 2022 escape
 971      sequence designating CHARSET.
 972
 973  - Function: charset-graphic charset
 974      This function returns either 0 or 1, depending on whether the
 975      position codes of characters in CHARSET map to the left or right
 976      half of their font, respectively.
 977
 978  - Function: charset-ccl-program charset
 979      This function returns the CCL program, if any, for converting
 980      position codes of characters in CHARSET into font indices.
 981
 982    The only property of a charset that can currently be set after the
 983 charset has been created is the CCL program.
 984
 985  - Function: set-charset-ccl-program charset ccl-program
 986      This function sets the `ccl-program' property of CHARSET to
 987      CCL-PROGRAM.
 988
 989 \1f
 990 File: lispref.info,  Node: Predefined Charsets,  Prev: Charset Property Functions,  Up: Charsets
 991
 992 Predefined Charsets
 993 -------------------
 994
 995    The following charsets are predefined in the C code.
 996
 997      Name                    Type  Fi Gr Dir Registry
 998      --------------------------------------------------------------
 999      ascii                    94    B  0  l2r ISO8859-1
1000      control-1                94       0  l2r ---
1001      latin-iso8859-1          94    A  1  l2r ISO8859-1
1002      latin-iso8859-2          96    B  1  l2r ISO8859-2
1003      latin-iso8859-3          96    C  1  l2r ISO8859-3
1004      latin-iso8859-4          96    D  1  l2r ISO8859-4
1005      cyrillic-iso8859-5       96    L  1  l2r ISO8859-5
1006      arabic-iso8859-6         96    G  1  r2l ISO8859-6
1007      greek-iso8859-7          96    F  1  l2r ISO8859-7
1008      hebrew-iso8859-8         96    H  1  r2l ISO8859-8
1009      latin-iso8859-9          96    M  1  l2r ISO8859-9
1010      thai-tis620              96    T  1  l2r TIS620
1011      katakana-jisx0201        94    I  1  l2r JISX0201.1976
1012      latin-jisx0201           94    J  0  l2r JISX0201.1976
1013      japanese-jisx0208-1978   94x94 @  0  l2r JISX0208.1978
1014      japanese-jisx0208        94x94 B  0  l2r JISX0208.19(83|90)
1015      japanese-jisx0212        94x94 D  0  l2r JISX0212
1016      chinese-gb2312           94x94 A  0  l2r GB2312
1017      chinese-cns11643-1       94x94 G  0  l2r CNS11643.1
1018      chinese-cns11643-2       94x94 H  0  l2r CNS11643.2
1019      chinese-big5-1           94x94 0  0  l2r Big5
1020      chinese-big5-2           94x94 1  0  l2r Big5
1021      korean-ksc5601           94x94 C  0  l2r KSC5601
1022      composite                96x96    0  l2r ---
1023
1024    The following charsets are predefined in the Lisp code.
1025
1026      Name                     Type  Fi Gr Dir Registry
1027      --------------------------------------------------------------
1028      arabic-digit             94    2  0  l2r MuleArabic-0
1029      arabic-1-column          94    3  0  r2l MuleArabic-1
1030      arabic-2-column          94    4  0  r2l MuleArabic-2
1031      sisheng                  94    0  0  l2r sisheng_cwnn\|OMRON_UDC_ZH
1032      chinese-cns11643-3       94x94 I  0  l2r CNS11643.1
1033      chinese-cns11643-4       94x94 J  0  l2r CNS11643.1
1034      chinese-cns11643-5       94x94 K  0  l2r CNS11643.1
1035      chinese-cns11643-6       94x94 L  0  l2r CNS11643.1
1036      chinese-cns11643-7       94x94 M  0  l2r CNS11643.1
1037      ethiopic                 94x94 2  0  l2r Ethio
1038      ascii-r2l                94    B  0  r2l ISO8859-1
1039      ipa                      96    0  1  l2r MuleIPA
1040      vietnamese-lower         96    1  1  l2r VISCII1.1
1041      vietnamese-upper         96    2  1  l2r VISCII1.1
1042
1043    For all of the above charsets, the dimension and number of columns
1044 are the same.
1045
1046    Note that ASCII, Control-1, and Composite are handled specially.
1047 This is why some of the fields are blank; and some of the filled-in
1048 fields (e.g. the type) are not really accurate.
1049
1050 \1f
1051 File: lispref.info,  Node: MULE Characters,  Next: Composite Characters,  Prev: Charsets,  Up: MULE
1052
1053 MULE Characters
1054 ===============
1055
1056  - Function: make-char charset arg1 &optional arg2
1057      This function makes a multi-byte character from CHARSET and octets
1058      ARG1 and ARG2.
1059
1060  - Function: char-charset ch
1061      This function returns the character set of char CH.
1062
1063  - Function: char-octet ch &optional n
1064      This function returns the octet (i.e. position code) numbered N
1065      (should be 0 or 1) of char CH.  N defaults to 0 if omitted.
1066
1067  - Function: find-charset-region start end &optional buffer
1068      This function returns a list of the charsets in the region between
1069      START and END.  BUFFER defaults to the current buffer if omitted.
1070
1071  - Function: find-charset-string string
1072      This function returns a list of the charsets in STRING.
1073
1074 \1f
1075 File: lispref.info,  Node: Composite Characters,  Next: Coding Systems,  Prev: MULE Characters,  Up: MULE
1076
1077 Composite Characters
1078 ====================
1079
1080    Composite characters are not yet completely implemented.
1081
1082  - Function: make-composite-char string
1083      This function converts a string into a single composite character.
1084      The character is the result of overstriking all the characters in
1085      the string.
1086
1087  - Function: composite-char-string ch
1088      This function returns a string of the characters comprising a
1089      composite character.
1090
1091  - Function: compose-region start end &optional buffer
1092      This function composes the characters in the region from START to
1093      END in BUFFER into one composite character.  The composite
1094      character replaces the composed characters.  BUFFER defaults to
1095      the current buffer if omitted.
1096
1097  - Function: decompose-region start end &optional buffer
1098      This function decomposes any composite characters in the region
1099      from START to END in BUFFER.  This converts each composite
1100      character into one or more characters, the individual characters
1101      out of which the composite character was formed.  Non-composite
1102      characters are left as-is.  BUFFER defaults to the current buffer
1103      if omitted.
1104
1105 \1f
1106 File: lispref.info,  Node: Coding Systems,  Next: CCL,  Prev: Composite Characters,  Up: MULE
1107
1108 Coding Systems
1109 ==============
1110
1111    A coding system is an object that defines how text containing
1112 multiple character sets is encoded into a stream of (typically 8-bit)
1113 bytes.  The coding system is used to decode the stream into a series of
1114 characters (which may be from multiple charsets) when the text is read
1115 from a file or process, and is used to encode the text back into the
1116 same format when it is written out to a file or process.
1117
1118    For example, many ISO-2022-compliant coding systems (such as Compound
1119 Text, which is used for inter-client data under the X Window System) use
1120 escape sequences to switch between different charsets - Japanese Kanji,
1121 for example, is invoked with `ESC $ ( B'; ASCII is invoked with `ESC (
1122 B'; and Cyrillic is invoked with `ESC - L'.  See `make-coding-system'
1123 for more information.
1124
1125    Coding systems are normally identified using a symbol, and the
1126 symbol is accepted in place of the actual coding system object whenever
1127 a coding system is called for. (This is similar to how faces and
1128 charsets work.)
1129
1130  - Function: coding-system-p object
1131      This function returns non-`nil' if OBJECT is a coding system.
1132
1133 * Menu:
1134
1135 * Coding System Types::               Classifying coding systems.
1136 * ISO 2022::                          An international standard for
1137                                         charsets and encodings.
1138 * EOL Conversion::                    Dealing with different ways of denoting
1139                                         the end of a line.
1140 * Coding System Properties::          Properties of a coding system.
1141 * Basic Coding System Functions::     Working with coding systems.
1142 * Coding System Property Functions::  Retrieving a coding system's properties.
1143 * Encoding and Decoding Text::        Encoding and decoding text.
1144 * Detection of Textual Encoding::     Determining how text is encoded.
1145 * Big5 and Shift-JIS Functions::      Special functions for these non-standard
1146                                         encodings.
1147 * Predefined Coding Systems::         Coding systems implemented by MULE.
1148