X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;f=man%2Fxemacs%2Fmule.texi;h=b71786b0bcfbbe65c3d7081f0d90a3cdd3e047fe;hb=4f5f06637b2e5634e93f267666c2a5b4157e61e9;hp=66cb4531ed032952c40db126160e1cd0ab4f9504;hpb=6883ee56ec887c2c48abe5b06b5e66aa74031910;p=chise%2Fxemacs-chise.git- diff --git a/man/xemacs/mule.texi b/man/xemacs/mule.texi index 66cb453..b71786b 100644 --- a/man/xemacs/mule.texi +++ b/man/xemacs/mule.texi @@ -13,13 +13,14 @@ @cindex IPA @cindex Japanese @cindex Korean +@cindex Cyrillic @cindex Russian - If you compile XEmacs with mule option, it supports a wide variety of + If you compile XEmacs with Mule option, it supports a wide variety of world scripts, including Latin script, as well as Arabic script, Simplified Chinese script (for mainland of China), Traditional Chinese script (for Taiwan and Hong-Kong), Greek script, Hebrew script, IPA symbols, Japanese scripts (Hiragana, Katakana and Kanji), Korean scripts -(Hangul and Hanja) and Cyrillic script (for Beylorussian, Bulgarian, +(Hangul and Hanja) and Cyrillic script (for Byelorussian, Bulgarian, Russian, Serbian and Ukrainian). These features have been merged from the modified version of Emacs known as MULE (for ``MULti-lingual Enhancement to GNU Emacs''). @@ -29,6 +30,7 @@ Enhancement to GNU Emacs''). * Language Environments:: Setting things up for the language you use. * Input Methods:: Entering text characters not on your keyboard. * Select Input Method:: Specifying your choice of input methods. +* Mule and Fonts:: Additional font-related issues * Coding Systems:: Character set conversion when you read and write files, and so on. * Recognize Coding:: How XEmacs figures out which conversion to use. @@ -36,18 +38,60 @@ Enhancement to GNU Emacs''). @end menu @node Mule Intro, Language Environments, Mule, Mule -@section Introduction to world scripts - - The users of these scripts have established many more-or-less standard -coding systems for storing files. -@c XEmacs internally uses a single multibyte character encoding, so that it -@c can intermix characters from all these scripts in a single buffer or -@c string. This encoding represents each non-ASCII character as a sequence -@c of bytes in the range 0200 through 0377. +@section What is Mule? + +Mule is the MUltiLingual Extension to XEmacs. It provides facilities +not only for handling text written in many different languages, but in +fact multilingual texts containing several languages in the same buffer. +This goes beyond the simple facilities offered by Unicode for +representation of multilingual text. Mule also supports input methods, +composing display using fonts in various different encodings, changing +character syntax and other editing facilities to correspond to local +language usage, and more. + +The most obvious problem is that of the different character coding +systems used by different languages. ASCII supplies all the characters +needed for most computer programming languages and US English (it lacks +the currency symbol for British English), but other Western European +languages (French, Spanish, German) require more than 96 code positions +for accented characters. In fact, even with 8 bits to represent 96 more +character (including accented characters and symbols such as currency +symbols), some languages' alphabets remain incomplete (Croatian, +Polish). (The 64 "missing characters" are reserved for control +characters.) Furthermore, many European languages have their own +alphabets, which must conflict with the accented characters since the +ASCII characters are needed for computer interaction (error and log +messages are typically in ASCII). + +For economy of space, historical practice has been for each language to +establish its own encoding for the characters it needs. This allows +most European languages to represented with one octet (byte) per +character. However, many Asian languages have thousands of characters +and require two or more octets per character. For multilingual +purposes, the ISO 2022 standard establishes escape codes that allow +switching encodings in midstream. (It's also ISO 2022 that establishes +the standard that code points 0-31 and 128-159 are control codes.) + +However, this is error-prone and complex for internal processing. For +this reason XEmacs uses an internal coding system which can encode all +of the world's scripts. Unfortunately, for historical reasons, this +code is not Unicode, although we are moving in that direction. + XEmacs translates between the internal character encoding and various other coding systems when reading and writing files, when exchanging data with subprocesses, and (in some cases) in the @kbd{C-q} command -(see below). +(see below). The internal encoding is never visible to the user in a +production XEmacs, but unfortunately the process cannot be completely +transparent to the user. This is because the same ranges of octets may +represent 1-octet ISO-8859-1 (which is satisfactory for most Western +European use prior to the introduction of the Euro currency), 1-octet +ISO-8859-15 (which substitutes the Euro for the rarely used "generic +currency" symbol), 1-octet ISO-8859-5 (Cyrillic), or multioctet EUC-JP +(Japanese). There's no way to tell without being able to read! + +A number of heuristics are incorporated in Mule for automatic +recognition, there are facilities for the user to set defaults, and +where necessary (rarely, we hope) to set coding systems directly. @kindex C-h h @findex view-hello-file @@ -70,7 +114,7 @@ to world scripts, coding systems, and input methods. @cindex language environments All supported character sets are supported in XEmacs buffers if it is -compile with mule; there is no need to select a particular language in +compiled with Mule; there is no need to select a particular language in order to display its characters in an XEmacs buffer. However, it is important to select a @dfn{language environment} in order to set various defaults. The language environment really represents a choice of @@ -89,8 +133,10 @@ current when you use this command, because the effects apply globally to the XEmacs session. The supported language environments include: @quotation -Chinese-BIG5, Chinese-CNS, Chinese-GB, Cyrillic-ISO, English, Ethiopic, -Greek, Japanese, Korean, Latin-1, Latin-2, Latin-3, Latin-4, Latin-5. +ASCII, Chinese-BIG5, Chinese-GB, Croatian, Cyrillic-ALT, Cyrillic-ISO, +Cyrillic-KOI8, Cyrillic-Win, Czech, English, Ethiopic, French, German, +Greek, Hebrew, IPA, Japanese, Korean, Latin-1, Latin-2, Latin-3, Latin-4, +Latin-5, Norwegian, Polish, Romanian, Slovenian, Thai-XTIS, Vietnamese. @end quotation Some operating systems let you specify the language you are using by @@ -187,7 +233,7 @@ the partial sequence is highlighted in the buffer. If characters to type next is displayed in the echo area (but not when you are in the minibuffer). -@node Select Input Method, Coding Systems, Input Methods, Mule +@node Select Input Method, Mule and Fonts, Input Methods, Mule @section Selecting an Input Method @table @kbd @@ -249,7 +295,71 @@ the command @kbd{M-x quail-set-keyboard-layout}. list-input-methods}. The list gives information about each input method, including the string that stands for it in the mode line. -@node Coding Systems, Recognize Coding, Select Input Method, Mule +@node Mule and Fonts, Coding Systems, Select Input Method, Mule +@section Mule and Fonts +@cindex fonts +@cindex font registry +@cindex font encoding +@cindex CCL programs + +(This section is X11-specific.) + +Text in XEmacs buffers is displayed using various faces. In addition to +specifying properties of a face, such as font and color, there are some +additional properties of Mule charsets that are used in text. + +There is currently two properties of a charset that could be adjusted by +user: font registry and so called @dfn{ccl-program}. + +Font registry is a regular expression matching the font registry field +for this character set. For example, both the @code{ascii} and +@w{@code{latin-iso8859-1}} charsets use the registry @code{"ISO8859-1"}. +This field is used to choose an appropriate font when the user gives a +general font specification such as @w{@samp{-*-courier-medium-r-*-140-*}}, +i.e. a 14-point upright medium-weight Courier font. + +You can set font registry for a charset using +@samp{set-charset-registry} function in one of your startup files. This +function takes two arguments: character set (as a symbol) and font +registry (as a string). + +E.@w{ }g., for Cyrillic texts Mule uses @w{@code{cyrillic-iso8859-5}} +charset with @samp{"ISO8859-5"} as a default registry, and we want to +use @samp{"koi8-r"} instead, because fonts in that encoding are +installed on our system. Use: + +@example +(set-charset-registry 'cyrillic-iso8859-5 "koi8-r") +@end example + +(Please note that you probably also want to set font registry for +@samp{ascii} charset so that mixed English/Cyrillic texts be displayed +using the same font.) + +"CCL-programs" are a little special-purpose scripts defined within +XEmacs or in some package. Those scripts allow XEmacs to use fonts that +are in different encoding from the encoding that is used by Mule for +text in buffer. Returning to the above example, we need to somehow tell +XEmacs that we have different encodings of fonts and text and so it +needs to convert characters between those encodings when displaying. +That's what @samp{set-charset-ccl-program} function is used for. There +are quite a few various CCL programs defined within XEmacs, and there is +no comprehensive list of them, so you currently have to consult sources. +@c FIXME: there must be a list of CCL programs + +We know that there is a CCL program called @samp{ccl-encode-koi8-r-font} +that is used exactly for needed purpose: to convert characters between +@samp{ISO8859-5} encoding and @samp{koi8-r}. Use: + +@example +(set-charset-ccl-program 'cyrillic-iso8859-5 'ccl-encode-koi8-r-font) +@end example + +There are several more uses for CCL programs, not related to fonts, but +those uses are not described here. + + +@node Coding Systems, Recognize Coding, Mule and Fonts, Mule @section Coding Systems @cindex coding systems @@ -282,11 +392,15 @@ Describe the coding systems currently in use. @item M-x list-coding-systems Display a list of all the supported coding systems. + +@item C-u M-x list-coding-systems +Display comprehensive list of specific details of all supported coding +systems. @end table -@kindex C-h C +@kindex C-x @key{RET} C @findex describe-coding-system - The command @kbd{C-h C} (@code{describe-coding-system}) displays + The command @kbd{C-x RET C} (@code{describe-coding-system}) displays information about particular coding systems. You can specify a coding system name as argument; alternatively, with an empty argument, it describes the coding systems currently selected for various purposes, @@ -351,6 +465,41 @@ the usual three variants to specify the kind of end-of-line conversion. @node Recognize Coding, Specify Coding, Coding Systems, Mule @section Recognizing Coding Systems +@c #### This section is out of date. The following set-*-coding-system +@c functions are known: + +@c set-buffer-file-coding-system +@c set-buffer-file-coding-system-for-read +@c set-buffer-process-coding-system +@c set-console-tty-coding-system +@c set-console-tty-input-coding-system +@c set-console-tty-output-coding-system +@c set-default-buffer-file-coding-system +@c set-default-coding-systems +@c set-default-file-coding-system +@c set-file-coding-system +@c set-file-coding-system-for-read +@c set-keyboard-coding-system +@c set-pathname-coding-system +@c set-process-coding-system +@c set-process-input-coding-system +@c set-process-output-coding-system +@c set-terminal-coding-system + +@c Some are marked as broken. Agenda: (1) Update this section using +@c docstrings. Note that they may be inaccurate. (2) Correct the +@c documentation here, updating docstrings at the same time. + +@c Document this. + +@c set-language-environment-coding-systems + +@c What are these? + +@c dontusethis-set-value-file-name-coding-system-handler +@c dontusethis-set-value-keyboard-coding-system-handler +@c dontusethis-set-value-terminal-coding-system-handler + Most of the time, XEmacs can recognize which coding system to use for any given file--once you have specified your preferences. @@ -435,7 +584,8 @@ Specify coding system @var{coding} for the immediately following command. @item C-x @key{RET} k @var{coding} @key{RET} -Use coding system @var{coding} for keyboard input. +Use coding system @var{coding} for keyboard input. (This feature is +non-functional and is temporarily disabled.) @item C-x @key{RET} t @var{coding} @key{RET} Use coding system @var{coding} for terminal output. @@ -518,6 +668,8 @@ the sequences that are translated are typically sequences of ASCII printing characters. Coding systems typically translate sequences of non-graphic characters. +(This feature is non-functional and is temporarily disabled.) + @kindex C-x RET p @findex set-buffer-process-coding-system The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system})