1 /* Copyright (C) 2003, 2004, 2005
2 National Institute of Advanced Industrial Science and Technology (AIST)
3 Registration Number H15PRO112
4 See the end for copying conditions. */
8 @page mdbIM Input Method
10 @section im-description DESCRIPTION
12 The m17n library provides a driver for input methods that are
13 dynamically loadable from the m17n database (see @ref m17nInputMethod
14 @latexonly (P.\pageref{group__m17nInputMethod}) @endlatexonly).
16 This section describes the data format that defines those input
19 @section im-format SYNTAX and SEMANTICS
21 The following data format defines an input method. The driver loads a
22 definition from a file, a stream, etc. The definition is converted
23 into the form of plist in the driver.
27 IM-DECLARATION ? DESCRIPTION ? TITLE ?
28 VARIABLE-LIST ? COMMAND-LIST ? MODULE-LIST ?
29 MACRO-LIST ? MAP-LIST ? STATE-LIST ?
31 IM-DECLARATION ::= '(' 'input-method' LANGUAGE NAME EXTRA-ID ? VERSION ? ')'
32 VERSION ::= '(' 'version' VERSION-NUMBER ')'
33 DESCRIPTION ::= '(' 'description' [ MTEXT-OR-GETTEXT | nil] ')'
34 VARIABLE-LIST ::= '(' 'variable' VARIABLE-DECLARATION * ')'
35 COMMAND-LIST ::= '(' 'command' COMMAND-DECLARATION * ')'
36 TITLE ::= '(' 'title' TITLE-TEXT ')'
38 VARIABLE-DECLARATION ::=
39 '(' VAR-NAME [ [ MTEXT-OR-GETTEXT | nil ] VALUE VALUE-CANDIDATE * ]')'
41 COMMAND-DECLARATION ::=
42 '(' CMD-NAME [ [ MTEXT-OR-GETTEXT | nil ] KEYSEQ * ] ')'
45 [ MTEXT | '(' '_' MTEXT ')']
51 IM-DESCRIPTION ::= MTEXT
53 VAR-DESCRIPTION ::= MTEXT
54 VALUE ::= MTEXT | SYMBOL | INTEGER
55 VALUE-CANDIDATE ::= VALUE | '(' RANGE-FROM RANGE-TO ')'
56 RANGE-FROM ::= INTEGER
59 CMD-DESCRIPTION ::= MTEXT
63 @c IM-DECLARATION specifies the language and name of this input
66 When @c LANGUAGE is @c t, the use of the input method is not limited
69 When @c NAME is @c nil, the input method is not standalone, but
70 is expected to be used in other input methods. In such cases,
71 @c EXTRA-ID is required to identify the input method.
73 @c VERSION specifies the required minimum version number of the m17n
74 library. The format is "XX.YY.ZZ" where XX is a major version
75 number, YY is a minor version number, and ZZ is a patch level.
77 @c DESCRIPTION specifies the description text of this input method by
78 @c MTEXT-OR-GETTEXT. If it takes the second form, the text is translated
79 according to the current locale by "gettext" (if the translation is
82 @c TITLE-TEXT is a text displayed on the screen when this input method
85 There is one special input method file "global.mim" that declares
86 common variables and commands. The input method driver always loads
87 this file and other input methods can inherit the variables and the
90 @c VARIABLE-DECLARATION declares a variable used in this input method.
91 If a variable must be initialized to the default value, or is to be
92 customized by a user, it must be declared here. The declaration can
93 be used in two ways. One is to introduce a new variable. In that
94 case, @c VALUE must not be omitted. Another is to inherit the variable
95 from what declared in "global.mim", and to give the different default
96 value and/or to make the variable customizable specially for the
97 current input method. In the latter case, @c VALUE can be omitted.
99 @c COMMAND-DECLARATION declares a command used in this input method.
100 If a command must be bound to the default key sequence, or is to be
101 customized by a user, it must be declared here. Like @c
102 VARIABLE-DECLARATION, the declaration can be used in two ways. One is
103 to introduce a new command. In that case, @c KEYSEQ must not be omitted.
104 Another is to inherit the command from what declared in "global.mim",
105 and to give the different key binding and/or to make the command
106 customizable specially for the current input method. In the latter
107 case, @c KEYSEQ can be omitted.
111 MODULE-LIST ::= '(' 'module' MODULE * ')'
113 MODULE ::= '(' MODULE-NAME FUNCTION * ')'
115 MODULE-NAME ::= SYMBOL
120 Each @c MODULE declares the name of an external module (i.e. dynamic
121 library) and function names exported by the module. If a @c FUNCTION has
122 name "init", it is called with only the default arguments (see the
123 section about @c CALL) when an input context is created for the input
124 method. If a @c FUNCTION has name "fini", it is called with only the
125 default arguments when an input context is destroyed.
128 MACRO-LIST ::= MACRO-INCLUSION ? '(' 'macro' MACRO * ')' MACRO-INCLUSION ?
130 MACRO ::= '(' MACRO-NAME MACRO-ACTION * ')'
132 MACRO-NAME ::= SYMBOL
134 MACRO-ACTION ::= ACTION
136 TAGS ::= `(` LANGUAGE NAME EXTRA-ID ? `)`
138 MACRO-INCLUSION ::= '(' 'include' TAGS 'macro' MACRO-NAME ? ')'
142 @c MACRO-INCLUSION includes macros from another input method specified
143 by @c TAGS. When @c MACRO-NAME is not given, all macros from the
144 input method are included.
146 @verbatim MAP-LIST ::= MAP-INCLUSION ? '(' 'map' MAP * ')'
149 MAP ::= '(' MAP-NAME RULE * ')'
153 RULE ::= '(' KEYSEQ MAP-ACTION * ')'
155 KEYSEQ ::= MTEXT | '(' [ SYMBOL | INTEGER ] * ')'
157 MAP-INCLUSION ::= '(' 'include' TAGS 'map' MAP-NAME ? ')'
161 When an input method is never standalone and always included in
162 another method, @c MAP-LIST can be omitted.
164 @c SYMBOL in the definitions of @c MAP-NAME must not be @c t nor @c
167 @c MTEXT in the definition of @c KEYSEQ consists of characters that
168 can be generated by a keyboard. Therefore @c MTEXT usually contains
169 only ASCII characters. However, if the input method is intended to be
170 used, for instance, with a West European keyboard, @c MTEXT may
171 contain Latin-1 characters.
173 @c SYMBOL in the definition of @c KEYSEQ must be the return value of
174 the minput_event_to_key () function. Under the X window system, you
175 can quickly check the value using the @c xev command. For example,
176 the return key, the backspace key, and the 0 key on the keypad are
177 represented as @c (Return) , @c (BackSpace) , and @c (KP_0)
178 respectively. If the shift, control, meta, alt, super, and hyper
179 modifiers are used, they are represented by the S- , C- , M- , A- , s-
180 , and H- prefixes respectively in this order. Thus, "return with
181 shift with meta with hyper" is @c (S-M-H-Return) . Note that "a with
182 shift" .. "z with shift" are represented simply as A .. Z . Thus "a
183 with shift with meta with hyper" is @c (M-H-A) .
185 @c INTEGER in the definition of @c KEYSEQ must be a valid character
188 @c MAP-INCLUSION includes maps from another input method specified by
189 @c TAGS. When @c MAP-NAME is not given, all maps from the input method
194 MAP-ACTION ::= ACTION
196 ACTION ::= INSERT | DELETE | SELECT | MOVE | MARK
197 | SHOW | HIDE | PUSHBACK | POP | UNDO
198 | COMMIT | UNHANDLE | SHIFT | CALL
199 | SET | IF | COND | '(' MACRO-NAME ')'
201 PREDEFINED-SYMBOL ::=
202 '@0' | '@1' | '@2' | '@3' | '@4'
203 | '@5' | '@6' | '@7' | '@8' | '@9'
204 | '@<' | '@=' | '@>' | '@-' | '@+' | '@[' | '@]'
206 | '@-0' | '@-N' | '@+N'
210 STATE-LIST ::= STATE-INCUSION ? '(' 'state' STATE * ')' STATE-INCUSION ?
212 STATE ::= '(' STATE-NAME [ STATE-TITLE-TEXT ] BRANCH * ')'
214 STATE-NAME ::= SYMBOL
216 STATE-TITLE-TEXT ::= MTEXT
218 BRANCH ::= '(' MAP-NAME BRANCH-ACTION * ')'
219 | '(' nil BRANCH-ACTION * ')'
220 | '(' t BRANCH-ACTION * ')'
222 STATE-INCLUSION ::= '(' 'include' TAGS 'state' STATE-NAME ? ')'
226 When an input system is never standalone and always included in
227 another system, @c STATE-LIST can be omitted.
229 @c STATE-INCLUSION includes states from another input method specified
230 by @c TAGS. When @c STATE-NAME is not given, all states from the input
233 The optional @c STATE-TITLE-TEXT specifies a title text displayed on
234 the screen when the input method is in this state. If @c
235 STATE-TITLE-TEXT is omitted, @c TITLE-TEXT is used.
237 In the first form of @c BRANCH, @c MAP-NAME must be an item that
238 appears in @c MAP. In this case, if a key sequence matching one of @c
239 KEYSEQs of @c MAP-NAME is typed, @c BRANCH-ACTIONs are executed.
241 In the second form of @c BRANCH, @c BRANCH-ACTIONs are executed if a
242 key sequence that doesn't match any of @c Branch's of the current
245 If there is no @c BRANCH beginning with @c nil and the typed key
246 sequence does not match any of the current @c BRANCHs, the input
247 method transits to the initial state.
249 In the third form of @c BRANCH, @c BRANCH-ACTIONs are executed when
250 shifted to the current state. If the current state is the initial
251 state, @c BRANCH-ACTIONs are executed also when an input context of
252 the input method is created.
255 BRANCH-ACTION ::= ACTION
258 An input method has the following two lists of symbols.
263 A marker is a symbol indicating a character position in the preediting
264 text. The @c MARK action assigns a position to a marker. The
265 position of a marker is referred by the @c MOVE and the @c DELETE actions.
269 A variable is a symbol associated with an integer, a symbol, or an
270 M-text value. The integer value of a variable can be set and referred
271 by the @c SET action. It can be referred by the @c SET, the @c
272 INSERT, the @c SELECT, the @c UNDO, the @c IF, the @c COND actions.
273 The M-text value of a variable can be referred by the @c INSERT
274 action. The symbol value of a variable can not be referred directly,
275 is used the library implicitly (e.g. candidates-charset). All
276 variables are implicitly initialized to the integer value zero.
280 Each @c PREDEFINED-SYMBOL has a special meaning when used as a marker.
283 <li> @c @@0, @c @@1, @c @@2, @c @@3, @c @@4, @c @@5, @c @@6, @c @@7, @c @@8, @c @@9
285 The 0th, 1st, 2nd, ... 9th position respectively.
287 <li> @c @@<, @c @@=, @c @@>
289 The first, the current, and the last position.
293 The previous and the next position.
297 The previous and the next position where a candidate list changes.
300 Some of the @c PREDEFINED-SYMBOL has a special meaning when used as a candidate
301 index in the @c SELECT action.
305 <li> @c @@<, @c @@=, @c @@>
307 The first, the current, and the last candidate of the current candidate group.
311 The previous candidate. If the current candidate is the first one in
312 the current candidate group, then it means the last candidate in the
313 previous candidate group.
317 The next candidate. If the current candidate is the last one in the
318 current candidate group, then it means the first candidate in the next
323 The candidate in the previous and the next candidate group having the same
324 candidate index as the current one.
327 And, this also has a special meaning.
332 Number of handled keys at that moment.
336 These are for supporting surround text handling.
341 -1 if surrounding text is supported, -2 if not.
345 Here, @c N is a positive integer. The value is the Nth previous
346 character in the preedit buffer. If there are only M (M<N) previous
347 characters in it, the value is the (N-M)th previous character from the
348 inputting spot. When this is used as the argument of @c delete
349 action, it specifies the number of characters to be deleted.
353 Here, @c N is a positive integer. The value is the Nth following
354 character in the preedit buffer. If there are only M (M<N) following
355 characters in it, the value is the (N-M)th following character from
356 the inputting spot. When this is used as the argument of @c delete
357 action, it specifies the number of characters to be deleted.
360 The arguments and the behavior of each action are listed below.
363 INSERT ::= '(' 'insert' MTEXT ')'
367 | '(' 'insert' SYMBOL ')'
368 | '(' 'insert' '(' CANDIDATES * ')' ')'
369 | '(' CANDIDATES * ')'
371 CANDIDATES ::= MTEXT | '(' MTEXT * ')'
374 The first and second forms insert @c MTEXT before the current position.
376 The third form inserts the character @c INTEGER before the current
379 The fourth and fith form treats @c SYMBOL as a variable, and inserts
380 its value (if it is a valid character code) before the current
383 In the sixth and seventh forms, each @c CANDIDATES represents a
384 candidate group, and each element of @c CANDIDATES represents a
385 candidate, i.e. if @c CANDIDATES is an M-text, the candidates are the
386 characters in the M-text; if @c CANDIDATES is a list of M-texts, the
387 candidates are the M-texts in the list.
389 These forms insert the first candidate before the current position.
390 The inserted string is associated with the list of candidates and
391 the information indicating the currently selected candidate.
393 The marker positions affected by the insertion are automatically relocated.
396 DELETE ::= '(' 'delete' SYMBOL ')'
397 | '(' 'delete' INTEGER ')'
400 The first form treats @c SYMBOL as a marker, and deletes characters
401 between the current position and the marker position.
403 The second form treats @c INTEGER as a character position, and deletes
404 characters between the current position and the character position.
406 The marker positions affected by the deletion are automatically relocated.
409 SELECT ::= '(' 'select' PREDEFINED-SYMBOL ')'
410 | '(' 'select' INTEGER ')'
411 | '(' 'select' SYMBOL ')'
414 This action first checks if the character just before the current position
415 belongs to a string that is associated with a candidate list. If it is,
416 the action replaces that string with a candidate specified by the
419 The first form treats @c PREDEFINED-SYMBOL as a candidate index (as
420 described above) that specifies a new candidate in the candidate list.
422 The second form treats @c INTEGER as a candidate index that specifies a
423 new candidate in the candidate list.
425 In the third form, @c SYMBOL must have a integer value, and it is treated
426 as a candidate index.
428 @verbatim SHOW ::= '(show)' @endverbatim
430 This actions instructs the input method driver to display a candidate
431 list associated with the string before the current position.
437 This action instructs the input method driver to hide the currently
438 displayed candidate list.
441 MOVE ::= '(' 'move' SYMBOL ')'
442 | '(' 'move' INTEGER ')'
445 The first form treats @c SYMBOL as a marker, and makes the marker
446 position be the new current position.
448 The second form treats @c INTEGER as a character position, and makes
449 that position be the new current position.
452 MARK ::= '(' 'mark' SYMBOL ')'
455 This action treats @c SYMBOL as a marker, and sets its position to the
456 current position. @c SYMBOL must not be a @c PREDEFINED-SYMBOL.
459 PUSHBACK :: = '(' 'pushback' INTEGER ')'
460 | '(' 'pushback' KEYSEQ ')'
463 The first form pushes back the latest @c INTEGER number of key events
464 to the event queue if @c INTEGER is positive, and pushes back all key
465 events if @c INTEGER is zero.
467 The second form pushes back keys in @c KEYSEQ to the event queue.
470 POP ::= '(' 'pop' ')'
473 This action pops the first key event that is not yet handled from the
477 UNDO :: = '(' 'undo' [ INTEGER | SYMBOL ] ')'
480 If there's no argument, this action cancels the last two key events
481 (i.e. the one that invoked this command, and the previous one).
483 If there's an integer argument NUM, it must be positive or negative
484 (not zero). If positive, from the NUMth to the last events are
485 canceled. If negative, the last (- NUM) events are canceled.
487 If there's a symbol argument, it must be resolved to an integer number
488 and the number is treated as the actual argument as above.
491 COMMIT :: = '(commit)'
494 This action commits the current preedit.
497 UNHANDLE :: = '(unhandle)'
500 This action commits the current preedit and returns the last key as
504 SHIFT :: = '(' 'shift' STATE-NAME ')'
507 If @c STATE-NAME is @c t, this action shifts the current state to the
508 previous one, otherwise it shifts to @c STATE-NAME. In the latter
509 case, @c STATE-NAME must appear in @c STATE-LIST.
512 CALL ::= '(' 'call' MODULE-NAME FUNCTION ARG * ')'
514 ARG ::= INTEGER | SYMBOL | MTEXT | PLIST
517 This action calls the function @c FUNCTION of external module @c
518 MODULE-NAME. @c MODULE-NAME and @c FUNCTION must appear in @c
521 The function is called with an argument of the type (#MPlist *). The
522 key of the first element is #Mt and its value is a pointer to an
523 object of the type #MInputContext. The key of the second element is
524 #Msymbol and its value is the current state name. @c ARGs are used as
525 the value of the third and later elements. Their keys are determined
526 automatically; if an @c ARG is an integer, the corresponding key is
527 #Minteger; if an @c ARG is a symbol, the corresponding key is
530 The function must return NULL or a value of the type (#MPlist *) that
531 represents a list of actions to take.
534 SET ::= '(' CMD SYMBOL1 EXPRESSION ')'
536 CMD ::= 'set' | 'add' | 'sub' | 'mul' | 'div'
538 EXPRESSION ::= INTEGER | SYMBOL2 | '(' OPERATOR EXPRESSION * ')'
540 OPERATOR ::= '+' | '-' | '*' | '/' | '|' | '&' | '!'
541 | '=' | '<' | '>' | '<=' | '>='
545 This action treats @c SYMBOL1 and @c SYMBOL2 as variables and sets the
546 value of @c SYMBOL1 as below.
548 If @c CMD is 'set', it sets the value of @c SYMBOL1 to the value of @c
551 If @c CMD is 'add', it increments the value of @c SYMBOL1 by the value
554 If @c CMD is 'sub', it decrements the value of @c SYMBOL1 by the value
557 If @c CMD is 'mul', it multiplies the value of @c SYMBOL1 by the value
560 If @c CMD is 'div', it divides the value of @c SYMBOL1 by the value of
564 IF ::= '(' CONDITION ACTION-LIST1 ACTION-LIST2 ')'
566 CONDITION ::= [ '=' | '<' | '>' | '<=' | '>=' ] EXPRESSION1 EXPRESSION2
568 ACTION-LIST1 ::= '(' ACTION * ')'
570 ACTION-LIST2 ::= '(' ACTION * ')'
573 This action performs actions in @c ACTION-LIST1 if @c CONDITION is
574 true, and performs @c ACTION-LIST2 (if any) otherwise.
576 @c SYMBOL1 and @c SYMBOL2 are treated as variables.
579 COND ::= '(' 'cond' [ '(' EXPRESSION ACTION * ') ] * ')'
582 This action performs the first action @c ACTION whose corresponding
583 @c EXPRESSION has nonzero value.
587 @section im-example1 EXAMPLE 1
589 This is a very simple example for inputting Latin characters with
590 diacritical marks (acute and cedilla). For instance, when you type:
592 Comme'die-Franc,aise, chic,,
597 Commédie-Française, chic,
602 \hskip5mm\texttt{\footnotesize Comm\'{e}die-Fran\c{c}aise, chic,}
606 The definition of the input method is very simple as below, and it is
607 quite straight forward to extend it to cover all Latin characters.
611 (title "latin-postfix")
614 ("a'" ?á) ("e'" ?é) ("i'" ?í) ("o'" ?ó) ("u'" ?ú) ("c," ?ç)
615 ("A'" ?Á) ("E'" ?É) ("I'" ?Í) ("O'" ?Ó) ("U'" ?Ú) ("C," ?Ç)
616 ("a''" "a'") ("e''" "e'") ("i''" "i'") ("o''" "o'") ("u''" "u'")
618 ("A''" "A'") ("E''" "E'") ("I''" "I'") ("O''" "O'") ("U''" "U'")
627 \texttt{\footnotesize
628 \hskip2mm(title "latin-postfix")\\
631 \hskip6mm ("a'" ?\'{a}) ("e'" ?\'{e}) ("i'" ?\'{i}) ("o'" ?\'{o})
632 ("u'" ?\'{u}) ("c," ?\c{c})\\
633 \hskip6mm ("A'" ?\'{A}) ("E'" ?\'{E}) ("I'" ?\'{I}) ("O'" ?\'{O})
634 ("U'" ?\'{U}) ("C," ?\c{C})\\
635 \hskip6mm ("a''" "a'") ("e''" "e'") ("i''" "i'") ("o''" "o'") ("u''" "u'")\\
636 \hskip6mm ("c,," "c,")\\
637 \hskip6mm ("A''" "A'") ("E''" "E'") ("I''" "I'") ("O''" "O'") ("U''" "U'")\\
638 \hskip6mm ("C,," "C,")))\\
645 @section im-example2 EXAMPLE 2
647 This example is for inputting Unicode characters by typing C-u
648 (Control-u) followed by four hexadecimal digits. For instance, when
649 you type ("^u" means Control-u):
651 ^u2190^u2191^u2192^u2193
653 you will get this (Unicode arrow symbols):
656 $\leftarrow \uparrow \rightarrow \downarrow
665 The definition utilizes @c SET and @c IF commands as below:
672 ("0" ?0) ("1" ?1) ... ("9" ?9) ("a" ?A) ("b" ?B) ... ("f" ?F)))
675 (starter (set code 0) (set count 0) (shift unicode)))
681 (mul code 16) (add code this)
684 ((delete @<) (insert code) (shift init))))))
687 @section im-example3 EXAMPLE 3
689 This example is for inputting Chinese characters by typing PinYin key
692 For instance, when you type:
701 The definition utilizes @c CANDIDATE and @c SELECT commands as below.
702 Note that this is just an example, and it ignores such important key
709 ;; The initial character of Pinyin.
711 ("a") ("b") ... ("h") ("j") ... ("t") ("w") ("x") ("y") ("z"))
713 ;; Big table of Pinyin vs the corresponding Chinese characters.
716 ("bei" ("被北备背悲辈杯倍贝碑" ...))
717 ("hao" ("好号毫豪浩耗皓嚎昊郝" ...))
718 ("jing" ("经京精境警竟静惊景敬" ...))
719 ("ni" ("你呢尼泥逆倪匿拟腻妮" ...))
721 ;; Typing 1, 2, ..., 0 selects the 0th, 1st, ..., 9th candidate.
723 ("1" (select 0)) ("2" (select 1)) ... ("9" (select 8)) ("0" (select 9))))
727 ;; When an initial character of Pinyin is typed, re-handle it in
728 ;; "main" state. Anything else is just produced as is.
729 (starter (show) (pushback 1) (shift main)))
732 ;; When a complete Pinyin sequence is typed, shift to "select" state
733 ;; to allow users to select one from the candidates.
734 (pinyin (shift select))
736 ;; When anything else is typed, produce the current candidate (if
737 ;; any), and re-handle the last input in "init" state.
738 (nil (hide) (shift init)))
741 ;; When a number is typed, select the corresponding canidate,
742 ;; produce it, and shift to "init" state.
743 (choose (hide) (shift init))
745 ;; When anything else is typed, produce the current candidate,
746 ;; and re-handle the last input in "init" state.
747 (nil (hide) (shift init))))
753 \fbox{This example is readable only in the documentation of HTML version.}
760 @section im-seealso SEE ALSO
762 @ref mim-list "Input Methods provided by the m17n database",
763 @ref mdbGeneral "mdbGeneral(5)"
767 Copyright (C) 2003, 2004, 2005
768 National Institute of Advanced Industrial Science and Technology (AIST)
769 Registration Number H15PRO112
771 This file is part of the m17n database; a sub-part of the m17n
774 The m17n library is free software; you can redistribute it and/or
775 modify it under the terms of the GNU Lesser General Public License
776 as published by the Free Software Foundation; either version 2.1 of
777 the License, or (at your option) any later version.
779 The m17n library is distributed in the hope that it will be useful,
780 but WITHOUT ANY WARRANTY; without even the implied warranty of
781 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
782 Lesser General Public License for more details.
784 You should have received a copy of the GNU Lesser General Public
785 License along with the m17n library; if not, write to the Free
786 Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
787 Boston, MA 02110-1301, USA.
790 /* Local Variables: */