2 National Institute of Advanced Industrial Science and Technology (AIST)
3 Registration Number H15PRO112
4 See the end for copying conditions. */
9 <style type="text/css">
15 @page mdbTutorialIM Tutorial of input method
17 @section im-struct Structure of an input method file
19 An input method is defined in a *.mim file with this format.
22 (input-method LANG NAME)
24 (description (_ "DESCRIPTION"))
26 (title "TITLE-STRING")
30 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
31 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
34 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
35 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
41 (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
44 (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
48 Lowercase letters and parentheses are literals, so they must be
49 written as they are. Uppercase letters represent arbitrary strings.
51 KEYSEQ specifies a sequence of keys in this format:
53 (SYMBOLIC-KEY SYMBOLIC-KEY ...)
55 where SYMBOLIC-KEY is the keysym value returned by the xev command.
60 represents a key sequence of <<n>> and <<i>>.
61 If all SYMBOLIC-KEYs are ASCII characters, you can use the short form
65 instead. Consult #mdbIM for Non-ASCII characters.
67 Both MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
71 The most common action is [[insert]], which is written as this:
75 But as it is very frequently used, you can use the short form
79 If [["TEXT"]] contains only one character "C", you can write it as
87 So the shortest notation for an action of inserting "a" is
92 @section im-upcase Simple example of capslock
94 Here is a simple example of an input method that works as CapsLock.
97 (input-method en capslock)
98 (description (_ "Upcase all lowercase letters"))
101 (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
102 ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
103 ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
104 ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
105 ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
111 When this input method is activated, it is in the initial condition of
112 the first state (in this case, the only state [[init]]). In the
113 initial condition, no key is being processed and no action is
114 suspended. When the input method receives a key event <<a>>, it
115 searches branches in the current state for a rule that matches <<a>>
116 and finds one in the map [[toupper]]. Then it executes MAP-ACTIONs
117 (in this case, just inserting "A" in the preedit buffer). After all
118 MAP-ACTIONs have been executed, the input method shifts to the initial
119 condition of the current state.
121 The shift to <em>the initial condition of the first state</em> has a special
122 meaning; it commits all characters in the preedit buffer then clears
125 As a result, "A" is given to the application program.
127 When a key event does not match with any rule in the current state,
128 that event is unhandled and given back to the application program.
130 Turkish users may want to extend the above example for "İ" (U+0130:
131 LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the
132 key sequence <<i>> <<i>> for that character is convenient. So, he
133 will add this rule in [[toupper]].
139 However, we already have the following rule:
145 What will happen when a key event <<i>> is sent to the input method?
147 No problem. When the input method receives <<i>>, it inserts "I" in the
148 preedit buffer. It knows that there is another rule that may
149 match the additional key event <<i>>. So, after inserting "I", it
150 suspends the normal behavior of shifting to the initial condition, and
151 waits for another key. Thus, the user sees "I" with underline, which
152 indicates it is not yet committed.
154 When the input method receives the next <<i>>, it cancels the effects
155 done by the rule for the previous "i" (in this case, the preedit buffer is
156 cleared), and executes MAP-ACTIONs of the rule for "ii". So, "İ" is
157 inserted in the preedit buffer. This time, as there are no other rules
158 that match with an additional key, it shifts to the initial condition
159 of the current state, which leads to commit "İ".
161 Then, what will happen when the next key event is <<a>> instead of <<i>>?
165 The input method knows that there are no rules that match the <<i>> <<a>> key
166 sequence. So, when it receives the next <<a>>, it executes the
167 suspended behavior (i.e. shifting to the initial condition), which
168 leads to commit "I". Then the input method tries to handle <<a>> in
169 the current state, which leads to commit "A".
171 So far, we have explained MAP-ACTION, but not
172 BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION.
173 It is executed only after a matching rule has been determined and the
174 corresponding MAP-ACTIONs have been executed. A typical use of
175 BRANCH-ACTION is to shift to a different state.
177 To see this effect, let us modify the current input method to upcase only
178 word-initial letters (i.e. to capitalize). For that purpose,
179 we modify the "init" state as this:
183 (toupper (shift non-upcase)))
186 Here [[(shift non-upcase)]] is an action to shift to the new state
187 [[non-upcase]], which has two branches as below:
195 The first branch is simple. We can define the new map [[lower]] as the
196 following to insert lowercase letters as they are.
201 (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
202 ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
203 ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
204 ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
205 ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
209 The second branch has a special meaning. The map name [[nil]] means
210 that it matches with any key event that does not match any rules in the
211 other maps in the current state. In addition, it does not
212 consume any key event. We will show the full code of the new input
213 method before explaining how it works.
216 (input-method en titlecase)
217 (description (_ "Titlecase letters"))
220 (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
221 ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
222 ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
223 ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
224 ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
225 ("z" "Z") ("ii" "İ"))
226 (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
227 ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
228 ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
229 ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
230 ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
234 (toupper (shift non-upcase)))
240 Let's see what happens when the user types the key sequence <<a>> <<b>> << >>.
241 Upon <<a>>, "A" is committed and the state shifts to [[non-upcase]].
242 So, the next <<b>> is handled in the [[non-upcase]] state.
244 rule in the map [[lower]], "b" is inserted in the preedit buffer and it
245 is committed explicitly by the "commit" command in BRANCH-ACTION. After
246 that, the input method is still in the [[non-upcase]] state. So the next << >>
247 is also handled in [[non-upcase]]. For this time, no rule in this state
248 matches it. Thus the branch [[(nil (shift init))]] is selected and the
249 state is shifted to [[init]]. Please note that << >> is not yet
250 handled because the map [[nil]] does not consume any key event.
251 So, the input method tries to handle it in the [[init]] state. Again no
252 rule matches it. Therefore, that event is given back to the application
253 program, which usually inserts a space for that.
255 When you type "a quick blown fox" with this input method, you get "A
256 Quick Blown Fox". OK, you find a typo in "blown", which should be
257 "brown". To correct it, you probably move the cursor after "l" and type
258 <<Backspace>> and <<r>>. However, if the current input method is still
259 active, a capital "R" is inserted. It is not a sophisticated
262 @section im-surrounding-text Example of utilizing surrounding text support
264 To make the input method work well also in such a case, we must use
265 "surrounding text support". It is a way to check characters around
266 the inputting spot and delete them if necessary. Note that
267 this facility is available only with Gtk+ applications and Qt
268 applications. You cannot use it with applications that use XIM
269 to communicate with an input method.
271 Before explaining how to utilize "surrounding text support", you must
272 understand how to use variables, arithmetic comparisons, and
275 At first, any symbol (except for several preserved ones) used as ARG
276 of an action is treated as a variable. For instance, the commands
279 (set X 32) (insert X)
282 set the variable [[X]] to integer value 32, then insert a character
283 whose Unicode character code is 32 (i.e. SPACE).
285 The second argument of the [[set]] action can be an expression of this form:
288 (OPERAND ARG1 [ARG2])
291 Both ARG1 and ARG2 can be an expression. So,
294 (set X (+ (* Y 32) Z))
297 sets [[X]] to the value of [[Y * 32 + Z]].
299 We have the following arithmetic/bitwise OPERANDs (require two arguments):
305 these relational OPERANDs (require two arguments):
311 and this logical OPERAND (requires one argument):
317 For surrounding text support, we have these preserved variables:
320 @-0, @-N, @+N (N is a positive integer)
323 The values of them are predefined as below and can not be altered.
328 -1 if surrounding text is supported, -2 if not.
332 The Nth previous character in the preedit buffer. If there are only M
333 (M<N) previous characters in it, the value is the (N-M)th previous
334 character from the inputting spot.
338 The Nth following character in the preedit buffer. If there are only M
339 (M<N) following characters in it, the value is the (N-M)th following
340 character from the inputting spot.
344 So, provided that you have this context:
350 ("def" is in the preedit buffer, two "|"s indicate borders between the
351 preedit buffer and the surrounding text) and your current position in
352 the preedit buffer is between "d" and "e", you get these values:
363 Next, you have to understand the conditional action of this form:
367 (EXPR1 ACTION ACTION ...)
368 (EXPR2 ACTION ACTION ...)
372 where EXPRn are expressions. When an input method executes this
373 action, it resolves the values of EXPRn one by one from the first branch.
374 If the value of EXPRn is resolved into nonzero, the corresponding
375 actions are executed.
377 Now you are ready to write a new version of the input method "Titlecase".
380 (input-method en titlecase2)
381 (description (_ "Titlecase letters"))
384 (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
385 ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
386 ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
387 ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
388 ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
389 ("z" "Z") ("ii" "İ")))
394 ;; Now we have exactly one uppercase character in the preedit
395 ;; buffer. So, "@-2" is the character just before the inputting
398 (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
399 (& (>= @-2 ?a) (<= @-2 ?z))
402 ;; If the character before the inputting spot is A..Z,
403 ;; a..z, or İ, remember the only character in the preedit
404 ;; buffer in the variable X and delete it.
406 (set X @-1) (delete @-)
408 ;; Then insert the lowercase version of X.
411 (1 (set X (+ X 32)) (insert X))))))))
414 The above example contains the new action [[delete]]. So, it is time
415 to explain more about the preedit buffer. The preedit buffer is a
416 temporary place to store a sequence of characters. In this buffer,
417 the input method keeps a position called the "current position". The
418 current position exists between two characters, at the beginning of
419 the buffer, or at the end of the buffer. The [[insert]] action inserts
420 characters before the current position. For instance, when your
421 preedit buffer contains "ab.c" ("." indicates the current position),
427 changes the buffer to "abxyz.c".
429 There are several predefined variables that represent a specific position in the
430 preedit buffer. They are:
435 The first, current, and last positions.
439 The previous and the next positions.
442 The format of the [[delete]] action is this:
448 where POS is a predefined positional variable.
449 The above action deletes the characters between POS and
450 the current position. So, [[(delete @-)]] deletes one character before
451 the current position. The other examples of [[delete]] include the followings:
454 (delete @+) ; delete the next character
455 (delete @<) ; delete all the preceding characters in the buffer
456 (delete @>) ; delete all the following characters in the buffer
459 You can change the current position using the [[move]] action as below:
462 (move @-) ; move the current position to the position before the
464 (move @<) ; move to the first position
467 Other positional variables work similarly.
469 Let's see how our new example works. Whatever a key event is, the
470 input method is in its only state, [[init]]. Since an event of a lower letter
471 key is firstly handled by MAP-ACTIONs, every key is changed into the
472 corresponding uppercase and put into the preedit buffer. Now this character
473 can be accessed with [[@-1]].
475 How can we tell whether the new character should be a lowercase or an
476 uppercase? We can do so by checking the character before it, i.e.
477 [[@-2]]. BRANCH-ACTIONs in the [[init]] state do the job.
479 It first checks if the character [[@-2]] is between A to Z, between
480 a to z, or İ by the conditional below.
483 (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
484 (& (>= @-2 ?a) (<= @-2 ?z))
488 If not, there is nothing to do specially. If so, our new key should
489 be changed back into lowercase. Since the uppercase character is
490 already in the preedit buffer, we retrieve and remember it in the
497 and then delete that character by
503 Lastly we re-insert the character in its lowercase form. The
504 problem here is that "İ" must be changed into "i", so we need another
505 conditional. The first branch
511 means that "if the character remembered in X is 'İ', 'i' is inserted".
516 (1 (set X (+ X 32)) (insert X))
519 starts with "1", which is always resolved into nonzero, so this branch
520 is a catchall. Actions in this branch increase [[X]] by 32, then
521 insert [[X]]. In other words, they change A...Z into a...z
522 respectively and insert the resulting lowercase character into the
523 preedit buffer. As the input method reaches the end of the
524 BRANCH-ACTIONs, the character is commited.
526 This new input method always checks the character before the current
527 position, so "A Quick Blown Fox" will be successfully fixed to "A
528 Quick Brown Fox" by the key sequence <<BackSpace>> <<r>>.
535 National Institute of Advanced Industrial Science and Technology (AIST)
536 Registration Number H15PRO112
538 This file is part of the m17n database; a sub-part of the m17n
541 The m17n library is free software; you can redistribute it and/or
542 modify it under the terms of the GNU Lesser General Public License
543 as published by the Free Software Foundation; either version 2.1 of
544 the License, or (at your option) any later version.
546 The m17n library is distributed in the hope that it will be useful,
547 but WITHOUT ANY WARRANTY; without even the implied warranty of
548 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
549 Lesser General Public License for more details.
551 You should have received a copy of the GNU Lesser General Public
552 License along with the m17n library; if not, write to the Free
553 Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
554 Boston, MA 02110-1301, USA.