1 An input method is defined in a *.mim file with this format.
3 (input-method LANG NAME)
5 (description (_ "DESCRIPTION"))
11 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
12 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
15 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
16 (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
22 (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
25 (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
29 Lowercase letters and parentheses are literals, so they must be
30 written as they are. Uppercase letters represent arbitrary strings.
32 KEYSEQ specifies a sequence of keys in this format:
34 (SYMBOLIC-KEY BYMBOLIC-KEY ...)
36 where SYMBOLIC-KEY is the keysym
41 represents a key sequence of <n> and <i>. If all SYMBOLIC-KEYs are
42 ASCII characters, you can use the short form
46 instead. Consult the file IM.txt for Non-ASCII characters.
48 MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
52 The most common action is "insert" which can be written as this:
56 But as it is very frequently used, you can use the short form
60 When TEXT contains only one character <C>, you can write it as
68 Therefore the shortest notation for an action of inserting <a> is
72 Here is a simple example:
74 ---upcase.mim---------------------------------------------------------
75 (input-method en upcase)
76 (description (_ "Upcase all lowercase letters"))
79 (toupper ("a" "A") ("b" "B") ... ("z" "Z")))
82 ----------------------------------------------------------------------
84 When this input method is activated, it is in the initial condition of
85 the first state (in this case, it is "init"). In the initial condition,
86 no key is being processed and no action is suspended. When it
87 receives a key event <a>, it searches branches in the current state
88 for a rule that matches with <a> and finds one in the map "toupper".
89 Then it executes MAP-ACTIONs (in this case, just inserting "A" in the
90 preedit buffer). After all MAP-ACTIONs are executed, the input
91 method shifts to the initial condition of the current state.
93 The shift to "the initial condition of the first state" has a special
94 meaning; it commits all characters in the preedit buffer then clears
97 As a result, "A" is given to an application program.
99 When a key event does not match with any rule in the current state,
100 that event is unhandled and given back to an application program.
102 A Turkish user may want to extend the above example for "İ" (U+0130:
103 LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the
104 key sequence <i> <i> for that character is convenient. So, he will
105 add this rule in "toupper".
109 But we already have this rule too:
113 What will happen when a key event <i> is sent to the input method?
115 No problem. When the input method receives <i>, it inserts "I" in the
116 preedit buffer. But, it knows that there is another rule that may
117 match with the additional key event <i>. So, after inserting "I", it
118 suspends the normal behavior of shifting to the initial condition, and
119 waits for another key. Thus, the user sees "I" with underline, which
120 indicates it is not yet committed.
122 When the input method receives the next <i>, it cancels the effects
123 done by the rule for the previous "i" (in this case, the preedit buffer is
124 cleared), and executes MAP-ACTIONs of the rule for "ii". So, "İ" is
125 inserted in the preedit buffer. This time, as there are no other rules
126 that match with an additional key, it shifts to the initial condition
127 of the current state, and commits "İ".
129 Then, what will happen when the next key event is <a> instead of <i>?
133 The input method knows that there are no rules that match the <i> <a> key
134 sequence. So, when it receives the next <a>, it executes the
135 suspended behavior (i.e. shifting to the initial condition), which
136 leads to commit "S". Then the input method tries to handle <a> in the current
137 state, which leads to commit "A".
139 So far, we have explained MAP-ACTION, but not
140 BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION.
141 It is executed only after a matching rule was determined and the
142 corresponding MAP-ACTIONs were executed. A typical use of
143 BRANCH-ACTION is to shift to a different state.
145 To see this effect, let us modify the current input method to upcase a
146 letter only at a beginning of a word (i.e. capitalizing). For that purpose,
147 we modify the "init" state as this:
150 (toupper (shift non-upcase)))
152 Here "(shift non-upcase)" is an action to shift to the new state
153 "non-upcase", which has two branches as below:
159 The first branch is simple. We can define the new map "lower" as the
160 following to insert lowercase letters as they are.
164 (lower ("a" "a") ("b" "b") ... ("z" "z")))
166 The second branch has a special meaning. The map name "nil" means
167 that it matches with any key event that does not match any rules in the
168 other maps in the same state. In addition, it does not
169 consume any key event. We will show the full code of the new input
170 method before explaining how it works.
172 ---titlecase.mim------------------------------------------------------
173 (input-method en titlecase)
174 (description (_ "Titlecase letters"))
177 (toupper ("a" "A") ("b" "B") ... ("z" "Z") ("ii" "İ"))
178 (lower ("a" "a") ("b" "b") ... ("z" "z")))
181 (toupper (shift non-upcase)))
185 ----------------------------------------------------------------------
187 Let's see what happens when the user types the keys <a> <b> < >.
189 Upon <a>, "A" is committed and the state shifts to "non-upcase".
190 So, the next <b> is handled in the "non-upcase" state. As it matches with a
191 rule in the map "lower", "b" is inserted in the preedit buffer and it
192 is committed explicitly by the "commit" command in BRANCH-ACTION. After
193 that, the input method is still in "non-upcase" state. So the next < >
194 is also handled in "non-upcase". This time, as no rule in this state
195 matches with it, the branch "(nil (shift init))" is selected, and the
196 state is changed to init. Please note that < > is not yet handled.
197 So, the input method tries to handle it in the "init" state. Again no
198 rule matches with it. Therefore, that event is given back to an application
199 program, which usually inserts a space for that.
201 When you type "a quick blown fox" with this input method, you get "A
202 Quick Blown Fox". OK, you find a typo in "blown", it should be
203 "brown". To correct it, you move the cursor after "l" and type
204 Backspace and <r>. However, if the current input method is still
205 active, a capital "R" is inserted. It is not a sophisticated
208 To make the input method work well also in such a case, we must use
209 "surrounding text support". It is a way to check characters around
210 the inputting spot and delete them if necessary. Please note that
211 this facility is available only with Gtk+ applications and Qt
212 applications. You cannot use it with applications that use XIM
213 to communicate with an input method.
215 Before explaining how to utilize "surrounding text support", you must
216 understand how to use variables, arithmetic comparison, and
219 At first, any symbol (except for several preserved ones) used as ARG
220 of an action is treated as a variable. For instance, the commands
222 (set X 32) (insert X)
224 set the variable "X" to integer value 32, then insert a character
225 whose Unicode character code is 32 (i.e. SPACE).
227 The second argument of the "set" action can be an expression of this form:
229 (OPERAND ARG1 [ARG2])
231 Both ARG1 and ARG2 can be an expression. So,
233 (set X (+ (* Y 32) Z)) --- set "X" to the value of "Y*32+Z"
235 We have the following arithmetic/bitwise OPERANDs (require two arguments):
239 these relational OPERANDs (require two arguments):
243 and this logical OPERAND (require one argument):
247 For surrounding text support, we have these preserved variables:
249 @-0, @-N, @+N (N is a natural number)
251 The values of them are predefined as below and can not be altered.
253 @-0 -- -1 if surrounding text is supported, -2 if not.
254 @-N -- The Nth previous character in the preedit buffer. If there is only
255 M (M<N) previous characters in it, the value is the (N-M)th previous
256 character from the inputting spot.
257 @+N -- The Nth following character in the preedit buffer. If there is only
258 M (M<N) following characters in it, the value is the (N-M)th following
259 character from the inputting spot.
261 So, provided that you have this context:
265 ("def" is in the preedit buffer, two "|"s indicate borders between the
266 preedit buffer and the surrounding text) and your current position in
267 the preedit buffer is between "d" and "e", we have these values:
276 Next, you have to understand a conditional action of this form:
279 (EXPR1 ACTION ACTION ...)
280 (EXPR2 ACTION ACTION ...)
283 where EXPRn are expressions. When an input method executes this
284 action, it resolves the values of EXPRn one by one from the first branch.
285 If the value of EXPRn is resolved into nonzero, the corresponding
286 actions are executed.
288 Now you are ready to write a new version of the input method "Titlecase".
290 ---titlecase2.mim------------------------------------------------------
291 (input-method en titlecase)
292 (description (_ "Titlecase letters"))
295 (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
296 ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
297 ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
298 ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
299 ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
300 ("z" "Z") ("ii" "İ")))
305 ;; Now we have one character in the preedit buffer. So, "@-2" is
306 ;; the character just before the inputting spot.
308 (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
309 (& (>= @-2 ?a) (<= @-2 ?z))
312 ;; If that character is A..Z, a..z, or İ, remember the
313 ;; character in the preedit in X and delete it.
315 (set X @-1) (delete @-)
317 ;; Then insert a proper lower case version of X.
320 (1 (set X (+ X 32)) (insert X))))))))
321 ----------------------------------------------------------------------
323 The above example contains a new action "delete". So, it is time to
324 explain more about the preedit buffer. The preedit buffer is a
325 temporary place to store a sequence of characters. In this buffer, the
326 input method keeps one position called the "current position". The current
327 position exists between two characters, at the beginning of the buffer, or at
328 the end of the buffer. The "insert" action inserts characters before
329 the current position. For instance, when your preedit
330 buffer contains "ab.c" ("." indicates the current position),
334 changes the buffer to "abxyz.c".
336 There are several predefined variables that represent a specific position in the
337 preedit buffer. They are:
339 @<, @=, @> -- the first, current, and last positions
340 @-, @+ -- the previous and the next positions
342 The format of the "delete" action is:
346 where POS is a positional variable beginning with @.
347 The above action deletes the characters between POS and
348 the current position. So, "(delete @-)" deletes one character before
349 the current position. The other examples of "delete" actions are:
351 (delete @+) -- delete the next character
352 (delete @<) -- delete all the preceding characters in the buffer
353 (delete @>) -- delete all the following characters in the buffer
355 You can change the current position by the "move" action as below:
357 (move @-) -- move the current position to the position before the
359 (move @<) -- move to the first position
361 Other positional variables work similarly.
362 Let's see how our new example works. Whatever a key event is, the
363 input method is in its only state, "init". Since an event of a lower letter
364 key is first handled by MAP-ACTIONs, every key is changed into the
365 corresponding uppercase and put into the preedit buffer. Now this character
366 can be retrieved with @-1.
368 How can we determine whether the new character should be
369 lowercase or not? We have to check the character before it, that is, @-2.
370 BRANCH-ACTIONs in the "init" state do the job.
372 It first checks if the character @-2 is between A and Z, or between a
373 and z, by the conditional below.
375 (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
376 (& (>= @-2 ?a) (<= @-2 ?z))
379 If not, there is nothing to do specially. If so, our new key
380 should be changed back into lowercase. The uppercase character is already
381 in the preedit buffer and we have to retrieve and remember it in the
386 and then delete that character by
390 Lastly we insert the character back in its lowercase form. The
391 problem here is that "İ" must be changed into "i", so we need another
392 conditional. The first branch
396 means that "if the character remembered in X is İ, "i" is inserted".
400 (1 (set X (+ X 32)) (insert X))
402 starts with "1", which is always resolved into nonzero, so this branch
403 is catchall. Actions in this branch add 32 to X itself and then insert
404 X. In other words, they make A...Z into a...z respectively and insert the
405 lowercase character into the preedit buffer. As the input method
406 reached the end of the BRANCH-ACTIONs, the character is commited.
408 This new input method always checks the character before the current
409 position, so "A Quick Blown Fox" will be successfully fixed to "A
410 Quick Brown Fox" by a backspace and <r>.