FORMATS/IM-tut.txt

   1 An input method is defined in a *.mim file with this format.
   2
   3 (input-method LANG NAME)
   4
   5 (description (_ "DESCRIPTION"))
   6
   7 (title "TITLE-STRING")
   8
   9 (map
  10   (MAP-NAME
  11     (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
  12     (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
  13     ...)
  14   (MAP-NAME
  15     (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
  16     (KEYSEQ MAP-ACTION MAP-ACTION ...)        <- rule
  17     ...)
  18   ...)
  19
  20 (state
  21   (STATE-NAME
  22     (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...)   <- branch
  23     ...)
  24   (STATE-NAME
  25     (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...)   <- branch
  26     ...)
  27   ...)
  28
  29 Lowercase letters and parentheses are literals, so they must be
  30 written as they are.  Uppercase letters represent arbitrary strings.
  31
  32 KEYSEQ specifies a sequence of keys in this format:
  33
  34   (SYMBOLIC-KEY BYMBOLIC-KEY ...)
  35
  36 where SYMBOLIC-KEY is the keysym
  37 For instance
  38
  39   (n i)
  40
  41 represents a key sequence of <n> and <i>.  If all SYMBOLIC-KEYs are
  42 ASCII characters, you can use the short form
  43
  44   "ni"
  45
  46 instead.  Consult the file IM.txt for Non-ASCII characters.
  47
  48 MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
  49
  50   (ACTION ARG ARG ...)
  51
  52 The most common action is "insert" which can be written as this:
  53
  54   (insert "TEXT")
  55
  56 But as it is very frequently used, you can use the short form
  57
  58   "TEXT"
  59
  60 When TEXT contains only one character <C>, you can write it as
  61
  62   (insert ?C)
  63
  64 or just
  65
  66   ?C
  67
  68 Therefore the shortest notation for an action of inserting <a> is
  69
  70   ?a
  71
  72 Here is a simple example:
  73
  74 ---upcase.mim---------------------------------------------------------
  75 (input-method en upcase)
  76 (description (_ "Upcase all lowercase letters"))
  77 (title "a->A")
  78 (map
  79   (toupper ("a" "A") ("b" "B") ... ("z" "Z")))
  80 (state
  81   (init (toupper)))
  82 ----------------------------------------------------------------------
  83
  84 When this input method is activated, it is in the initial condition of
  85 the first state (in this case, it is "init").  In the initial condition,
  86 no key is being processed and no action is suspended.  When it
  87 receives a key event <a>, it searches branches in the current state
  88 for a rule that matches with <a> and finds one in the map "toupper".
  89 Then it executes MAP-ACTIONs (in this case, just inserting "A" in the
  90 preedit buffer).  After all MAP-ACTIONs are executed, the input
  91 method shifts to the initial condition of the current state.
  92
  93 The shift to "the initial condition of the first state" has a special
  94 meaning; it commits all characters in the preedit buffer then clears
  95 the preedit buffer.
  96
  97 As a result, "A" is given to an application program.
  98
  99 When a key event does not match with any rule in the current state,
 100 that event is unhandled and given back to an application program.
 101
 102 A Turkish user may want to extend the above example for "İ" (U+0130:
 103 LATIN CAPITAL LETTER I WITH DOT ABOVE).  It seems that assigning the
 104 key sequence <i> <i> for that character is convenient.  So, he will
 105 add this rule in "toupper".
 106
 107     ("ii" "İ")
 108
 109 But we already have this rule too:
 110
 111     ("i" "I")
 112
 113 What will happen when a key event <i> is sent to the input method?
 114
 115 No problem.  When the input method receives <i>, it inserts "I" in the
 116 preedit buffer.  But, it knows that there is another rule that may
 117 match with the additional key event <i>.  So, after inserting "I", it
 118 suspends the normal behavior of shifting to the initial condition, and
 119 waits for another key.  Thus, the user sees "I" with underline, which
 120 indicates it is not yet committed.
 121
 122 When the input method receives the next <i>, it cancels the effects
 123 done by the rule for the previous "i" (in this case, the preedit buffer is
 124 cleared), and executes MAP-ACTIONs of the rule for "ii".  So, "İ" is
 125 inserted in the preedit buffer.  This time, as there are no other rules
 126 that match with an additional key, it shifts to the initial condition
 127 of the current state, and commits "İ".
 128
 129 Then, what will happen when the next key event is <a> instead of <i>?
 130
 131 No problem, either.
 132
 133 The input method knows that there are no rules that match the <i> <a> key
 134 sequence.  So, when it receives the next <a>, it executes the
 135 suspended behavior (i.e. shifting to the initial condition), which
 136 leads to commit "S".  Then the input method tries to handle <a> in the current
 137 state, which leads to commit "A".
 138
 139 So far, we have explained MAP-ACTION, but not
 140 BRANCH-ACTION.  The format of BRANCH-ACTION is the same as that of MAP-ACTION.
 141 It is executed only after a matching rule was determined and the
 142 corresponding MAP-ACTIONs were executed.  A typical use of
 143 BRANCH-ACTION is to shift to a different state.
 144
 145 To see this effect, let us modify the current input method to upcase a
 146 letter only at a beginning of a word (i.e. capitalizing).  For that purpose,
 147 we modify the "init" state as this:
 148
 149   (init
 150     (toupper (shift non-upcase)))
 151
 152 Here "(shift non-upcase)" is an action to shift to the new state
 153 "non-upcase", which has two branches as below:
 154
 155   (non-upcase
 156     (lower)
 157     (nil (shift init)))
 158
 159 The first branch is simple.  We can define the new map "lower" as the
 160 following to insert lowercase letters as they are.
 161
 162 (map
 163   ...
 164   (lower ("a" "a") ("b" "b") ... ("z" "z")))
 165
 166 The second branch has a special meaning.  The map name "nil" means
 167 that it matches with any key event that does not match any rules in the
 168 other maps in the same state.  In addition, it does not
 169 consume any key event.  We will show the full code of the new input
 170 method before explaining how it works.
 171
 172 ---titlecase.mim------------------------------------------------------
 173 (input-method en titlecase)
 174 (description (_ "Titlecase letters"))
 175 (title "abc->Abc")
 176 (map
 177   (toupper ("a" "A") ("b" "B") ... ("z" "Z") ("ii" "İ"))
 178   (lower ("a" "a") ("b" "b") ... ("z" "z")))
 179 (state
 180   (init
 181     (toupper (shift non-upcase)))
 182   (non-upcase
 183     (lower (commit))
 184     (nil (shift init))))
 185 ----------------------------------------------------------------------
 186
 187 Let's see what happens when the user types the keys <a> <b> < >.
 188
 189 Upon <a>, "A" is committed and the state shifts to "non-upcase".
 190 So, the next <b> is handled in the "non-upcase" state.  As it matches with a
 191 rule in the map "lower", "b" is inserted in the preedit buffer and it
 192 is committed explicitly by the "commit" command in BRANCH-ACTION.  After
 193 that, the input method is still in "non-upcase" state.  So the next < >
 194 is also handled in "non-upcase".  This time, as no rule in this state
 195 matches with it, the branch "(nil (shift init))" is selected, and the
 196 state is changed to init.  Please note that < > is not yet handled.
 197 So, the input method tries to handle it in the "init" state.  Again no
 198 rule matches with it.  Therefore, that event is given back to an application
 199 program, which usually inserts a space for that.
 200
 201 When you type "a quick blown fox" with this input method, you get "A
 202 Quick Blown Fox".  OK, you find a typo in "blown", it should be
 203 "brown".  To correct it, you move the cursor after "l" and type
 204 Backspace and <r>.  However, if the current input method is still
 205 active, a capital "R" is inserted.  It is not a sophisticated
 206 behavior.
 207
 208 To make the input method work well also in such a case, we must use
 209 "surrounding text support".  It is a way to check characters around
 210 the inputting spot and delete them if necessary.  Please note that
 211 this facility is available only with Gtk+ applications and Qt
 212 applications.  You cannot use it with applications that use XIM
 213 to communicate with an input method.
 214
 215 Before explaining how to utilize "surrounding text support", you must
 216 understand how to use variables, arithmetic comparison, and
 217 conditional actions.
 218
 219 At first, any symbol (except for several preserved ones) used as ARG
 220 of an action is treated as a variable.  For instance, the commands
 221
 222   (set X 32) (insert X)
 223
 224 set the variable "X" to integer value 32, then insert a character
 225 whose Unicode character code is 32 (i.e. SPACE).
 226
 227 The second argument of the "set" action can be an expression of this form:
 228
 229   (OPERAND ARG1 [ARG2])
 230
 231 Both ARG1 and ARG2 can be an expression.  So,
 232
 233   (set X (+ (* Y 32) Z)) --- set "X" to the value of "Y*32+Z"
 234
 235 We have the following arithmetic/bitwise OPERANDs (require two arguments):
 236
 237   + - * / & |
 238
 239 these relational OPERANDs (require two arguments):
 240
 241   == <= >= < >
 242
 243 and this logical OPERAND (require one argument):
 244
 245   !
 246
 247 For surrounding text support, we have these preserved variables:
 248
 249   @-0, @-N, @+N (N is a natural number)
 250
 251 The values of them are predefined as below and can not be altered.
 252
 253   @-0  -- -1 if surrounding text is supported, -2 if not.
 254   @-N  -- The Nth previous character in the preedit buffer.  If there is only
 255           M (M<N) previous characters in it, the value is the (N-M)th previous
 256           character from the inputting spot.
 257   @+N  -- The Nth following character in the preedit buffer.  If there is only
 258           M (M<N) following characters in it, the value is the (N-M)th following
 259           character from the inputting spot.
 260
 261 So, provided that you have this context:
 262
 263   ABC|def|GHI
 264
 265 ("def" is in the preedit buffer, two "|"s indicate borders between the
 266 preedit buffer and the surrounding text) and your current position in
 267 the preedit buffer is between "d" and "e", we have these values:
 268
 269   @-3 -- ?B
 270   @-2 -- ?C
 271   @-1 -- ?d
 272   @+1 -- ?e
 273   @+2 -- ?f
 274   @+3 -- ?G
 275
 276 Next, you have to understand a conditional action of this form:
 277
 278   (cond
 279     (EXPR1 ACTION ACTION ...)
 280     (EXPR2 ACTION ACTION ...)
 281     ...)
 282
 283 where EXPRn are expressions.  When an input method executes this
 284 action, it resolves the values of EXPRn one by one from the first branch.
 285 If the value of EXPRn is resolved into nonzero, the corresponding
 286 actions are executed.
 287
 288 Now you are ready to write a new version of the input method "Titlecase".
 289
 290 ---titlecase2.mim------------------------------------------------------
 291 (input-method en titlecase)
 292 (description (_ "Titlecase letters"))
 293 (title "abc->Abc")
 294 (map
 295   (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
 296            ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
 297            ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
 298            ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
 299            ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
 300            ("z" "Z") ("ii" "İ")))
 301 (state
 302   (init
 303     (toupper
 304
 305      ;; Now we have one character in the preedit buffer.  So, "@-2" is
 306      ;; the character just before the inputting spot.
 307
 308      (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
 309                (& (>= @-2 ?a) (<= @-2 ?z))
 310                (= @-2 ?İ))
 311
 312             ;; If that character is A..Z, a..z, or İ, remember the
 313             ;; character in the preedit in X and delete it.
 314
 315             (set X @-1) (delete @-)
 316
 317             ;; Then insert a proper lower case version of X.
 318
 319             (cond ((= X ?İ) "i")
 320                   (1 (set X (+ X 32)) (insert X))))))))
 321 ----------------------------------------------------------------------
 322
 323 The above example contains a new action "delete".  So, it is time to
 324 explain more about the preedit buffer.  The preedit buffer is a
 325 temporary place to store a sequence of characters.  In this buffer, the
 326 input method keeps one position called the "current position".  The current
 327 position exists between two characters, at the beginning of the buffer, or at
 328 the end of the buffer.  The "insert" action inserts characters before
 329 the current position.  For instance, when your preedit
 330 buffer contains "ab.c" ("." indicates the current position),
 331
 332   (insert "xyz")
 333
 334 changes the buffer to "abxyz.c".
 335
 336 There are several predefined variables that represent a specific position in the
 337 preedit buffer.  They are:
 338
 339   @<, @=, @> -- the first, current, and last positions
 340   @-, @+     -- the previous and the next positions
 341
 342 The format of the "delete" action is:
 343
 344   (delete POS)
 345
 346 where POS is a positional variable beginning with @.
 347 The above action deletes the characters between POS and
 348 the current position.  So, "(delete @-)" deletes one character before
 349 the current position.  The other examples of "delete" actions are:
 350
 351   (delete @+)  -- delete the next character
 352   (delete @<)  -- delete all the preceding characters in the buffer
 353   (delete @>)  -- delete all the following characters in the buffer
 354
 355 You can change the current position by the "move" action as below:
 356
 357   (move @-)  -- move the current position to the position before the
 358                 previous character
 359   (move @<)  -- move to the first position
 360
 361 Other positional variables work similarly.
 362 Let's see how our new example works.  Whatever a key event is, the
 363 input method is in its only state, "init".  Since an event of a lower letter
 364 key is first handled by MAP-ACTIONs, every key is changed into the
 365 corresponding uppercase and put into the preedit buffer.  Now this character
 366 can be retrieved with @-1.
 367
 368 How can we determine whether the new character should be
 369 lowercase or not?  We have to check the character before it, that is, @-2.
 370 BRANCH-ACTIONs in the "init" state do the job.
 371
 372 It first checks if the character @-2 is between A and Z, or between a
 373 and z, by the conditional below.
 374
 375      (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
 376                (& (>= @-2 ?a) (<= @-2 ?z))
 377                (= @-2 ?İ))
 378
 379 If not, there is nothing to do specially.  If so, our new key
 380 should be changed back into lowercase.  The uppercase character is already
 381 in the preedit buffer and we have to retrieve and remember it in the
 382 variable X by
 383
 384     (set X @-1)
 385
 386 and then delete that character by
 387
 388     (delete @-)
 389
 390 Lastly we insert the character back in its lowercase form.  The
 391 problem here is that "İ" must be changed into "i", so we need another
 392 conditional.  The first branch
 393
 394     ((= X ?İ) "i")
 395
 396 means that "if the character remembered in X is İ, "i" is inserted".
 397
 398 The second branch
 399
 400     (1 (set X (+ X 32)) (insert X))
 401
 402 starts with "1", which is always resolved into nonzero, so this branch
 403 is catchall.  Actions in this branch add 32 to X itself and then insert
 404 X.  In other words, they make A...Z into a...z respectively and insert the
 405 lowercase character into the preedit buffer.  As the input method
 406 reached the end of the BRANCH-ACTIONs, the character is commited.
 407
 408 This new input method always checks the character before the current
 409 position, so "A Quick Blown Fox" will be successfully fixed to "A
 410 Quick Brown Fox" by a backspace and <r>.