--- /dev/null
+/* -*- coding: utf-8; -*- */
+/*** @page m17nDBTutorial Tutorial for writing the m17n database
+This section contains tutorials for writing various database files of
+the m17n database.
+<li> @ref mdbTutorialIM "TutorialIM" -- Tutorial of input method
+@section mdbTutorialIM Tutorial of input method
+@subsection im-struct Structure of an input method file
+An input method is defined in a *.mim file with this format.
+(input-method LANG NAME)
+(description (_ "DESCRIPTION"))
+(title "TITLE-STRING")
+ ...)
+ ...)
+ ...)
+ ...)
+ ...)
+ ...)
+Lowercase letters and parentheses are literals, so they must be
+written as they are. Uppercase letters represent arbitrary strings.
+KEYSEQ specifies a sequence of keys in this format:
+where SYMBOLIC-KEY is the keysym value returned by the xev command.
+For instance
+ (n i)
+represents a key sequence of \<n\> and \<i\>.
+If all SYMBOLIC-KEYs are ASCII characters, you can use the short form
+ "ni"
+instead. Consult #mdbIM for Non-ASCII characters.
+Both MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
+The most common action is <span class="fragment">insert</span>, which is written as this:
+ (insert "TEXT")
+But as it is very frequently used, you can use the short form
+ "TEXT"
+If <span class="fragment">"TEXT"</span> contains only one character "C", you can write it as
+ (insert ?C)
+or even shorter as
+ ?C
+So the shortest notation for an action of inserting "a" is
+ ?a
+@subsection im-upcase Simple example of capslock
+Here is a simple example of an input method that works as CapsLock.
+(input-method en capslock)
+(description (_ "Upcase all lowercase letters"))
+(title "a->A")
+ (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
+ ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
+ ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
+ ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
+ ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
+ ("z" "Z")))
+ (init (toupper)))
+When this input method is activated, it is in the initial condition of
+the first state (in this case, the only state <span class="fragment">init</span>). In the
+initial condition, no key is being processed and no action is
+suspended. When the input method receives a key event \<a\>, it
+searches branches in the current state for a rule that matches \<a\>
+and finds one in the map <span class="fragment">toupper</span>. Then it executes MAP-ACTIONs
+(in this case, just inserting "A" in the preedit buffer). After all
+MAP-ACTIONs have been executed, the input method shifts to the initial
+condition of the current state.
+The shift to <em>the initial condition of the first state</em> has a special
+meaning; it commits all characters in the preedit buffer then clears
+the preedit buffer.
+As a result, "A" is given to the application program.
+When a key event does not match with any rule in the current state,
+that event is unhandled and given back to the application program.
+Turkish users may want to extend the above example for "İ" (U+0130:
+LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the
+key sequence \<i\> \<i\> for that character is convenient. So, he
+will add this rule in <span class="fragment">toupper</span>.
+ ("ii" "İ")
+However, we already have the following rule:
+ ("i" "I")
+What will happen when a key event \<i\> is sent to the input method?
+No problem. When the input method receives \<i\>, it inserts "I" in the
+preedit buffer. It knows that there is another rule that may
+match the additional key event \<i\>. So, after inserting "I", it
+suspends the normal behavior of shifting to the initial condition, and
+waits for another key. Thus, the user sees "I" with underline, which
+indicates it is not yet committed.
+When the input method receives the next \<i\>, it cancels the effects
+done by the rule for the previous "i" (in this case, the preedit buffer is
+cleared), and executes MAP-ACTIONs of the rule for "ii". So, "İ" is
+inserted in the preedit buffer. This time, as there are no other rules
+that match with an additional key, it shifts to the initial condition
+of the current state, which leads to commit "İ".
+Then, what will happen when the next key event is \<a\> instead of \<i\>?
+No problem, either.
+The input method knows that there are no rules that match the \<i\> \<a\> key
+sequence. So, when it receives the next \<a\>, it executes the
+suspended behavior (i.e. shifting to the initial condition), which
+leads to commit "I". Then the input method tries to handle \<a\> in
+the current state, which leads to commit "A".
+So far, we have explained MAP-ACTION, but not
+BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION.
+It is executed only after a matching rule has been determined and the
+corresponding MAP-ACTIONs have been executed. A typical use of
+BRANCH-ACTION is to shift to a different state.
+To see this effect, let us modify the current input method to upcase only
+word-initial letters (i.e. to capitalize). For that purpose,
+we modify the "init" state as this:
+ (init
+ (toupper (shift non-upcase)))
+Here <span class="fragment">(shift non-upcase)</span> is an action to shift to the new state
+<span class="fragment">non-upcase</span>, which has two branches as below:
+ (non-upcase
+ (lower)
+ (nil (shift init)))
+The first branch is simple. We can define the new map <span class="fragment">lower</span> as the
+following to insert lowercase letters as they are.
+ ...
+ (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
+ ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
+ ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
+ ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
+ ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
+ ("z" "z")))
+The second branch has a special meaning. The map name <span class="fragment">nil</span> means
+that it matches with any key event that does not match any rules in the
+other maps in the current state. In addition, it does not
+consume any key event. We will show the full code of the new input
+method before explaining how it works.
+(input-method en titlecase)
+(description (_ "Titlecase letters"))
+(title "abc->Abc")
+ (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
+ ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
+ ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
+ ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
+ ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
+ ("z" "Z") ("ii" "İ"))
+ (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
+ ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
+ ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
+ ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
+ ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
+ ("z" "z")))
+ (init
+ (toupper (shift non-upcase)))
+ (non-upcase
+ (lower (commit))
+ (nil (shift init))))
+Let's see what happens when the user types the key sequence \<a\> \<b\> \< \>.
+Upon \<a\>, "A" is committed and the state shifts to <span class="fragment">non-upcase</span>.
+So, the next \<b\> is handled in the <span class="fragment">non-upcase</span> state.
+As it matches a
+rule in the map <span class="fragment">lower</span>, "b" is inserted in the preedit buffer and it
+is committed explicitly by the "commit" command in BRANCH-ACTION. After
+that, the input method is still in the <span class="fragment">non-upcase</span> state. So the next \< \>
+is also handled in <span class="fragment">non-upcase</span>. For this time, no rule in this state
+matches it. Thus the branch <span class="fragment">(nil (shift init))</span> is selected and the
+state is shifted to <span class="fragment">init</span>. Please note that \< \> is not yet
+handled because the map <span class="fragment">nil</span> does not consume any key event.
+So, the input method tries to handle it in the <span class="fragment">init</span> state. Again no
+rule matches it. Therefore, that event is given back to the application
+program, which usually inserts a space for that.
+When you type "a quick blown fox" with this input method, you get "A
+Quick Blown Fox". OK, you find a typo in "blown", which should be
+"brown". To correct it, you probably move the cursor after "l" and type
+\<Backspace\> and \<r\>. However, if the current input method is still
+active, a capital "R" is inserted. It is not a sophisticated
+@subsection im-surrounding-text Example of utilizing surrounding text support
+To make the input method work well also in such a case, we must use
+"surrounding text support". It is a way to check characters around
+the inputting spot and delete them if necessary. Note that
+this facility is available only with Gtk+ applications and Qt
+applications. You cannot use it with applications that use XIM
+to communicate with an input method.
+Before explaining how to utilize "surrounding text support", you must
+understand how to use variables, arithmetic comparisons, and
+conditional actions.
+At first, any symbol (except for several preserved ones) used as ARG
+of an action is treated as a variable. For instance, the commands
+ (set X 32) (insert X)
+set the variable <span class="fragment">X</span> to integer value 32, then insert a character
+whose Unicode character code is 32 (i.e. SPACE).
+The second argument of the <span class="fragment">set</span> action can be an expression of this form:
+Both ARG1 and ARG2 can be an expression. So,
+ (set X (+ (* Y 32) Z))
+sets <span class="fragment">X</span> to the value of <span class="fragment">Y * 32 + Z</span>.
+We have the following arithmetic/bitwise OPERANDs (require two arguments):
+ + - * / & |
+these relational OPERANDs (require two arguments):
+ == <= >= < >
+and this logical OPERAND (requires one argument):
+ !
+For surrounding text support, we have these preserved variables:
+ @-0, @-N, @+N (N is a positive integer)
+The values of them are predefined as below and can not be altered.
+<li> <span class="fragment">@-0</span>
+-1 if surrounding text is supported, -2 if not.
+<li> <span class="fragment">@-N</span>
+The Nth previous character in the preedit buffer. If there are only M
+(M<N) previous characters in it, the value is the (N-M)th previous
+character from the inputting spot.
+<li> <span class="fragment">@+N</span>
+The Nth following character in the preedit buffer. If there are only M
+(M<N) following characters in it, the value is the (N-M)th following
+character from the inputting spot.
+So, provided that you have this context:
+ ABC|def|GHI
+("def" is in the preedit buffer, two "|"s indicate borders between the
+preedit buffer and the surrounding text) and your current position in
+the preedit buffer is between "d" and "e", you get these values:
+ @-3 -- ?B
+ @-2 -- ?C
+ @-1 -- ?d
+ @+1 -- ?e
+ @+2 -- ?f
+ @+3 -- ?G
+Next, you have to understand the conditional action of this form:
+ (cond
+ ...)
+where EXPRn are expressions. When an input method executes this
+action, it resolves the values of EXPRn one by one from the first branch.
+If the value of EXPRn is resolved into nonzero, the corresponding
+actions are executed.
+Now you are ready to write a new version of the input method "Titlecase".
+(input-method en titlecase2)
+(description (_ "Titlecase letters"))
+(title "abc->Abc")
+ (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
+ ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
+ ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
+ ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
+ ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
+ ("z" "Z") ("ii" "İ")))
+ (init
+ (toupper
+ ;; Now we have exactly one uppercase character in the preedit
+ ;; buffer. So, "@-2" is the character just before the inputting
+ ;; spot.
+ (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
+ (& (>= @-2 ?a) (<= @-2 ?z))
+ (= @-2 ?İ))
+ ;; If the character before the inputting spot is A..Z,
+ ;; a..z, or İ, remember the only character in the preedit
+ ;; buffer in the variable X and delete it.
+ (set X @-1) (delete @-)
+ ;; Then insert the lowercase version of X.
+ (cond ((= X ?İ) "i")
+ (1 (set X (+ X 32)) (insert X))))))))
+The above example contains the new action <span class="fragment">delete</span>. So, it is time
+to explain more about the preedit buffer. The preedit buffer is a
+temporary place to store a sequence of characters. In this buffer,
+the input method keeps a position called the "current position". The
+current position exists between two characters, at the beginning of
+the buffer, or at the end of the buffer. The <span class="fragment">insert</span> action inserts
+characters before the current position. For instance, when your
+preedit buffer contains "ab.c" ("." indicates the current position),
+ (insert "xyz")
+changes the buffer to "abxyz.c".
+There are several predefined variables that represent a specific position in the
+preedit buffer. They are:
+<li> <span class="fragment">@@<, @=, @@></span>
+The first, current, and last positions.
+<li> <span class="fragment">@-, @+</span>
+The previous and the next positions.
+The format of the <span class="fragment">delete</span> action is this:
+ (delete POS)
+where POS is a predefined positional variable.
+The above action deletes the characters between POS and
+the current position. So, <span class="fragment">(delete @-)</span> deletes one character before
+the current position. The other examples of <span class="fragment">delete</span> include the followings:
+ (delete @+) ; delete the next character
+ (delete @<) ; delete all the preceding characters in the buffer
+ (delete @>) ; delete all the following characters in the buffer
+You can change the current position using the <span class="fragment">move</span> action as below:
+ (move @-) ; move the current position to the position before the
+ previous character
+ (move @<) ; move to the first position
+Other positional variables work similarly.
+Let's see how our new example works. Whatever a key event is, the
+input method is in its only state, <span class="fragment">init</span>. Since an event of a lower letter
+key is firstly handled by MAP-ACTIONs, every key is changed into the
+corresponding uppercase and put into the preedit buffer. Now this character
+can be accessed with <span class="fragment">@-1</span>.
+How can we tell whether the new character should be a lowercase or an
+uppercase? We can do so by checking the character before it, i.e.
+<span class="fragment">@-2</span>. BRANCH-ACTIONs in the <span class="fragment">init</span> state do the job.
+It first checks if the character <span class="fragment">@-2</span> is between A to Z, between
+a to z, or İ by the conditional below.
+ (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
+ (& (>= @-2 ?a) (<= @-2 ?z))
+ (= @-2 ?İ))
+If not, there is nothing to do specially. If so, our new key should
+be changed back into lowercase. Since the uppercase character is
+already in the preedit buffer, we retrieve and remember it in the
+variable <span class="fragment">X</span> by
+ (set X @-1)
+and then delete that character by
+ (delete @-)
+Lastly we re-insert the character in its lowercase form. The
+problem here is that "İ" must be changed into "i", so we need another
+conditional. The first branch
+ ((= X ?İ) "i")
+means that "if the character remembered in X is 'İ', 'i' is inserted".
+The second branch
+ (1 (set X (+ X 32)) (insert X))
+starts with "1", which is always resolved into nonzero, so this branch
+is a catchall. Actions in this branch increase <span class="fragment">X</span> by 32, then
+insert <span class="fragment">X</span>. In other words, they change A...Z into a...z
+respectively and insert the resulting lowercase character into the
+preedit buffer. As the input method reaches the end of the
+BRANCH-ACTIONs, the character is commited.
+This new input method always checks the character before the current
+position, so "A Quick Blown Fox" will be successfully fixed to "A
+Quick Brown Fox" by the key sequence \<BackSpace\> \<r\>.