+++ /dev/null
-/* -*- coding: utf-8; -*- */
-/*** @page m17nDBTutorial Tutorial for writing the m17n database
-
-This section contains tutorials for writing various database files of
-the m17n database.
-
-<ul>
-<li> @ref mdbTutorialIM "TutorialIM" -- Tutorial of input method
-</ul>
-*/
-/***
-
-@section mdbTutorialIM Tutorial of input method
-
-@subsection im-struct Structure of an input method file
-
-An input method is defined in a *.mim file with this format.
-
-@verbatim
-(input-method LANG NAME)
-
-(description (_ "DESCRIPTION"))
-
-(title "TITLE-STRING")
-
-(map
- (MAP-NAME
- (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
- (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
- ...)
- (MAP-NAME
- (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
- (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule
- ...)
- ...)
-
-(state
- (STATE-NAME
- (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
- ...)
- (STATE-NAME
- (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
- ...)
- ...)
-@endverbatim
-Lowercase letters and parentheses are literals, so they must be
-written as they are. Uppercase letters represent arbitrary strings.
-
-KEYSEQ specifies a sequence of keys in this format:
-@verbatim
- (SYMBOLIC-KEY SYMBOLIC-KEY ...)
-@endverbatim
-where SYMBOLIC-KEY is the keysym value returned by the xev command.
-For instance
-@verbatim
- (n i)
-@endverbatim
-represents a key sequence of \<n\> and \<i\>.
-If all SYMBOLIC-KEYs are ASCII characters, you can use the short form
-@verbatim
- "ni"
-@endverbatim
-instead. Consult #mdbIM for Non-ASCII characters.
-
-Both MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
-@verbatim
- (ACTION ARG ARG ...)
-@endverbatim
-The most common action is <span class="fragment">insert</span>, which is written as this:
-@verbatim
- (insert "TEXT")
-@endverbatim
-But as it is very frequently used, you can use the short form
-@verbatim
- "TEXT"
-@endverbatim
-If <span class="fragment">"TEXT"</span> contains only one character "C", you can write it as
-@verbatim
- (insert ?C)
-@endverbatim
-or even shorter as
-@verbatim
- ?C
-@endverbatim
-So the shortest notation for an action of inserting "a" is
-@verbatim
- ?a
-@endverbatim
-
-@subsection im-upcase Simple example of capslock
-
-Here is a simple example of an input method that works as CapsLock.
-
-@verbatim
-(input-method en capslock)
-(description (_ "Upcase all lowercase letters"))
-(title "a->A")
-(map
- (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
- ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
- ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
- ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
- ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
- ("z" "Z")))
-(state
- (init (toupper)))
-@endverbatim
-
-When this input method is activated, it is in the initial condition of
-the first state (in this case, the only state <span class="fragment">init</span>). In the
-initial condition, no key is being processed and no action is
-suspended. When the input method receives a key event \<a\>, it
-searches branches in the current state for a rule that matches \<a\>
-and finds one in the map <span class="fragment">toupper</span>. Then it executes MAP-ACTIONs
-(in this case, just inserting "A" in the preedit buffer). After all
-MAP-ACTIONs have been executed, the input method shifts to the initial
-condition of the current state.
-
-The shift to <em>the initial condition of the first state</em> has a special
-meaning; it commits all characters in the preedit buffer then clears
-the preedit buffer.
-
-As a result, "A" is given to the application program.
-
-When a key event does not match with any rule in the current state,
-that event is unhandled and given back to the application program.
-
-Turkish users may want to extend the above example for "İ" (U+0130:
-LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the
-key sequence \<i\> \<i\> for that character is convenient. So, he
-will add this rule in <span class="fragment">toupper</span>.
-
-@verbatim
- ("ii" "İ")
-@endverbatim
-
-However, we already have the following rule:
-
-@verbatim
- ("i" "I")
-@endverbatim
-
-What will happen when a key event \<i\> is sent to the input method?
-
-No problem. When the input method receives \<i\>, it inserts "I" in the
-preedit buffer. It knows that there is another rule that may
-match the additional key event \<i\>. So, after inserting "I", it
-suspends the normal behavior of shifting to the initial condition, and
-waits for another key. Thus, the user sees "I" with underline, which
-indicates it is not yet committed.
-
-When the input method receives the next \<i\>, it cancels the effects
-done by the rule for the previous "i" (in this case, the preedit buffer is
-cleared), and executes MAP-ACTIONs of the rule for "ii". So, "İ" is
-inserted in the preedit buffer. This time, as there are no other rules
-that match with an additional key, it shifts to the initial condition
-of the current state, which leads to commit "İ".
-
-Then, what will happen when the next key event is \<a\> instead of \<i\>?
-
-No problem, either.
-
-The input method knows that there are no rules that match the \<i\> \<a\> key
-sequence. So, when it receives the next \<a\>, it executes the
-suspended behavior (i.e. shifting to the initial condition), which
-leads to commit "I". Then the input method tries to handle \<a\> in
-the current state, which leads to commit "A".
-
-So far, we have explained MAP-ACTION, but not
-BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION.
-It is executed only after a matching rule has been determined and the
-corresponding MAP-ACTIONs have been executed. A typical use of
-BRANCH-ACTION is to shift to a different state.
-
-To see this effect, let us modify the current input method to upcase only
-word-initial letters (i.e. to capitalize). For that purpose,
-we modify the "init" state as this:
-
-@verbatim
- (init
- (toupper (shift non-upcase)))
-@endverbatim
-
-Here <span class="fragment">(shift non-upcase)</span> is an action to shift to the new state
-<span class="fragment">non-upcase</span>, which has two branches as below:
-
-@verbatim
- (non-upcase
- (lower)
- (nil (shift init)))
-@endverbatim
-
-The first branch is simple. We can define the new map <span class="fragment">lower</span> as the
-following to insert lowercase letters as they are.
-
-@verbatim
-(map
- ...
- (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
- ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
- ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
- ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
- ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
- ("z" "z")))
-@endverbatim
-
-The second branch has a special meaning. The map name <span class="fragment">nil</span> means
-that it matches with any key event that does not match any rules in the
-other maps in the current state. In addition, it does not
-consume any key event. We will show the full code of the new input
-method before explaining how it works.
-
-@verbatim
-(input-method en titlecase)
-(description (_ "Titlecase letters"))
-(title "abc->Abc")
-(map
- (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
- ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
- ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
- ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
- ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
- ("z" "Z") ("ii" "İ"))
- (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e")
- ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j")
- ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o")
- ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t")
- ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y")
- ("z" "z")))
-(state
- (init
- (toupper (shift non-upcase)))
- (non-upcase
- (lower (commit))
- (nil (shift init))))
-@endverbatim
-
-Let's see what happens when the user types the key sequence \<a\> \<b\> \< \>.
-Upon \<a\>, "A" is committed and the state shifts to <span class="fragment">non-upcase</span>.
-So, the next \<b\> is handled in the <span class="fragment">non-upcase</span> state.
-As it matches a
-rule in the map <span class="fragment">lower</span>, "b" is inserted in the preedit buffer and it
-is committed explicitly by the "commit" command in BRANCH-ACTION. After
-that, the input method is still in the <span class="fragment">non-upcase</span> state. So the next \< \>
-is also handled in <span class="fragment">non-upcase</span>. For this time, no rule in this state
-matches it. Thus the branch <span class="fragment">(nil (shift init))</span> is selected and the
-state is shifted to <span class="fragment">init</span>. Please note that \< \> is not yet
-handled because the map <span class="fragment">nil</span> does not consume any key event.
-So, the input method tries to handle it in the <span class="fragment">init</span> state. Again no
-rule matches it. Therefore, that event is given back to the application
-program, which usually inserts a space for that.
-
-When you type "a quick blown fox" with this input method, you get "A
-Quick Blown Fox". OK, you find a typo in "blown", which should be
-"brown". To correct it, you probably move the cursor after "l" and type
-\<Backspace\> and \<r\>. However, if the current input method is still
-active, a capital "R" is inserted. It is not a sophisticated
-behavior.
-
-@subsection im-surrounding-text Example of utilizing surrounding text support
-
-To make the input method work well also in such a case, we must use
-"surrounding text support". It is a way to check characters around
-the inputting spot and delete them if necessary. Note that
-this facility is available only with Gtk+ applications and Qt
-applications. You cannot use it with applications that use XIM
-to communicate with an input method.
-
-Before explaining how to utilize "surrounding text support", you must
-understand how to use variables, arithmetic comparisons, and
-conditional actions.
-
-At first, any symbol (except for several preserved ones) used as ARG
-of an action is treated as a variable. For instance, the commands
-
-@verbatim
- (set X 32) (insert X)
-@endverbatim
-
-set the variable <span class="fragment">X</span> to integer value 32, then insert a character
-whose Unicode character code is 32 (i.e. SPACE).
-
-The second argument of the <span class="fragment">set</span> action can be an expression of this form:
-
-@verbatim
- (OPERAND ARG1 [ARG2])
-@endverbatim
-
-Both ARG1 and ARG2 can be an expression. So,
-
-@verbatim
- (set X (+ (* Y 32) Z))
-@endverbatim
-
-sets <span class="fragment">X</span> to the value of <span class="fragment">Y * 32 + Z</span>.
-
-We have the following arithmetic/bitwise OPERANDs (require two arguments):
-
-@verbatim
- + - * / & |
-@endverbatim
-
-these relational OPERANDs (require two arguments):
-
-@verbatim
- == <= >= < >
-@endverbatim
-
-and this logical OPERAND (requires one argument):
-
-@verbatim
- !
-@endverbatim
-
-For surrounding text support, we have these preserved variables:
-
-@verbatim
- @-0, @-N, @+N (N is a positive integer)
-@endverbatim
-
-The values of them are predefined as below and can not be altered.
-
-<ul>
-<li> <span class="fragment">@-0</span>
-
--1 if surrounding text is supported, -2 if not.
-
-<li> <span class="fragment">@-N</span>
-
-The Nth previous character in the preedit buffer. If there are only M
-(M<N) previous characters in it, the value is the (N-M)th previous
-character from the inputting spot.
-
-<li> <span class="fragment">@+N</span>
-
-The Nth following character in the preedit buffer. If there are only M
-(M<N) following characters in it, the value is the (N-M)th following
-character from the inputting spot.
-
-</ul>
-
-So, provided that you have this context:
-
-@verbatim
- ABC|def|GHI
-@endverbatim
-
-("def" is in the preedit buffer, two "|"s indicate borders between the
-preedit buffer and the surrounding text) and your current position in
-the preedit buffer is between "d" and "e", you get these values:
-
-@verbatim
- @-3 -- ?B
- @-2 -- ?C
- @-1 -- ?d
- @+1 -- ?e
- @+2 -- ?f
- @+3 -- ?G
-@endverbatim
-
-Next, you have to understand the conditional action of this form:
-
-@verbatim
- (cond
- (EXPR1 ACTION ACTION ...)
- (EXPR2 ACTION ACTION ...)
- ...)
-@endverbatim
-
-where EXPRn are expressions. When an input method executes this
-action, it resolves the values of EXPRn one by one from the first branch.
-If the value of EXPRn is resolved into nonzero, the corresponding
-actions are executed.
-
-Now you are ready to write a new version of the input method "Titlecase".
-
-@verbatim
-(input-method en titlecase2)
-(description (_ "Titlecase letters"))
-(title "abc->Abc")
-(map
- (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E")
- ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J")
- ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
- ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
- ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
- ("z" "Z") ("ii" "İ")))
-(state
- (init
- (toupper
-
- ;; Now we have exactly one uppercase character in the preedit
- ;; buffer. So, "@-2" is the character just before the inputting
- ;; spot.
-
- (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
- (& (>= @-2 ?a) (<= @-2 ?z))
- (= @-2 ?İ))
-
- ;; If the character before the inputting spot is A..Z,
- ;; a..z, or İ, remember the only character in the preedit
- ;; buffer in the variable X and delete it.
-
- (set X @-1) (delete @-)
-
- ;; Then insert the lowercase version of X.
-
- (cond ((= X ?İ) "i")
- (1 (set X (+ X 32)) (insert X))))))))
-@endverbatim
-
-The above example contains the new action <span class="fragment">delete</span>. So, it is time
-to explain more about the preedit buffer. The preedit buffer is a
-temporary place to store a sequence of characters. In this buffer,
-the input method keeps a position called the "current position". The
-current position exists between two characters, at the beginning of
-the buffer, or at the end of the buffer. The <span class="fragment">insert</span> action inserts
-characters before the current position. For instance, when your
-preedit buffer contains "ab.c" ("." indicates the current position),
-
-@verbatim
- (insert "xyz")
-@endverbatim
-
-changes the buffer to "abxyz.c".
-
-There are several predefined variables that represent a specific position in the
-preedit buffer. They are:
-
-<ul>
-<li> <span class="fragment">@@<, @=, @@></span>
-
-The first, current, and last positions.
-
-<li> <span class="fragment">@-, @+</span>
-
-The previous and the next positions.
-</ul>
-
-The format of the <span class="fragment">delete</span> action is this:
-
-@verbatim
- (delete POS)
-@endverbatim
-
-where POS is a predefined positional variable.
-The above action deletes the characters between POS and
-the current position. So, <span class="fragment">(delete @-)</span> deletes one character before
-the current position. The other examples of <span class="fragment">delete</span> include the followings:
-
-@verbatim
- (delete @+) ; delete the next character
- (delete @<) ; delete all the preceding characters in the buffer
- (delete @>) ; delete all the following characters in the buffer
-@endverbatim
-
-You can change the current position using the <span class="fragment">move</span> action as below:
-
-@verbatim
- (move @-) ; move the current position to the position before the
- previous character
- (move @<) ; move to the first position
-@endverbatim
-
-Other positional variables work similarly.
-
-Let's see how our new example works. Whatever a key event is, the
-input method is in its only state, <span class="fragment">init</span>. Since an event of a lower letter
-key is firstly handled by MAP-ACTIONs, every key is changed into the
-corresponding uppercase and put into the preedit buffer. Now this character
-can be accessed with <span class="fragment">@-1</span>.
-
-How can we tell whether the new character should be a lowercase or an
-uppercase? We can do so by checking the character before it, i.e.
-<span class="fragment">@-2</span>. BRANCH-ACTIONs in the <span class="fragment">init</span> state do the job.
-
-It first checks if the character <span class="fragment">@-2</span> is between A to Z, between
-a to z, or İ by the conditional below.
-
-@verbatim
- (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
- (& (>= @-2 ?a) (<= @-2 ?z))
- (= @-2 ?İ))
-@endverbatim
-
-If not, there is nothing to do specially. If so, our new key should
-be changed back into lowercase. Since the uppercase character is
-already in the preedit buffer, we retrieve and remember it in the
-variable <span class="fragment">X</span> by
-
-@verbatim
- (set X @-1)
-@endverbatim
-
-and then delete that character by
-
-@verbatim
- (delete @-)
-@endverbatim
-
-Lastly we re-insert the character in its lowercase form. The
-problem here is that "İ" must be changed into "i", so we need another
-conditional. The first branch
-
-@verbatim
- ((= X ?İ) "i")
-@endverbatim
-
-means that "if the character remembered in X is 'İ', 'i' is inserted".
-
-The second branch
-
-@verbatim
- (1 (set X (+ X 32)) (insert X))
-@endverbatim
-
-starts with "1", which is always resolved into nonzero, so this branch
-is a catchall. Actions in this branch increase <span class="fragment">X</span> by 32, then
-insert <span class="fragment">X</span>. In other words, they change A...Z into a...z
-respectively and insert the resulting lowercase character into the
-preedit buffer. As the input method reaches the end of the
-BRANCH-ACTIONs, the character is commited.
-
-This new input method always checks the character before the current
-position, so "A Quick Blown Fox" will be successfully fixed to "A
-Quick Brown Fox" by the key sequence \<BackSpace\> \<r\>.
-
-
-*/