+/* Copyright (C) 2007
+ National Institute of Advanced Industrial Science and Technology (AIST)
+ Registration Number H15PRO112
+ See the end for copying conditions. */
+
+/***
+
+@htmlonly
+<style type="text/css">
+<!--
+.red { color:red }
+-->
+@endhtmlonly
+
+@page mdbTutorialIM Tutorial of input method
+
+@section im-struct Structure of an input method file
+
An input method is defined in a *.mim file with this format.
+@verbatim
(input-method LANG NAME)
(description (_ "DESCRIPTION"))
(MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch
...)
...)
-
+@endverbatim
Lowercase letters and parentheses are literals, so they must be
written as they are. Uppercase letters represent arbitrary strings.
KEYSEQ specifies a sequence of keys in this format:
-
- (SYMBOLIC-KEY BYMBOLIC-KEY ...)
-
-where SYMBOLIC-KEY is the keysym
+@verbatim
+ (SYMBOLIC-KEY SYMBOLIC-KEY ...)
+@endverbatim
+where SYMBOLIC-KEY is the keysym value returned by the xev command.
For instance
-
- (n i)
-
-represents a key sequence of <n> and <i>. If all SYMBOLIC-KEYs are
-ASCII characters, you can use the short form
-
- "ni"
-
-instead. Consult the file IM.txt for Non-ASCII characters.
+@verbatim
+ [[(n i)]]
+@endverbatim
+represents a key sequence of <<n>> and <<i>>.
+If all SYMBOLIC-KEYs are ASCII characters, you can use the short form
+@verbatim
+ [["ni"]]
+@endverbatim
+instead. Consult #mdbIM for Non-ASCII characters.
MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format:
-
+@verbatim
(ACTION ARG ARG ...)
-
-The most common action is "insert" which can be written as this:
-
+@endverbatim
+The most common action is [[insert]], which is written as this:
+@verbatim
(insert "TEXT")
-
+@endverbatim
But as it is very frequently used, you can use the short form
-
+@verbatim
"TEXT"
-
-When TEXT contains only one character <C>, you can write it as
-
+@endverbatim
+If [["TEXT"]] contains only one character "C", you can write it as
+@verbatim
(insert ?C)
-
+@endverbatim
or just
-
+@verbatim
?C
-
-Therefore the shortest notation for an action of inserting <a> is
-
+@endverbatim
+So the shortest notation for an action of inserting "a" is
+@verbatim
?a
+@endverbatim
-Here is a simple example:
+@section im-upcase Simple example of capslock
----upcase.mim---------------------------------------------------------
-(input-method en upcase)
+Here is a simple example of an input method that works as CapsLock.
+
+@verbatim
+(input-method en capslock)
(description (_ "Upcase all lowercase letters"))
(title "a->A")
(map
(toupper ("a" "A") ("b" "B") ... ("z" "Z")))
(state
(init (toupper)))
-----------------------------------------------------------------------
+@endverbatim
When this input method is activated, it is in the initial condition of
-the first state (in this case, it is "init"). In the initial condition,
+the first state (in this case, it is [[init]]). In the initial condition,
no key is being processed and no action is suspended. When it
-receives a key event <a>, it searches branches in the current state
-for a rule that matches with <a> and finds one in the map "toupper".
+receives a key event <<a>>, it searches branches in the current state
+for a rule that matches <<a>> and finds one in the map [[toupper]].
Then it executes MAP-ACTIONs (in this case, just inserting "A" in the
-preedit buffer). After all MAP-ACTIONs are executed, the input
+preedit buffer). After all MAP-ACTIONs have been executed, the input
method shifts to the initial condition of the current state.
-The shift to "the initial condition of the first state" has a special
+The shift to <em>the initial condition of the first state</em> has a special
meaning; it commits all characters in the preedit buffer then clears
the preedit buffer.
When a key event does not match with any rule in the current state,
that event is unhandled and given back to an application program.
-A Turkish user may want to extend the above example for "İ" (U+0130:
+Turkish users may want to extend the above example for "İ" (U+0130:
LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the
-key sequence <i> <i> for that character is convenient. So, he will
-add this rule in "toupper".
+key sequence <<i>> <<i>> for that character is convenient. So, he
+will add this rule in [[toupper]].
+@verbatim
("ii" "İ")
+@endverbatim
But we already have this rule too:
+@verbatim
("i" "I")
+@endverbatim
What will happen when a key event <i> is sent to the input method?
-No problem. When the input method receives <i>, it inserts "I" in the
+No problem. When the input method receives <<i>>, it inserts "I" in the
preedit buffer. But, it knows that there is another rule that may
-match with the additional key event <i>. So, after inserting "I", it
+match the additional key event <<i>>. So, after inserting "I", it
suspends the normal behavior of shifting to the initial condition, and
waits for another key. Thus, the user sees "I" with underline, which
indicates it is not yet committed.
-When the input method receives the next <i>, it cancels the effects
+When the input method receives the next <<i>>, it cancels the effects
done by the rule for the previous "i" (in this case, the preedit buffer is
cleared), and executes MAP-ACTIONs of the rule for "ii". So, "İ" is
inserted in the preedit buffer. This time, as there are no other rules
that match with an additional key, it shifts to the initial condition
of the current state, and commits "İ".
-Then, what will happen when the next key event is <a> instead of <i>?
+Then, what will happen when the next key event is <<a>> instead of <<i>>?
No problem, either.
-The input method knows that there are no rules that match the <i> <a> key
-sequence. So, when it receives the next <a>, it executes the
+The input method knows that there are no rules that match the <<i>> <<a>> key
+sequence. So, when it receives the next <<a>>, it executes the
suspended behavior (i.e. shifting to the initial condition), which
-leads to commit "S". Then the input method tries to handle <a> in the current
-state, which leads to commit "A".
+leads to commit "I". Then the input method tries to handle <<a>> in
+the current state, which leads to commit "A".
So far, we have explained MAP-ACTION, but not
BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION.
letter only at a beginning of a word (i.e. capitalizing). For that purpose,
we modify the "init" state as this:
+@verbatim
(init
(toupper (shift non-upcase)))
+@endverbatim
-Here "(shift non-upcase)" is an action to shift to the new state
-"non-upcase", which has two branches as below:
+Here [[(shift non-upcase)]] is an action to shift to the new state
+[[non-upcase]], which has two branches as below:
+@verbatim
(non-upcase
(lower)
(nil (shift init)))
+@endverbatim
-The first branch is simple. We can define the new map "lower" as the
+The first branch is simple. We can define a new map [[lower]] as the
following to insert lowercase letters as they are.
+@verbatim
(map
...
(lower ("a" "a") ("b" "b") ... ("z" "z")))
+@endverbatim
-The second branch has a special meaning. The map name "nil" means
+The second branch has a special meaning. The map name [[nil]] means
that it matches with any key event that does not match any rules in the
other maps in the same state. In addition, it does not
consume any key event. We will show the full code of the new input
method before explaining how it works.
----titlecase.mim------------------------------------------------------
+@verbatim
(input-method en titlecase)
(description (_ "Titlecase letters"))
(title "abc->Abc")
(map
- (toupper ("a" "A") ("b" "B") ... ("z" "Z") ("ii" "İ"))
+ (toupper ("a" "A") ("b" "B") ... ("z" "Z") ("ii" "Ä°"))
(lower ("a" "a") ("b" "b") ... ("z" "z")))
(state
(init
(non-upcase
(lower (commit))
(nil (shift init))))
-----------------------------------------------------------------------
-
-Let's see what happens when the user types the keys <a> <b> < >.
+@endverbatim
-Upon <a>, "A" is committed and the state shifts to "non-upcase".
-So, the next <b> is handled in the "non-upcase" state. As it matches with a
+Let's see what happens when a user types keys <<a>> <<b>> << >>.
+Upon <<a>>, "A" is committed and the state shifts to [[non-upcase]].
+So, the next <<b>> is handled in the [[non-upcase]] state.
+As it matches with a
rule in the map "lower", "b" is inserted in the preedit buffer and it
is committed explicitly by the "commit" command in BRANCH-ACTION. After
-that, the input method is still in "non-upcase" state. So the next < >
-is also handled in "non-upcase". This time, as no rule in this state
-matches with it, the branch "(nil (shift init))" is selected, and the
-state is changed to init. Please note that < > is not yet handled.
-So, the input method tries to handle it in the "init" state. Again no
-rule matches with it. Therefore, that event is given back to an application
+that, the input method is still in the [[non-upcase]] state. So the next << >>
+is also handled in [[non-upcase]]. This time, as no rule in this state
+matches it, the branch [[(nil (shift init))]] is selected, and the
+state is changed to [[init]]. Please note that << >> is not yet handled.
+So, the input method tries to handle it in the [[init]] state. Again no
+rule matches it. Therefore, that event is given back to the application
program, which usually inserts a space for that.
When you type "a quick blown fox" with this input method, you get "A
Quick Blown Fox". OK, you find a typo in "blown", it should be
"brown". To correct it, you move the cursor after "l" and type
-Backspace and <r>. However, if the current input method is still
+<<Backspace>> and <<r>>. However, if the current input method is still
active, a capital "R" is inserted. It is not a sophisticated
behavior.
+@section im-surrounding-text Example of utilizing surrounding text support
+
To make the input method work well also in such a case, we must use
"surrounding text support". It is a way to check characters around
the inputting spot and delete them if necessary. Please note that
At first, any symbol (except for several preserved ones) used as ARG
of an action is treated as a variable. For instance, the commands
+@verbatim
(set X 32) (insert X)
+@endverbatim
-set the variable "X" to integer value 32, then insert a character
+set the variable [[X]] to integer value 32, then insert a character
whose Unicode character code is 32 (i.e. SPACE).
-The second argument of the "set" action can be an expression of this form:
+The second argument of the [[set]] action can be an expression of this form:
+@verbatim
(OPERAND ARG1 [ARG2])
+@endverbatim
Both ARG1 and ARG2 can be an expression. So,
- (set X (+ (* Y 32) Z)) --- set "X" to the value of "Y*32+Z"
+@verbatim
+ (set X (+ (* Y 32) Z))
+@endverbatim
+
+sets [[X]] to the value of [[Y * 32 + Z]].
We have the following arithmetic/bitwise OPERANDs (require two arguments):
+@verbatim
+ - * / & |
+@endverbatim
these relational OPERANDs (require two arguments):
+@verbatim
== <= >= < >
+@endverbatim
and this logical OPERAND (require one argument):
+@verbatim
!
+@endverbatim
For surrounding text support, we have these preserved variables:
+@verbatim
@-0, @-N, @+N (N is a natural number)
+@endverbatim
The values of them are predefined as below and can not be altered.
- @-0 -- -1 if surrounding text is supported, -2 if not.
- @-N -- The Nth previous character in the preedit buffer. If there is only
- M (M<N) previous characters in it, the value is the (N-M)th previous
- character from the inputting spot.
- @+N -- The Nth following character in the preedit buffer. If there is only
- M (M<N) following characters in it, the value is the (N-M)th following
- character from the inputting spot.
+<ul>
+<li> [[@-0]]
+
+-1 if surrounding text is supported, -2 if not.
+
+<li> [[@-N]]
+
+The Nth previous character in the preedit buffer. If there is only M
+(M<N) previous characters in it, the value is the (N-M)th previous
+character from the inputting spot.
+
+<li> [[@+N]]
+
+The Nth following character in the preedit buffer. If there is only M
+(M<N) following characters in it, the value is the (N-M)th following
+character from the inputting spot.
+
+</ul>
So, provided that you have this context:
+@verbatim
ABC|def|GHI
+@endverbatim
("def" is in the preedit buffer, two "|"s indicate borders between the
preedit buffer and the surrounding text) and your current position in
the preedit buffer is between "d" and "e", we have these values:
+@verbatim
@-3 -- ?B
@-2 -- ?C
@-1 -- ?d
@+1 -- ?e
@+2 -- ?f
@+3 -- ?G
+@endverbatim
Next, you have to understand a conditional action of this form:
+@verbatim
(cond
(EXPR1 ACTION ACTION ...)
(EXPR2 ACTION ACTION ...)
...)
+@endverbatim
where EXPRn are expressions. When an input method executes this
action, it resolves the values of EXPRn one by one from the first branch.
Now you are ready to write a new version of the input method "Titlecase".
----titlecase2.mim------------------------------------------------------
-(input-method en titlecase)
+@verbatim
+(input-method en titlecase2)
(description (_ "Titlecase letters"))
(title "abc->Abc")
(map
("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O")
("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T")
("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y")
- ("z" "Z") ("ii" "İ")))
+ ("z" "Z") ("ii" "Ä°")))
(state
(init
(toupper
(cond ((= X ?İ) "i")
(1 (set X (+ X 32)) (insert X))))))))
-----------------------------------------------------------------------
+@endverbatim
-The above example contains a new action "delete". So, it is time to
+The above example contains a new action [[delete]]. So, it is time to
explain more about the preedit buffer. The preedit buffer is a
temporary place to store a sequence of characters. In this buffer, the
-input method keeps one position called the "current position". The current
+input method keeps a position called the "current position". The current
position exists between two characters, at the beginning of the buffer, or at
the end of the buffer. The "insert" action inserts characters before
the current position. For instance, when your preedit
buffer contains "ab.c" ("." indicates the current position),
+@verbatim
(insert "xyz")
+@endverbatim
changes the buffer to "abxyz.c".
There are several predefined variables that represent a specific position in the
preedit buffer. They are:
- @<, @=, @> -- the first, current, and last positions
- @-, @+ -- the previous and the next positions
+<ul>
+<li> [[@<, @=, @>]]
-The format of the "delete" action is:
+The first, current, and last positions.
+<li> [[@-, @+]]
+
+The previous and the next positions.
+</ul>
+
+The format of the [[delete]] action is this:
+
+@verbatim
(delete POS)
+@endverbatim
-where POS is a positional variable beginning with @.
+where POS is a positional variable beginning with [[@]].
The above action deletes the characters between POS and
-the current position. So, "(delete @-)" deletes one character before
-the current position. The other examples of "delete" actions are:
+the current position. So, [[(delete @-)]] deletes one character before
+the current position. The other examples of [[delete]] include the followings:
+@verbatim
(delete @+) -- delete the next character
(delete @<) -- delete all the preceding characters in the buffer
(delete @>) -- delete all the following characters in the buffer
+@endverbatim
-You can change the current position by the "move" action as below:
+You can change the current position using the [[move]] action as below:
+@verbatim
(move @-) -- move the current position to the position before the
previous character
(move @<) -- move to the first position
+@endverbatim
Other positional variables work similarly.
+
Let's see how our new example works. Whatever a key event is, the
-input method is in its only state, "init". Since an event of a lower letter
-key is first handled by MAP-ACTIONs, every key is changed into the
+input method is in its only state, [[init]]. Since an event of a lower letter
+key is firstly handled by MAP-ACTIONs, every key is changed into the
corresponding uppercase and put into the preedit buffer. Now this character
-can be retrieved with @-1.
+can be accessed with [[@-1]].
-How can we determine whether the new character should be
-lowercase or not? We have to check the character before it, that is, @-2.
-BRANCH-ACTIONs in the "init" state do the job.
+How can we tell whether the new character is
+lowercase or not? We have to check the character before it, that is, [[@-2]].
+BRANCH-ACTIONs in the [[init]] state do the job.
-It first checks if the character @-2 is between A and Z, or between a
-and z, by the conditional below.
+It first checks if the character [[@-2]] is between A to Z, or between
+a to z, by the conditional below.
+@verbatim
(cond ((| (& (>= @-2 ?A) (<= @-2 ?Z))
(& (>= @-2 ?a) (<= @-2 ?z))
(= @-2 ?İ))
+@endverbatim
If not, there is nothing to do specially. If so, our new key
should be changed back into lowercase. The uppercase character is already
in the preedit buffer and we have to retrieve and remember it in the
-variable X by
+variable [[X]] by
+@verbatim
(set X @-1)
+@endverbatim
and then delete that character by
+@verbatim
(delete @-)
+@endverbatim
-Lastly we insert the character back in its lowercase form. The
+Lastly we re-insert the character in its lowercase form. The
problem here is that "İ" must be changed into "i", so we need another
conditional. The first branch
+@verbatim
((= X ?İ) "i")
+@endverbatim
-means that "if the character remembered in X is İ, "i" is inserted".
+means that "if the character remembered in X is 'İ', 'i' is inserted".
The second branch
+@verbatim
(1 (set X (+ X 32)) (insert X))
+@endverbatim
starts with "1", which is always resolved into nonzero, so this branch
-is catchall. Actions in this branch add 32 to X itself and then insert
-X. In other words, they make A...Z into a...z respectively and insert the
-lowercase character into the preedit buffer. As the input method
-reached the end of the BRANCH-ACTIONs, the character is commited.
+is catchall. Actions in this branch add 32 to [[X]] itself and then
+insert [[X]]. In other words, they change A...Z into a...z
+respectively and insert the lowercase character into the preedit
+buffer. As the input method reached the end of the BRANCH-ACTIONs,
+the character is commited.
This new input method always checks the character before the current
position, so "A Quick Blown Fox" will be successfully fixed to "A
-Quick Brown Fox" by a backspace and <r>.
+Quick Brown Fox" by a <<BackSpace>> and <<r>>.
+
+
+*/
+
+/*
+Copyright (C) 2007
+ National Institute of Advanced Industrial Science and Technology (AIST)
+ Registration Number H15PRO112
+
+This file is part of the m17n database; a sub-part of the m17n
+library.
+
+The m17n library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public License
+as published by the Free Software Foundation; either version 2.1 of
+the License, or (at your option) any later version.
+
+The m17n library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with the m17n library; if not, write to the Free
+Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+Boston, MA 02110-1301, USA.
+*/