From: nisikimi Date: Thu, 15 Jan 2009 02:04:35 +0000 (+0000) Subject: *** empty log message *** X-Git-Tag: XML-base~2 X-Git-Url: http://git.chise.org/gitweb/?a=commitdiff_plain;h=c598f6d94ae33729aad747e6da263da81a4abe85;p=m17n%2Fm17n-docs.git *** empty log message *** --- diff --git a/data/dbtutorial.txt b/data/dbtutorial.txt new file mode 100644 index 0000000..40ed121 --- /dev/null +++ b/data/dbtutorial.txt @@ -0,0 +1,529 @@ +/* -*- coding: utf-8; -*- */ +/*** @page m17nDBTutorial Tutorial for writing the m17n database + +This section contains tutorials for writing various database files of +the m17n database. + + +*/ +/*** + +@section mdbTutorialIM Tutorial of input method + +@subsection im-struct Structure of an input method file + +An input method is defined in a *.mim file with this format. + +@verbatim +(input-method LANG NAME) + +(description (_ "DESCRIPTION")) + +(title "TITLE-STRING") + +(map + (MAP-NAME + (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule + (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule + ...) + (MAP-NAME + (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule + (KEYSEQ MAP-ACTION MAP-ACTION ...) <- rule + ...) + ...) + +(state + (STATE-NAME + (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch + ...) + (STATE-NAME + (MAP-NAME BRANCH-ACTION BRANCH-ACTION ...) <- branch + ...) + ...) +@endverbatim +Lowercase letters and parentheses are literals, so they must be +written as they are. Uppercase letters represent arbitrary strings. + +KEYSEQ specifies a sequence of keys in this format: +@verbatim + (SYMBOLIC-KEY SYMBOLIC-KEY ...) +@endverbatim +where SYMBOLIC-KEY is the keysym value returned by the xev command. +For instance +@verbatim + (n i) +@endverbatim +represents a key sequence of \ and \. +If all SYMBOLIC-KEYs are ASCII characters, you can use the short form +@verbatim + "ni" +@endverbatim +instead. Consult #mdbIM for Non-ASCII characters. + +Both MAP-ACTION and BRANCH-ACTION are a sequence of actions of this format: +@verbatim + (ACTION ARG ARG ...) +@endverbatim +The most common action is insert, which is written as this: +@verbatim + (insert "TEXT") +@endverbatim +But as it is very frequently used, you can use the short form +@verbatim + "TEXT" +@endverbatim +If "TEXT" contains only one character "C", you can write it as +@verbatim + (insert ?C) +@endverbatim +or even shorter as +@verbatim + ?C +@endverbatim +So the shortest notation for an action of inserting "a" is +@verbatim + ?a +@endverbatim + +@subsection im-upcase Simple example of capslock + +Here is a simple example of an input method that works as CapsLock. + +@verbatim +(input-method en capslock) +(description (_ "Upcase all lowercase letters")) +(title "a->A") +(map + (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E") + ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J") + ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O") + ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T") + ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y") + ("z" "Z"))) +(state + (init (toupper))) +@endverbatim + +When this input method is activated, it is in the initial condition of +the first state (in this case, the only state init). In the +initial condition, no key is being processed and no action is +suspended. When the input method receives a key event \, it +searches branches in the current state for a rule that matches \ +and finds one in the map toupper. Then it executes MAP-ACTIONs +(in this case, just inserting "A" in the preedit buffer). After all +MAP-ACTIONs have been executed, the input method shifts to the initial +condition of the current state. + +The shift to the initial condition of the first state has a special +meaning; it commits all characters in the preedit buffer then clears +the preedit buffer. + +As a result, "A" is given to the application program. + +When a key event does not match with any rule in the current state, +that event is unhandled and given back to the application program. + +Turkish users may want to extend the above example for "Ä°" (U+0130: +LATIN CAPITAL LETTER I WITH DOT ABOVE). It seems that assigning the +key sequence \ \ for that character is convenient. So, he +will add this rule in toupper. + +@verbatim + ("ii" "Ä°") +@endverbatim + +However, we already have the following rule: + +@verbatim + ("i" "I") +@endverbatim + +What will happen when a key event \ is sent to the input method? + +No problem. When the input method receives \, it inserts "I" in the +preedit buffer. It knows that there is another rule that may +match the additional key event \. So, after inserting "I", it +suspends the normal behavior of shifting to the initial condition, and +waits for another key. Thus, the user sees "I" with underline, which +indicates it is not yet committed. + +When the input method receives the next \, it cancels the effects +done by the rule for the previous "i" (in this case, the preedit buffer is +cleared), and executes MAP-ACTIONs of the rule for "ii". So, "Ä°" is +inserted in the preedit buffer. This time, as there are no other rules +that match with an additional key, it shifts to the initial condition +of the current state, which leads to commit "Ä°". + +Then, what will happen when the next key event is \ instead of \? + +No problem, either. + +The input method knows that there are no rules that match the \ \ key +sequence. So, when it receives the next \, it executes the +suspended behavior (i.e. shifting to the initial condition), which +leads to commit "I". Then the input method tries to handle \ in +the current state, which leads to commit "A". + +So far, we have explained MAP-ACTION, but not +BRANCH-ACTION. The format of BRANCH-ACTION is the same as that of MAP-ACTION. +It is executed only after a matching rule has been determined and the +corresponding MAP-ACTIONs have been executed. A typical use of +BRANCH-ACTION is to shift to a different state. + +To see this effect, let us modify the current input method to upcase only +word-initial letters (i.e. to capitalize). For that purpose, +we modify the "init" state as this: + +@verbatim + (init + (toupper (shift non-upcase))) +@endverbatim + +Here (shift non-upcase) is an action to shift to the new state +non-upcase, which has two branches as below: + +@verbatim + (non-upcase + (lower) + (nil (shift init))) +@endverbatim + +The first branch is simple. We can define the new map lower as the +following to insert lowercase letters as they are. + +@verbatim +(map + ... + (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e") + ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j") + ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o") + ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t") + ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y") + ("z" "z"))) +@endverbatim + +The second branch has a special meaning. The map name nil means +that it matches with any key event that does not match any rules in the +other maps in the current state. In addition, it does not +consume any key event. We will show the full code of the new input +method before explaining how it works. + +@verbatim +(input-method en titlecase) +(description (_ "Titlecase letters")) +(title "abc->Abc") +(map + (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E") + ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J") + ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O") + ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T") + ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y") + ("z" "Z") ("ii" "Ä°")) + (lower ("a" "a") ("b" "b") ("c" "c") ("d" "d") ("e" "e") + ("f" "f") ("g" "g") ("h" "h") ("i" "i") ("j" "j") + ("k" "k") ("l" "l") ("m" "m") ("n" "n") ("o" "o") + ("p" "p") ("q" "q") ("r" "r") ("s" "s") ("t" "t") + ("u" "u") ("v" "v") ("w" "w") ("x" "x") ("y" "y") + ("z" "z"))) +(state + (init + (toupper (shift non-upcase))) + (non-upcase + (lower (commit)) + (nil (shift init)))) +@endverbatim + +Let's see what happens when the user types the key sequence \ \ \< \>. +Upon \, "A" is committed and the state shifts to non-upcase. +So, the next \ is handled in the non-upcase state. +As it matches a +rule in the map lower, "b" is inserted in the preedit buffer and it +is committed explicitly by the "commit" command in BRANCH-ACTION. After +that, the input method is still in the non-upcase state. So the next \< \> +is also handled in non-upcase. For this time, no rule in this state +matches it. Thus the branch (nil (shift init)) is selected and the +state is shifted to init. Please note that \< \> is not yet +handled because the map nil does not consume any key event. +So, the input method tries to handle it in the init state. Again no +rule matches it. Therefore, that event is given back to the application +program, which usually inserts a space for that. + +When you type "a quick blown fox" with this input method, you get "A +Quick Blown Fox". OK, you find a typo in "blown", which should be +"brown". To correct it, you probably move the cursor after "l" and type +\ and \. However, if the current input method is still +active, a capital "R" is inserted. It is not a sophisticated +behavior. + +@subsection im-surrounding-text Example of utilizing surrounding text support + +To make the input method work well also in such a case, we must use +"surrounding text support". It is a way to check characters around +the inputting spot and delete them if necessary. Note that +this facility is available only with Gtk+ applications and Qt +applications. You cannot use it with applications that use XIM +to communicate with an input method. + +Before explaining how to utilize "surrounding text support", you must +understand how to use variables, arithmetic comparisons, and +conditional actions. + +At first, any symbol (except for several preserved ones) used as ARG +of an action is treated as a variable. For instance, the commands + +@verbatim + (set X 32) (insert X) +@endverbatim + +set the variable X to integer value 32, then insert a character +whose Unicode character code is 32 (i.e. SPACE). + +The second argument of the set action can be an expression of this form: + +@verbatim + (OPERAND ARG1 [ARG2]) +@endverbatim + +Both ARG1 and ARG2 can be an expression. So, + +@verbatim + (set X (+ (* Y 32) Z)) +@endverbatim + +sets X to the value of Y * 32 + Z. + +We have the following arithmetic/bitwise OPERANDs (require two arguments): + +@verbatim + + - * / & | +@endverbatim + +these relational OPERANDs (require two arguments): + +@verbatim + == <= >= < > +@endverbatim + +and this logical OPERAND (requires one argument): + +@verbatim + ! +@endverbatim + +For surrounding text support, we have these preserved variables: + +@verbatim + @-0, @-N, @+N (N is a positive integer) +@endverbatim + +The values of them are predefined as below and can not be altered. + +
    +
  • @-0 + +-1 if surrounding text is supported, -2 if not. + +
  • @-N + +The Nth previous character in the preedit buffer. If there are only M +(M @+N + +The Nth following character in the preedit buffer. If there are only M +(M + +So, provided that you have this context: + +@verbatim + ABC|def|GHI +@endverbatim + +("def" is in the preedit buffer, two "|"s indicate borders between the +preedit buffer and the surrounding text) and your current position in +the preedit buffer is between "d" and "e", you get these values: + +@verbatim + @-3 -- ?B + @-2 -- ?C + @-1 -- ?d + @+1 -- ?e + @+2 -- ?f + @+3 -- ?G +@endverbatim + +Next, you have to understand the conditional action of this form: + +@verbatim + (cond + (EXPR1 ACTION ACTION ...) + (EXPR2 ACTION ACTION ...) + ...) +@endverbatim + +where EXPRn are expressions. When an input method executes this +action, it resolves the values of EXPRn one by one from the first branch. +If the value of EXPRn is resolved into nonzero, the corresponding +actions are executed. + +Now you are ready to write a new version of the input method "Titlecase". + +@verbatim +(input-method en titlecase2) +(description (_ "Titlecase letters")) +(title "abc->Abc") +(map + (toupper ("a" "A") ("b" "B") ("c" "C") ("d" "D") ("e" "E") + ("f" "F") ("g" "G") ("h" "H") ("i" "I") ("j" "J") + ("k" "K") ("l" "L") ("m" "M") ("n" "N") ("o" "O") + ("p" "P") ("q" "Q") ("r" "R") ("s" "S") ("t" "T") + ("u" "U") ("v" "V") ("w" "W") ("x" "X") ("y" "Y") + ("z" "Z") ("ii" "Ä°"))) +(state + (init + (toupper + + ;; Now we have exactly one uppercase character in the preedit + ;; buffer. So, "@-2" is the character just before the inputting + ;; spot. + + (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z)) + (& (>= @-2 ?a) (<= @-2 ?z)) + (= @-2 ?Ä°)) + + ;; If the character before the inputting spot is A..Z, + ;; a..z, or Ä°, remember the only character in the preedit + ;; buffer in the variable X and delete it. + + (set X @-1) (delete @-) + + ;; Then insert the lowercase version of X. + + (cond ((= X ?Ä°) "i") + (1 (set X (+ X 32)) (insert X)))))))) +@endverbatim + +The above example contains the new action delete. So, it is time +to explain more about the preedit buffer. The preedit buffer is a +temporary place to store a sequence of characters. In this buffer, +the input method keeps a position called the "current position". The +current position exists between two characters, at the beginning of +the buffer, or at the end of the buffer. The insert action inserts +characters before the current position. For instance, when your +preedit buffer contains "ab.c" ("." indicates the current position), + +@verbatim + (insert "xyz") +@endverbatim + +changes the buffer to "abxyz.c". + +There are several predefined variables that represent a specific position in the +preedit buffer. They are: + +
      +
    • @@<, @=, @@> + +The first, current, and last positions. + +
    • @-, @+ + +The previous and the next positions. +
    + +The format of the delete action is this: + +@verbatim + (delete POS) +@endverbatim + +where POS is a predefined positional variable. +The above action deletes the characters between POS and +the current position. So, (delete @-) deletes one character before +the current position. The other examples of delete include the followings: + +@verbatim + (delete @+) ; delete the next character + (delete @<) ; delete all the preceding characters in the buffer + (delete @>) ; delete all the following characters in the buffer +@endverbatim + +You can change the current position using the move action as below: + +@verbatim + (move @-) ; move the current position to the position before the + previous character + (move @<) ; move to the first position +@endverbatim + +Other positional variables work similarly. + +Let's see how our new example works. Whatever a key event is, the +input method is in its only state, init. Since an event of a lower letter +key is firstly handled by MAP-ACTIONs, every key is changed into the +corresponding uppercase and put into the preedit buffer. Now this character +can be accessed with @-1. + +How can we tell whether the new character should be a lowercase or an +uppercase? We can do so by checking the character before it, i.e. +@-2. BRANCH-ACTIONs in the init state do the job. + +It first checks if the character @-2 is between A to Z, between +a to z, or Ä° by the conditional below. + +@verbatim + (cond ((| (& (>= @-2 ?A) (<= @-2 ?Z)) + (& (>= @-2 ?a) (<= @-2 ?z)) + (= @-2 ?Ä°)) +@endverbatim + +If not, there is nothing to do specially. If so, our new key should +be changed back into lowercase. Since the uppercase character is +already in the preedit buffer, we retrieve and remember it in the +variable X by + +@verbatim + (set X @-1) +@endverbatim + +and then delete that character by + +@verbatim + (delete @-) +@endverbatim + +Lastly we re-insert the character in its lowercase form. The +problem here is that "Ä°" must be changed into "i", so we need another +conditional. The first branch + +@verbatim + ((= X ?Ä°) "i") +@endverbatim + +means that "if the character remembered in X is 'Ä°', 'i' is inserted". + +The second branch + +@verbatim + (1 (set X (+ X 32)) (insert X)) +@endverbatim + +starts with "1", which is always resolved into nonzero, so this branch +is a catchall. Actions in this branch increase X by 32, then +insert X. In other words, they change A...Z into a...z +respectively and insert the resulting lowercase character into the +preedit buffer. As the input method reaches the end of the +BRANCH-ACTIONs, the character is commited. + +This new input method always checks the character before the current +position, so "A Quick Blown Fox" will be successfully fixed to "A +Quick Brown Fox" by the key sequence \ \. + + +*/