From: nisikimi Date: Thu, 19 Nov 2009 02:27:36 +0000 (+0000) Subject: *** empty log message *** X-Git-Tag: XML-BEFORE-XEX~5 X-Git-Url: http://git.chise.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=e6bdbc9fd3c5788e360f728b6aa217db8528eb1d;p=m17n%2Fm17n-db.git *** empty log message *** --- diff --git a/FORMATS/IM-tut.txt b/FORMATS/IM-tut.txt index 2f6dcca..b7a3bcb 100644 --- a/FORMATS/IM-tut.txt +++ b/FORMATS/IM-tut.txt @@ -26,40 +26,41 @@ An input method is defined in a *.mimx file with this format. ... - + - - ACTIONS1 ... + INPUT_KEYS11 + ACTIONS11 ... - - ACTIONS2 ... + INPUT_KEYS12 + ACTIONS12 ... ... - + - + INPUT_KEYS21 ACTIONS21 ... - + INPUT_KEYS22 ACTIONS22 ... - ... ... - - - S-ACTIONS1 + + HOOK ACTONS + + ACTIONS1 - - S-ACTIONS2 + + ACTIONS2 .... + CATCH-ALL ACTIONS .... @@ -72,12 +73,28 @@ characters through some actions. Tags should be written as they are. Contents and attribute values (written with uppercases here) may be restricted to some patterns. (See m17n-db-xml/MIM/mim.rng for details.) Every child -element but is optional and we will not see the variable-list, +element but is optional. We will not see the variable-list, command-list, module-list and macro-list in this tutorial. - specifies a sequence of keys in one of the following two ways. -@li one or more (the keysym value returned by the xev command) or -. +Input sequence is translated into characters according to the s +in the s and actions in the s. The characters are +temporarily put into a special place called @c preedit @c buffer. The +input method driver uses this buffer to store, change or re-arrenge +characters, and when it is done, commit the characters in the buffer +to applications. + + consists of a that triggers the rule and actions to +apply to the characters in the preedit buffer. + + specifies a sequence of keys in one of the following two +ways. + +@li a list of s or s. A variable that refers +to a , or a function call that returns a or a + can also appear as an element of the list. A symbol +specifies a key event (the keysym value returned by the xev command) +and an integer specifies a character-code. + @li a string that can be entered from the keyboard. (Usually only ASCII characters. However, if the input method is intended to be used, for instance, with a West European keyboard, the value may @@ -85,29 +102,25 @@ contain Latin-1 characters.) @verbatim - - 0x2E - A-z - + + 0x2E + A-z + - + egsk @endverbatim These are both valid s. -Characters translated from an input sequence is temporarily put into a -special place @c preedit @c buffer. The input method driver uses this -buffer to store, change or re-arrenge characters, and when it is done, -commit the characters in the buffer to applications. - Actions for the translation are defined in s and s. -ACTIONS and S-ACTIONS are a sequence of actions. They may or may not -have attributes or contents that specify its details. For example, -the action for character insertion takes the character to be inserted -as the value of its attribute "character", and the action for calling -external function requires the function to be called as its content. +Actions are a sequence of . They may or may not have +contents that specify its details. For example, the action for +character insertion takes the character to be inserted as its content, +the action for calling external function requires the function to be +called as its content, and the action to commit the translated +characters requires no content. The most common action is for inserting fixed characters or strings. The input method driver keeps a position called the "current @@ -116,17 +129,18 @@ two characters, at the beginning of the buffer, or at the end of the buffer. The inserting action puts characters before the current position. -Inserting actions are written as below. +Here is two examples of inserting actions: @verbatim - + tutorial - + 0x0BB3 @endverbatim -When your preedit buffer contains "this ^text" ("^" indicates the -current position), the first example change the buffer to "this -tutorial ^text". +When your preedit buffer contains "this text" and the current position +is between the space and the second "t", the first example change the +buffer to "this tutorial text" and the current position is between the +second space and fourth "t". The second example inserts a Tamil Letter LAA to the preedit buffer. @@ -143,23 +157,21 @@ Here is a simple example of an input method that works as Caps Lock. Up-case all lowercase letters a->A - - - - - + + aA + bB + cC/string> : : - + iI : : - - - + xX + yY + zZ - - - + + @@ -167,65 +179,70 @@ Here is a simple example of an input method that works as Caps Lock. When an input method is activated, the input method driver is in the initial condition of the first in the . In this -case, it is the state whose @c id is @c state-init. In the initial +case, it is the state whose @c mname is @c state-init. In the initial condition, no key is being processed and no action is suspended. -Each has es. has an attribute @c -branch-selecting-map and its value appears as the value of @c id -attribute of one of the s. This attribute defines the -correspondence between a and a . A has s, -and a has a , so when a key sequence is given, a -that handles the key sequence is determined, and a that is -responsible for the map is determined. +Each has es. Each has an attribute @c mname +that defines the correspondent . A has s, and a + has a , so when a key sequence is given, a for +the key sequence is determined, a that contains the rule is +determined, and a that is responsible for the map is +determined. When the input method driver receives a key sequence "a", it searches for a whose part matches with "a", and finds one in -the whose @c id is @c map-to-upper. The selected branch is the -one whose @c branch-selecting-map is @c map-to-upper. +the whose @c mname is @c map-to-upper. The selected branch is +the one whose @c mname is @c map-to-upper. When a given key sequence does not match with any in any that corresponds with a of the current , that event is unhandled and given back to the application program. -The driver then executes ACTIONs of the . In this case, it -inserts "A" in the preedit buffer. Then S-ACTIONs in the , if -any, are executed. When all ACTIONs and S-ACTIONs have been handled, -the driver shifts to the initial condition of the current state. +The driver then executes actions of the . In this case, it +is +@verbatim +A +@endverbatim +that inserts an "A" in the preedit buffer. Then actions in the +, if any, are executed. When all actions in the rules and the +branch have been handled, the driver shifts to the initial condition +of the current state. The shift to the initial condition of the first state has a special -meaning; it commits all characters in the preedit buffer and clears -it. In this case, as the result, "A" is given to the +meaning; it commits all the characters in the preedit buffer and +clears it. In this case, as the result, "A" is given to the application program. Turkish users may want to extend the above example for "Ä°" (U+0130: LATIN CAPITAL LETTER I WITH DOT ABOVE). Assigning the key sequence -"ii" for that character would be convenient, so and the user might add +"ii" for that character would be convenient, so the user might add this rule in the @c map-to-upper map. @verbatim - + iiÄ° @endverbatim However, we already have the following rule: @verbatim - + iI @endverbatim -Will these rules conflict? What will happen when a key sequence "i" is -entered? +Won't these rules conflict? What will happen when a key sequence "i" +is entered? The input method driver takes care of these kind of overlapping rules. -When the driver receives a "i", it inserts "I" in the preedit buffer. -As it knows that there is another rule that may match the additional -key event "i", after inserting "I", it suspends the normal behavior of -shifting to the initial condition, and waits for another key. The user -will see "I" with underline, which indicates the rule for this -translation is not deterministic and the "I" is not yet committed. +When the driver receives a "i", it inserts an "I" in the preedit +buffer. As it knows that there is another rule that may match the +additional key event "i", after inserting "I", it suspends the normal +behavior of shifting to the initial condition, and waits for another +key. The user will see "I" with underline, which indicates the rule +for this translation is not deterministic and the "I" is not yet +committed. When the input method driver receives the next "i", it cancels all the effects of the rule for the previous "i". In this case, the preedit -buffer is cleared. Then it executes ACTIONs of the rule for "ii", +buffer is cleared. Then it executes actions of the rule for "ii", that is, inserts an "Ä°" to the preedit buffer. This time, there is no rule that matches with "ii" and an additional key, so the character is determined, the driver shifts to the initial condition of the current @@ -241,53 +258,49 @@ leads to commit "A". @section im-state-action Use of state example: Capitalizing -We have so far explained ACTIONs, but not S-ACTIONs. The format of a -S-ACTION is the same as that of an ACTION. It is executed only after -a matching rule has been determined and the corresponding ACTIONs have -been executed. A typical use of S-ACTION is to shift to a different -state. +We have so far explained actions in s, but not in s. +Actions in s are executed only after a matching rule has been +determined its actions have been executed. A typical use of action in +a is to shift to a different state. -In order to see how S-ACTIONs are used, let us modify the current -input method to upcase only such letters that start a word (i.e. to -capitalize). For this purpose, the "state-init" state should be -modified as below. +In order to see how actions in a are used, let us modify the +current input method to upcase only such letters that start a word +(i.e. to capitalize). For this purpose, the "state-init" state should +be modified as below. @verbatim - - - - + + + state-non-upcase @endverbatim -The S-ACTION here is <shift-to> that shifts the input method -driver to another state whose id is @c state-non-upcase. - -We now need to define the state. It has one branch and one catchall. +The action in here is that shifts the input method +driver to another state whose sname is @c state-non-upcase. Let us +define the state. It has one branch and one catchall. @verbatim - - - + + + state-init @endverbatim The branch is for character "a" to "z", and we need a new map with the -id "map-lower" that inserts lowercase letters as they are. +mname "map-lower" that inserts lowercase letters as they are. @verbatim - - + aa + bb : : - - + zz @endverbatim The catchall branch matches with any key event that does not match any -rules in the other maps in the current state. In this case, it +rules in any other maps in the current state. In this case, it matches with characters other than [a-z]. A catchall branch does not -consume any key event. +consume key event. We will show the full code of the new input method before explaining how it works. @@ -301,30 +314,30 @@ how it works. Titlecase letters abc->Abc - - - + + aA + bB : : - - + yY + zZ - - - + + aa + bb : : - - + yy + zz - - - + + + state-non-upcase - - - + + + state-init @@ -337,15 +350,14 @@ preedit buffer and the driver shifts to the state @c state-non-upcase. The next "b" is handled in @c state-non-upcase. It matches the <keyseq> of the second <rule> in the map @c map-lower, so -it is handled by the <branch> whose @c branch-selecting-map is @c -map-lower. By the rule in the map, "b" is <inserted in the preedit -buffer and it is committed explicitly by the <commit> in -<branch>. +it is handled by the <branch> whose @c mname is @c map-lower. +By the rule in the map, "b" is inserted in the preedit buffer and it +is committed explicitly by the <commit> in the <branch>. At this point, the input method is still in @c state-non-upcase, where the next " " key is handled. This time, however, the only branch in this state has no rule for the key and <catch-all-branch> is -selected. S-action in this branch is to the shift to @c state-init. +selected. The action in this branch is to the shift to @c state-init. Note that the key " " is not yet handled because <catch-all-branch> does not consume any key event. The input @@ -369,112 +381,76 @@ applications and Qt applications, and cannot be used with applications that utilizes XIM to communicate with an input method. Before "surrounding text support", we explain a few features of the -input method; variables, arithmetic operations and comparisons, and -conditional actions. +expressions used in input methods; variables, arithmetic operations +and comparisons, and conditional actions. -As we have already seen in <insert> action, some actions takes -the attribute or the content that specifies the target of the action, -and some attribute or content may contain a variable as its value. +As we have already seen in <insert> action, an action may use +its content to specify the target, and some content can be a variable +reference or function call. For instance, the actions @verbatim - 32 - + 32 + @endverbatim set the variable @c X to integer value 32, then insert a character whose Unicode character code is 32 (i.e. SPACE). -The variable value can be set to an integer value, another variable, -or an expression of this form: - -@verbatim - - EXPRESSION1 - EXPRESSION2 - -@endverbatim - -EXPRESSION1 and EXPRESSION2 can also be an expression. For example, -the action below sets the value of the variable @c X to @c Y*32+Z. +The variable value can be set to any term. Terms contain, in addtion +to other items, integer values, variable references and function +calls. (See EXPR.txt for the definition of .) For example, +the action below contains two variable references and two function calls +and sets the value of the variable @c X to @c Y*32+Z. @verbatim - - - 32 - - - + + 32 + + @endverbatim -The operators that appear in expressions are divided into the -following three groups. + and here, in addiotn to others, are calls to predefined +functions. Predefined functions include arithmetic and bitwise +operators (add, subtract, etc.) , relational operators (equal to, +greater than, etc.), logical operators (and, not, etc.), list +operators (append, nth, etc.) and control structures (loop, cond, +etc.) EXPR.txt gives the complete list and descriptions of predefined +functions. -@li Arithmetic and bitwise operators that requires two arguments. +The input method can control the processing flow with <cond> +that has the following form. @verbatim - + - * / & | -@endverbatim - -@li Relational operators that requires two arguments. - -@verbatim - == <= >= < > -@endverbatim - -@li Logical operators that requires one argument. - -@verbatim - ! -@endverbatim - -The input method can control the processing flow with -<conditional> that has the following form. - -@verbatim - - - EXPRESSION1 - ACTIONs1 - - - EXPRESSION1 - ACTIONs1 - + + EXPRESSION1 ACTIONs1 + EXPRESSION2 ACTIONs2 ..... - + @endverbatim - checks the value of EXPRESSION in s one by one, -and when the whose EXPRESSION has a nonzero value is -encountered, ACTIONs in that are performed. - -Now let us return to surrounding text support. Some variables are -predefined and among them are @c predefined-surround-text-flag and @c -predefined-nth-previous-or-following-character whose values are -defined as below and can not be altered. + checks the value of EXPRESSION in s one by one, and when +the whose EXPRESSION has a nonzero value is encountered, +ACTIONs in that are performed. -
    -
  • predefined-surround-text-flag +Now let us return to surrounding text support. Calls to the +predefined function returns -1 if surrounding +text is supported, and -2 if not. --1 if surrounding text is supported, -2 if not. +In order to know what characters surrounds the input spot, we need the +help of and . indicates a position in the +buffer and the predefined function returns the character at +the specified position. -
  • predefined-nth-previous-or-following-character - -This variable takes an attribute @c position whose value must be an -positive or negative integer. If the @c position value is negative, -the value of the @c predefined-nth-previous-or-following-character is -the Nth previous character in the preedit buffer. If there are only M -(M +@@+N and @@-N (N is an positive +integer) mark the N-th preceding or following position, and are used +to specify a character inside or outside of the preedit buffer. If +the number of preceding or following characters in the preedit buffer +is less than N, it marks the (N minus the number of characters)th +preceding or following character from the input spot. When you have the context below, where "def" is in the preedit buffer and your current position in the preedit buffer is between "d" and "e": @@ -483,19 +459,19 @@ and your current position in the preedit buffer is between "d" and "e": ABCdefGHI @endverbatim -The @c predefined-nth-previous-or-following-character has the -following values. +The calls to the functions return the following values. @verbatim - --> ?B - --> ?C - --> ?d - --> ?e - --> ?f - --> ?G + -3 --> ?B + -2 --> ?C + -1 --> ?d + +1 --> ?e + +2 --> ?f + +3 --> ?G @endverbatim -Now you are ready to write a new version of the input method "Titlecase". +Now we are ready to write a new version of the input method +"Titlecase". @verbatim @@ -506,77 +482,69 @@ Now you are ready to write a new version of the input method "Titlecase". Titlecase letters abc->Abc - - - + + aA + bB : : - - - + yY + zZ + iiÄ° - - + + + preedit buffer, -2 + returns the character just before the inputting spot. --> - - - + + + - - - - ?A - - - ?Z + + -2 + ?A + -2 + ?Z + - - - - ?a - - - ?z + + -2 + ?a + -2 + ?z + - - - ?Ä° - + -2?Ä° + - - - - -1 + @< - + - - ?Ä° - - - - - 1 + + ?Ä° + i + + + + 1 - 32 - - - - - + 32 + + + + + @@ -584,81 +552,78 @@ Now you are ready to write a new version of the input method "Titlecase". @endverbatim -The above example contains the new action <delete-to-marker>, -Several markers are predefined to represent (or mark) a specific -position in the preedit buffer. - -<delete-to-marker> action takes the attribute named @c position -and its value must be a marker. It deletes the characters between -that position and the current position. The examples of -delete-to-marker are: +The above example contains the new action <delete>, Several +markers are predefined to mark a specific position in the preedit +buffer. When the content of is a marker, a function call to +the deletes the characters between the marked position and +the current position. The examples are: @verbatim - ; delete the previous character - ; delete the next character - ; delete all the preceding characters in the buffer - ; delete all the following characters in the buffer + @- ; delete the previous character + @+ ; delete the next character + @< ; delete the character between the current position + ; and the first position of the buffer, that is, + ; delete all the preceding characters in the buffer + @> ; delete the character between the current position + ; and the last position of the buffer, that is, + ; delete all the following characters in the buffer @endverbatim Let us see how our new example works. Whatever a key event is, the input method is in its only state, @c state-init. Since an event of a -lower letter key falls into the branch whose @c branch-selecting-map -is @c map-to-upper and handled by <rule>s in that <map>, -the key is changed into the corresponding uppercase character and -inserted into the preedit buffer. Now this uppercase character can be -accessed with @c position="@previous". +lower letter key falls into the branch whose @c mname is @c +map-to-upper and handled by <rule>s in that <map>, the key +is changed into the corresponding uppercase character and inserted +into the preedit buffer. Now this uppercase character can be accessed +with -1. How can we tell whether the new character should be left as an uppercase or changed back to a lowercase? We need to check the character before. That character can be accessed by -<predefined-nth-previous-or-following-character position="-2"/>. +-2. -The character is checked by the @c EXPRESSION part of the <case> -in the first <conditional> of the branch for @c map-to-upper. It -is the disjunction of three s; each becomes true when the -character is between A to Z, between a to z, or Ä°. +The character is checked by the @c EXPRESSION part of the <list> +in the first <cond> of the branch for @c map-to-upper. It is the +disjunction of three conditons; each becomes true when the character +is between A to Z, between a to z, or Ä°. -When the character is not one of the above, the @c EXPRESSION does not -have a nonzero value and @c ACTIONs in this <case> will not be -executed. As there is no more <case> in this -<conditional>, nothing is done to the new character in the +When the character is not one of the above, @c ACTIONs in this +<list> will not be executed. As there is no more <list> +in this <cond>, nothing is done to the new character in the preedit. When the @c EXPRESSION becomes true, the new character must be changed -into a lowercase. @c ACTIONs part in <case> does the work. - -Since the uppercase character is already in the preedit buffer, we -retrieve and remember it in the variable "X" with +into a lowercase, and @c ACTIONs part does the work. Since the +uppercase character is already in the preedit buffer, we retrieve and +remember it in the variable "X" with @verbatim - - - + -1 @endverbatim and then delete it with @verbatim - - ?Ä° - - + + ?Ä° + i + @endverbatim insert "i" if the value of the variable @c X is "Ä°". -The @c EXPRESSION part of the second <case> is +The @c EXPRESSION part of the second <list> is @verbatim - 1 + 1 @endverbatim which is always resolved into nonzero, so this is the catchall. @@ -666,17 +631,16 @@ which is always resolved into nonzero, so this is the catchall. Its @c ACTIONs part @verbatim - 32 - - + 32 + @endverbatim first increases the "X" value by 32, and insert "X". In other words, it changes A...Z into a...z respectively and inserts the lowercase character into the preedit buffer. -Now the input method reaches the end of the S-ACTIONs, the character -in the preedit buffer is committed. +Now that the input method reaches the end of the acions in the branch, +the character in the preedit buffer is committed. This new input method always checks the character before the current position, so "A Quick Blown Fox" will be successfully fixed to "A diff --git a/FORMATS/IM.txt b/FORMATS/IM.txt index 695c08e..5388d86 100644 --- a/FORMATS/IM.txt +++ b/FORMATS/IM.txt @@ -423,7 +423,7 @@ This code moves the marker to the user defined position T. @-5 #endif -This code refers to the 5th previous character wherever it is. +This code refers to the 5th preceding character wherever it is. @verbatim