/***en @page m17nDBFormat Data format of the m17n database This section describes the data formats of the m17n database. */ /***en @section m17nDBgeneral General format @subsection description DESCRIPTION The mdatabase_load () function returns the data specified by tags in the form of plist if the first tag is not @c Mchartable nor @c Mcharset. The keys of the returned plist are limited to Minteger, Msymbol, Mtext, and Mplist. The type of the value is unambiguously determined by the corresponding key. If the key is Minteger, the value is an integer. If the key is Msymbol, the value is a symbol. And so on. A number of expressions are possible to represent a plist. For instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to represent a plist whose first property key and value are K1 and V1, second key and value are K2 and V2, and so on. However, we can use a simpler expression here because the types of plists used in the m17n database are fairly restricted. Hereafter, we use an expression, which is similar to S-expression, to represent a plist. (Actually, the default database loader of the m17n library is designed to read data files written in this expression.) The expression consists of one or more elements. Each element represents a property, i.e. a single element of a plist. Elements are separated by one or more whitespaces, i.e. a space (code 32), a tab (code 9), or a newline (code 10). Comments begin with a semicolon (;) and extend to the end of the line. The key and the value of each property are determined based on the type of the element as explained below. @subsection example EXAMPLE Here is an example of plist that is written in the expression explained above. @verbatim abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456) @endverbatim It represents the following plist: @verbatim (Msymbol:abc, Minteger:123, Mplist:(Msymbol:pqr, Minteger:255), Mtext:m"text, Mplist:(Msymbol:_\_, Mplist:(Mtext:string, Msymbol:xyz), Minteger:-456)) @endverbatim */ /***en @section m17nDBflt Font Layout Table @subsection description DESCRIPTION Usually, the rendering engine converts character codes of into glyph codes one by one by consulting information about encoding of each selected font. But, for rendering a text that requires complicated layouting (e.g. Thai and Indic), such an one to one conversion is not sufficient. In addition, some glyphs must be shifted 2-dimensionally on the screen. For such a case, a font layout table (FLT in short) must be used. A FLT can contain the information equivarent to OpenType Layout Table (CMAP, GSUB, and GPOS) in addition to the information about how to extract a grapheme cluster and how to re-order characters. The m17n library loads a FLT from the m17n database by the tags \. The plist format of the data is as follows: @verbatim FONT-LAYOUT-TABLE ::= '(' CATEGORY-TABLE GENERATOR ( CATEGORY-TABLE ? GENERATOR ) * ')' CATEGORY-TABLE ::= '(' 'category' CATEGORY-SPEC + ')' CATEGORY-SPEC ::= '(' ( CODE | CODE-FROM CODE-TO ) CATEGORY-CODE ')' CATEGORY-CODE ::= [ '?A' | ... | '?Z' | '?a' | ... | '?z' ] GENERATOR ::= '(' 'generator' RULE RULE-DEF * ')' RULE-DEF ::= '(' RULE-NAME RULE ')' RULE ::= REGEXP-RULE | MATCH-RULE | MAP-RULE | PREDEFINED-RULE | OTF-RULE REGEXP-RULE ::= '(' REGEXP RULE * ')' | RULE-NAME MATCH-RULE ::= '(' MATCH-IDX RULE * ')' | RULE-NAME PREDEFINED-RULE ::= '=' | '*' | '<' | '>' | '|' OTF-URLE ::= 'otf:SCRIPT-TAG[:LANGSYS-TAG]' where SCRIPT-TAG is OTF's ScriptTag name (four letters) listed at: LANGSYS-TAG is OTF's Language System name (four letters) listed at: Example: 'otf:deva' MAP-RULE ::= '(' ( SEQ-TO-SEQ | RANGE-TO-CODE ) + ')' | RULE-NAME SEQ-TO-SEQ ::= '(' CODE + '-1' ( COMBINING ? CODE ) * ')' RANGE-TO-CODE ::= '(' '(' CODE-FROM CODE-TO ')' COMBINING ? CODE ')' COMBINING ::= 'VHOVH' where V ::= ( 't' | 'c' | 'b' | 'B' ) H ::= ( 'l' | 'c' | 'r' ) O ::= ( '.' | XOFF | YOFF XOFF ? ) XOFF ::= ('<' | '>') DIGITS YOFF ::= ('+' | '-') DIGITS Example: 'tc+bc', 'Bl-10<20Br' Semantics of PREDEFINED-RULE: '*': repeat the previous command (if successful) '<': start grapheme cluster '>': end grapheme cluster '|': produce a special glyph that has category-code ' ' '=': use the first glyph of the current run as is @endverbatim @subsection seealso SEE ALSO m17nDBgeneral(5) */ /***en @section m17nDBfontenc Font Encoding The m17n library loads information about the encoding of each font form the m17n database by the tags \. The plist format of the data is as follows: @verbatim FONT-ENCODING ::= PER-FONT-INFO * PER-FONT-INFO ::= '(' FONT-SPEC ENCODING ')' FONT-SPEC ::= '(' [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY ')' @endverbatim @c FOUNDRY to @c REGISTRY are symbols specifying the corresponding XLFD font name fields. Omitted symbols are regarded as @c nil, and @c nil means a wild card. For instance, this @c FONT-SPEC: @verbatim (nil alice0\ lao iso8859-1) @endverbatim should be applied to all fonts whose family is "alice0 lao", and registry is "iso8859-1". @c ENCODING is a symbol representing a charset. A font matching @c FONT-SPEC supports all characters of the charset, and a character code is mapped to the corresponding glyph code of the font by this charset. */ /***en @section m17nDBfontset Fontset The m17n library loads a fontset definition from the m17n database by the tags \. The plist format of the data is as follows: @verbatim FONTSET ::= PER-SCRIPT * PER-CHARSET * FALLBACK * PER-SCRIPT ::= '(' SCRIPT PER-LANGUAGE + ')' PER-LANGUAGE ::= '(' LANGUAGE FONT-SPEC-ELEMENT + ')' PER-CHARSET ::= '(' CHARSET FONT-SPEC-ELEMENT + ')' FALLBACK ::= FONT-SPEC-ELEMENT FONT-SPEC-ELEMENT ::= '(' FONT-SPEC [ FLT-NAME ] ')' FONT-SPEC ::= '(' [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY ')' @endverbatim @c SCRIPT is a symbol of script name (e.g. latin, han) or @c nil. @c LANGUAGE is a two-letter symbol of language name code defined by ISO 639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name of @ref flt. For instance, this @c PER_SCRIPT: @verbatim (han (ja ((jisx0208.1983-0))) (zh ((gb2312.1980-0))) (nil ((big5-0)))) @endverbatim instructs the rendering engine to use a font of registry "jisx0208.1983-0" for a "han" character that has @c Mlanguage text propert "ja" if the character is in the repertories of such fonts. Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a "han" character does not have @c Mlangauge text property, try all three fonts. */ /***en @section m17nDBfontsize Font Resizing In some case, a font contains incorrect information about its size (typically in the case of a hacked TrueType font), which results in a bad text layout when such a font is used in combination with the other fonts. To overcome this problem, the m17n library loads information about font-size correction from the m17n database by the tags \. The plist format of the data is as follows: @verbatim FONT-RESIZE ::= PER-FONT-INFO * PER-FONT-INFO ::= '(' FONT-SPEC RESIZE-RATIO ')' FONT-SPEC ::= '(' [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY ')' @endverbatim The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e Encoding. @c RESIZE-RATIO is an integer number specifying by percentage how much the font-size must be adjusted. For instance, this @c PER-FONT-INFO: @verbatim ((devanagari-cdac) 150) @endverbatim means that, to use a font of registry "devanagari-cdac" with a specific size, we have to open an 1.5 times bigger one. */ /***en @section m17nDBinput Input Metod @verbatim INPUT-METHOD ::= '(' TITLE MAP-LIST MODULE-LIST ? STATE-LIST ')' TITLE ::= '(' title MTEXT ')' MAP-LIST ::= '(' 'map' MAP * ')' MAP ::= '(' MAP-NAME RULE * ')' MAP-NAME ::= symbol RULE ::= '(' KEYSEQ MAP-ACTION * ')' KEYSEQ ::= mtext | '(' [ symbol | integer ] * ')' MAP-ACTION ::= ACTION ACTION ::= mtext | CANDIDATES | '(' ACTION-NAME ACTION-ARG * ')' CANDIDATES ::= '(' CANDIDATE-GROUP * ')' (in ASCTION, MTEXT is a short form of "(insert MTEXT)" CANDIDATES is a short form of "(candidates CANDIDATE-GROUP *)") ACTION-NAME ::= 'insert' | 'candidates' | 'delete' | 'select' | 'select-group' | 'move' | 'mark' | 'pushback' | 'undo' | 'shift' | 'call' ACTION-ARG ::= integer | symbol | mtext | CANDIDATE-GROUP CANDIDATE-GROUP ::= mtext | '(' mtext * ')' PREDEFINED-SYMBOL ::= '@0' | '@1' | '@2' | '@3' | '@4' | '@5' | '@6' | '@7' | '@8' | '@9' | '@<' | '@=' | '@>' | '@-' | '@+' MARKER ::= PREDEFINED-SYMBOL | symbol CANDIDATE-INDEX ::= PREDEFINED-SYMBOL CANDIDATE-GROUP-INDEX ::= PREDEFINED-SYMBOL ;; The first five actions ('insert' .. 'select-group') are for ;; editing a text in the preediting buffer. The buffer can keep ;; multiple markers. Each marker is represented by a symbol, and ;; keeps a position between characters in the preediting buffer. ;; PREDEFINED-SYMBOL has special meanings when used as MARKER: ;; '@0', '@1', ... '@9' ;; The 0th, 1th, ... 9th position. ;; '@<', '@=', '@>' ;; The first, current, and end position. ;; '@-', '@+' ;; The previous and next position. ;; PREDEFINED-SYMBOL has special meanings when used as CANDIDATE-INDEX: ;; '@0', '@1', ... '@9' ;; The 0th, 1th, ... 9th candidate of the current candidate group. ;; '@<', '@=', '@>' ;; The first, current, and end candidate of the current candidate ;; group. ;; '@-' ;; The previous candidate. If the current candidate is the ;; first of the current candidate group, the last candidate of ;; the previous candidate group. ;; '@+' ;; The next candidate. If the current candidate is the last of ;; the current candidate group, the first candidate of the ;; previous candidate group. ;; PREDEFINED-SYMBOL has special meanings when used as CANDIDATE-INDEX: ;; '@0', '@1', ... '@9' ;; The 0th, 1th, ... 9th candidate of the current candidate group. ;; '@<', '@=', '@>' ;; The first, current, and end candidate of the current candidate ;; group. ;; '@-' ;; The previous candidate. If the current candidate is the ;; first of the current candidate group, the last candidate of ;; the previous candidate group. ;; '@+' ;; The next candidate. If the current candidate is the last of ;; the current candidate group, the first candidate of the ;; previous candidate group. ;; PREDEFINED-SYMBOL has special meanings when used as ;; CANDIDATE-GROUP-INDEX: ;; '@0', '@1', ... '@9' ;; The 0th, 1th, ... 9th candidate group. ;; '@<', '@=', '@>' ;; The first, current, and end candidate group. ;; '@-', '@+' ;; The previous and the next candidate group. ;; (insert MTEXT) inserts MTEXT before @=. ;; (candidates CANDIDATE-GROUP *) set the current candidates list ;; to the list of arguments, set the current candidate group to the ;; first argument, insert the first candidate of the current group ;; before @=, and mark the inserted text as the current candidate. ;; Each element of CANDIDATE-GROUP represents a candidate, i.e. if ;; CANDIDATE-GROUP is MTEXT, each character in MTEXT is a ;; candidate, if CANDIDATE-GROUP is a list of MTEXT, each MTEXT is ;; a candidate. ;; (select CANDIDATE-INDEX) replaces the current candidate with ;; what specified by CANDIDATE-INDEX. If a candidate of the ;; different candidate group is specified, set the current ;; candidate group to that group. ;; (select-group CANDIDATE-GROUP-INDEX) sets the current candidate ;; group to a group indicated by CANDIDATE-GROUP-INDEX, and relaces ;; the current candiate with the candiate of the same index in the ;; new group. ;; (delete MARKER) deletes characters between @= and the position ;; specified by MARKER. ;; (move MARKER) sets @= to the position of specified by MARKER. ;; (mark MARKER) sets MARKER to the position of @=. MARKER must ;; not be PREDEFINED-SYMBOL. ;; (pushback) pushbacks the latest key events to the event queue. ;; (undo) cancels the last key events. ;; (shift STATE-NAME) shifts the current state to the state ;; specified by STATE-NAME. ;; (call FUNCTION ARG *) calls the function FUNCTION of an external ;; module. FUNCTION must be defined in MODULE-LIST. ;; The function is called with a property list (MPlist *) that has ;; these properties in this order. ;; KEY VALUE ;; --- ----- ;; mtext The current preedit text. ;; symbol The current state name (MSymbol). ;; The remaining properties (if any) are ARGs. ;; ;; The function must return a property list (MPlist *) that ;; represents a list of ACTIONs to take. MODULE-LIST ::= '(' 'module' MODULE * ')' MODULE ::= '(' MODULE-NAME FUNCTION * ')' MODULE-NAME ::= mtext FUNCTION ::= symbol STATE-LIST ::= '(' 'state' STATE * ')' STATE ::= '(' STATE-NAME BRANCH * ')' STATE-NAME ::= symbol BRANCH ::= '(' [ MAP-NAME | 'nil' ] BRANCH-ACTION * ')' ;; If MAP-NAME is specified, it must be a name of a map defined in ;; MAP-LIST. Otherwise, BRANCH is the default branch of STATE. BRANCH-ACTION ::= ACTION ;; Example: (title "sample") (maps (single ("a" "A") ("b" "B")) (double ("bb" (("BB" "Bb")))) (select ((Left) (select @-)) ((Right) (select @+)))) (states (init (single) (double (shift selection))) (selection (select))) ;; When this input method is loaded, the following state transition ;; machine is created. ;; STATE-TRANSITION-MACHINE ::= ;; '(' STATE-NAME ROOT-MAP ')' * ;; ROOT-MAP ::= TRANSITION-MAP ;; TRANSITION-MAP ::= ;; '(' INDEX [ KEY | 'nil' ] ;; MAP-ACTIONS TRANSITION-MAPS BRANCH-ACTIONS ')' ;; TRANSITION-MAPS ::= ;; '(' TRANSITION-MAP * ')' ;; MAP-ACTIONS ::= ;; '(' MAP-ACTION * ')' ;; BRANCH-ACTIONS ::= ;; '(' BRANCH-ACTION * ')' (init (#0 nil nil ((#1 'a' ((insert "A")) nil nil) (#2 'b' ((insert "B")) ((#3 'b' ((candidates (("BB" "Bb")))) nil (shift selection))) nil)) nil)) (selection (#4 nil nil ((#5 'Left' ((delete @<) (select @-)) nil nil) (#6 'Right' ((delete @<) (select @+)) nil nil)) nil)) ;; The state transition machine keeps these things: ;; STATE: the current state, initially 'init'. ;; MAP: current transition map, initially #0. ;; PREEDIT: the preediting buffer, initially empty. ;; MARKERS: the positions assigned to each marker, @= is initially 0. ;; PRODUCED: the produced text, initially empty. ;; CANDIDATE-LIST: the current candidate group list, initially NULL. ;; CANDIDATE-GROUP: the current candidate group, initially NULL. ;; CANDIDATE: the the current candidate, initially NULL. ;; ;; When MAP is changed to the root map of the initial state, PREEDIT ;; is concatenated to PRODUCED and reset to empty. This way, the ;; machine produces a text. ;; ;; The machine accepts one key KEY, handles it while updating internal ;; information, and return 'nil' (if KEY is correctly handled) or KEY ;; (if KEY can't be handled in the machine). ;; Here we describes how the key sequence: ;; 'a' 'b' 'b' 'Right' 'b' 'a' ;; is handled by the machine and "aBbba" is produced. ;; 'a' arrives. ;; ;; Lookup the transition maps of #0 for 'a' and find #1. Change MAP ;; to #1. Perform its map actions. Now PREEDIT contains "a". As it ;; has no sub transition maps, no branch actions, no state to shift, ;; change MAP to #0, the root map of the current state. ;; ;; As we have changed MAP to the root map of the initial map, ;; concatenate PREEDIT to PRODUCED, and reset PREEDIT to empty. ;; ;; Now we have consumed the key. As MAP has sub transition maps, wait ;; for the next key. ;; 'b' arrives. ;; ;; Lookup #0 for 'b' and find #2. Change MAP to #2. Perform the map ;; actions. Now, PREEDIT contains "B". As it has sub transition ;; maps, and we have consumed the key, wait for the next key. ;; 'b' arrives. ;; ;; Lookup #2 for 'b' and find #3. Cancel the change in PREEDIT done ;; by the map actions of #2, and change MAP to #3. Perform the map ;; actions. Now, PREEDIT is "BB", CANDIDATE-LIST is (("BB" "Bb")), ;; CANDIDATE is "BB". As #3 has no sub transition maps and no branch ;; actions, change STATE to 'selection' and change MAP to #4, the root ;; map of 'selection'. As it has sub transition maps, and we have ;; consumed the key. Wait for the next key. ;; 'Right' arrives. ;; ;; Lookup #4 for 'Right' and find #5. Change MAP to #5. Perform the ;; map actions. Now PREEDIT is "Bb". As #5 has no sub transition ;; maps, no branch actions, no state to shift, change MAP to #4 (the ;; initial map of the current state 'selection'. ;; ;; As #4 has sub transition maps, and we have consumed the key. Wait ;; for the next key. ;; 'b' arrives. ;; ;; Lookup #4 for 'b' and fail. As #4 has no branch maps, no shift to ;; transit, change STATE to 'init', MAP to #0. ;; ;; As we have changed MAP to the root map of the initial map, ;; concatenate PREEDIT to PRODUCED, and reset PREEDIT to empty. Now ;; PRODUCED has "aBb". ;; ;; As we have not yet consumed the key 'b', handle it in MAP (#0). ;; ;; Lookup #0 for 'b' and find #2. Change MAP to #2. Perform the map ;; actions. Now, PREEDIT contains "B". As it has sub transition ;; maps, and we have consumed the key, wait for the next key. ;; 'a' arrives. Lookup #2 for 'a' and fail. As #2 has no branch ;; actions, no state to shift, Change MAP to #0, the root map of the ;; current state. ;; ;; As we have changed MAP to the root map of the initial map, ;; concatenate PREEDIT to PRODUCED, and reset PREEDIT to empty. Now ;; PRODUCED has "aBbb". ;; ;; As we have not yet consumed the key 'a', handle it in MAP (#0). ;; ;; Lookup #0 for 'a' and find #1. Change MAP to #1. Perform its map ;; actions. Now PREEDIT contains "a". As it has no sub transition ;; maps, no branch actions, no state to shift, change MAP to #0, the ;; root map of the current state. ;; ;; As we have changed MAP to the root map of the initial map, ;; concatenate PREEDIT to PRODUCED, and reset PREEDIT to empty. Now ;; PRODUCED has "aBbba". ;; Now we have consumed the key. As MAP has sub transition maps, wait ;; for the next key. @endverbatim */ ////