X-Git-Url: http://git.chise.org/gitweb/?p=m17n%2Fm17n-docs.git;a=blobdiff_plain;f=data-usr%2Fdbformat.txt;h=0ad20fbad01bdcdc943aef23ca0bd1401367ec76;hp=3a8ecc1a92eeaa20cd267c763e36cc94ec41cd9c;hb=d5b639418e6f1532218a9ec58738290bd8f87322;hpb=021ab1a22c30d1edf07b79e6c99d2adef45e7476 diff --git a/data-usr/dbformat.txt b/data-usr/dbformat.txt index 3a8ecc1..0ad20fb 100644 --- a/data-usr/dbformat.txt +++ b/data-usr/dbformat.txt @@ -1,61 +1,74 @@ -/***@page m17nDatabaseFormat DatabaseFormat +/***@page m17nDatabaseFormat Data Format of the m17n database -This section describes the formats of predefined @e plist @e type data -in the m17n database. +This section describes formats of data in the m17n database. @section dbformat General format -The m17n library expects that the function mdatabase_load () returns a -plist of a specific format on loading data identified by a specific -set of tags. As the plist format used for the database data is -strongly limited, we can use the equivalent text of simple syntax -(S-expression) to represent the plist as below. - -The text consists of one or more @e elements . Each element -represents a property, i.e. a single element of a plist, and the -sequence of the elements constitute a plist. - -Elements are separated by one or more @e whitespaces, i.e. a space +The mdatabase_load () function returns the data specified by tags in +the form of plist if the first tag is not @c Mchartable nor @c +Mcharset. The keys of the returned plist are limited to +Minteger, Msymbol, Mtext, and +Mplist. The type of the value is unambiguously determined by +the corresponding key. If the key is Minteger, the value is +an integer. If the key is Msymbol, the value is a symbol. +And so on. + +A number of expressions are possible to represent a plist. For +instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to +represent a plist whose first property key and value are K1 and V1, +second key and value are K2 and V2, and so on. However, we can use a +simpler expression here because the types of plists used in the m17n +database are fairly restricted. + +Hereafter, we use an expression, which is similar to S-expression, to +represent a plist. (Actually, the default database loader of the m17n +library is designed to read data files written in this expression.) + +The expression consists of one or more elements. Each element +represents a property, i.e. a single element of a plist. + +Elements are separated by one or more whitespaces, i.e. a space (code 32), a tab (code 9), or a newline (code 10). Comments begin with a semicolon (;) and extend to the end of the line. The key and the value of each property are determined based on the -form of element as below: +type of the element as explained below. EXAMPLE -Suppose (K1:V1, K2:V2, ... ,Kn:Vn) represents a plist whose first -property key and value are K1 and V1, second key and value are K2 and -V2, and so on. Then the line: +Here is an example of plist that is written in the expression +explained above. + @verbatim abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456) @endverbatim -is interpreted as follows. + +It represents the following plist: + @verbatim (Msymbol:abc, Minteger:123, @@ -94,11 +108,6 @@ is interpreted as follows. Minteger:-456)) @endverbatim -The default database loader of the m17n library actually read a text -of this syntax from the database file. - -Here after we describes the plist format of each data by this syntax. - @section fontenc Font Encoding The m17n library loads information about the encoding of each font @@ -107,7 +116,7 @@ format of the data is as follows: @verbatim FONT-ENCODING ::= - '(' PER-FONT-INFO * ')' + PER-FONT-INFO * PER-FONT-INFO ::= '(' FONT-SPEC ENCODING ')' @@ -129,14 +138,15 @@ XLFD font name fields. Omitted symbols are regarded as @c nil, and should be applied to all fonts whose family is "alice0 lao", and registry is "iso8859-1". -@c ENCODING is a charset symbol. A font matching @c FONT-SPEC -supports all characters of the charset, and a character code is mapped -to the corresponding glyph code of the font by this charset. +@c ENCODING is a symbol representing a charset. A font matching @c +FONT-SPEC supports all characters of the charset, and a character code +is mapped to the corresponding glyph code of the font by this charset. + @section fontsize Font Resizing In some case, a font contains incorrect information about its size -(typically in the case of a hacked TrueType font), and results in a +(typically in the case of a hacked TrueType font), which results in a bad text layout when such a font is used in combination with the other fonts. To overcome this problem, the m17n library loads information about font-size correction from the m17n database by the tags \. The plist format of the data is as follows: @verbatim FONT-RESIZE ::= - '(' PER-FONT-INFO * ')' + PER-FONT-INFO * PER-FONT-INFO ::= '(' FONT-SPEC RESIZE-RATIO ')' @@ -175,7 +185,7 @@ as follows: @verbatim FONTSET ::= - '(' PER-SCRIPT + ')' + PER-SCRIPT * PER-CHARSET * FALLBACK * PER-SCRIPT ::= '(' SCRIPT PER-LANGUAGE + ')' @@ -183,6 +193,12 @@ PER-SCRIPT ::= PER-LANGUAGE ::= '(' LANGUAGE FONT-SPEC-ELEMENT + ')' +PER-CHARSET ::= + '(' CHARSET FONT-SPEC-ELEMENT + ')' + +FALLBACK ::= + FONT-SPEC-ELEMENT + FONT-SPEC-ELEMENT ::= '(' FONT-SPEC [ FLT-NAME ] ')' @@ -192,7 +208,7 @@ FONT-SPEC ::= ')' @endverbatim -@c SCRIPT is a symbol of script name (e.g. latin, han), or @c nil. @c +@c SCRIPT is a symbol of script name (e.g. latin, han) or @c nil. @c LANGUAGE is a two-letter symbol of language name code defined by ISO 639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name @@ -214,22 +230,22 @@ instructs the rendering engine to use a font of registry "jisx0208.1983-0" for a "han" character that has @c Mlanguage text propert "ja" if the character is in the repertories of such fonts. Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a -"han" character doesn not have @c Mlangauge text property, try all +"han" character does not have @c Mlangauge text property, try all three fonts. @section flt Font Layout Table -Usually, the rendering engine converts character codes of a text into -glyph codes one by one by consulting information about encoding of -each selected font. But, for rendering a text that requires -complicated layouting (e.g. Thai and Indic), such an one to one -conversion is not sufficient. In addition, some glyphs must be -shifted 2-dimensionally on the screen. For such a case, a font layout -table (FLT in short) must be used. - -A FLT can contain all the information in OpenType Layout Table (CMAP, -GSUB, and GPOS) in addition to the information about how to extract a -grapheme cluster and how to re-order characters. +Usually, the rendering engine converts character codes of into glyph +codes one by one by consulting information about encoding of each +selected font. But, for rendering a text that requires complicated +layouting (e.g. Thai and Indic), such an one to one conversion is not +sufficient. In addition, some glyphs must be shifted 2-dimensionally +on the screen. For such a case, a font layout table (FLT in short) +must be used. + +A FLT can contain the information equivarent to OpenType Layout Table +(CMAP, GSUB, and GPOS) in addition to the information about how to +extract a grapheme cluster and how to re-order characters. The m17n library loads a FLT from the m17n database by the tags \. The plist format of the data is as