X-Git-Url: http://git.chise.org/gitweb/?p=m17n%2Fm17n-docs.git;a=blobdiff_plain;f=data-usr%2Fdbformat.txt;h=0ad20fbad01bdcdc943aef23ca0bd1401367ec76;hp=19812692ab4e6535b057f1f7a4a3b2bbf8007a9b;hb=d5b639418e6f1532218a9ec58738290bd8f87322;hpb=2dbf661e686b887d3519a9dc348fd57fb73cd30b diff --git a/data-usr/dbformat.txt b/data-usr/dbformat.txt index 1981269..0ad20fb 100644 --- a/data-usr/dbformat.txt +++ b/data-usr/dbformat.txt @@ -1,65 +1,74 @@ -/*** @addtogroup m17nDatabase */ -//// -/*** @{ */ -//// -/***@defgroup m17nDatabaseFormat DatabaseFormat +/***@page m17nDatabaseFormat Data Format of the m17n database + +This section describes formats of data in the m17n database. + +@section dbformat General format -This section describes the formats of predefined @e plist @e type data -in the m17n database. +The mdatabase_load () function returns the data specified by tags in +the form of plist if the first tag is not @c Mchartable nor @c +Mcharset. The keys of the returned plist are limited to +Minteger, Msymbol, Mtext, and +Mplist. The type of the value is unambiguously determined by +the corresponding key. If the key is Minteger, the value is +an integer. If the key is Msymbol, the value is a symbol. +And so on. -@section general General format +A number of expressions are possible to represent a plist. For +instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to +represent a plist whose first property key and value are K1 and V1, +second key and value are K2 and V2, and so on. However, we can use a +simpler expression here because the types of plists used in the m17n +database are fairly restricted. -The m17n library expects that the function mdatabase_load () returns a -plist of a specific format on loading data identified by a specific -set of tags. As the plist format used for the database data is -strongly limited, we can use the equivalent text of simple syntax -(S-expression) to represent the plist as below. +Hereafter, we use an expression, which is similar to S-expression, to +represent a plist. (Actually, the default database loader of the m17n +library is designed to read data files written in this expression.) -The text consists of one or more @e elements . Each element -represents a property, i.e. a single element of a plist, and the -sequence of the elements constitute a plist. +The expression consists of one or more elements. Each element +represents a property, i.e. a single element of a plist. -Elements are separated by one or more @e whitespaces, i.e. a space +Elements are separated by one or more whitespaces, i.e. a space (code 32), a tab (code 9), or a newline (code 10). Comments begin with a semicolon (;) and extend to the end of the line. The key and the value of each property are determined based on the -form of element as below: +type of the element as explained below. EXAMPLE -Suppose (K1:V1, K2:V2, ... ,Kn:Vn) represents a plist whose first -property key and value are K1 and V1, second key and value are K2 and -V2, and so on. Then the line: +Here is an example of plist that is written in the expression +explained above. + @verbatim abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456) @endverbatim -is interpreted as follows. + +It represents the following plist: + @verbatim (Msymbol:abc, Minteger:123, @@ -98,18 +108,155 @@ is interpreted as follows. Minteger:-456)) @endverbatim -The default database loader of the m17n library actually read a text -of this syntax from the database file. +@section fontenc Font Encoding + +The m17n library loads information about the encoding of each font +form the m17n database by the tags \. The plist +format of the data is as follows: + +@verbatim +FONT-ENCODING ::= + PER-FONT-INFO * + +PER-FONT-INFO ::= + '(' FONT-SPEC ENCODING ')' + +FONT-SPEC ::= + '(' + [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY + ')' +@endverbatim + +@c FOUNDRY to @c REGISTRY are symbols specifying the corresponding +XLFD font name fields. Omitted symbols are regarded as @c nil, and +@c nil means a wild card. For instance, this @c FONT-SPEC: + +@verbatim + (nil alice0\ lao iso8859-1) +@endverbatim + +should be applied to all fonts whose family is "alice0 lao", and +registry is "iso8859-1". + +@c ENCODING is a symbol representing a charset. A font matching @c +FONT-SPEC supports all characters of the charset, and a character code +is mapped to the corresponding glyph code of the font by this charset. + + +@section fontsize Font Resizing + +In some case, a font contains incorrect information about its size +(typically in the case of a hacked TrueType font), which results in a +bad text layout when such a font is used in combination with the other +fonts. To overcome this problem, the m17n library loads information +about font-size correction from the m17n database by the tags \. The plist format of the data is as follows: + +@verbatim +FONT-RESIZE ::= + PER-FONT-INFO * + +PER-FONT-INFO ::= + '(' FONT-SPEC RESIZE-RATIO ')' + +FONT-SPEC ::= + '(' + [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY + ')' +@endverbatim -Here after we describes the plist format of each data by this syntax. +The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e +Encoding. @c RESIZE-RATIO is an integer number specifying by +percentage how much the font-size must be adjusted. For instance, +this @c PER-FONT-INFO: + +@verbatim + ((devanagari-cdac) 150) +@endverbatim + +means that, to use a font of registry "devanagari-cdac" with a +specific size, we have to open an 1.5 times bigger one. + +@section fontset Fontset + +The m17n library loads a fontset definition from the m17n database by +the tags \. The plist format of the data is +as follows: + +@verbatim +FONTSET ::= + PER-SCRIPT * PER-CHARSET * FALLBACK * + +PER-SCRIPT ::= + '(' SCRIPT PER-LANGUAGE + ')' + +PER-LANGUAGE ::= + '(' LANGUAGE FONT-SPEC-ELEMENT + ')' + +PER-CHARSET ::= + '(' CHARSET FONT-SPEC-ELEMENT + ')' + +FALLBACK ::= + FONT-SPEC-ELEMENT + +FONT-SPEC-ELEMENT ::= + '(' FONT-SPEC [ FLT-NAME ] ')' + +FONT-SPEC ::= + '(' + [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY + ')' +@endverbatim + +@c SCRIPT is a symbol of script name (e.g. latin, han) or @c nil. @c +LANGUAGE is a two-letter symbol of language name code defined by ISO +639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c +REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name +of @ref flt. + +For instance, this @c PER_SCRIPT: + +@verbatim +(han + (ja + ((jisx0208.1983-0))) + (zh + ((gb2312.1980-0))) + (nil + ((big5-0)))) +@endverbatim + +instructs the rendering engine to use a font of registry +"jisx0208.1983-0" for a "han" character that has @c Mlanguage text +propert "ja" if the character is in the repertories of such fonts. +Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a +"han" character does not have @c Mlangauge text property, try all +three fonts. @section flt Font Layout Table +Usually, the rendering engine converts character codes of into glyph +codes one by one by consulting information about encoding of each +selected font. But, for rendering a text that requires complicated +layouting (e.g. Thai and Indic), such an one to one conversion is not +sufficient. In addition, some glyphs must be shifted 2-dimensionally +on the screen. For such a case, a font layout table (FLT in short) +must be used. + +A FLT can contain the information equivarent to OpenType Layout Table +(CMAP, GSUB, and GPOS) in addition to the information about how to +extract a grapheme cluster and how to re-order characters. + +The m17n library loads a FLT from the m17n database by the tags +\. The plist format of the data is as +follows: + @verbinclude flt.txt @section im Input Method -@verbinclude im.txt */ -//// -/*** @} */ +@verbinclude im.txt + +*/ + ////