1 /***@page m17nDatabaseFormat DatabaseFormat
3 This section describes the data formats used in the m17n database.
5 @section dbformat General format
7 The mdatabase_load () function receives a set of tags and returns the
8 contents of database in the form of plist. The keys of the returned
9 plist are limited to <tt>Minteger</tt>, <tt>Msymbol</tt>,
10 <tt>Mtext</tt>, and <tt>Mplist</tt>. The type of the value is
11 unambiguously determined by the corresponding key. If the key is
12 <tt>Minteger</tt>, the value is an integer. If the key is
13 <tt>Msymbol</tt>, the value is a symbol. And so on.
15 A number of expressions are possible to represent a
16 plist. For instance, we can use the form <tt>(K1:V1, K2:V2,
17 ... ,Kn:Vn)</tt> to represent a plist whose first property key and
18 value are K1 and V1, second key and value are K2 and V2, and so on.
19 However, we can use a simpler expression here because the types of
20 plists used in the m17n database are fairly restricted.
22 Hereafter, we use an expression, which is similar to
23 S-expression, to represent a plist. (Actually, the default database
24 loader of the m17n library is designed to read data files written in
27 The expression consists of one or more <i>elements</i>. Each
28 element represents a property, i.e. a single element of a plist.
30 Elements are separated by one or more <i>whitespaces</i>, i.e. a space
31 (code 32), a tab (code 9), or a newline (code 10). Comments begin
32 with a semicolon (<tt>;</tt>) and extend to the end of the line.
34 The key and the value of each property are determined based on the
35 type of the element as explained below.
41 An element that matches the regular expression <tt>-?[0-9]+</tt> or
42 <tt>0[xX][0-9A-Fa-f]+</tt> represents a property whose key is
43 <tt>Minteger</tt>. An element matching the former expression is
44 interpreted as an integer in decimal notation, and one matching the
45 latter is interpreted as an integer in hexadecimal notation. The
46 value of the property is the result of interpretation.
48 For instance, the element <tt>0xA0</tt> represents a property whose value is
53 An element that matches the regular expression
54 <tt>[^-0-9(]([^\\()]|\\.)+</tt> represents a property whose key is
55 <tt> Msymbol</tt>. In the element, <tt>\\t</tt> , <tt>\\n</tt>,
56 <tt>\\r</tt>, and <tt>\\e</tt> are replaced with tab (code 9), newline
57 (code 10), carriage return (code 13), and escape (code 27)
58 respectively. Other characters following a backslash is interpreted
59 as it is. The value of the property is the symbol having the
60 resulting string as its name.
62 For instance, the element <tt>abc\ def</tt> represents a property
63 whose value is the symbol having the name "abc def".
67 An element that matches the regular expression <tt>"([^"]|\\")*"</tt>
68 represents a property whose key is <tt>Mtext</tt>. The backslash escape
69 explained above also applies here. Moreover, each part in the
70 element matching the regular expression <tt>
71 \\[xX][0-9A-Fa-f][0-9A-Fa-f]</tt> is replaced with its hexadecimal
74 After having resolved the backslash escapes, the byte sequence between
75 the double quotes is interpreted as a UTF-8 sequence and decoded into
76 an M-text. This M-text is the value of the property.
80 Zero or more elements surrounded by a pair of parentheses represent a
81 property whose key is <tt>Mplist</tt>. Whitespaces before and after a
82 parenthesis can be omitted. The value of the property is a plist,
83 which is the result of recursive interpretation of the elements
84 between the parentheses.
90 Here is an example of plist that is written in the expression
94 abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456)
97 It represents the following plist:
106 Mplist:(Mtext:string,
111 @section fontenc Font Encoding
113 The m17n library loads information about the encoding of each font
114 form the m17n database by the tags \<font, encoding\>. The plist
115 format of the data is as follows:
122 '(' FONT-SPEC ENCODING ')'
126 [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
130 @c FOUNDRY to @c REGISTRY are symbols specifying the corresponding
131 XLFD font name fields. Omitted symbols are regarded as @c nil, and
132 @c nil means a wild card. For instance, this @c FONT-SPEC:
135 (nil alice0\ lao iso8859-1)
138 should be applied to all fonts whose family is "alice0 lao", and
139 registry is "iso8859-1".
141 @c ENCODING is a charset symbol. A font matching @c FONT-SPEC
142 supports all characters of the charset, and a character code is mapped
143 to the corresponding glyph code of the font by this charset.
145 @section fontsize Font Resizing
147 In some case, a font contains incorrect information about its size
148 (typically in the case of a hacked TrueType font), and results in a
149 bad text layout when such a font is used in combination with the other
150 fonts. To overcome this problem, the m17n library loads information
151 about font-size correction from the m17n database by the tags \<font,
152 resize\>. The plist format of the data is as follows:
159 '(' FONT-SPEC RESIZE-RATIO ')'
163 [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
167 The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e
168 Encoding. @c RESIZE-RATIO is an integer number specifying by
169 percentage how much the font-size must be adjusted. For instance,
170 this @c PER-FONT-INFO:
173 ((devanagari-cdac) 150)
176 means that, to use a font of registry "devanagari-cdac" with a
177 specific size, we have to open an 1.5 times bigger one.
179 @section fontset Fontset
181 The m17n library loads a fontset definition from the m17n database by
182 the tags \<fontset, FONTSET-NAME\>. The plist format of the data is
190 '(' SCRIPT PER-LANGUAGE + ')'
193 '(' LANGUAGE FONT-SPEC-ELEMENT + ')'
195 FONT-SPEC-ELEMENT ::=
196 '(' FONT-SPEC [ FLT-NAME ] ')'
200 [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
204 @c SCRIPT is a symbol of script name (e.g. latin, han), or @c nil. @c
205 LANGUAGE is a two-letter symbol of language name code defined by ISO
206 639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c
207 REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name
210 For instance, this @c PER_SCRIPT:
222 instructs the rendering engine to use a font of registry
223 "jisx0208.1983-0" for a "han" character that has @c Mlanguage text
224 propert "ja" if the character is in the repertories of such fonts.
225 Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a
226 "han" character doesn not have @c Mlangauge text property, try all
229 @section flt Font Layout Table
231 Usually, the rendering engine converts character codes of a text into
232 glyph codes one by one by consulting information about encoding of
233 each selected font. But, for rendering a text that requires
234 complicated layouting (e.g. Thai and Indic), such an one to one
235 conversion is not sufficient. In addition, some glyphs must be
236 shifted 2-dimensionally on the screen. For such a case, a font layout
237 table (FLT in short) must be used.
239 A FLT can contain all the information in OpenType Layout Table (CMAP,
240 GSUB, and GPOS) in addition to the information about how to extract a
241 grapheme cluster and how to re-order characters.
243 The m17n library loads a FLT from the m17n database by the tags
244 \<font, layouter, FLT-NAME\>. The plist format of the data is as
249 @section im Input Method