/***@page m17nDatabaseFormat Data Format of the m17n database
This section describes formats of data in the m17n database.
@section dbformat General format
The mdatabase_load () function returns the data specified by tags in
the form of plist if the first tag is not @c Mchartable nor @c
Mcharset. The keys of the returned plist are limited to
Minteger, Msymbol, Mtext, and
Mplist. The type of the value is unambiguously determined by
the corresponding key. If the key is Minteger, the value is
an integer. If the key is Msymbol, the value is a symbol.
And so on.
A number of expressions are possible to represent a plist. For
instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to
represent a plist whose first property key and value are K1 and V1,
second key and value are K2 and V2, and so on. However, we can use a
simpler expression here because the types of plists used in the m17n
database are fairly restricted.
Hereafter, we use an expression, which is similar to S-expression, to
represent a plist. (Actually, the default database loader of the m17n
library is designed to read data files written in this expression.)
The expression consists of one or more elements. Each element
represents a property, i.e. a single element of a plist.
Elements are separated by one or more whitespaces, i.e. a space
(code 32), a tab (code 9), or a newline (code 10). Comments begin
with a semicolon (;) and extend to the end of the line.
The key and the value of each property are determined based on the
type of the element as explained below.
- INTEGER
An element that matches the regular expression -?[0-9]+ or
0[xX][0-9A-Fa-f]+ represents a property whose key is
Minteger. An element matching the former expression is
interpreted as an integer in decimal notation, and one matching the
latter is interpreted as an integer in hexadecimal notation. The
value of the property is the result of interpretation.
For instance, the element 0xA0 represents a property whose value is
160 in decimal.
- SYMBOL
An element that matches the regular expression
[^-0-9(]([^\\()]|\\.)+ represents a property whose key is
Msymbol. In the element, \\t, \\n,
\\r, and \\e are replaced with tab (code 9), newline
(code 10), carriage return (code 13), and escape (code 27)
respectively. Other characters following a backslash is interpreted
as it is. The value of the property is the symbol having the
resulting string as its name.
For instance, the element abc\ def represents a property
whose value is the symbol having the name "abc def".
- M-TEXT
An element that matches the regular expression "([^"]|\\")*"
represents a property whose key is Mtext. The backslash escape
explained above also applies here. Moreover, each part in the
element matching the regular expression
\\[xX][0-9A-Fa-f][0-9A-Fa-f] is replaced with its hexadecimal
interpretation.
After having resolved the backslash escapes, the byte sequence between
the double quotes is interpreted as a UTF-8 sequence and decoded into
an M-text. This M-text is the value of the property.
- PLIST
Zero or more elements surrounded by a pair of parentheses represent a
property whose key is Mplist. Whitespaces before and after a
parenthesis can be omitted. The value of the property is a plist,
which is the result of recursive interpretation of the elements
between the parentheses.
EXAMPLE
Here is an example of plist that is written in the expression
explained above.
@verbatim
abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456)
@endverbatim
It represents the following plist:
@verbatim
(Msymbol:abc,
Minteger:123,
Mplist:(Msymbol:pqr,
Minteger:255),
Mtext:m"text,
Mplist:(Msymbol:_\_,
Mplist:(Mtext:string,
Msymbol:xyz),
Minteger:-456))
@endverbatim
@section fontenc Font Encoding
The m17n library loads information about the encoding of each font
form the m17n database by the tags \. The plist
format of the data is as follows:
@verbatim
FONT-ENCODING ::=
PER-FONT-INFO *
PER-FONT-INFO ::=
'(' FONT-SPEC ENCODING ')'
FONT-SPEC ::=
'('
[ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
')'
@endverbatim
@c FOUNDRY to @c REGISTRY are symbols specifying the corresponding
XLFD font name fields. Omitted symbols are regarded as @c nil, and
@c nil means a wild card. For instance, this @c FONT-SPEC:
@verbatim
(nil alice0\ lao iso8859-1)
@endverbatim
should be applied to all fonts whose family is "alice0 lao", and
registry is "iso8859-1".
@c ENCODING is a symbol representing a charset. A font matching @c
FONT-SPEC supports all characters of the charset, and a character code
is mapped to the corresponding glyph code of the font by this charset.
@section fontsize Font Resizing
In some case, a font contains incorrect information about its size
(typically in the case of a hacked TrueType font), which results in a
bad text layout when such a font is used in combination with the other
fonts. To overcome this problem, the m17n library loads information
about font-size correction from the m17n database by the tags \. The plist format of the data is as follows:
@verbatim
FONT-RESIZE ::=
PER-FONT-INFO *
PER-FONT-INFO ::=
'(' FONT-SPEC RESIZE-RATIO ')'
FONT-SPEC ::=
'('
[ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
')'
@endverbatim
The meanings of @c FOUNDRY to @c REGISTRY are the same as @e Font @e
Encoding. @c RESIZE-RATIO is an integer number specifying by
percentage how much the font-size must be adjusted. For instance,
this @c PER-FONT-INFO:
@verbatim
((devanagari-cdac) 150)
@endverbatim
means that, to use a font of registry "devanagari-cdac" with a
specific size, we have to open an 1.5 times bigger one.
@section fontset Fontset
The m17n library loads a fontset definition from the m17n database by
the tags \. The plist format of the data is
as follows:
@verbatim
FONTSET ::=
PER-SCRIPT * PER-CHARSET * FALLBACK *
PER-SCRIPT ::=
'(' SCRIPT PER-LANGUAGE + ')'
PER-LANGUAGE ::=
'(' LANGUAGE FONT-SPEC-ELEMENT + ')'
PER-CHARSET ::=
'(' CHARSET FONT-SPEC-ELEMENT + ')'
FALLBACK ::=
FONT-SPEC-ELEMENT
FONT-SPEC-ELEMENT ::=
'(' FONT-SPEC [ FLT-NAME ] ')'
FONT-SPEC ::=
'('
[ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
')'
@endverbatim
@c SCRIPT is a symbol of script name (e.g. latin, han) or @c nil. @c
LANGUAGE is a two-letter symbol of language name code defined by ISO
639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c
REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name
of @ref flt.
For instance, this @c PER_SCRIPT:
@verbatim
(han
(ja
((jisx0208.1983-0)))
(zh
((gb2312.1980-0)))
(nil
((big5-0))))
@endverbatim
instructs the rendering engine to use a font of registry
"jisx0208.1983-0" for a "han" character that has @c Mlanguage text
propert "ja" if the character is in the repertories of such fonts.
Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a
"han" character does not have @c Mlangauge text property, try all
three fonts.
@section flt Font Layout Table
Usually, the rendering engine converts character codes of into glyph
codes one by one by consulting information about encoding of each
selected font. But, for rendering a text that requires complicated
layouting (e.g. Thai and Indic), such an one to one conversion is not
sufficient. In addition, some glyphs must be shifted 2-dimensionally
on the screen. For such a case, a font layout table (FLT in short)
must be used.
A FLT can contain the information equivarent to OpenType Layout Table
(CMAP, GSUB, and GPOS) in addition to the information about how to
extract a grapheme cluster and how to re-order characters.
The m17n library loads a FLT from the m17n database by the tags
\. The plist format of the data is as
follows:
@verbinclude flt.txt
@section im Input Method
@verbinclude im.txt
*/
////