X-Git-Url: http://git.chise.org/gitweb/?p=m17n%2Fm17n-docs.git;a=blobdiff_plain;f=data-usr%2Fdbformat.txt;h=0ad20fbad01bdcdc943aef23ca0bd1401367ec76;hp=bd6c345b19267f28add74a32b013d1b01fb55d09;hb=040f97e25c3c541ac88cfd9e57d7d16867f5be7d;hpb=e3fea7f1664cea64a3ae89e31d78049bafb4db8c
diff --git a/data-usr/dbformat.txt b/data-usr/dbformat.txt
index bd6c345..0ad20fb 100644
--- a/data-usr/dbformat.txt
+++ b/data-usr/dbformat.txt
@@ -1,65 +1,74 @@
-/*** @addtogroup m17nDatabase */
-////
-/*** @{ */
-////
-/***@defgroup m17nDatabaseFormat DatabaseFormat
+/***@page m17nDatabaseFormat Data Format of the m17n database
-This section describes the formats of predefined @e plist @e type data
-in the m17n database.
+This section describes formats of data in the m17n database.
@section dbformat General format
-The m17n library expects that the function mdatabase_load () returns a
-plist of a specific format on loading data identified by a specific
-set of tags. As the plist format used for the database data is
-strongly limited, we can use the equivalent text of simple syntax
-(S-expression) to represent the plist as below.
-
-The text consists of one or more @e elements . Each element
-represents a property, i.e. a single element of a plist, and the
-sequence of the elements constitute a plist.
-
-Elements are separated by one or more @e whitespaces, i.e. a space
+The mdatabase_load () function returns the data specified by tags in
+the form of plist if the first tag is not @c Mchartable nor @c
+Mcharset. The keys of the returned plist are limited to
+Minteger, Msymbol, Mtext, and
+Mplist. The type of the value is unambiguously determined by
+the corresponding key. If the key is Minteger, the value is
+an integer. If the key is Msymbol, the value is a symbol.
+And so on.
+
+A number of expressions are possible to represent a plist. For
+instance, we can use the form (K1:V1, K2:V2, ..., Kn:Vn) to
+represent a plist whose first property key and value are K1 and V1,
+second key and value are K2 and V2, and so on. However, we can use a
+simpler expression here because the types of plists used in the m17n
+database are fairly restricted.
+
+Hereafter, we use an expression, which is similar to S-expression, to
+represent a plist. (Actually, the default database loader of the m17n
+library is designed to read data files written in this expression.)
+
+The expression consists of one or more elements. Each element
+represents a property, i.e. a single element of a plist.
+
+Elements are separated by one or more whitespaces, i.e. a space
(code 32), a tab (code 9), or a newline (code 10). Comments begin
with a semicolon (;) and extend to the end of the line.
The key and the value of each property are determined based on the
-form of element as below:
+type of the element as explained below.
- INTEGER
-An element that matches the regular expression @c -?[0-9]+ or @c
-0[xX][0-9A-Fa-f]+ represents a property whose key is @c Minteger . An
-element matching the former expression is interpreted as an integer in
-decimal notation, and one matching the latter expression is
-interpreted as an integer in hexadecimal notation. The resulting
-integer is the value of the property.
+An element that matches the regular expression -?[0-9]+ or
+0[xX][0-9A-Fa-f]+ represents a property whose key is
+Minteger. An element matching the former expression is
+interpreted as an integer in decimal notation, and one matching the
+latter is interpreted as an integer in hexadecimal notation. The
+value of the property is the result of interpretation.
-For instance, the element 0xA0 represents a property whose value is
-the integer 160.
+For instance, the element 0xA0 represents a property whose value is
+160 in decimal.
- SYMBOL
-An element that matchs the regular expression @c
-[^-0-9(]([^\\()]|\\.)+ represents a property whose key is @c Msymbol .
-In the sequence, @c \\t , @c \\n , @c \\r , and @c \\e are replaced
-with tab (code 9), newline (code 10), carriage return (code 13), and
-escape (code 27) respectively. Other characters following a backslash
-is interpreted as it is. The symbol having the name of the resulting
-sequence is the value of the property.
+An element that matches the regular expression
+[^-0-9(]([^\\()]|\\.)+ represents a property whose key is
+ Msymbol. In the element, \\t, \\n,
+\\r, and \\e are replaced with tab (code 9), newline
+(code 10), carriage return (code 13), and escape (code 27)
+respectively. Other characters following a backslash is interpreted
+as it is. The value of the property is the symbol having the
+resulting string as its name.
For instance, the element abc\ def represents a property
-whose value is the symbol of name "abc def".
+whose value is the symbol having the name "abc def".
- M-TEXT
-An element that matches the regular expression @c "([^"]|\\")*"
-represents a property whose key is @c Mtext . The backslash escape
+An element that matches the regular expression "([^"]|\\")*"
+represents a property whose key is Mtext. The backslash escape
explained above also applies here. Moreover, each part in the
-sequence matching the regular expression @c
-\\[xX][0-9A-Fa-f][0-9A-Fa-f] is replaced with its hexadecimal
+element matching the regular expression
+\\[xX][0-9A-Fa-f][0-9A-Fa-f] is replaced with its hexadecimal
interpretation.
After having resolved the backslash escapes, the byte sequence between
@@ -68,24 +77,25 @@ an M-text. This M-text is the value of the property.
- PLIST
-An element that is preceded by a left parenthesis, followed zero or
-more elements, and terminated by a right parenthesis represents a
-property whose key is @c Mplist . Parentheses also serve as a
-separator, which means whitespaces before and after a parenthesis can
-be omitted. The value of the property is a plist, which is the result
-of recursive interpretation of the elements between the parentheses.
+Zero or more elements surrounded by a pair of parentheses represent a
+property whose key is Mplist. Whitespaces before and after a
+parenthesis can be omitted. The value of the property is a plist,
+which is the result of recursive interpretation of the elements
+between the parentheses.
EXAMPLE
-Suppose (K1:V1, K2:V2, ... ,Kn:Vn) represents a plist whose first
-property key and value are K1 and V1, second key and value are K2 and
-V2, and so on. Then the line:
+Here is an example of plist that is written in the expression
+explained above.
+
@verbatim
abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456)
@endverbatim
-is interpreted as follows.
+
+It represents the following plist:
+
@verbatim
(Msymbol:abc,
Minteger:123,
@@ -98,13 +108,6 @@ is interpreted as follows.
Minteger:-456))
@endverbatim
-The default database loader of the m17n library actually read a text
-of this syntax from the database file.
-
-Here after we describes the plist format of each data by this syntax.
-
-@section fontset Fontset
-
@section fontenc Font Encoding
The m17n library loads information about the encoding of each font
@@ -113,7 +116,7 @@ format of the data is as follows:
@verbatim
FONT-ENCODING ::=
- '(' PER-FONT-INFO * ')'
+ PER-FONT-INFO *
PER-FONT-INFO ::=
'(' FONT-SPEC ENCODING ')'
@@ -135,14 +138,15 @@ XLFD font name fields. Omitted symbols are regarded as @c nil, and
should be applied to all fonts whose family is "alice0 lao", and
registry is "iso8859-1".
-@c ENCODING is a charset symbol. A font matching @c FONT-SPEC
-supports all characters of the charset, and a character code is mapped
-to the corresponding glyph code of the font by this charset.
+@c ENCODING is a symbol representing a charset. A font matching @c
+FONT-SPEC supports all characters of the charset, and a character code
+is mapped to the corresponding glyph code of the font by this charset.
+
@section fontsize Font Resizing
In some case, a font contains incorrect information about its size
-(typically in the case of a hacked TrueType font), and results in a
+(typically in the case of a hacked TrueType font), which results in a
bad text layout when such a font is used in combination with the other
fonts. To overcome this problem, the m17n library loads information
about font-size correction from the m17n database by the tags \. The plist format of the data is as follows:
@verbatim
FONT-RESIZE ::=
- '(' PER-FONT-INFO * ')'
+ PER-FONT-INFO *
PER-FONT-INFO ::=
'(' FONT-SPEC RESIZE-RATIO ')'
@@ -173,14 +177,86 @@ this @c PER-FONT-INFO:
means that, to use a font of registry "devanagari-cdac" with a
specific size, we have to open an 1.5 times bigger one.
+@section fontset Fontset
+
+The m17n library loads a fontset definition from the m17n database by
+the tags \. The plist format of the data is
+as follows:
+
+@verbatim
+FONTSET ::=
+ PER-SCRIPT * PER-CHARSET * FALLBACK *
+
+PER-SCRIPT ::=
+ '(' SCRIPT PER-LANGUAGE + ')'
+
+PER-LANGUAGE ::=
+ '(' LANGUAGE FONT-SPEC-ELEMENT + ')'
+
+PER-CHARSET ::=
+ '(' CHARSET FONT-SPEC-ELEMENT + ')'
+
+FALLBACK ::=
+ FONT-SPEC-ELEMENT
+
+FONT-SPEC-ELEMENT ::=
+ '(' FONT-SPEC [ FLT-NAME ] ')'
+
+FONT-SPEC ::=
+ '('
+ [ FOUNDRY FAMILY [ WEIGHT [ STYLE [ STRETCH [ ADSTYLE ] ] ] ] ] REGISTRY
+ ')'
+@endverbatim
+
+@c SCRIPT is a symbol of script name (e.g. latin, han) or @c nil. @c
+LANGUAGE is a two-letter symbol of language name code defined by ISO
+639 (e.g. ja, zh) or @c nil. The meanings of @c FOUNDRY to @c
+REGISTRY are the same as @e Font @e Encoding. @c FLT-NAME is a name
+of @ref flt.
+
+For instance, this @c PER_SCRIPT:
+
+@verbatim
+(han
+ (ja
+ ((jisx0208.1983-0)))
+ (zh
+ ((gb2312.1980-0)))
+ (nil
+ ((big5-0))))
+@endverbatim
+
+instructs the rendering engine to use a font of registry
+"jisx0208.1983-0" for a "han" character that has @c Mlanguage text
+propert "ja" if the character is in the repertories of such fonts.
+Otherwise, try a font of registry "gb2312.1980-0" or "big5-0". If a
+"han" character does not have @c Mlangauge text property, try all
+three fonts.
+
@section flt Font Layout Table
+Usually, the rendering engine converts character codes of into glyph
+codes one by one by consulting information about encoding of each
+selected font. But, for rendering a text that requires complicated
+layouting (e.g. Thai and Indic), such an one to one conversion is not
+sufficient. In addition, some glyphs must be shifted 2-dimensionally
+on the screen. For such a case, a font layout table (FLT in short)
+must be used.
+
+A FLT can contain the information equivarent to OpenType Layout Table
+(CMAP, GSUB, and GPOS) in addition to the information about how to
+extract a grapheme cluster and how to re-order characters.
+
+The m17n library loads a FLT from the m17n database by the tags
+\. The plist format of the data is as
+follows:
+
@verbinclude flt.txt
@section im Input Method
-@verbinclude im.txt */
+@verbinclude im.txt
+
+*/
-////
-/*** @} */
////