1 /* Copyright (C) 2003, 2004
2 National Institute of Advanced Industrial Science and Technology (AIST)
3 Registration Number H15PRO112
4 See the end for copying conditions. */
8 @page mdbGeneral General Format
10 @section general-description DESCRIPTION
12 The mdatabase_load () function returns the data specified by tags in
13 the form of plist if the first tag is not @c Mchartable nor @c
14 Mcharset. The keys of the returned plist are limited to
15 <tt>Minteger</tt>, <tt>Msymbol</tt>, <tt>Mtext</tt>, and
16 <tt>Mplist</tt>. The type of the value is unambiguously determined by
17 the corresponding key. If the key is <tt>Minteger</tt>, the value is
18 an integer. If the key is <tt>Msymbol</tt>, the value is a symbol.
21 A number of expressions are possible to represent a plist. For
22 instance, we can use the form <tt>(K1:V1, K2:V2, ..., Kn:Vn)</tt> to
23 represent a plist whose first property key and value are K1 and V1,
24 second key and value are K2 and V2, and so on. However, we can use a
25 simpler expression here because the types of plists used in the m17n
26 database are fairly restricted.
28 Hereafter, we use an expression, which is similar to S-expression, to
29 represent a plist. (Actually, the default database loader of the m17n
30 library is designed to read data files written in this expression.)
32 The expression consists of one or more <i>elements</i>. Each element
33 represents a property, i.e. a single element of a plist.
35 Elements are separated by one or more <i>whitespaces</i>, i.e. a space
36 (code 32), a tab (code 9), or a newline (code 10). Comments begin
37 with a semicolon (<tt>;</tt>) and extend to the end of the line.
39 The key and the value of each property are determined based on the
40 type of the element as explained below.
46 An element that matches the regular expression <tt>-?[0-9]+</tt> or
47 <tt>0[xX][0-9A-Fa-f]+</tt> represents a property whose key is
48 <tt>Minteger</tt>. An element matching the former expression is
49 interpreted as an integer in decimal notation, and one matching the
50 latter is interpreted as an integer in hexadecimal notation. The
51 value of the property is the result of interpretation.
53 For instance, the element <tt>0xA0</tt> represents a property whose
54 value is 160 in decimal.
58 An element that matches the regular expression
59 <tt>[^-0-9(]([^\\()]|\\.)+</tt> represents a property whose key is
60 <tt> Msymbol</tt>. In the element, <tt>\\t</tt>, <tt>\\n</tt>,
61 <tt>\\r</tt>, and <tt>\\e</tt> are replaced with tab (code 9), newline
62 (code 10), carriage return (code 13), and escape (code 27)
63 respectively. Other characters following a backslash is interpreted
64 as it is. The value of the property is the symbol having the
65 resulting string as its name.
67 For instance, the element <tt>abc\ def</tt> represents a property
68 whose value is the symbol having the name "abc def".
72 An element that matches the regular expression <tt>"([^"]|\\")*"</tt>
73 represents a property whose key is <tt>Mtext</tt>. The backslash
74 escape explained above also applies here. Moreover, each part in the
75 element matching the regular expression <tt>
76 \\[xX][0-9A-Fa-f][0-9A-Fa-f]</tt> is replaced with its hexadecimal
79 After having resolved the backslash escapes, the byte sequence between
80 the double quotes is interpreted as a UTF-8 sequence and decoded into
81 an M-text. This M-text is the value of the property.
85 Zero or more elements surrounded by a pair of parentheses represent a
86 property whose key is <tt>Mplist</tt>. Whitespaces before and after a
87 parenthesis can be omitted. The value of the property is a plist,
88 which is the result of recursive interpretation of the elements
89 between the parentheses.
93 @section general-syntax SYNTAX NOTATION
95 In an explanation of a plist format of data, a BNF-like notation is
96 used. In the notation, non-terminals are represented by a string of
97 uppercase letters (including '-' in the middle), terminals are
98 represented by a string surrounded by '"'. Special non-terminals
99 INTEGER, SYMBOL, MTEXT and PLIST represents property integer, symbol,
100 M-text, or plist respectively.
102 @section general-example EXAMPLE
104 Here is an example of database data that is read into a plist of this
109 [ INTEGER | SYMBOL | MTEXT | FUNC ] *
112 '(' FUNC-NAME FUNC-ARG * ')'
118 INTEGER | SYMBOL | MTEXT | '(' FUNC-ARG ')'
121 For instance, a data file that contains this text matches the above
125 abc 123 (pqr 0xff) "m\"text" (_\\_ ("string" xyz) -456)
128 and is read into this plist:
131 1st element: key: Msymbol, value: abc
132 2nd element: key: Minteger, value: 123
133 3rd element: key: Mplist, value: a plist of these elements:
134 1st element: key Msymbol, value: pgr
135 2nd element: key Minteger, value: 255
136 4th element: key: Mtext, value: m"text
137 5th element: key: Mplist, value: a plist of these elements:
138 1st element: key: Msymbol, value: _\_
139 2nd element: key: Mplist, value: a plist of these elements:
140 1st element: key: Mtext, value: string
141 2nd element: key: Msymbol, value: xyz
142 3rd element: key: Minteger, value: -456
147 Copyright (C) 2003, 2004
148 National Institute of Advanced Industrial Science and Technology (AIST)
149 Registration Number H15PRO112
151 This file is part of the m17n database; a sub-part of the m17n
154 The m17n library is free software; you can redistribute it and/or
155 modify it under the terms of the GNU Lesser General Public License
156 as published by the Free Software Foundation; either version 2.1 of
157 the License, or (at your option) any later version.
159 The m17n library is distributed in the hope that it will be useful,
160 but WITHOUT ANY WARRANTY; without even the implied warranty of
161 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
162 Lesser General Public License for more details.
164 You should have received a copy of the GNU Lesser General Public
165 License along with the m17n library; if not, write to the Free
166 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA