Some sentences revised.
[www/chise.git] / index.html.en
1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2             "http://www.w3.org/TR/html4/loose.dtd">
3 <html lang="en">
4 <head>
5 <title>CHaracter Information Service Environment</title>
6 </head>
7 <body>
8 <p>
9 [<a href="http://cvs.m17n.org/chise/">m17n.org</a>]
10 [<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/">
11 Kyoto University, Institute for Research in Humanities, Documentation
12             and Information Center for Chinese Studies
13 </a>]
14 </p>
15
16 <h1>
17 <table cellspacing="8">
18 <tr><th align="center" valign="top">
19 <img alt="DICCS" src="images/cm450118-s.jpg">
20 <td align="center" valign="middle">
21 <font size="+3">CHISE project</font>
22 </table>
23 </h1>
24 <p>
25 <br>
26 <b><a href="index.html.ja.iso-2022-jp"><img
27 src="images/japanese-page.png">
28 </a></b><br>
29 <hr>
30
31 <h2>About the CHISE Project</h2>
32 <p>
33 The CHISE (CHaracter Information Service Environment) project attempts
34 to collect and organize into a Knowledge-Base information about
35 characters in the scripts of the world.  A new processing environment
36 based on this architecture is currently under development.
37 </p>
38
39
40 <h2>News</h2>
41 <ul>
42    <li>2002-09-20 to 22 <a
43        href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">Tomohiko
44                          MORIOKA</a> and 
45        <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">
46        Christian WITTERN</a> made a presentation at the <a
47        href="http://pnc-ecai.oiu.ac.jp/prog2.htm">
48        PNC Annual Conference and Joint Meetings 2002
49        </a>.
50    <li>2002-09-19 <a
51        href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/"
52        >Tomohiko MORIOKA</a> gave a presentation at the <a href="http://lc.linux.or.jp/lc2002/">
53        Linux Conference 2002</a>
54    <li>2002-08-21 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
55        XEmacs UTF-2000 0.19 (Koriyama)
56        </a> has been released.
57 </ul>
58
59 <hr>
60 <!--
61 <h2>文字知識データベースに基づく文字処理アーキテクチャの開発</h2>
62 -->
63 <h2>Development of a character processing architecture based on a
64 character knowledge base</h2>
65
66 <h3><a name="xemacs/">XEmacs UTF-2000</a></h3> <p> <!-- 外部文字データ
67 ベースから文字属性を lazy-loading 可能になりました。IA32 アーキテクチャ
68 で実行形式の大きさが従来約 30 MB だったのが約 15 MB になりました。現在、
69 cvs.m17n.org の /cvs/root のXEmacs モジュールの utf-2000 枝でから 
70 anonymous CVS で入手可能です。--> It is now possible to load character
71 attributes from a external database on demand ("lazy loading").  On
72 Intel 32 bit processor architectures, the size of the executable file
73 thus shrinks from the 30 MB required with the traditional built to
74 just about 15 MB. This can now be downloaded from <a
75 href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
76 XEmacs UTF-2000 0.19 (Koriyama)</a>. In addtion, there is a UTF-2000
77 branch of the XEmacs tree at cvs.m17n.org in /cvs/root, this can be
78 accessed by anonymous CVS </p>
79
80
81 <h2>A <a name="topicmaps">
82 <a href="http://www.topicmaps.org">TopicMaps</a> based approach to a
83 character dababase 
84 </a></h2>
85 <p>
86 In 2001 the prototype of a Topic Map engine has been developed based
87 on <a href="http://www.zope.org/">Zope</a>.  This proved less than
88 ideal for this purpose, so the focus for this year is to port this
89 engine to a relational database backend.  Currently development
90 continued with PostgreSQL. It is planned to enable Topic Map editing
91 within  XEmacs UTF-2000, but also to allow multiple clients in addtion
92 to this.
93 </p>
94
95
96
97 <h2>Database of features of characters</h2>
98
99 <h3>Database of the component structure of Chinese Characters</h3>
100
101 <p>
102 Based on the Ideographic Description Characters (IDS) in 
103 ISO/IEC 10646-1:2000 and Unicode, we are now developping a database
104 that expresses the structure of Chinese Characters using this syntax. 
105 At the moment, we are using the characters in the Unicode tables as a
106 reference.  The basic <emph>CJK Unified Ideographs</emph>, as well as
107 <emph>Extension A</emph> and <emph>Extension B</epmph>, together more
108 than 70000 characters are currently covered.
109 </p>
110
111 <p>
112 <a href="images/ids-ext-b-1.png">
113 <img align="ids" src="images/ids-ext-b-1-s.png">
114 <br>
115 Table of the component structure database
116 </a>
117 </p>
118
119 <p>
120 The following tables are currently available via anonymous CVS from <a
121 href="http://cvs.m17n.org/">cvs.m17n.org</a> at <a
122 href="http://cvs.m17n.org/cgi-bin/viewcvs/?cvsroot=chise">/cvs/chise</a> 
123 as module <a
124 href="http://cvs.m17n.org/cgi-bin/viewcvs/ids/?cvsroot=chise">ids:</a> 
125 </p>
126
127 <blockquote>
128 <dl compact>
129   <dt><a
130 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Basic.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
131       IDS-UCS-Basic.txt
132       </a>
133   <dd>CJK Unified Ideographs (U+4E00 〜 U+9FA5) of ISO/IEC
134       10646-1:2000
135
136   <dt><a
137 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-A.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
138       IDS-UCS-Ext-A.txt
139       </a>
140   <dd>CJK Unified Ideographs Extension A (U+3400 〜 U+4DB5, U+FA1F and
141       U+FA23) of ISO/IEC 10646-1:2000
142
143   <dt><a
144       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
145       IDS-UCS-Compat.txt
146       </a>
147   <dd>CJK Compatibility Ideographs (U+F900 〜 U+FA2D, except U+FA1F
148       and U+FA23) of ISO/IEC 10646-1:2000
149
150   <dt><a
151       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-1.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
152        IDS-UCS-Ext-B-1.txt
153       </a>
154   <dd>CJK Unified Ideographs Extension B [part 1] (U-00020000 〜 
155       U-00021FFF) of ISO/IEC 10646-2:2001
156
157   <dt><a
158       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-2.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
159        IDS-UCS-Ext-B-2.txt
160       </a>
161   <dd>CJK Unified Ideographs Extension B [part 2] (U-00022000 〜 
162       U-00023FFF) of ISO/IEC 10646-2:2001
163   <dt><a
164       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-3.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
165        IDS-UCS-Ext-B-3.txt
166       </a>
167   <dd>CJK Unified Ideographs Extension B [part 3] (U-00024000 〜 
168       U-00025FFF) of ISO/IEC 10646-2:2001
169   <dt><a
170       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-4.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
171        IDS-UCS-Ext-B-4.txt
172       </a>
173   <dd>CJK Unified Ideographs Extension B [part 4] (U-00026000 〜
174       U-00027FFF) of ISO/IEC 10646-2:2001
175   <dt><a
176       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-5.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
177        IDS-UCS-Ext-B-5.txt
178       </a>
179   <dd>CJK Unified Ideographs Extension B [part 5] (U-00028000 〜
180       U-00029FFF) of ISO/IEC 10646-2:2001
181   <dt><a
182       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-6.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
183        IDS-UCS-Ext-B-6.txt
184       </a>
185   <dd>CJK Unified Ideographs Extension B [part 6] (U-0002A000 〜
186       U-0002A6D6) of ISO/IEC 10646-2:2001
187   <dt><a
188       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat-Supplement.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
189       IDS-UCS-Compat-Supplement.txt
190       </a>
191   <dd>CJK Compatibility Ideographs Supplement (U-0002F800 〜 
192       U-0002FA1D) of ISO/IEC 10646-2:2001
193   <dt><a
194       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-01.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
195       IDS-Daikanwa-01.txt
196       </a>
197   <dd>Morohashi: Daikanwa Jiten, Volume 1
198   <dt><a
199       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-02.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
200       IDS-Daikanwa-02.txt
201       </a>
202   <dd>Morohashi: Daikanwa Jiten, Volume 2
203   <dt><a
204       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-03.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
205       IDS-Daikanwa-03.txt
206       </a>
207   <dd>Morohashi: Daikanwa Jiten, Volume 3
208   <dt><a
209       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-04.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
210       IDS-Daikanwa-04.txt
211       </a>
212   <dd>Morohashi: Daikanwa Jiten, Volume 4
213   <dt><a
214       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-05.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
215       IDS-Daikanwa-05.txt
216       </a>
217   <dd>Morohashi: Daikanwa Jiten, Volume 5
218   <dt><a
219       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-06.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
220       IDS-Daikanwa-06.txt
221       </a>
222   <dd>Morohashi: Daikanwa Jiten, Volume 6
223   <dt><a
224       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-07.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
225       IDS-Daikanwa-07.txt
226       </a>
227   <dd>Morohashi: Daikanwa Jiten, Volume 7
228   <dt><a
229       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-08.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
230       IDS-Daikanwa-08.txt
231       </a>
232   <dd>Morohashi: Daikanwa Jiten, Volume 8
233   <dt><a
234       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-09.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
235       IDS-Daikanwa-09.txt
236       </a>
237   <dd>Morohashi: Daikanwa Jiten, Volume 9
238   <dt><a
239       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-10.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
240       IDS-Daikanwa-10.txt
241       </a>
242   <dd>Morohashi: Daikanwa Jiten, Volume 10
243   <dt><a
244       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-11.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
245       IDS-Daikanwa-11.txt
246       </a>
247   <dd>Morohashi: Daikanwa Jiten, Volume 11
248   <dt><a
249       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-12.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
250       IDS-Daikanwa-12.txt
251       </a>
252   <dd>Morohashi: Daikanwa Jiten, Volume 12
253   <dt><a
254       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-dx.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
255       IDS-Daikanwa-dx.txt
256       </a>
257   <dd>Morohashi: Daikanwa Jiten, Additions
258   <dt><a
259       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-ho.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
260       IDS-Daikanwa-ho.txt
261       </a>
262   <dd>Morohashi: Daikanwa Jiten, Appendix
263   <dt><a
264       href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-CBETA.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
265       IDS-CBETA.txt
266       </a>
267   <dd>Characters encountered by the <a href="http://www.cbeta.org/">Chinese Buddhist Electronic Text
268       Association (CBETA)</a>
269 </dl>
270 </blockquote>
271
272 <ul>
273   <li><a href="http://web.sfc.keio.ac.jp/~kamichi/">Koichi KAMICHI</a>
274       (<a href="http://www.fonts.jp/">
275       Forum for development of on-the-fly generation of Kanji Fonts
276       </a>)
277       <a href="http://www.fonts.jp/search.html">
278                         Analytic tool for Kanji Fonts (in Japanese)
279       </a>
280 </ul>
281
282
283 <h3><a name="glyph">Intgegration and Composition of Character Glyphs
284 and Styles</a></h3> <p> In the character database is information about
285 character glyphs and styles collected.  This allows to use this
286 information together with the other knowledge about a character in the
287 database to built a system that uses the <a href="#ids">component
288 structure information </a> to assemble the font for a character
289 depending on the contextual requirements from its components.  With
290 this system, occurrences of mismatches based on erroneous association
291 or insufficient contextual information are excluded, and it will be
292 possible easily display and print character forms that have not been codified and for
293 which no fonts exists .
294 <ul>
295   <li>
296       <a href="http://www.fonts.jp/">
297       Forum for development of on-the-fly generation of Kanji Fonts
298       </a>
299 </ul>
300
301
302 <h3><a name="network">Mathematical analysis and visualation of
303 character knowledge</a></h3>
304 <ul>
305   <li>Yoshi Fujiwara, Yasuhiro Suzuki, Tomohiko
306       Morioka, “<a
307       href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/nw.ps">
308       Network of Words</a>”, <a href="http://arob.cc.oita-u.ac.jp/">
309       Artificial Life and Robotics 2002</a>
310       (<a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/index.html">
311                         Presentation material
312       </a>)
313   <li>Model for the relation of Kanji characters that share a component
314       <br>
315       <a
316                         href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage1.jpg">
317       <img alt="Image 1"
318       src="images/mage1-s.jpg"><br>Image 1</a>
319 &nbsp;<br>
320       <a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage2.jpg">
321       <img alt="模式図2"
322       src="images/mage2-s.jpg"><br>Image 2</a>
323 </ul>
324
325 <!--  <h2>TOMOYO Project</h2> -->
326 <!--  <p> -->
327 <!--  TOMOYO (Text Operation Models and Outfits for Your Objects) -->
328 <!--  プロジェクトは、従来「UTF-2000 プロジェクト」と呼んでいたもので、 -->
329 <!--  文字知識データベースに基づく -->
330 <!--  文字処理アーキテクチャを開発するためのプロジェクトです。 -->
331 <!--  </p> -->
332
333
334 <hr>
335 <h2>Mailing List</h2>
336 <p>
337 Discussion about the CHISE Project occur in the CHISE-{ja|en} mailing list.
338 <p>
339 Anybody who would like to take part in the discussion about and
340 development of the CHISE Project, has ideas or questions about the
341 implementation or wishes for new features is welcome to join either
342 the English, or the Japanese or both lists.
343 <p>
344 To become a member in the CHISE mailing, send a message to the
345 following adress:
346 <dl compact>
347   <dt>For Japanese:
348   <dd><a href="mailto:chise-ja-ctl@m17n.org">
349       chise-ja-ctl@m17n.org</a>
350
351   <dt>For English:
352   <dd><a href="mailto:chise-en-ctl@m17n.org">
353       chise-en-ctl@m17n.org</a>
354 </dl>
355
356 with the word 
357 <blockquote>subscribe Your Name</blockquote>
358 in the body of the message.  You will then receive a conformation
359 message with the line
360
361 <blockquote>
362 confirm PASSWORD Your Name
363 </blockquote> You will have to reply to this message to become a member.
364
365
366 <hr>
367
368 <h2>Papers and Presentations</h2>
369 <ul>
370   <li><a href="xemacs/#presentation">
371       About XEmacs UTF-2000</a>
372   <li><a href="#network">About mathematical analysis of Character Information</a>
373   <li>Other
374       <ul>
375         <li><a href="papers/u2k-plan.ja/">
376             &ldquo;Model and Implementation of a Next Generation Multilingual
377             Processing System&rdquo;
378             </a> (in Japanese. October 1999)
379         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>, 
380             “Non-system characters in XML documents”, in:
381             <i>The Frontier of Asian Information Processing</i>
382             [Seminar Series of the National Documentation and
383                                 Information Centers in Humanities] No. 10, November 2000
384         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>, 
385             &ldquo;The UTF-2000 Project&rdquo;, in:
386             <a
387             href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-2.pdf">
388             Kanji and Information, No.2</a>, March 2001 (in Japanese)
389         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>,
390             “CHISE project &emdash; beyond the UTF-2000”,
391             <a href="http://www.m17n.org/m17n2001/">
392             m17n2001: the Fifth International Symposium on Multilingual
393             Information Processing and Open Source Software
394             </a>.
395         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>, 
396             “A Short Introduction to UTF-2000 Project”,
397             the First TEI Character Set Issues Working Group (October 2001,
398             University of California, Berkeley, USA).
399         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>, 
400             &ldquo;What is Digitisation?&rdquo;, in:
401             <a
402             href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
403             Kanji and Information, No.3</a>, October 2001 (in Japanese).
404         <li><a href="http://www.ya.sakura.ne.jp/~moro/">MORO, Shigeki</a>, 
405             &ldquo;The meaning of 'beyond character codes'&rdquo;, in:
406             <a
407             href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
408             Kanji and Information, No.3</a>, October 2001 (in Japanese).
409         <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>, 
410             “Some thoughts on the digitization of Kanji”,
411             <i>Information Technology and the Humanities</i>
412             [Seminar Series of the National Documentation and
413                                 Information Centers in Humanities] No. 11, November 2001.
414         <li><a href="http://web.sfc.keio.ac.jp/~kamichi/">KAMICHI, Koichi</a>, 
415             &ldquo;Building KAGE (Kanji-font Automatic Generating Engine):
416             The Next Gerenation of Kanji Processing beyond the Character Code Model&rdquo;
417             in <a href="http://www.jaet.gr.jp/jj/3.html"><i>Journal of Japan Association for 
418             East Asian Text Processing (JAET)</i> No. 3</a>, October 2002 (in Japanese).
419         <li><a href="http://www.ya.sakura.ne.jp/~moro/">MORO, Shigeki</a>, 
420             &ldquo;Software Review: CHISE Project,&rdquo;
421             in <a href="http://www.jaet.gr.jp/jj/3.html"><i>Journal of Japan Association for 
422             East Asian Text Processing (JAET)</i> No. 3</a>, October 2002 (in Japanese).
423         <!-- <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA, Tomohiko</a>,
424             <a href="papers/dc2002.pdf">
425             「ポスト文字コード時代の文書処理技術に関する展望」</a>、
426             「データベースの活用と人文社会科学」
427             (全国文献・情報センター人文社会科学学術セミナーシリーズ No.12),
428             2002年11月 -->
429       </ul>
430 </ul>
431
432 <h2><a name="history">History</a></h2>
433
434 <hr>
435
436 <br>
437 <b>[<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/">Documentation and Information Center for Chinese Studies</a> at the 
438 <a href="http://www.zinbun.kyoto-u.ac.jp/">
439 Institute for Research in the Humanities</a>&nbsp;
440 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/">
441 Related Projects
442 </a>]</b>
443 <p><img SRC="images/dragon.jpg" height=146 width=198></center>
444
445 <hr>
446
447 <!-- hhmts start -->
448 Last modified: Wed Oct  9 03:33:25 JST 2002
449 <!-- hhmts end -->.
450 <a href="http://www.aurora.dti.ne.jp/~zom/Counter/index.html">
451 <img
452  src="http://mousai.as.wakwak.ne.jp/cgi-bin/counterp.cgi?projects_chise-en.log"
453  alt="counter"></a>
454 since Oct 9 2002.
455
456 </body>
457 </html>
458 <!-- Keep this comment at the end of the file
459 Local variables:
460 mode: text
461 tab-width: 8
462 time-stamp-line-limit:40
463 -->