1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
5 <title>CHaracter Information Service Environment</title>
9 [<a href="http://cvs.m17n.org/chise/">m17n.org</a>]
10 [<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/">
11 Kyoto University, Institute for Research in Humanities, Documentation
12 and Information Center for Chinese Studies
17 <table cellspacing="8">
18 <tr><th align="center" valign="top">
19 <img alt="DICCS" src="images/cm450118-s.jpg">
20 <td align="center" valign="middle">
21 <font size="+3">CHISE project</font>
26 <b><a href="index.html.ja.iso-2022-jp"><img
27 src="images/japanese-page.png">
31 <h2>About the CHISE Project</h2>
33 The CHISE (CHaracter Information Service Environment) project attempts
34 to collect and organize into a Knowledge-Base information about
35 characters in the scripts of the world. A new processing environment
36 based on this architecture is currently under development.
42 <li>2002-09-20 to 22 <a
43 href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">Tomohiko
45 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">
46 Christian WITTERN</a> made a presentation at the <a
47 href="http://pnc-ecai.oiu.ac.jp/prog2.htm">
48 PNC Annual Conference and Joint Meetings 2002
51 href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/"
52 >Tomohiko MORIOKA</a> gave a presentation at the <a href="http://lc.linux.or.jp/lc2002/">
53 Linux Conference 2002</a>
54 <li>2002-08-21 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
55 XEmacs UTF-2000 0.19 (Koriyama)
56 </a> has been released.
61 <h2>文字知識データベースに基づく文字処理アーキテクチャの開発</h2>
63 <h2>Development of a character processing architecture based on a
64 character knowledge base</h2>
66 <h3><a name="xemacs/">XEmacs UTF-2000</a></h3> <p> <!-- 外部文字データ
67 ベースから文字属性を lazy-loading 可能になりました。IA32 アーキテクチャ
68 で実行形式の大きさが従来約 30 MB だったのが約 15 MB になりました。現在、
69 cvs.m17n.org の /cvs/root のXEmacs モジュールの utf-2000 枝でから
70 anonymous CVS で入手可能です。--> It is now possible to load character
71 attributes from a external database on demand ("lazy loading"). On
72 Intel 32 bit processor architectures, the size of the executable file
73 thus shrinks from the 30 MB required with the traditional built to
74 just about 15 MB. This can now be downloaded from <a
75 href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
76 XEmacs UTF-2000 0.19 (Koriyama)</a>. In addtion, there is a UTF-2000
77 branch of the XEmacs tree at cvs.m17n.org in /cvs/root, this can be
78 accessed by anonymous CVS </p>
81 <h2>A <a name="topicmaps">
82 <a href="http://www.topicmaps.org">TopicMaps</a> based approach to a
86 In 2001 the prototype of a Topic Map engine has been developed based
87 on <a href="http://www.zope.org/">Zope</a>. This proved less than
88 ideal for this purpose, so the focus for this year is to port this
89 engine to a relational database backend. Currently development
90 continued with PostgreSQL. It is planned to enable Topic Map editing
91 within XEmacs UTF-2000, but also to allow multiple clients in addtion
97 <h2>Database of features of characters</h2>
99 <h3>Database of the component structure of Chinese Characters</h3>
102 Based on the Ideographic Description Characters (IDS) in
103 ISO/IEC 10646-1:2000 and Unicode, we are now developping a database
104 that expresses the structure of Chinese Characters using this syntax.
105 At the moment, we are using the characters in the Unicode tables as a
106 reference. The basic <emph>CJK Unified Ideographs</emph>, as well as
107 <emph>Extension A</emph> and <emph>Extension B</epmph>, together more
108 than 70000 characters are currently covered.
112 <a href="images/ids-ext-b-1.png">
113 <img align="ids" src="images/ids-ext-b-1-s.png">
115 Table of the component structure database
120 The following tables are currently available via anonymous CVS from <a
121 href="http://cvs.m17n.org/">cvs.m17n.org</a> at <a
122 href="http://cvs.m17n.org/cgi-bin/viewcvs/?cvsroot=chise">/cvs/chise</a>
124 href="http://cvs.m17n.org/cgi-bin/viewcvs/ids/?cvsroot=chise">ids:</a>
130 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Basic.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
133 <dd>CJK Unified Ideographs (U+4E00 〜 U+9FA5) of ISO/IEC
137 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-A.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
140 <dd>CJK Unified Ideographs Extension A (U+3400 〜 U+4DB5, U+FA1F and
141 U+FA23) of ISO/IEC 10646-1:2000
144 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
147 <dd>CJK Compatibility Ideographs (U+F900 〜 U+FA2D, except U+FA1F
148 and U+FA23) of ISO/IEC 10646-1:2000
151 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-1.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
154 <dd>CJK Unified Ideographs Extension B [part 1] (U-00020000 〜
155 U-00021FFF) of ISO/IEC 10646-2:2001
158 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-2.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
161 <dd>CJK Unified Ideographs Extension B [part 2] (U-00022000 〜
162 U-00023FFF) of ISO/IEC 10646-2:2001
164 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-3.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
167 <dd>CJK Unified Ideographs Extension B [part 3] (U-00024000 〜
168 U-00025FFF) of ISO/IEC 10646-2:2001
170 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-4.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
173 <dd>CJK Unified Ideographs Extension B [part 4] (U-00026000 〜
174 U-00027FFF) of ISO/IEC 10646-2:2001
176 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-5.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
179 <dd>CJK Unified Ideographs Extension B [part 5] (U-00028000 〜
180 U-00029FFF) of ISO/IEC 10646-2:2001
182 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-6.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
185 <dd>CJK Unified Ideographs Extension B [part 6] (U-0002A000 〜
186 U-0002A6D6) of ISO/IEC 10646-2:2001
188 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat-Supplement.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
189 IDS-UCS-Compat-Supplement.txt
191 <dd>CJK Compatibility Ideographs Supplement (U-0002F800 〜
192 U-0002FA1D) of ISO/IEC 10646-2:2001
194 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-01.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
197 <dd>Morohashi: Daikanwa Jiten, Volume 1
199 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-02.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
202 <dd>Morohashi: Daikanwa Jiten, Volume 2
204 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-03.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
207 <dd>Morohashi: Daikanwa Jiten, Volume 3
209 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-04.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
212 <dd>Morohashi: Daikanwa Jiten, Volume 4
214 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-05.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
217 <dd>Morohashi: Daikanwa Jiten, Volume 5
219 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-06.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
222 <dd>Morohashi: Daikanwa Jiten, Volume 6
224 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-07.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
227 <dd>Morohashi: Daikanwa Jiten, Volume 7
229 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-08.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
232 <dd>Morohashi: Daikanwa Jiten, Volume 8
234 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-09.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
237 <dd>Morohashi: Daikanwa Jiten, Volume 9
239 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-10.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
242 <dd>Morohashi: Daikanwa Jiten, Volume 10
244 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-11.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
247 <dd>Morohashi: Daikanwa Jiten, Volume 11
249 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-12.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
252 <dd>Morohashi: Daikanwa Jiten, Volume 12
254 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-dx.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
257 <dd>Morohashi: Daikanwa Jiten, Additions
259 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-ho.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
262 <dd>Morohashi: Daikanwa Jiten, Appendix
264 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-CBETA.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
267 <dd>Characters encountered by the <a href="http://www.cbeta.org/">Chinese Buddhist Electronic Text
268 Association (CBETA)</a>
273 <li><a href="http://web.sfc.keio.ac.jp/~kamichi/">Koichi KAMICHI</a>
274 (<a href="http://www.fonts.jp/">
275 Forum for development of on-the-fly generation of Kanji Fonts
277 <a href="http://www.fonts.jp/search.html">
278 Analytic tool for Kanji Fonts (in Japanese)
283 <h3><a name="glyph">Intgegration and Composition of Character Glyphs
284 and Styles</a></h3> <p> In the character database is information about
285 character glyphs and styles collected. This allows to use this
286 information together with the other knowledge about a character in the
287 database to built a system that uses the <a href="#ids">component
288 structure information </a> to assemble the font for a character
289 depending on the contextual requirements from its components. With
290 this system, occurrences of mismatches based on erroneous association
291 or insufficient contextual information are excluded, and it will be
292 possible easily display and print character forms that have not been codified and for
293 which no fonts exists .
296 <a href="http://www.fonts.jp/">
297 Forum for development of on-the-fly generation of Kanji Fonts
302 <h3><a name="network">Mathematical analysis and visualation of
303 character knowledge</a></h3>
305 <li>Yoshi Fujiwara, Yasuhiro Suzuki, Tomohiko
307 href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/nw.ps">
308 Network of Words</a>”, <a href="http://arob.cc.oita-u.ac.jp/">
309 Artificial Life and Robotics 2002</a>
310 (<a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/index.html">
311 Presentation material
313 <li>Model for the relation of Kanji characters that share a component
316 href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage1.jpg">
318 src="images/mage1-s.jpg"><br>Image 1</a>
320 <a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage2.jpg">
322 src="images/mage2-s.jpg"><br>Image 2</a>
325 <!-- <h2>TOMOYO Project</h2> -->
327 <!-- TOMOYO (Text Operation Models and Outfits for Your Objects) -->
328 <!-- プロジェクトは、従来「UTF-2000 プロジェクト」と呼んでいたもので、 -->
329 <!-- 文字知識データベースに基づく -->
330 <!-- 文字処理アーキテクチャを開発するためのプロジェクトです。 -->
335 <h2>Mailing List</h2>
337 Discussion about the CHISE Project occur in the CHISE-{ja|en} mailing list.
339 Anybody who would like to take part in the discussion about and
340 development of the CHISE Project, has ideas or questions about the
341 implementation or wishes for new features is welcome to join either
342 the English, or the Japanese or both lists.
344 To become a member in the CHISE mailing, send a message to the
348 <dd><a href="mailto:chise-ja-ctl@m17n.org">
349 chise-ja-ctl@m17n.org</a>
352 <dd><a href="mailto:chise-en-ctl@m17n.org">
353 chise-en-ctl@m17n.org</a>
357 <blockquote>subscribe Your Name</blockquote>
358 in the body of the message. You will then receive a conformation
359 message with the line
362 confirm PASSWORD Your Name
363 </blockquote> You will have to reply to this message to become a member.
368 <h2>Papers and Presentations</h2>
370 <li><a href="xemacs/#presentation">
371 About XEmacs UTF-2000</a>
372 <li><a href="#network">About mathematical analysis of Character Information</a>
375 <li><a href="papers/u2k-plan.ja/">
376 “Model and Implementation of a Next Generation Multilingual
377 Processing System”
378 </a> (in Japanese. October 1999)
379 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>,
380 “Non-system characters in XML documents”, in:
381 <i>The Frontier of Asian Information Processing</i>
382 [Seminar Series of the National Documentation and
383 Information Centers in Humanities] No. 10, November 2000
384 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>,
385 “The UTF-2000 Project”, in:
387 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-2.pdf">
388 Kanji and Information, No.2</a>, March 2001 (in Japanese)
389 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>,
390 “CHISE project &emdash; beyond the UTF-2000”,
391 <a href="http://www.m17n.org/m17n2001/">
392 m17n2001: the Fifth International Symposium on Multilingual
393 Information Processing and Open Source Software
395 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA Tomohiko</a>,
396 “A Short Introduction to UTF-2000 Project”,
397 the First TEI Character Set Issues Working Group (October 2001,
398 University of California, Berkeley, USA).
399 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>,
400 “What is Digitisation?”, in:
402 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
403 Kanji and Information, No.3</a>, October 2001 (in Japanese).
404 <li><a href="http://www.ya.sakura.ne.jp/~moro/">MORO, Shigeki</a>,
405 “The meaning of 'beyond character codes'”, in:
407 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
408 Kanji and Information, No.3</a>, October 2001 (in Japanese).
409 <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">WITTERN, Christian</a>,
410 “Some thoughts on the digitization of Kanji”,
411 <i>Information Technology and the Humanities</i>
412 [Seminar Series of the National Documentation and
413 Information Centers in Humanities] No. 11, November 2001.
414 <li><a href="http://web.sfc.keio.ac.jp/~kamichi/">KAMICHI, Koichi</a>,
415 “Building KAGE (Kanji-font Automatic Generating Engine):
416 The Next Gerenation of Kanji Processing beyond the Character Code Model”
417 in <a href="http://www.jaet.gr.jp/jj/3.html"><i>Journal of Japan Association for
418 East Asian Text Processing (JAET)</i> No. 3</a>, October 2002 (in Japanese).
419 <li><a href="http://www.ya.sakura.ne.jp/~moro/">MORO, Shigeki</a>,
420 “Software Review: CHISE Project,”
421 in <a href="http://www.jaet.gr.jp/jj/3.html"><i>Journal of Japan Association for
422 East Asian Text Processing (JAET)</i> No. 3</a>, October 2002 (in Japanese).
423 <!-- <li><a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">MORIOKA, Tomohiko</a>,
424 <a href="papers/dc2002.pdf">
425 「ポスト文字コード時代の文書処理技術に関する展望」</a>、
427 (全国文献・情報センター人文社会科学学術セミナーシリーズ No.12),
432 <h2><a name="history">History</a></h2>
437 <b>[<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/">Documentation and Information Center for Chinese Studies</a> at the
438 <a href="http://www.zinbun.kyoto-u.ac.jp/">
439 Institute for Research in the Humanities</a>
440 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/">
443 <p><img SRC="images/dragon.jpg" height=146 width=198></center>
448 Last modified: Wed Oct 9 03:33:25 JST 2002
450 <a href="http://www.aurora.dti.ne.jp/~zom/Counter/index.html">
452 src="http://mousai.as.wakwak.ne.jp/cgi-bin/counterp.cgi?projects_chise-en.log"
458 <!-- Keep this comment at the end of the file
462 time-stamp-line-limit:40