1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2 "http://www.w3.org/TR/html4/loose.dtd">
5 <title>CHaracter Information Service Environment</title>
9 [<a href="http://cvs.m17n.org/chise/">m17n.org</a>]
10 [<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/">
11 Kyoto University, Institute for Research in Humanities, Documentation
12 and Information Center for Chinese Studies
17 <table cellspacing="8">
18 <tr><th align="center" valign="top">
19 <img alt="DICCS" src="images/cm450118-s.jpg">
20 <td align="center" valign="middle">
21 <font size="+3">CHISE project</font>
26 Time-stamp: "2002-09-26 17:49:31 JST chris"
29 <b><a href="index.html.ja.iso-2022-jp"><img
30 src="../../images/japanese-page.gif">
34 <h2>About the CHISE Project</h2>
36 The CHISE (CHaracter Information Service Environment) project attempts
37 to collect and organize into a Knowledge-Base information about
38 characters in the scripts of the world. A new processing environment
39 based on this architecture is currently under development.
46 href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/"
47 >Tomohiko MORIOKA</a> is presenting at the <a href="http://lc.linux.or.jp/lc2002/">
48 Linux Conference 2002</a>
49 <li>2002-09-20 to 22 <a
50 href="http://www.kanji.zinbun.kyoto-u.ac.jp/~tomo/">Tomohiko
52 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/~wittern/">
53 Christian WITTERN</a> are presenting at the <a
54 href="http://pnc-ecai.oiu.ac.jp/prog2.htm">
55 PNC Annual Conference and Joint Meetings 2002
57 <li>2002-08-21 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
58 XEmacs UTF-2000 0.19 (Koriyama)
59 </a> has been released.
64 <h2>
\e$BJ8;zCN<1%G!<%?%Y!<%9$K4p$E$/J8;z=hM}%"!<%-%F%/%A%c$N3+H/
\e(B</h2>
66 <h2>Development of a character processing architecture based on a
67 character knowledge base</h2>
69 <h3><a name="xemacs/">XEmacs UTF-2000</a></h3> <p> <!--
\e$B30ItJ8;z%G!<%?
\e(B
70 \e$B%Y!<%9$+$iJ8;zB0@-$r
\e(B lazy-loading
\e$B2DG=$K$J$j$^$7$?!#
\e(BIA32
\e$B%"!<%-%F%/%A%c
\e(B
71 \e$B$G<B9T7A<0$NBg$-$5$,=>MhLs
\e(B 30 MB
\e$B$@$C$?$N$,Ls
\e(B 15 MB
\e$B$K$J$j$^$7$?!#8=:_!"
\e(B
72 cvs.m17n.org
\e$B$N
\e(B /cvs/root
\e$B$N
\e(BXEmacs
\e$B%b%8%e!<%k$N
\e(B utf-2000
\e$B;^$G$+$i
\e(B
73 anonymous CVS
\e$B$GF~<j2DG=$G$9!#
\e(B--> It is now possible to load character
74 attributes from a external database on demand ("lazy loading"). On
75 Intel 32 bit processor architectures, the size of the executable file
76 thus shrinks from the 30 MB required with the traditional built to
77 just about 15 MB. This can now be downloaded from <a
78 href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/chise/dist/XEmacs/xemacs-utf-2000-0.19.tar.gz">
79 XEmacs UTF-2000 0.19 (Koriyama)</a>. In addtion, there is a UTF-2000
80 branch of the XEmacs tree at cvs.m17n.org in /cvs/root, this can be
81 accessed by anonymous CVS </p>
84 <h2>A <a name="topicmaps">
85 <a href="http://www.topicmaps.org">TopicMaps</a> based approach to a
89 In 2001 the prototype of a Topic Map engine has been developed based
90 on <a href="http://www.zope.org/">Zope</a>. This proved less than
91 ideal for this purpose, so the focus for this year is to port this
92 engine to a relational database backend. Currently development
93 continued with PostgreSQL. It is planned to enable Topic Map editing
94 within XEmacs UTF-2000, but also to allow multiple clients in addtion
100 <h2>Database of features of characters</h2>
102 <h3>Database of the component structure of Chinese Characters</h3>
105 Based on the Ideographic Description Characters (IDS) in
106 ISO/IEC 10646-1:2000 and Unicode, we are now developping a database
107 that expresses the structure of Chinese Characters using this syntax.
108 At the moment, we are using the characters in the Unicode tables as a
109 reference. The basic <emph>CJK Unified Ideographs</emph>, as well as
110 <emph>Extension A</emph> and <emph>Extension B</epmph>, together more
111 than 70000 characters are currently covered.
115 <a href="images/ids-ext-b-1.png">
116 <img align="ids" src="images/ids-ext-b-1-s.png">
118 Table of the component structure database
123 The following tables are currently available via anonymous CVS from <a
124 href="http://cvs.m17n.org/">cvs.m17n.org</a> at <a
125 href="http://cvs.m17n.org/cgi-bin/viewcvs/?cvsroot=chise">/cvs/chise</a>
127 href="http://cvs.m17n.org/cgi-bin/viewcvs/ids/?cvsroot=chise">ids:</a>
133 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Basic.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
136 <dd>CJK Unified Ideographs (U+4E00
\e$B!A
\e(B U+9FA5) of ISO/IEC
140 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-A.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
143 <dd>CJK Unified Ideographs Extension A (U+3400
\e$B!A
\e(B U+4DB5, U+FA1F and
144 U+FA23) of ISO/IEC 10646-1:2000
147 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
150 <dd>CJK Compatibility Ideographs (U+F900
\e$B!A
\e(B U+FA2D, except U+FA1F
151 and U+FA23) of ISO/IEC 10646-1:2000
154 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-1.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
157 <dd>CJK Unified Ideographs Extension B [part 1] (U-00020000
\e$B!A
\e(B
158 U-00021FFF) of ISO/IEC 10646-2:2001
161 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-2.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
164 <dd>CJK Unified Ideographs Extension B [part 2] (U-00022000
\e$B!A
\e(B
165 U-00023FFF) of ISO/IEC 10646-2:2001
167 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-3.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
170 <dd>CJK Unified Ideographs Extension B [part 3] (U-00024000
\e$B!A
\e(B
171 U-00025FFF) of ISO/IEC 10646-2:2001
173 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-4.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
176 <dd>CJK Unified Ideographs Extension B [part 4] (U-00026000
\e$B!A
\e(B
177 U-00027FFF) of ISO/IEC 10646-2:2001
179 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-5.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
182 <dd>CJK Unified Ideographs Extension B [part 5] (U-00028000
\e$B!A
\e(B
183 U-00029FFF) of ISO/IEC 10646-2:2001
185 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Ext-B-6.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
188 <dd>CJK Unified Ideographs Extension B [part 6] (U-0002A000
\e$B!A
\e(B
189 U-0002A6D6) of ISO/IEC 10646-2:2001
191 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-UCS-Compat-Supplement.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
192 IDS-UCS-Compat-Supplement.txt
194 <dd>CJK Compatibility Ideographs Supplement (U-0002F800
\e$B!A
\e(B
195 U-0002FA1D) of ISO/IEC 10646-2:2001
197 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-01.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
200 <dd>Morohashi: Daikanwa Jiten, Volume 1
202 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-02.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
205 <dd>Morohashi: Daikanwa Jiten, Volume 2
207 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-03.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
210 <dd>Morohashi: Daikanwa Jiten, Volume 3
212 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-04.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
215 <dd>Morohashi: Daikanwa Jiten, Volume 4
217 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-05.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
220 <dd>Morohashi: Daikanwa Jiten, Volume 5
222 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-06.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
225 <dd>Morohashi: Daikanwa Jiten, Volume 6
227 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-07.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
230 <dd>Morohashi: Daikanwa Jiten, Volume 7
232 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-08.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
235 <dd>Morohashi: Daikanwa Jiten, Volume 8
237 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-09.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
240 <dd>Morohashi: Daikanwa Jiten, Volume 9
242 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-10.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
245 <dd>Morohashi: Daikanwa Jiten, Volume 10
247 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-11.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
250 <dd>Morohashi: Daikanwa Jiten, Volume 11
252 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-12.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
255 <dd>Morohashi: Daikanwa Jiten, Volume 12
257 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-dx.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
260 <dd>Morohashi: Daikanwa Jiten, Additions
262 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-Daikanwa-ho.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
265 <dd>Morohashi: Daikanwa Jiten, Appendix
267 href="http://cvs.m17n.org/cgi-bin/viewcvs/*checkout*/ids/IDS-CBETA.txt?rev=HEAD&cvsroot=chise&content-type=text/plain">
270 <dd>Characters encountered by the <a href="http://www.cbeta.org/">Chinese Buddhist Electronic Text
271 Association (CBETA)</a>
276 <li><a href="http://web.sfc.keio.ac.jp/~kamichi/">Koichi KAMICHI</a>
277 (<a href="http://www.fonts.jp/">
278 Forum for development of on-the-fly generation of Kanji Fonts
280 <a href="http://www.fonts.jp/search.html">
281 Analytic tool for Kanji Fonts (in Japanese)
286 <h3><a name="glyph">Intgegration and Composition of Character Glyphs
287 and Styles</a></h3> <p> In the character database is information about
288 character glyphs and styles collected. This allows to use this
289 information together with the other knowledge about a character in the
290 database to built a system that uses the <a href="#ids">component
291 structure information </a> to assemble the font for a character
292 depending on the contextual requirements from its components. With
293 this system, occurrences of mismatches based on erroneous association
294 or insufficient contextual information are excluded, and it will be
295 possible easily display and print character forms that have not been codified and for
296 which no fonts exists .
299 <a href="http://www.fonts.jp/">
300 Forum for development of on-the-fly generation of Kanji Fonts
305 <h3><a name="network">Mathematical analysis and visualation of
306 character knowledge</a></h3>
308 <li>Yoshi Fujiwara, Yasuhiro Suzuki, Tomohiko
309 Morioka,
\e$B!H
\e(B<a
310 href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/nw.ps">
311 Network of Words</a>
\e$B!I
\e(B, <a href="http://arob.cc.oita-u.ac.jp/">
312 Artificial Life and Robotics 2002</a>
313 (<a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/index.html">
314 Presentation material
316 <li>Model for the relation of Kanji characters that share a component
319 href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage1.jpg">
321 src="images/mage1-s.jpg"><br>Image 1</a>
323 <a href="http://www2.crl.go.jp/jt/a134/yoshi/pc/kanji/mage2.jpg">
324 <img alt="
\e$BLO<0?^
\e(B2"
325 src="images/mage2-s.jpg"><br>Image 2</a>
328 <!-- <h2>TOMOYO Project</h2> -->
330 <!-- TOMOYO (Text Operation Models and Outfits for Your Objects) -->
331 <!--
\e$B%W%m%8%'%/%H$O!"=>Mh!V
\e(BUTF-2000
\e$B%W%m%8%'%/%H!W$H8F$s$G$$$?$b$N$G!"
\e(B -->
332 <!--
\e$BJ8;zCN<1%G!<%?%Y!<%9$K4p$E$/
\e(B -->
333 <!--
\e$BJ8;z=hM}%"!<%-%F%/%A%c$r3+H/$9$k$?$a$N%W%m%8%'%/%H$G$9!#
\e(B -->
338 <h2>Mailing List</h2>
340 Discussion about the CHISE Project occur in the CHISE-{ja|en} mailing list.
342 Anybody who would like to take part in the discussion about and
343 development of the CHISE Project, has ideas or questions about the
344 implementation or wishes for new features is welcome to join either
345 the English, or the Japanese or both lists.
347 To become a member in the CHISE mailing, send a message to the
351 <dd><a href="mailto:chise-ja-ctl@m17n.org">
352 chise-ja-ctl@m17n.org</a>
355 <dd><a href="mailto:chise-en-ctl@m17n.org">
356 chise-en-ctl@m17n.org</a>
360 <blockquote>subscribe Your Name</blockquote>
361 in the body of the message. You will then receive a conformation
362 message with the line
365 confirm PASSWORD Your Name
366 </blockquote> You will have to reply to this message to become a member.
371 <h2>Papers and Presentations</h2>
373 <li><a href="xemacs/#presentation">
375 <li><a href="#network">About mathematical analysis of Character Information</a>
378 <li><a href="papers/u2k-plan.ja/">
379 "Model and Implementation of a Next Generation Multilingual
382 <li>WITTERN, Christian,
\e$B!H
\e(BNon-system characters in XML documents
\e$B!I
\e(B, in:
383 <i>The Frontier of Asian Information Processing</i>
384 [Seminar Series of the National Documentation and
385 Information Centers in Humanities] No. 10, November 2000
386 <li>MORIOKA Tomohiko,
\e$B!V
\e(BThe UTF-2000 Project
\e$B!W
\e(B, in:
388 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-2.pdf">
389 Kanji and Information, No.2</a>, March 2001
390 <li>MORIOKA Tomohiko,
\e$B!H
\e(BCHISE project &emdash; beyond the UTF-2000
\e$B!I
\e(B,
391 <a href="http://www.m17n.org/m17n2001/">
392 m17n2001: the Fifth International Symposium on Multilingual
393 Information Processing and Open Source Software
395 <li>MORIOKA Tomohiko,
\e$B!H
\e(BA Short Introduction to UTF-2000 Project
\e$B!I
\e(B,
396 the First TEI Character Set Issues Working Group (October 2001,
397 University of California, Berkeley, USA).
398 <li>WITTERN, Christian, "What is Digitisation?", in:
400 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
401 Kanji and Information, No.3</a>, October 2001
402 <li>MORO Shigeki, "The meaning of 'beyond character codes'", in:
404 href="http://www.kanji.zinbun.kyoto-u.ac.jp/publications/kanji-and-info-3.pdf">
405 Kanji and Information, No.3</a>, October 2001
406 <li>WITTERN, Christian,
\e$B!H
\e(BSome thoughts on the digitization of Kanji
\e$B!I
\e(B,
407 <i>Information Technology and the Humanities</i>
408 [Seminar Series of the National Documentation and
409 Information Centers in Humanities] No. 11, November 2001
413 <h2><a name="history">History</a></h2>
418 <b>[<a href="http://www.kanji.zinbun.kyoto-u.ac.jp/">Documentation and Information Center for Chinese Studies</a> at the
419 <a href="http://www.zinbun.kyoto-u.ac.jp/">
420 Institute for Research in the Humanities</a>
421 <a href="http://www.kanji.zinbun.kyoto-u.ac.jp/projects/">
424 <p><img SRC="images/dragon.jpg" height=146 width=198></center>
428 <!-- Keep this comment at the end of the file
432 time-stamp-line-limit:40