[m17n.org] [ Kyoto University, Institute for Research in Humanities, Documentation and Information Center for Chinese Studies ]
CHISE project |
Last modified: Fri Sep 27 00:30:59 JST 2002
The CHISE (CHaracter Information Service Environment) project attempts to collect and organize into a Knowledge-Base information about characters in the scripts of the world. A new processing environment based on this architecture is currently under development.
It is now possible to load character attributes from a external database on demand ("lazy loading"). On Intel 32 bit processor architectures, the size of the executable file thus shrinks from the 30 MB required with the traditional built to just about 15 MB. This can now be downloaded from XEmacs UTF-2000 0.19 (Koriyama). In addtion, there is a UTF-2000 branch of the XEmacs tree at cvs.m17n.org in /cvs/root, this can be accessed by anonymous CVS
In 2001 the prototype of a Topic Map engine has been developed based on Zope. This proved less than ideal for this purpose, so the focus for this year is to port this engine to a relational database backend. Currently development continued with PostgreSQL. It is planned to enable Topic Map editing within XEmacs UTF-2000, but also to allow multiple clients in addtion to this.
Based on the Ideographic Description Characters (IDS) in
ISO/IEC 10646-1:2000 and Unicode, we are now developping a database
that expresses the structure of Chinese Characters using this syntax.
At the moment, we are using the characters in the Unicode tables as a
reference. The basic
Table of the component structure database
The following tables are currently available via anonymous CVS from cvs.m17n.org at /cvs/chise as module ids:
- IDS-UCS-Basic.txt
- CJK Unified Ideographs (U+4E00 〜 U+9FA5) of ISO/IEC 10646-1:2000
- IDS-UCS-Ext-A.txt
- CJK Unified Ideographs Extension A (U+3400 〜 U+4DB5, U+FA1F and U+FA23) of ISO/IEC 10646-1:2000
- IDS-UCS-Compat.txt
- CJK Compatibility Ideographs (U+F900 〜 U+FA2D, except U+FA1F and U+FA23) of ISO/IEC 10646-1:2000
- IDS-UCS-Ext-B-1.txt
- CJK Unified Ideographs Extension B [part 1] (U-00020000 〜 U-00021FFF) of ISO/IEC 10646-2:2001
- IDS-UCS-Ext-B-2.txt
- CJK Unified Ideographs Extension B [part 2] (U-00022000 〜 U-00023FFF) of ISO/IEC 10646-2:2001
- IDS-UCS-Ext-B-3.txt
- CJK Unified Ideographs Extension B [part 3] (U-00024000 〜 U-00025FFF) of ISO/IEC 10646-2:2001
- IDS-UCS-Ext-B-4.txt
- CJK Unified Ideographs Extension B [part 4] (U-00026000 〜 U-00027FFF) of ISO/IEC 10646-2:2001
- IDS-UCS-Ext-B-5.txt
- CJK Unified Ideographs Extension B [part 5] (U-00028000 〜 U-00029FFF) of ISO/IEC 10646-2:2001
- IDS-UCS-Ext-B-6.txt
- CJK Unified Ideographs Extension B [part 6] (U-0002A000 〜 U-0002A6D6) of ISO/IEC 10646-2:2001
- IDS-UCS-Compat-Supplement.txt
- CJK Compatibility Ideographs Supplement (U-0002F800 〜 U-0002FA1D) of ISO/IEC 10646-2:2001
- IDS-Daikanwa-01.txt
- Morohashi: Daikanwa Jiten, Volume 1
- IDS-Daikanwa-02.txt
- Morohashi: Daikanwa Jiten, Volume 2
- IDS-Daikanwa-03.txt
- Morohashi: Daikanwa Jiten, Volume 3
- IDS-Daikanwa-04.txt
- Morohashi: Daikanwa Jiten, Volume 4
- IDS-Daikanwa-05.txt
- Morohashi: Daikanwa Jiten, Volume 5
- IDS-Daikanwa-06.txt
- Morohashi: Daikanwa Jiten, Volume 6
- IDS-Daikanwa-07.txt
- Morohashi: Daikanwa Jiten, Volume 7
- IDS-Daikanwa-08.txt
- Morohashi: Daikanwa Jiten, Volume 8
- IDS-Daikanwa-09.txt
- Morohashi: Daikanwa Jiten, Volume 9
- IDS-Daikanwa-10.txt
- Morohashi: Daikanwa Jiten, Volume 10
- IDS-Daikanwa-11.txt
- Morohashi: Daikanwa Jiten, Volume 11
- IDS-Daikanwa-12.txt
- Morohashi: Daikanwa Jiten, Volume 12
- IDS-Daikanwa-dx.txt
- Morohashi: Daikanwa Jiten, Additions
- IDS-Daikanwa-ho.txt
- Morohashi: Daikanwa Jiten, Appendix
- IDS-CBETA.txt
- Characters encountered by the Chinese Buddhist Electronic Text Association (CBETA)
In the character database is information about character glyphs and styles collected. This allows to use this information together with the other knowledge about a character in the database to built a system that uses the component structure information to assemble the font for a character depending on the contextual requirements from its components. With this system, occurrences of mismatches based on erroneous association or insufficient contextual information are excluded, and it will be possible easily display and print character forms that have not been codified and for which no fonts exists .
Discussion about the CHISE Project occur in the CHISE-{ja|en} mailing list.
Anybody who would like to take part in the discussion about and development of the CHISE Project, has ideas or questions about the implementation or wishes for new features is welcome to join either the English, or the Japanese or both lists.
To become a member in the CHISE mailing, send a message to the following address:
subscribe Your Namein the body of the message. You will then receive a conformation message with the line
confirm PASSWORD Your NameYou will have to reply to this message to become a member.
This project was assisted by 未踏ソフトウェア創造事業, 2001.
[
Documentation and Information Center for Chinese Studies (DICCS),
Institute for Research in the Humanities,
Kyoto University
]
[
m17n.org (the Organization for Multilingualization)
(National Institute of Advanced Industrial Science and Technology)
]
[
Hanazono University
]
[
National Institute of Advanced Industrial Science and Technology
]
[
Dept. of Bioinformatics,
Medical Research Institute,
Tokyo Medical and Dental University
]