[m17n.org] [ Kyoto University, Institute for Research in Humanities, Documentation and Information Center for Chinese Studies ]


About the CHISE Project

The CHISE (CHaracter Information Service Environment) project attempts to collect and organize into a Knowledge-Base information about characters in the scripts of the world. A new processing environment based on this architecture is currently under development.


Development of a character processing architecture based on a character knowledge base

XEmacs UTF-2000

It is now possible to load character attributes from a external database on demand ("lazy loading"). On Intel 32 bit processor architectures, the size of the executable file thus shrinks from the 30 MB required with the traditional built to just about 15 MB. This can now be downloaded from XEmacs UTF-2000 0.19 (Koriyama). In addtion, there is a UTF-2000 branch of the XEmacs tree at cvs.m17n.org in /cvs/root, this can be accessed by anonymous CVS

A TopicMaps based approach to a character dababase

In 2001 the prototype of a Topic Map engine has been developed based on Zope. This proved less than ideal for this purpose, so the focus for this year is to port this engine to a relational database backend. Currently development continued with PostgreSQL. It is planned to enable Topic Map editing within XEmacs UTF-2000, but also to allow multiple clients in addtion to this.

Database of features of characters

Database of the component structure of Chinese Characters

Based on the Ideographic Description Characters (IDS) in ISO/IEC 10646-1:2000 and Unicode, we are now developping a database that expresses the structure of Chinese Characters using this syntax. At the moment, we are using the characters in the Unicode tables as a reference. The basic CJK Unified Ideographs, as well as Extension A and Extension B, together more than 70000 characters are currently covered.

Table of the component structure database

The following tables are currently available via anonymous CVS from cvs.m17n.org at /cvs/chise as module ids:

CJK Unified Ideographs (U+4E00 〜 U+9FA5) of ISO/IEC 10646-1:2000
CJK Unified Ideographs Extension A (U+3400 〜 U+4DB5, U+FA1F and U+FA23) of ISO/IEC 10646-1:2000
CJK Compatibility Ideographs (U+F900 〜 U+FA2D, except U+FA1F and U+FA23) of ISO/IEC 10646-1:2000
CJK Unified Ideographs Extension B [part 1] (U-00020000 〜 U-00021FFF) of ISO/IEC 10646-2:2001
CJK Unified Ideographs Extension B [part 2] (U-00022000 〜 U-00023FFF) of ISO/IEC 10646-2:2001
CJK Unified Ideographs Extension B [part 3] (U-00024000 〜 U-00025FFF) of ISO/IEC 10646-2:2001
CJK Unified Ideographs Extension B [part 4] (U-00026000 〜 U-00027FFF) of ISO/IEC 10646-2:2001
CJK Unified Ideographs Extension B [part 5] (U-00028000 〜 U-00029FFF) of ISO/IEC 10646-2:2001
CJK Unified Ideographs Extension B [part 6] (U-0002A000 〜 U-0002A6D6) of ISO/IEC 10646-2:2001
CJK Compatibility Ideographs Supplement (U-0002F800 〜 U-0002FA1D) of ISO/IEC 10646-2:2001
Morohashi: Daikanwa Jiten, Volume 1
Morohashi: Daikanwa Jiten, Volume 2
Morohashi: Daikanwa Jiten, Volume 3
Morohashi: Daikanwa Jiten, Volume 4
Morohashi: Daikanwa Jiten, Volume 5
Morohashi: Daikanwa Jiten, Volume 6
Morohashi: Daikanwa Jiten, Volume 7
Morohashi: Daikanwa Jiten, Volume 8
Morohashi: Daikanwa Jiten, Volume 9
Morohashi: Daikanwa Jiten, Volume 10
Morohashi: Daikanwa Jiten, Volume 11
Morohashi: Daikanwa Jiten, Volume 12
Morohashi: Daikanwa Jiten, Additions
Morohashi: Daikanwa Jiten, Appendix
Characters encountered by the Chinese Buddhist Electronic Text Association (CBETA)

Intgegration and Composition of Character Glyphs and Styles

In the character database is information about character glyphs and styles collected. This allows to use this information together with the other knowledge about a character in the database to built a system that uses the component structure information to assemble the font for a character depending on the contextual requirements from its components. With this system, occurrences of mismatches based on erroneous association or insufficient contextual information are excluded, and it will be possible easily display and print character forms that have not been codified and for which no fonts exists .

Mathematical analysis and visualation of character knowledge

Mailing List

Discussion about the CHISE Project occur in the CHISE-{ja|en} mailing list.

Anybody who would like to take part in the discussion about and development of the CHISE Project, has ideas or questions about the implementation or wishes for new features is welcome to join either the English, or the Japanese or both lists.

To become a member in the CHISE mailing, send a message to the following adress:

For Japanese:
For English:
with the word
subscribe Your Name
in the body of the message. You will then receive a conformation message with the line
confirm PASSWORD Your Name
You will have to reply to this message to become a member.

Papers and Presentations


[Documentation and Information Center for Chinese Studies at the Institute for Research in the Humanities  Related Projects ]

Last modified: Wed Oct 9 03:33:25 JST 2002 . counter since Oct 9 2002.