papers/chise-m17n-2001.txt

   1 -*- coding: utf-8-gb-er -*-
   2 \f
   3
   4 知世 project ― beyond the UTF-2000
   5
   6
   7
   8
   9
  10
  11   守岡 知彦 / MORIOKA Tomohiko
  12         京都大学 漢字情報硏究センター
  13         Document Information Center
  14         for Chinese Studies, Kyōto University
  15 \f
  16
  17 What is 知世? (1)
  18
  19     知 (Knowledge, Information)
  20
  21     世 (world and age)
  22
  23 Not only for worldwide,
  24  but also for time (ancient → future)
  25
  26 \f
  27
  28 What is 知世? (2)
  29
  30 ・CHISE (CHaracter Information
  31                 Service Environment)
  32         character information server
  33
  34 ・TOMOYO (Text Object Manipulator
  35                 and Outfit for YOurself)
  36 \f
  37
  38 History (1)— Before UTF-2000
  39
  40 ・each character is
  41         represented by coded character sets
  42
  43 \f
  44
  45 History (2) — UTF-2000 (1)
  46
  47 ・each character is
  48         represented by character object
  49
  50 \f
  51
  52 UTF-2000 (2)
  53
  54 ・Every character related information
  55         are stored in character database
  56
  57   - system gets property of character
  58         from the database
  59
  60   - user can add characters by definition
  61         → not only shape
  62         → user can use own unification rule
  63 \f
  64
  65 XEmacs UTF-2000
  66
  67 ・sample implementation of UTF-2000
  68
  69         based on XEmacs-Mule
  70
  71 \f
  72
  73 Problem of XEmacs UTF-2000
  74
  75 ・Require too big memory
  76   → external database + lazy loading
  77
  78 ・There are no UTF-2000 based
  79         external representations
  80   → XML? for file
  81      multipart/related
  82         + application/char-info for MIME
  83
  84 → 知世 project
  85 \f
  86
  87 Plan of 知世 (CHISE)
  88
  89 (1) private character database
  90         based on dbm like simple database
  91
  92 (2) local character database server
  93         (based on PostgreSQL?)
  94
  95 (3) distributed server system
  96         - How to sync
  97         - Check conflicts and report
  98 \f
  99
 100 Plan of 知世 (TOMOYO)
 101
 102 (0) Complete UTF-2000
 103     (a) complete XEmacs UTF-2000
 104                 and send MEGA patch
 105                 to xemacs-patches :-)
 106     (b) implement GNU Emacs 21 UTF-2000
 107
 108 (1) Multiple representation in one system
 109
 110 (2) Character definition editor
 111
 112 (3) Network representation
 113 \f
 114
 115 Related Plan
 116
 117 ・Develop high quality character data
 118         not depended on any character codes
 119
 120 ・Integrate glyph, shape and
 121         type setting information
 122         into the character database system
 123
 124 ・Searchable image based document database
 125         (especially for classical
 126          Chinese documents,
 127                 such as 拓本, 稀覯本)