From: MORIOKA Tomohiko Date: Tue, 11 Jun 2002 17:37:01 +0000 (+0000) Subject: New files. X-Git-Url: http://git.chise.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=1719044f96038f7b46acada68efffc7c7bb279e8;p=www%2Fchise.git New files. --- diff --git a/papers/mitou-2001-report/main/images/mitou-report01.jpg b/papers/mitou-2001-report/main/images/mitou-report01.jpg new file mode 100644 index 0000000..5decdbc Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report01.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report02.gif b/papers/mitou-2001-report/main/images/mitou-report02.gif new file mode 100644 index 0000000..1588d61 Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report02.gif differ diff --git a/papers/mitou-2001-report/main/images/mitou-report03.jpg b/papers/mitou-2001-report/main/images/mitou-report03.jpg new file mode 100644 index 0000000..abd21ce Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report03.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report04.jpg b/papers/mitou-2001-report/main/images/mitou-report04.jpg new file mode 100644 index 0000000..3ed15eb Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report04.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report05.jpg b/papers/mitou-2001-report/main/images/mitou-report05.jpg new file mode 100644 index 0000000..a3517f0 Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report05.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report06.jpg b/papers/mitou-2001-report/main/images/mitou-report06.jpg new file mode 100644 index 0000000..cb92f29 Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report06.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report07.jpg b/papers/mitou-2001-report/main/images/mitou-report07.jpg new file mode 100644 index 0000000..c8a8567 Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report07.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report08.jpg b/papers/mitou-2001-report/main/images/mitou-report08.jpg new file mode 100644 index 0000000..c51be0c Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report08.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report09.jpg b/papers/mitou-2001-report/main/images/mitou-report09.jpg new file mode 100644 index 0000000..76f49ed Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report09.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report10.jpg b/papers/mitou-2001-report/main/images/mitou-report10.jpg new file mode 100644 index 0000000..e1cd624 Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report10.jpg differ diff --git a/papers/mitou-2001-report/main/images/mitou-report12.jpg b/papers/mitou-2001-report/main/images/mitou-report12.jpg new file mode 100644 index 0000000..2aac94a Binary files /dev/null and b/papers/mitou-2001-report/main/images/mitou-report12.jpg differ diff --git a/papers/mitou-2001-report/main/img10.png b/papers/mitou-2001-report/main/img10.png new file mode 100644 index 0000000..cf62f18 Binary files /dev/null and b/papers/mitou-2001-report/main/img10.png differ diff --git a/papers/mitou-2001-report/main/img12.png b/papers/mitou-2001-report/main/img12.png new file mode 100644 index 0000000..f28fb3a Binary files /dev/null and b/papers/mitou-2001-report/main/img12.png differ diff --git a/papers/mitou-2001-report/main/img2.png b/papers/mitou-2001-report/main/img2.png new file mode 100644 index 0000000..c7cc6df Binary files /dev/null and b/papers/mitou-2001-report/main/img2.png differ diff --git a/papers/mitou-2001-report/main/img4.png b/papers/mitou-2001-report/main/img4.png new file mode 100644 index 0000000..08f4e3b Binary files /dev/null and b/papers/mitou-2001-report/main/img4.png differ diff --git a/papers/mitou-2001-report/main/img5.png b/papers/mitou-2001-report/main/img5.png new file mode 100644 index 0000000..8725730 Binary files /dev/null and b/papers/mitou-2001-report/main/img5.png differ diff --git a/papers/mitou-2001-report/main/img6.png b/papers/mitou-2001-report/main/img6.png new file mode 100644 index 0000000..389e335 Binary files /dev/null and b/papers/mitou-2001-report/main/img6.png differ diff --git a/papers/mitou-2001-report/main/img7.png b/papers/mitou-2001-report/main/img7.png new file mode 100644 index 0000000..4cf5f46 Binary files /dev/null and b/papers/mitou-2001-report/main/img7.png differ diff --git a/papers/mitou-2001-report/main/img8.png b/papers/mitou-2001-report/main/img8.png new file mode 100644 index 0000000..8f70c80 Binary files /dev/null and b/papers/mitou-2001-report/main/img8.png differ diff --git a/papers/mitou-2001-report/main/img9.png b/papers/mitou-2001-report/main/img9.png new file mode 100644 index 0000000..a4c0670 Binary files /dev/null and b/papers/mitou-2001-report/main/img9.png differ diff --git a/papers/mitou-2001-report/main/index.html b/papers/mitou-2001-report/main/index.html new file mode 100644 index 0000000..11dd987 --- /dev/null +++ b/papers/mitou-2001-report/main/index.html @@ -0,0 +1,92 @@ + + + + + +2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ 13¾ð·ÐÂè1154¹æ¡Ë Êó¹ð½ñ + + + + + + + + + + + + + + + + + +next +up +previous +
+ Next: ÌÜŪ +
+
+ + +

+

2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È +

+
+ ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ +
ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ +
¡Ê·ÀÌóÈÖ¹æ 13¾ð·ÐÂè1154¹æ¡Ë +
+
+
Êó¹ð½ñ +

+
+
+

+

³«È¯¼Ô¡§¼é²¬ ÃÎɧ, Christian Wittern

+ +

+ +

+ +

+


+ + + + + +

+
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node1.html b/papers/mitou-2001-report/main/node1.html new file mode 100644 index 0000000..c634049 --- /dev/null +++ b/papers/mitou-2001-report/main/node1.html @@ -0,0 +1,109 @@ + + + + + +ÌÜŪ + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¥â¥Ç¥ë + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ +
+
+ + +

+ÌÜŪ +

+ +

+¼¡À¤Âå¤Î¿¸À¸ìʸ½ñ½èÍýµ»½Ñ¤Î´ðÁäȤ·¤Æ¡¢Ê¸»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»ú¥ª +¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤ò¸¦µæ¡¦³«È¯¤¹¤ë¡£¤³¤ÎÌÜŪ¤Î¤¿¤á¤Ë¡¢¸Ä¿ÍÍÑ¡¢¥µ¥¤¥ÈÍÑ¡¢¥¤ +¥ó¥¿¡¼¥Í¥Ã¥ÈÍѤÎʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¡¦¥·¥¹¥Æ¥à¤È¤½¤Î¥³¥ó¥Æ¥ó¥Ä¡¢¤ª¤è¤Ó¡¢¤³ +¤ì¤é¤òÍѤ¤¤ë¥¯¥é¥¤¥¢¥ó¥È¡¦¥·¥¹¥Æ¥à¤ò³«È¯¤¹¤ë¡£ + +

+·×»»µ¡¤Ë¤ª¤±¤ëʸ»úɽ¸½µ»½Ñ¤Ï·×»»µ¡¤Ë¤ª¤±¤ë¤µ¤Þ¤¶¤Þ¤Ê¥Ç¡¼¥¿É½¸½¤Î´ðÈ×¤Ç +¤¢¤ê¡¢¥¤¥ó¥¿¡¼¥Í¥Ã¥È¤Ë¤ª¤±¤ë¿ºÌ¤Ê¥³¥ó¥Æ¥ó¥Ä¤âʸ»úɽ¸½µ»½Ñ¤Ë°Í¤Ã¤Æ¤¤¤ë¡£ +¸½ºß¡¢·×»»µ¡¤Ë¤ª¤±¤ëʸ»úɽ¸½¤Ï¤¹¤Ù¤ÆÉä¹æ²½Ê¸»úµ»½Ñ¤Ë´ð¤Å¤¤¤Æ¤¤¤ë¡£¤³¤ì +¤Ïʸ»ú¤òÂбþ¤¹¤ëÈÖ¹æ¤Çɽ¸½¤¹¤ëÊýË¡¤Ç¤¢¤ë¡£ + +

+¤³¤ÎÊýË¡¤Ç¤Ïʸ»ú¤ÈÈÖ¹æ¤ÎÂбþµ¬Â§¤Ç¤¢¤ëÉä¹æ²½Ê¸»ú½¸¹ç¡Êʸ»úÉä¹æ¡Ë¤òξ¼Ô +¤¬¶¦Í­¤·¤Æ¤¤¤ëɬÍפ¬¤¢¤ë¡£¤µ¤â¤Ê¤¯¤Ðʸ»ú²½¤±¤¬À¸¤¸¤ë¡£¤Þ¤¿¡¢Éä¹æ²½Ê¸»ú +½¸¹ç¤Ë´Þ¤Þ¤ì¤ëʸ»ú¤ÏÍ­¸Â¤Ç¤¢¤ê¡¢¤½¤³¤Ë¸ºß¤·¤Ê¤¤Ê¸»ú¤Ïɽ¸½¤Ç¤­¤Ê¤¤¡£¤³ +¤Î¤¿¤á¡¢½¾Íè¡¢¸ò´¹²ÄǽÀ­¤ò¤¢¤­¤é¤á¤Æ³°»ú¤òÍѤ¤¤ëÊýË¡¤ä¡¢¸¡º÷¤Ê¤É¤ÎÉä¹æ +²½Ê¸»ú¤È¤·¤Æ¤ÎÍøÊØÀ­¤ò¤¢¤­¤é¤á¤Æ²èÁü¤òÍѤ¤¤ëÊýË¡¤¬ºÎ¤é¤ì¤Æ¤­¤¿¡£¤Þ¤¿¡¢ +¤³¤¦¤·¤¿ÌäÂê¤òÈò¤±¤ë¤¿¤á¤Ë¡¢Ê¸»úÉä¹æ¤Îµ¬³Ê¤Ë¿¿ô¤Îʸ»ú¤ÎÄɲäò¹Ô¤¦Æ°¤­ +¤â¤¢¤ë¤¬¡¢¤³¤ì¤Ë¤âµ»½ÑŪ¡¦À¯¼£ÅªÌäÂ꤬¤¢¤ë¡£ + +

+¤³¤Î¤è¤¦¤ÊÌäÂê¤òÈ´ËÜŪ¤Ë²ò·è¤·¡¢Æü¾ïŪ¤Êʸ½ñ¤Î¤ß¤Ê¤é¤ºÎò»ËŪ¤Êʸ½ñ¤ä¾­ +ÍèÀ¸¤¸¤ëʸ½ñ¤âɽ¸½²Äǽ¤Ç¡¢¤«¤Ä¡¢³Æʸ½ñ¤Î±Ê³À­¤òÊݾڤ¹¤ë¤¿¤á¤Ë¤Ï¡¢Éä¹æ +²½Ê¸»ú½¸¹ç¤Ë°Í¸¤·¤Ê¤¤¼¡À¤Âå¤Î¿¸À¸ìʸ½ñ½èÍýµ»½Ñ¤Î³ÎΩ¤¬É¬ÍפǤ¢¤ë¤È¹Í +¤¨¤ë¡£ + +

+¤³¤Î¤¿¤á¤Ë¤Ïʸ½ñɽ¸½¤Ë¤ª¤±¤ë¤¢¤é¤æ¤ëÍ×ÁǤòµ¡³£²ÄÆɤÊÊýË¡¤ÇÀë¸ÀŪ¤Ëµ­½Ò +²Äǽ¤Ë¤¹¤ëɬÍפ¬¤¢¤ë¡£¤¹¤Ç¤Ë¡¢Ê¸»ú¤è¤êÂ礭¤Êñ°Ì¤ÎÍ×ÁǤËÂФ·¤Æ¤Ï¡¢ +SGML/XML ¤Ê¤É¤Ë¤â¤È¤Å¤¯¥Þ¡¼¥¯¥¢¥Ã¥×¼êË¡¤¬ÍѤ¤¤é¤ì¤Æ¤¤¤ë¡£¤½¤³¤Ç¡¢²æ¡¹ +¤Ïʸ»ú¤ËÂФ·¤ÆƱÍͤμêË¡¤ò»î¤ß¤Æ¤¤¤ë¡£ + +

+ʸ»ú¤Îµ¡³£²ÄÆɤʵ­½Ò¤Ë´ð¤Å¤¯¼¡À¤Âåʸ½ñ½èÍýµ»½Ñ¤ò³ÎΩ¤·¡¢¼ÂÍѲ½¤¹¤ë¤¿¤á +¤Ë¤ÏÈÆÍÑŪ¤Êʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¡¦¥·¥¹¥Æ¥à¤ª¤è¤Ó¥¯¥é¥¤¥¢¥ó¥È¡¦¥·¥¹¥Æ¥à¤Î¸ú +ΨŪ¤Ê¼Â¸½¤¬·ç¤«¤»¤Ê¤¤¡£¤Þ¤¿¡¢¤³¤ì¤é¤Î¾å¤ÇÍøÍѲÄǽ¤Êʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î +¥³¥ó¥Æ¥ó¥Ä¤â¤Ê¤¯¤Æ¤Ï¤Ê¤é¤Ê¤¤¡£ + +

+¥¤¥ó¥¿¡¼¥Í¥Ã¥È¤Ç¤ÎÉáµÚ¤ò¹Í¤¨¤ë¤Ê¤é¤Ð¡¢¤³¤¦¤·¤¿µ»½Ñ¤Î¾¯¤Ê¤¯¤È¤âÃæ³Ë¤Ï¼« +ͳ¤ËÍøÍѲÄǽ¤Ê¥×¥í¥°¥é¥à¤È¥Ç¡¼¥¿¤Ç¹½À®¤µ¤ì¤ëɬÍפ¬¤¢¤ë¡£¤è¤Ã¤Æ¡¢¼«Í³¥½ +¥Õ¥È¥¦¥§¥¢¡¦¥Ç¡¼¥¿¤ò³èÍѤ·¡¢´õ˾¤¹¤ë¤â¤Î¤¬¼«Í³¤ËÍøÍѤ·¤¿¤ê³«È¯¤Ë»²²Ã¤Ç +¤­¤ë¤è¤¦¤Ê·Á¤Ç¥×¥í¥¸¥§¥¯¥È¤ò¹Ô¤¦¡£ + +

+


+
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node2.html b/papers/mitou-2001-report/main/node2.html new file mode 100644 index 0000000..a53db65 --- /dev/null +++ b/papers/mitou-2001-report/main/node2.html @@ -0,0 +1,280 @@ + + + + + +ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¥â¥Ç¥ë + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ÌÜŪ +
+
+ + +Subsections + + + +
+ +

+ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¥â¥Ç¥ë +

+ +

+ +

+½¾Íè¤Îʸ»úɽ¸½¤ÎÆÃħ¤ÈÌäÂêÅÀ +

+ +

+ +

+Éä¹æ²½Ê¸»ú¥â¥Ç¥ë +

+ +

+¸½ºß¡¢Â¿¤¯¤Îʸ»ú½èÍý·Ï¤Ç¤Ï¡¢Ê¸»ú¤½¤Î¤â¤Î¤ò°·¤¦Âå¤ï¤ê¤Ë¡¢Ê¸»ú¤Ë¸ÇÍ­¤ÎÈÖ +¹æ¤ò¿¶¤Ã¤Æ¡¢¤½¤ÎÈÖ¹æ¤Çʸ»ú¤òɽ¤¹¼êË¡¤òÍѤ¤¤Æ¤¤¤ë¡£¤³¤³¤Ç¡¢Ê¸»ú¤ÈÈÖ¹æ¤Î +Âбþµ¬Â§¤ò¡ÖÉä¹æ²½Ê¸»ú½¸¹ç¡×(Coded Character Set; CCS) ¤â¤·¤¯¤Ï¡Öʸ»ú +Éä¹æ¡×(character code) ¤È¸Æ¤Ó¡¢Ê¸»ú¤Ë¿¶¤é¤ì¤¿ÈÖ¹æ¤ò¡ÖÉä¹æ°ÌÃÖ¡×(code +point) ¤È¸Æ¤Ö¡£¤Þ¤¿¡¢¤³¤Î¤è¤¦¤Ë¡¢Í­¸Â¤Îʸ»ú¤Î½¸¹ç¤òÄê¤á¡¢³Æʸ»ú¤Ë¸ÇÍ­ +¤ÎÈÖ¹æ¤ò¿¶¤ê¤½¤ÎÈÖ¹æ¤Çʸ»ú¤òɽ¸½¤¹¤ë¼êË¡¤Î¤³¤È¤ò¡¢¤³¤³¤Ç¤Ï¡ØÉä¹æ²½Ê¸»ú +¥â¥Ç¥ë¡Ù¤È¸Æ¤Ö¤³¤È¤Ë¤¹¤ë¡£ + +

+ +

+
+ + + +
Figure 2.1: +Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Î³µÇ°¿Þ
+
+ +\scalebox{0.5}{\includegraphics*[0mm,15mm][300mm,155mm]{char-code.eps}} +
+
+

+ +

+¿Þ 2.1.1¤Ë¼¨¤¹¤è¤¦¤Ë¡¢Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Ç¤Ïʸ»ú¤Ï +ʸ»ú¤Ë¿¶¤é¤ì¤¿ÈÖ¹æ¤Çɽ¸½¤µ¤ì¤ë¡£Ê¸»ú¤Ë´Ø¤¹¤ëÃ챤ÏÉä¹æ²½Ê¸»ú½¸¹ç¤ÎÄêµÁ +¤ÎÃæ¤Ë¸ºß¤·¡¢Ä̾·×»»µ¡¤ÎÃæ¤Ë¤Ï¸ºß¤·¤Ê¤¤¡£¤³¤Î¤¿¤á¡¢·×»»µ¡¤Ï¾¯¤Ê¤¤ +µ­²±Î̤Çʸ»ú¤òɽ¸½¤Ç¤­¤ëȾÌÌ¡¢·×»»µ¡¤¬°·¤¦¤³¤È¤¬¤Ç¤­¤ëʸ»ú¤Î¼ïÎà¤äʸ»ú +¤Î³µÇ°¤ÏÉä¹æ²½Ê¸»ú½¸¹ç¤ÎÄêµÁ¤Ë«Çû¤µ¤ì¤ë¡£ + +

+Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Ë´ð¤Å¤¤¤ÆÄÌ¿®¤ò¹Ô¤Ê¤¦¾ì¹ç¡¢Á÷¿®¼Ô¤È¼õ¿®¼Ô¤Ïʸ»ú¤ÈÈÖ¹æ +¤ÎÂбþµ¬Â§¤Ç¤¢¤ëʸ»úÉä¹æ¤ò¶¦Í­¤·¤Æ¤¤¤ëɬÍפ¬¤¢¤ë¡£¤µ¤â¤Ê¤¯¤Ðʸ»ú²½¤±¤¬ +À¸¤¸¤ë¡£¤¢¤ëʸ»úÉä¹ç¤Çɽ¸½¤Ç¤­¤Ê¤¤Ê¸»ú¤¬Â¸ºß¤·¤¿¾ì¹ç¡¢Î¾¼Ô¤¬¹ç°Õ¤·¤ÆÄÌ +¿®¼ê½ç¤ÇÍѤ¤¤ëʸ»úÉä¹ç¤ò³ÈÄ¥¤·¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¡£¤·¤«¤·¡¢¥¤¥ó¥¿¡¼¥Í¥Ã¥È +¤Î¤è¤¦¤ÊÉÔÆÃÄê¿¿ô¤¬»²²Ã¤·¤µ¤Þ¤¶¤Þ¤Ê¼ïÎà¤Îµ¡ºà¤¬Â¸ºß¤¹¤ë¤è¤¦¤ÊÂ絬Ìϳ« +Êü¥·¥¹¥Æ¥à¤ÇÍѤ¤¤é¤ì¤ëɸ½àŪ¤Êʸ»úÉä¹ç¤ò³ÈÄ¥¤¹¤ë¤³¤È¤Ï¶Ë¤á¤Æº¤Æñ¤Ç¤¢¤ë +¤È¤¤¤¨¤ë¡£ + +

+¤³¤Î¤¿¤á¡¢¸½¼ÂŪ¤ÊÊýºö¤È¤·¤Æ¡¢½¾Íè¡¢¸ò´¹²ÄǽÀ­¤ò¤¢¤­¤é¤á¤Æ³°»ú¤òÍѤ¤¤ë +ÊýË¡¤ä¡¢¸¡º÷¤Ê¤É¤ÎÉä¹æ²½Ê¸»ú¤È¤·¤Æ¤ÎÍøÊØÀ­¤ò¤¢¤­¤é¤á¤Æ²èÁü¤òÍѤ¤¤ëÊýË¡ +¤¬ºÎ¤é¤ì¤Æ¤­¤¿¡£¤Þ¤¿¡¢¹ñºÝŪ¤Êɸ½àÉä¹ç¤È¤·¤Æ ISO/IEC 10646 +(UCS)/Unicode ¤ò de jure/de fact standard ¤È¤·¤ÆÀ©Äꤷ¡¢¤³¤ì¤òÁ´À¤³¦¤Ç +¶¦Í­¤·¡¢¤Þ¤¿¡¢¤³¤ì¤ò³ÈÄ¥¤¹¤ë¤³¤È¤Ë¤è¤ê¤³¤ÎÌäÂê¤ò²ò·è¤·¤è¤¦¤È¤¹¤ëÅØÎϤ¬ +³¤±¤é¤ì¤Æ¤¤¤ë¤¬¡¢¤³¤ì¤Ë¤âµ»½ÑŪ¡¦À¯¼£ÅªÌäÂ꤬¤¢¤êÍưפǤϤʤ¤¡£ +[*] +

+ +

+Éä¹ç³Èĥˡ ¡½ Mule Êý¼° +

+ +

+Á°½Ò¤Î¤è¤¦¤Ë¡¢Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Ç¤Ï¡¢¡ÖÀ¤¤ÎÃæ¤Ë¸ºß¤¹¤ë¤µ¤Þ¤¶¤Þ¤Êʸ»ú¡× +¤È¡ÖÉä¹æ²½¤µ¤ìÉä¹æ°ÌÃ֤Ȥ¤¤¦ÈÖ¹æ¤ò¿¶¤é¤ì¤¿Ê¸»ú¡×¤Î´Ö¤Î´Ø·¸¤Ï¡¢Éä¹æ²½Ê¸ +»ú½¸¹ç¤Ë¤è¤Ã¤ÆÄêµÁ¤µ¤ì¤Æ¤¤¤ë¡£¤³¤Î¹Í¤¨Êý¤Ç¤Ï¡¢Ê¸»ú¤Î»ý¤Ä¤¤¤í¤¤¤í¤ÊÀ­¼Á +¤Ï¸Ä¡¹¤Îʸ»ú¤Ç¤Ï¤Ê¤¯Éä¹æ²½Ê¸»ú½¸¹ç¤¬»ý¤Ä¤³¤È¤òÁ°Äó¤È¤·¤Æ¤¤¤ë¡£¤Þ¤¿¡¢¤µ +¤é¤Ë¡¢¤½¤ì¤¾¤ì¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤¬»÷¤¿À­¼Á¤Îʸ»ú¤Î½¸¹ç¤Ç¤¢¤ê¡¢ÆÃÄê¤ÎÍÑ»ú +·Ï (script) ¤ä¸À¸ì¤ÇÍѤ¤¤ëʸ»ú¤Î½¸¹ç¤Ç¤¢¤ë¤È²¾Äê¤Ç¤­¤ë¤Ê¤é¤Ð¡¢Éä¹æ²½ +ʸ»ú½¸¹ç¤Î¼ïÎà¤òÆÃÄꤹ¤ë¤³¤È¤Ë¤è¤ê¡¢ÍÑ»ú·Ï¤ä¸À¸ì¤òÆÃÄê¤Ç¤­¤ë¤È¤¤¤¨¤ë¡£ + +

+¤³¤Î²¾Ä꤬¤Ê¤ê¤¿¤Ã¤Æ¤¤¤ì¤Ð¡¢¸Ä¡¹¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤ÏÍÑ»ú·Ï¤ä¸À¸ì¤ò¶á»÷¤¹ +¤ë¤â¤Î¤È¤·¤ÆÍѤ¤¤ë¤³¤È¤¬¤Ç¤­¡¢¾¯¤Ê¤¤µ­²±ÍÆÎ̤ÇÂçÎ̤Îʸ»ú¤ò°·¤¦¤³¤È¤¬¤Ç +¤­¤ë¡£UCS/Unicode ¤Î¤è¤¦¤Ê¿¿ô¤Î¸À¸ì¤òÂоݤË¿¿ô¤ÎÍÑ»ú·Ï¤ò¼ýÏ¿¤·¤¿Éä¹æ +²½Ê¸»ú½¸¹ç¤¬ÍøÍѤµ¤ì¤ë¤è¤¦¤Ë¤Ê¤ë¤Þ¤Ç¤Ï¡¢Ã±°ì¤Î¸À¸ì¤äÍÑ»ú·Ï¤òÁÛÄꤷ¤Æºî +¤é¤ì¤Æ¤¤¤¿Éä¹æ²½Ê¸»ú½¸¹ç¤¬Â¿¤¯¡¢¤³¤ÎÁ°Äó¤Ï¤Û¤ÜËþ¤¿¤µ¤ì¤Æ¤¤¤¿¡£ + +

+¤³¤Î¤è¤¦¤Ê²¾Äê¤òÁ°Äó¤ËÉä¹æ²½Ê¸»ú½¸¹ç¤òÄɲòÄǽ¤Ë¤·¤¿¥·¥¹¥Æ¥à¤È¤·¤Æ +Mule [3] ¤¬Â¸ºß¤¹¤ë¡£[*] +

+¤·¤«¤·¡¢Â¿¤¯¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤ÏÄ̾ï¤Îʸ»ú¤À¤±¤Ç¤Ê¤¯¡¢¶çÆÉÅÀ¤Î¤è¤¦¤Êµ­¹æ +¤ò´Þ¤ó¤Ç¤ª¤ê¡¢¤Þ¤¿¡¢JIS X 0208 ¤Î¤è¤¦¤ËÊ£¿ô¤ÎÍÑ»ú·Ï¤ò´Þ¤à¤è¤¦¤ÊÉä¹æ²½ +ʸ»ú½¸¹ç¤ä ISO 8859-1 ¤Î¤è¤¦¤ËÊ£¿ô¤Î¸À¸ì¤ÇÍøÍѤµ¤ì¤ëÉä¹æ²½Ê¸»ú½¸¹ç¤â¿ +¤¯Â¸ºß¤·¤¿¤Î¤Ç¡¢Åö½é¤«¤é¤³¤ì¤Ï¤¢¤¯¤Þ¤Ç¶á»÷¤Ë²á¤®¤Ê¤«¤Ã¤¿¡£ +µÕ¤Ë¡¢Æ±°ì¤Îʸ»ú¤Ç¤â¡¢Éä¹æ²½Ê¸»ú½¸¹ç¤¬°Û¤Ê¤ì¤Ð°ã¤¦Ê¸»ú¤È¤·¤Æ°·¤ï¤ì¤Æ¤· +¤Þ¤¦¡£Î㤨¤Ð¡¢ISO 8859-1 ¤Î¡Öá¡×¤È ISO 8859-2 ¤Î¡Öá¡×¤ä JIS X +0208:1978 ¤Î¡Ö¤¢¡×¤È JIS X 0208:1983 ¤Î¡Ö¤¢¡×¤ÏÊ̤Îʸ»ú¤È¤µ¤ì¤ë¡£¤³¤ì +¤Ç¤Ï¡¢¸¡º÷¤äʸ»ú¤ÎÀ­¼Á¤òÄêµÁ¤¹¤ë¾å¤ÇÌäÂ꤬¤ª¤³¤ë¡£À¾²¤¤ÈÅ첤¤Î¤è¤¦¤Ë¡¢ +ÅÁÅýŪ¤Ë¤Ï°Û¤Ê¤ëÉä¹æ²½Ê¸»ú½¸¹ç¤ò»È¤Ã¤Æ¤­¤Æ¤ª¤ê¡¢¶áǯ¸òή¤¬À¹¤ó¤Ë¤Ê¤ê¤Ä +¤Ä¤¢¤ë¤è¤¦¤Ê´Ä¶­¤Ç¤Ï¡¢ÆäËÉÔÊؤǤ¢¤ë¡£ + +

+ʸ»ú¤Î½Å¤Ê¤ê¤¬¤Û¤È¤ó¤É¤Ê¤¤¤³¤È¤òÁ°Äó¤È¤·¤Æ¤¤¤ë½¾Íè¤Î Mule ·¿¤Îʸ»úɽ¸½ +ÊýË¡¤Ï¡¢µ­²±»ñ¸»¤¬¹â²Á¤Ç¤¢¤Ã¤¿»þÂå¤Ë¤Ï°ì¼ï¤Î¶á»÷²ò¤È¤·¤Æ¸ú²ÌŪ¤Ç¤¢¤Ã¤¿¡£ +¤·¤«¤·µ­²±»ñ¸»¤¬Èæ³ÓŪ˭É٤ˤʤê¤Ä¤Ä¤¢¤ë¸½ºß¡¢¤½¤ÎÍøÅÀ¤è¤ê¤âÊÀ³²¤¬Â¿¤¯ +¤Ê¤Ã¤Æ¤­¤¿¡£¤è¤ê¹âÅÙ¤Êʸ»ú½èÍý¤ä¤è¤ê½ÀÆð¤Ê³ÈÄ¥²ÄǽÀ­¤ò¼Â¸½¤¹¤ë¤¿¤á¤Ë¤â¡¢ +¤Þ¤¿¡¢¸½ºßÍøÍѤµ¤ì¤Æ¤¤¤ë¤µ¤Þ¤¶¤Þ¤ÊÉä¹æ²½Ê¸»ú½¸¹ç¤ò¤è¤êŬÀÚ¤ËÍøÍѤ¹¤ë¤¿ +¤á¤Ë¤â¡¢Ãø¼Ô¤é¤Ï½¾Íè¤Î Mule ¤ÎÊý¼°¤ËÂå¤ï¤ë¿·¤¿¤Êʸ»úɽ¸½¤Î»ÅÁȤߤò¼Â¸½ +¤¹¤ë¤³¤È¤ò¤á¤¶¤·¤Æ¤¤¤ë¡£ + +

+ +

+ +
+Éä¹æ²½Ê¸»ú½¸¹ç¤Ë°Í¸¤·¤Ê¤¤Ê¸»úɽ¸½¤Ø +

+ +

+Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Ë´ð¤Å¤«¤º¤Ëʸ»ú¤òɽ¸½¤¹¤ë¤³¤È¡¢¤¹¤Ê¤ï¤Á¡¢¸ÇÍ­¤ÎÈÖ¹æ¤Ç +ƱÄꤹ¤ë¤³¤È¤Ê¤·¤Ëʸ»ú¤ò»Ø¤·¼¨¤½¤¦¤È¤¹¤ë¤È¤·¤¿¤é¡¢¤É¤¦¤¹¤ì¤ÐÎɤ¤¤À¤í¤¦ +¤«¡© ¤ª¤½¤é¤¯¤½¤Î°ì¤Ä¤ÎÊýË¡¤Ï»Ø¤·¼¨¤·¤¿¤¤Ê¸»ú¤ÎÀ­¼Á¤òÎóµó¤¹¤ë¤³¤È¤À¤í +¤¦¡£Î㤨¤Ð¡¢¡Ö¤¢¡×¤ò»Ø¤·¼¨¤¹¤È¤¹¤ë¤Ê¤é¤Ð

+
ÍÑ»ú·Ï
+
¤Ò¤é¤¬¤Ê + +
+
²»²Á
+
/a/ + +
+
+¤Î¤è¤¦¤Ë¤¹¤ì¤Ð¤è¤¤¤À¤í¤¦¡£[*] ´Á»ú¤Î¾ì¹ç¤Ë¤Ï¡¢È¯ +²»¤À¤±¤Ç¤ÏÉÔ½½Ê¬¤Ç¤¢¤ë¡£Î㤨¤Ð¡¢¡Ö»ú¡×¤ò»Ø¤·¼¨¤·¤¿¤¤»þ¤Ë
+
ÍÑ»ú·Ï
+
´Á»ú + +
+
²»
+
/¤¸/ + +
+
+¤È¤¹¤ì¤Ð¡¢¡Ö»ú¡×°Ê³°¤Ë¤â¡Ö»þ¡×¤ä¡Ö¼¡¡×¤Ê¤É /¤¸/ ¤È¤¤¤¦²»¤ò»ý¤ÄÁ´¤Æ¤Î´Á +»ú¤¬´Þ¤Þ¤ì¤Æ¤·¤Þ¤¦¡£Áí²è¿ô¤òÉÕ¤±
+
ÍÑ»ú·Ï
+
´Á»ú + +
+
²»
+
/¤¸/ + +
+
Áí²è¿ô
+
6 + +
+
+¤È¤¹¤ì¤Ð¡¢¡Ö»þ¡×¤ä¡Ö»ö¡×¤Ê¤É¤Ï½ü³°¤µ¤ì¡¢¡Ö»ú¡×¤ä¡Ö¼¡¡×¤ä¡Ö¼ª¡×¤Ê¤É¤ÎÁí +²è¿ô¤¬£¶²è¤Î /¤¸/ ¤È¤¤¤¦²»¤ò»ý¤ÄÁ´¤Æ¤Î´Á»ú¤Î½¸¹ç¤Ë¸ÂÄꤵ¤ì¤ë¡£¤µ¤é¤ËÉô +¼ó¡Ö»Ò¡×¤ò»ØÄꤹ¤ì¤Ð¤µ¤é¤Ë¸ÂÄꤵ¤ì¤Æ¤¤¤¯¤À¤í¤¦¤·¡¢Ê¸»ú¤Î¹½Â¤¡Ö¡ØÕߡ٤Π+²¼¤Ë¡Ø»Ò¡Ù¡×¤ò»ØÄꤷ¤¿¤ê¡¢»úµÁ¤ò»ØÄꤹ¤ì¤Ð¤µ¤é¤Ë¸ÂÄꤵ¤ì¤ë¤À¤í¤¦¡£ + +

+¤³¤Î¤è¤¦¤Ëʸ»ú¤ò°À­¤Î½¸¹ç¤Çɽ¸½¤¹¤ë¤³¤È¤Ë¤è¤Ã¤Æ¡¢Éä¹æ²½Ê¸»úÊý¼°¤Ë°Í¤é +¤º¤Ëʸ»ú¡Ê¤Ê¤¤¤·¤Ïʸ»ú¤Î½¸¹ç¡Ë¤òɽ¸½¤¹¤ë¤³¤È¤¬²Äǽ¤Ç¤¢¤ë¡£¤Þ¤¿¡¢»ØÄꤹ +¤ë°À­¤ò¿¤¯¤·¤¿¤ê¾¯¤Ê¤¯¤·¤¿¤ê¤¹¤ë¤³¤È¤Ë¤è¤Ã¤Æ¡¢É½¸½¤¹¤ëʸ»ú¤ÎÊñÀݵ¬½à +¤òºÙ¤«¤¯¤·¤¿¤êÁƤ¯¤·¤¿¤ê¤¹¤ë¤³¤È¤¬¤Ç¤­¤ë¡£ + +

+¤³¤Î¤è¤¦¤Ê°À­¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¤Ë¤ª¤¤¤Æ¤Ï¡¢Ê¸»ú¤ÎÀ­¼Á¤Ë´Ø¤¹¤ë¾ðÊó¤Ïʸ»ú +°À­¤È¤·¤Æʸ»úɽ¸½¼«ÂΤ˴ޤޤì¤Æ¤¤¤ë¡£¤è¤Ã¤Æ¡¢ÆÃÄê¤Î½èÍý¤ËɬÍפʾðÊó¤Ï +ʸ»ú°À­¤È¤·¤Æʸ»úɽ¸½¤Ë´Þ¤á¤Æ¤ª¤«¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¡£Î㤨¤Ð¡¢É½¼¨¤ò¹Ô¤Ê¤¦ +¤Ë¤Ïʸ»ú¤Î»ú·Á¤ò¼¨¤¹¾ðÊó¤¬É¬ÍפǤ¢¤ë¡£µÕ¤Ë¡¢É½¼¨¤È¤¤¤¦½èÍý¤ò¹Ô¤Ê¤ï¤Ê¤¤ +¤³¤È¤òÁ°Äó¤È¤·¤¿Ê¸»úɽ¸½¤Ë¤Ï¡¢»ú·Á¤È¤¤¤¦Â°À­¤ÏÉÔÍפȤʤ롣 + +

+¤³¤Î¤è¤¦¤Ë³Æʸ»ú¤ò¤½¤Îʸ»ú¤¬»ý¤Ä°À­¤Î½¸¹ç¤Ë¤è¤Ã¤Æɽ¸½¤·¡¢¤³¤¦¤·¤¿Ê¸»ú +¤Î°À­¤Î½¸¹ç¤ò¥Ç¡¼¥¿¥Ù¡¼¥¹²½¤·¡¢¤½¤ì¤ò»²¾È¤·¤Ê¤¬¤éʸ»ú¤ò½èÍý¤¹¤ëÊý¼°¤ò +¤³¤³¤Ç¤Ï¡ØUTF-2000 Êý¼°¡Ù¤È¸Æ¤Ö¤³¤È¤Ë¤¹¤ë¡£UTF-2000 Êý¼°¤Ïʸ»ú¤Ë´Ø +¤¹¤ëÃ챤òľÀÜ¥×¥í¥°¥é¥à¤¹¤ë¤Î¤Ç¤Ï¤Ê¤¯¡¢¥Ç¡¼¥¿¥Ù¡¼¥¹²½¤·¤Æ¤½¤ì¤ò»²¾È¤¹ +¤ëÊýË¡¤Ç¤¢¤ë¤Î¤Ç¡¢·×»»µ¡¤¬°·¤¦¤³¤È¤¬¤Ç¤­¤ëʸ»ú¤Î¼ïÎà¤äʸ»ú¤Î³µÇ°¤ÏÉä¹æ +²½Ê¸»ú½¸¹ç¤ÎÄêµÁ¤Ë«Çû¤µ¤ì¤Ê¤¤¡£»ÈÍѤ¹¤ëʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò¼è¤ê´¹¤¨¤ë¤³ +¤È¤Ë¤è¤Ã¤Æ¡¢ÍưפËʸ»ú¤Î¼ïÎà¤ä³µÇ°¤òÊѹ¹²Äǽ¤Ç¤¢¤ë¡£¤³¤Î¤è¤¦¤Ê¹âÅ٤ʽÀ +ÆðÀ­¤ò»ý¤ÄȾÌÌ¡¢Â¿¿ô¤Îʸ»ú¤ò°·¤¦¾ì¹ç¡¢Â¿Î̤ε­²±Î̤¬É¬ÍפȤʤë¤È¹Í¤¨¤é +¤ì¤ë¡£¤³¤Î¤¿¤á¡¢Ê¸»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò½ÀÆðÀ­¤ò»¤Ê¤¦¤³¤È¤Ê¤¯¸úΨŪ¤Ë°·¤¦¤¿ +¤á¤Îµ¡¹½¤¬É¬ÍפȤʤ롣 + +

+¤È¤³¤í¤Ç¡¢UTF-2000 Êý¼°¼«ÂΤϴû¸¤ÎÉä¹æ²½Ê¸»úµ»½Ñ¤ÈÂÐΩ¤¹¤ë¤â¤Î¤Ç¤Ï¤Ê +¤¤¡£Éä¹æ²½Ê¸»ú½¸¹ç¤Î¼ïÎà¤ÈÉä¹æ°ÌÃÖ¤Ïʸ»ú°À­¤Î°ì¤Ä¤Èª¤¨¤ë¤³¤È¤Ï²Äǽ¤Ç +¤¢¤ê¡¢ÍøÍѤ·¤¿¤¤Éä¹æ²½Ê¸»ú½¸¹ç¤òʸ»ú°À­¤Ë´Þ¤á¤ë¤³¤È¤Ë¤è¤ê¡¢¤½¤ì¤é¤ÎÉä +¹æ²½Ê¸»ú½¸¹ç¤ÎÀ¤³¦¤È¾ðÊó¸ò´¹¤ò¹Ô¤Ê¤¦¤³¤È¤Ï²Äǽ¤Ç¤¢¤ë¡£ + +

+ +

+


+
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node3.html b/papers/mitou-2001-report/main/node3.html new file mode 100644 index 0000000..99a6965 --- /dev/null +++ b/papers/mitou-2001-report/main/node3.html @@ -0,0 +1,532 @@ + + + + + +ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹ + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¥â¥Ç¥ë +
+
+ + +Subsections + + + +
+ +

+ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï +

+ +

+²æ¡¹¤Ï 2.2 Àá¤Ç½Ò¤Ù¤¿¡ØUTF-2000 Êý¼°¡Ù¤Î¼Â¾Ú +¤òÌÜŪ¤Ë XEmacs UTF-2000 ¤ò³«È¯¤·¤¿¡£ +¤³¤Î¾Ï¤Ç¤Ï¡¢¼ç¤Ë¡¢Ê¸»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï (text editor) +¤È¤·¤Æ¤Î´ÑÅÀ¤«¤é XEmacs UTF-2000 ¤Ë¤Ä¤¤¤Æ³µÀ⤹¤ë¡£ + +

+ +

+XEmacs UTF-2000 +

+ +

+XEmacs-UTF-2000¡Ê¿Þ 3.1) ¤Ï XEmacs [6] +¤È¸Æ¤Ð¤ì¤ëÂÐÏ÷¿Åý¹ç´Ä¶­¤ò´ð¤Ë¤·¤Æ¤¤¤ë¡£XEmacs ¤Ï GNU Emacs +[4] ¤È¸Æ¤Ð¤ì¤ë³ÈÄ¥²Äǽ¤Ê¥¨¥Ç¥£¥¿¤òÃæ¿´¤È¤·¤¿ÂÐÏ÷¿Åý¹ç´Ä +¶­¤òÀ°Íý¡¦³ÈÄ¥¤·¡¢³¨¤Ê¤É¤â°·¤¨¤ë¤è¤¦¤Ë¤·¤¿¤â¤Î¤Ç¤¢¤ë[*]¡£°ìÊý¡¢Mule +(MULtilingual enhancement of GNU Emacsen) [3] ¤ÏÅŻҵ»½ÑÁí¹ç¸¦ +µæ½ê¤ÎȾÅÄ·õ°ì»á¤¬Ãæ¿´¤È¤Ê¤Ã¤Æ³«È¯¤·¤Æ¤¤¤ë GNU Emacs ¤Î¿¸À¸ì³ÈÄ¥¤Ç¡¢ +¸½ºß¤Ï GNU Emacs ¤ËÅý¹ç¤µ¤ì¤Æ¤¤¤ë¡£¤³¤Î Mule µ¡Ç½¤ò XEmacs ¤ËÅý¹ç¤·¤¿ +XEmacs-Mule ¤Î³«È¯¤â¿Ê¤á¤é¤ì¤Æ¤ª¤ê¡¢XEmacs ¤Î°ìÉô¤È¤·¤ÆÇÛÉÛ¤µ¤ì¤Æ¤¤¤ë¡£ +XEmacs UTF-2000 ¤Ï¤³¤Î XEmacs-Mule ¤ò´ð¤ËÂçÉý¤Ë²þÊѤ·¤¿¤â¤Î¤Ç¤¢¤ë¡£ + +

+ +

+
+ + + +
Figure 3.1: +XEmacs UTF-2000
+\scalebox{0.5}{\includegraphics{xe-u2k-cd.eps}}
+

+ +

+GNU Emacs ¤ª¤è¤Ó XEmacs ¤Ï Emacs Lisp ¸À¸ì¤ò»È¤Ã¤Æµ¡Ç½¤ò³ÈÄ¥¤¹¤ë¤³¤È¤¬ +¤Ç¤­¡¢¼ÂºÝ¤Ë Emacs Lisp ¤Çµ­½Ò¤µ¤ì¤¿¤µ¤Þ¤¶¤Þ¤Ê¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤¬ºîÀ®¤µ +¤ìÍøÍѤµ¤ì¤Æ¤¤¤ë¡£Î㤨¤Ð¡¢GNU Emacs/XEmacs ¤ÎÃæ¤ÇÅŻҥ᡼¥ë¤ä¥Í¥Ã¥È¥Ë¥å¡¼ +¥¹¤òÆɤ߽ñ¤­¤¹¤ë¤³¤È¤¬¤Ç¤­¤ë¤· WWW ÊǤò¸«¤ë¤³¤È¤¬¤Ç¤­¤ë¡£¤·¤«¤·¡¢¤³¤ì +¤é¤Ï¸µ¡¹ Latin-1 (ISO-8859-1) ¤Ê¤É¤Î 1 byte ¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤Ç¤·¤«ÍøÍÑ +²Äǽ¤Ç¤Ï¤Ê¤«¤Ã¤¿¡£ + +

+¤³¤ì¤ËÂФ·¡¢Mule ¤Ï GNU Emacs/XEmacs ¤ÎÍøÊØÀ­¡¦³ÈÄ¥²ÄǽÀ­¤ò¤½¤Î¤Þ¤Þ¤Ë¡¢ +ÍøÍѤǤ­¤ëÉä¹æ²½Ê¸»ú½¸¹ç¤òÂçÉý¤Ë³ÈÄ¥¤·¤¿¤â¤Î¤Ç¤¢¤ê¡¢¤µ¤Þ¤¶¤Þ¤Êʸ»ú¤òɽ +¼¨¤¹¤ëµ¡Ç½¤È³Æ¼ï¸À¸ìÍѤÎÆþÎϵ¡Ç½¤ò¤Ï¤¸¤á¤È¤¹¤ë¿¸À¸ì²½µ¡Ç½¤òÄ󶡤¹¤ë¡£ +Mule µ¡Ç½¤ò»ý¤Ã¤¿ GNU Emacs/XEmacs ¤Ç¤Ï GNU Emacs/XEmacs ¤Îµ¡Ç½¤¬¤µ¤Þ +¤¶¤Þ¤Êʸ»ú¡¦¸À¸ì¤ÇÍøÍѲÄǽ¤È¤Ê¤ë¡£[*] +

+Mule ¤Ç¤Ïʸ»ú¤ÏÉä¹æ²½Ê¸»ú½¸¹ç¤Î¼ïÎà¤òɽ¤¹ charset-id +¤ÈÉä¹æ°ÌÃ֤ΠÁȤÇɽ¸½¤µ¤ì¤ë¡£charset-id ¤ÏºÇÂç 128 +¸Ä¤¬Æ±»þ¤ËÍøÍѲÄǽ¤Ç¤¢¤ë¡£Ã¢¤·¡¢ ÍøÍѲÄǽ¤ÊÉä¹æ²½Ê¸»ú½¸¹ç¤Ï ISO 2022 +[1] ¤Î 94 ʸ»ú½¸¹ç¡¢96 ʸ»ú½¸¹ç¡¢94¡ß94 ʸ»ú½¸¹ç¡¢96¡ß96 +ʸ»ú½¸¹ç¤Ë¸Â¤é¤ì¤ë¡£Big5 ¤Î¤è¤¦¤Ê ISO 2022 +¤Î¿Þ·Áʸ»ú½¸¹ç¤Î¹½Â¤¤ËŬ¹ç¤·¤Ê¤¤¤â¤Î¤Ï ISO 2022 ¤Î¹½Â¤¤Ë¹ç¤¦ +¤è¤¦¤ËÊÑ´¹¤·¤Æ°·¤¦É¬Íפ¬¤¢¤ë¡£¤Þ¤¿¡¢Ê¸»úɽ¸½¶õ´Ö¤Ï 19 bit ¤Ç¤¢¤ê¡¢´û¤Ë +ÍøÍѲÄǽ¤Ê¶õ´Ö¤¬¸Ï³é¤·¤Æ¤­¤Æ¤¤¤ë¡£¤µ¤é¤Ë¡¢Mule ¤Ç¤Ïʸ»ú¤Ï charset-id +¤ÈÉä¹æ°ÌÃÖ¤ÎÁȤÇɽ¸½¤µ¤ì¤ë¤Î¤Ç¡¢Éä¹æ²½Ê¸»ú½¸¹ç¤¬°Û¤Ê¤ì¤ÐËÜÍèƱ¤¸Ê¸»ú¤Ç +¤¢¤Ã¤Æ¤â°Û¤Ê¤ëʸ»ú¤È¤·¤Æ°·¤ï¤ì¤Æ¤·¤Þ¤¦¤È¤¤¤¦ÌäÂê¤â¤¢¤ë¡£ + +

+XEmacs UTF-2000 ¤Ï GNU Emacs, XEmacs, Mule ¤ÎÍøÅÀ¤ò·Ñ¾µ¤·¤Ä¤Ä¡¢¤³¤ì¤é +¤ÎÌäÂê¤ò²ò·è¤¹¤ë¤¿¤á¤Ë°Ê²¼¤Ë½Ò¤Ù¤ë¤è¤¦¤Ê³ÈÄ¥¡¦²þÊѤò¹Ô¤Ã¤¿¡£ + +

+ +

+ʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤Èʸ»úɽ¸½ +

+ +

+XEmacs UTF-2000 ¤Ï¡¢Ê¸»ú°À­¤Î½¸¹ç¤Ë¤è¤ëʸ»úɽ¸½¤È¤¤¤¦¥â¥Ç¥ë¤Ë´ð¤Å¤¤¤Æ¡¢ +ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»²¾È¤¹¤ë¤³¤È¤Ë¤è¤Ã¤Æʸ»ú¤ò½èÍý¤¹¤ë¡£¤³¤Î¤¿¤á¡¢Ê¸»ú¥Ç¡¼ +¥¿¥Ù¡¼¥¹¤òÍøÍѤ·¤ä¤¹¤¤¤è¤¦¤ËÆâÉôɽ¸½¤òÊѹ¹¤·¤Æ¤¤¤ë¡£Ê¸»úɽ¸½¶õ´Ö¤ò 30 +bit ¤Ë³ÈÂ礷¤¿¤¿¤á¡¢Æ±»þ¤ËÍøÍѤǤ­¤ëʸ»ú¿ô¤ÏÂçÉý¤ËÁý¤¨¤Æ¤¤¤ë¡£Ê¸»ú¤Ï¡¢ +ʸ»ú id¤È¸Æ¤Ð¤ì¤ëʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤Î id ¤ÇÆâÉôŪ¤Ëɽ¸½¤µ¤ì¡¢¤³¤Î +ʸ»ú id ¤òÍѤ¤¤Æʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤ò»²¾È¤·¤Æ¡¢½èÍý¤¬¹Ô¤Ê¤ï¤ì¤ë¡£ + +

+ʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤Ïʸ»ú°À­¤Î½¸¹ç¤È¤·¤ÆÄêµÁ¤µ¤ì¡¢¸ÇÍ­¤Î¡Öʸ»ú id¡×¤¬³ä +¤êÅö¤Æ¤é¤ì¤ë¡£Ê¸»ú¤òÄêµÁ¤¹¤ë¤¿¤á¤Ë XEmacs UTF-2000 ¤Ç¤Ï define-char ¤È +¤¤¤¦Áȹþ¤ß´Ø¿ô¤òÍÑ°Õ¤·¤Æ¤¤¤ë¡£ + +

+

+ + ´Ø¿ô define-char (attributes) + +
+ʸ»ú°À­ attributes ¤Çɽ¸½¤µ¤ì¤ëʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤òÄêµÁ¤·¡¢ + ¤½¤Îʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤òÊÖ¤¹¡£ +
+

+

ʸ»ú°À­ attributes ¤ÏÏ¢Áۥꥹ¥È¤Ç¤¢¤ë¡£ + +
+
+

+
¡ÎÎã¡Ï
+
+(define-char
+  '((name               . "CJK RADICAL SECOND TWO")
+    (general-category   symbol other) ; Informative Category
+    (bidi-category      . "ON")
+    (mirrored           . nil)
+    (total-strokes       . 1)
+    (<-radical
+     ((ucs                . #x4E5A)
+      ))
+    (ideograph-cdp      . -21)
+    (chinese-big5-cdp   . #x8C5D)
+    (ucs                . #x2E83)
+    ))
+
+ +
+
+ +

+¤³¤Î¾¡¢Ê¸»ú¤äʸ»ú°À­¤ò°·¤¦¤¿¤á¤Ë¼¡¤Î¤è¤¦¤Ê´Ø¿ô¤òÍÑ°Õ¤·¤Æ¤¤¤ë¡§ + +

+

+ + ´Ø¿ô get-char-attribute + (character attribute) + +
+ʸ»ú¥ª¥Ö¥¸¥§¥¯¥È character ¤Î°À­ attribute ¤ÎÃÍ + ¤òÊÖ¤¹¡£ + +
+
+

+
¡ÎÎã¡Ï
+
+(get-char-attribute ?¤¢ 'name)
+¢ª "HIRAGANA LETTER A"
+
+
+
+ + +
+ +

+

+ + ´Ø¿ô put-char-attribute + (character attribute value) + +
+ʸ»ú¥ª¥Ö¥¸¥§¥¯¥È character ¤Î°À­ attribute ¤ÎÃÍ + ¤ò value ¤ËÀßÄꤹ¤ë¡£ + +
+
+
¡ÎÎã¡Ï
+
+(get-char-attribute ?¤¢ 'foo)
+¢ªnil
+(put-char-attribute ?¤¢ 'foo 1)
+¢ª1
+(get-char attribute ?¤¢ 'foo)
+¢ª1
+
+
+
+ +

+

+ + ´Ø¿ô find-char + (attributes) + +
+In order to find a character object by a character attribute, a + builtin function find-char may be convenient. This function + retrieves the character that has specified attributes. + +
+
+ +

+

+ + ´Ø¿ô map-char-attribute + (function attribute &optional range) + +
+A map function for character attributes is also available. This + function is useful in finding characters with a character + attribute, or processing by a character attribute, +
+

+

This function maps function over entries in attribute, + +calling it with two arguments, each key and value in the table. +
+

+

Range + specifies a subrange to map over and + +is in the same format as the + range argument to + 'put-range-table'. If omitted or t, it defaults to + the entire table. See Fig 3.2 + +for an example using this function. + +
+
+ +

+

+ + ´Ø¿ô char-attribute-alist + (character) + +
+You can get every attributes of a character as an association-list + by a built-in function char-attribute-alist. +
+

+

This function returns the alist of attributes of character. + +
+
+ +

+

+ + ´Ø¿ô char-attribute-list () + +
+You can get the list of character attributes by a builtin function + char-attribute-list. +
+

+

This function returns the list of all existing character + attributes. + +
+
+ +

+¤Ê¤ª¡¢Ê¸»ú°À­¤Ë´Ø¤·¤Æ¤Ï 4 ¾Ï¤Ç¾ÜÀ⤹¤ë¡£ + +

+ +

+Internal Representation of Character Object +

+ +

+XEmacs UTF-2000 processes characters based on the UTF-2000 model, +that is, it operates on character objects and their attributes stored in +its character database. For this purpose, XEmacs UTF-2000 modifies +and extends the internal character and string representations. + +

+In Mule, its internal representation depends on the structure of +graphic character set[*] of +ISO/IEC 2022 [1]. In this paper, such kind of CCS used for the +internal representation of Mule is called Mule-charset. +Each character is represented by a pair of Mule-charset and its code +point. The internal character representation is separated by +7bit-segments. It is designed to use bit calculus, and thus very +sparse. The internal string representation is a kind of multi-byte +encoding. ASCII characters are represented by ASCII code points. +Other characters are represented by 2 to 4 bytes sequence. For this +case, the first byte called leading-byte specifies a CCS. The +following bytes indicate a code point[*]. Every valid multi-byte sequence can be +mapped to a corresponding character representation, however possible +character representation may not be mapped to any multi-byte sequence. +The code space is limited by two parameters: charset-id (basically +same with leading-byte) and characters representation. Number of +charset-ids has to be smaller than 128. Each character is represented +by 19-bit integer 14 bit can be used for code point and 5 bit can be +used for charset-id of 2-bytes-set. However full 14-bit cannot be +used for a code point. As the character representation has to satisfy +the structure of ISO/IEC graphic character set, 33 to 126 or 32 to 127 +can be used for each 7-bit segment (octet) . + +

+The character and string representation of Mule is too limited to +implement a large character database. It is better to guarantee +1-to-1 mapping between character representation and string +representation, and they should not depend on any coded character +sets. Perhaps a simple, non-segmented, linear space is better than +complex, segmented, sparse space. In addition, a 19-bit code space +seems too narrow even for UCS [2] (Unicode [5]). To +support UCS, at least a 21-bit code space is required. Therefore, we +decided to change the internal representation. + +

+In the XEmacs UTF-2000, each character object has a character-id. +Each character-id is represented by a 30-bit integer. In strings or +buffers, each character object is represented by a multi-byte sequence +that is a character-id encoded in UTF-8 [2]. It is wide enough +to represent various kind of characters at the same time. It can +support every Unicode character. In addition, user can define a lot +of other characters. + +

+ +

+
+ + + +
Figure 3.2: +Example of map-char-attribute
+

+ +

+ +

+Áȹþ¤ßʸ»ú +

+ +

+XEmacs UTF-2000 ¤Ç¤Ï¡¢Ê¸»ú¤Ï´Ø¿ô define-char ¤Çʸ»ú°À­¤Î½¸¹ç¤ò»ØÄꤹ +¤ë¤³¤È¤Ë¤è¤Ã¤ÆÄêµÁ¤µ¤ì¤ë¡£¤·¤«¤·ºÇ½é¤Ëʸ»ú¤òÄêµÁ¤¹¤ë»þ¤Ë²¿¤âʸ»ú¤¬Â¸ºß +¤·¤Ê¤¤¤È¤¹¤ì¤Ð¡¢Ê¸»ú¤ÎÄêµÁ¥×¥í¥°¥é¥à¤òʸ»úÎó¤Çɽ¸½¤Ç¤­¤Ê¤¤¤³¤È¤Ë¤Ê¤ë¡£ +¤³¤ì¤Ç¤ÏÉÔÊؤǤ¢¤ë¤Î¤Ç¡¢´Ø¿ô define-char ¤Çʸ»ú¤òÄêµÁ¤¹¤ëÁ°¤«¤é¤¢¤é¤« +¤¸¤áÄêµÁ¤µ¤ì¤Æ¤¤¤ëʸ»ú¤òÀߤ±¤Æ¤¤¤ë¡£¤³¤ì¤òÁȹþ¤ßʸ»ú +(builtin character)¤È¸Æ¤Ö¡£ + +

+ÁȤ߹þ¤ßʸ»ú¤Ï¡¢Ê¸»ú°À­¤ò°ìÀÚ»ý¤¿¤Ê¤¤¤³¤È¤ò½ü¤±¤Ð¡¢°ì¼ï¤Îʸ»ú¥ª¥Ö¥¸¥§ +¥¯¥È¤È¤·¤Æ°·¤ï¤ì¤ë¡£³ÆÁȤ߹þ¤ßʸ»ú¤Ï¡¢Ê¸»ú°À­¤ò°ìÀÚ»ý¤¿¤Ê¤¤¤Ë¤â´Ø¤ï¤é +¤º¡¢¤¢¤ëÉä¹æ²½Ê¸»ú½¸¹ç¤ÎÉä¹ç°ÌÃÖ¤òʸ»ú°À­¤È¤·¤Æ»ý¤Ã¤Æ¤¤¤ë¤è¤¦¤Ë²ò¼á¤µ +¤ì¤ë¡£¤³¤Î¤è¤¦¤Ê»ÅÁȤߤòÍѤ¤¤ë¤³¤È¤Ë¤è¤ê¡¢XEmacs UTF-2000 ¤Ï¥Ö¡¼¥È¥¹¥È +¥é¥Ã¥×½èÍý¤Ë¤ª¤¤¤Æʸ»úÄêµÁ¤Ê¤·¤ËÉä¹æ²½Ê¸»úÎó¤òÆɤ߹þ¤à¤³¤È¤¬¤Ç¤­¤ë¡£ + +

+¤Ê¤ª¡¢Ê¸»úÄêµÁ¤Ë¤è¤Ã¤ÆÄêµÁ¤µ¤ì¤¿Ê¸»ú¤ÈƱÍͤˡ¢ÁȤ߹þ¤ßʸ»ú¤ËÂФ·¤Æ¤â¡¢ +ʸ»ú°À­¤òÉղ乤뤳¤È¤Ï²Äǽ¤Ç¤¢¤ê¡¢¤½¤Î¾ì¹ç¡¢¤½¤ÎÁȤ߹þ¤ßʸ»ú¤ÏÉղäµ +¤ì¤¿Ê¸»ú°À­¤ò»ý¤Ã¤¿Ä̾ï¤Îʸ»ú¤È¤·¤ÆºÆÄêµÁ¤µ¤ì¤ë¡£ + +

+¤È¤³¤í¤Ç¡¢ÄêµÁ¥×¥í¥°¥é¥à¤òɽ¸½¤¹¤ë¤¿¤á¤ÎÁȹþ¤ßʸ»ú¤È¤·¤Æ¤Ç¤¢¤ì¤Ð¡¢ISO +8859-1 ¤Îʸ»ú¤¬Â¸ºß¤¹¤ì¤Ð½½Ê¬¤¢¤ë¡£¤·¤«¤·¡¢ÂçÎ̤Îʸ»úÄêµÁ¤Ê¤·¤Ë¡¢¾¯¤Ê +¤¯¤È¤â»ú·Á¤¬³Îǧ¤Ç¤­¤ë¤è¤¦¤Ë¡¢UCS ¤Îʸ»ú¤ä ISO 2022 ¤Î¿Þ·Áʸ»ú½¸¹ç¤Îʸ +»ú¡¢Ê¸»ú¶À¤Îʸ»ú¤Ê¤É¤òÁȹþ¤ßʸ»ú¤È¤·¤Æ¤¤¤ë¡£ + +

+ +

+ +
+coded-charset +

+ +

+XEmacs UTF-2000 ¤Ç¤Ï¡¢Ê¸»ú¤ÏÉä¹æ²½Ê¸»ú½¸¹ç¤Ë¤è¤Ã¤Æɽ¸½¤µ¤ì¤Æ¤Ï¤¤¤Ê¤¤¡£¤³ +¤Î¤¿¤á¡¢Mule ¤Ë¤ª¤±¤ë¤è¤¦¤Ê°ÕÌ£¤Ç¤Î Mule-charset ¤ÏɬÍפʤ¤¡£¤·¤«¤·¡¢ +´û¸¤Îʸ»úÉä¹æ¤ÎÀ¤³¦¤È¾ðÊó¸ò´¹¤·¤¿¤ê¡¢´û¸¤Î¥Õ¥©¥ó¥È¤òÍøÍѤ·¤¿¤ê¡¢ +´û¸¤Î¼ÂÁõ¤È¤Î¸ß´¹À­¤ò¼Â¸½¤·´û¸¤Î¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤òÍøÍѤ¹ +¤ë¤¿¤á¤Ë¡¢¤Ä¤Þ¤ê²ÄÈÂÀ­¤ÎÌ̤«¤é¡¢Mule-charset ¤ËÁêÅö¤¹¤ë¤â¤Î¤¬¤¢¤ë¤ÈÊØ +Íø¤Ç¤¢¤ë¡£¤½¤³¤Ç¡¢XEmacs UTF-2000 ¤Ç¤ÏÉä¹æ²½Ê¸»ú½¸¹ç¤ÎÃê¾Ý¤·¤¿¤â¤Î¤È¤·¤Æ +coded-charset¤òÀߤ±¤Æ¤¤¤ë¡£ + +

+XEmacs UTF-2000 ¤Î coded-charset ¤Ë´Ø¤¹¤ë API ¤Ï XEmacs-Mule ¤Ë¤ª¤±¤ë +Mule-charset ¤Ë´Ø¤¹¤ë API ¤Î¾å°Ì¸ß´¹¤Ë¤Ê¤ë¤è¤¦¤ËÀ߷פµ¤ì¤Æ¤¤¤ë¡£ +XEmacs-Mule ¤Ë¤ª¤±¤ë Mule-charset ¤ÈƱÍͤˡ¢charset ·¿¤¬Â¸ºß¤·¡¢ +charset ·¿¥ª¥Ö¥¸¥§¥¯¥È¤Ï¥·¥ó¥Ü¥ë¤Çɽ¸½¤µ¤ì¤ë̾Á°¤ò»ý¤Ä¡£Mule ¤Ç¤Ï +Mule-charset ¤Ïʸ»ú¤ÎÆâÉôɽ¸½¤Ë°Í¸¤·¤Æ¤ª¤ê¡¢¤½¤Î¹½Â¤¾å¡¢ +94 ʸ»ú½¸¹ç¡¢ +96 ʸ»ú½¸¹ç¡¢ +94¡ß94 ʸ»ú½¸¹ç¡¢ +96¡ß96 ʸ»ú½¸¹ç¤Î£´¼ïÎà +¤Ë¸ÂÄꤵ¤ì¤Æ¤¤¤¿¡£°ìÊý¡¢XEmacs UTF-2000 ¤Ç¤Ï¤½¤Î¤è¤¦¤ÊÀ©Ìó¤Ï¸ºß¤·¤Ê¤¤ +¤¿¤á¡¢ + +$\{94\vert 96\vert 128\vert 256\}^{1 ¡Á 4}$ ʸ»ú½¸¹ç¤Ë³ÈÄ¥¤µ¤ì¡¢Éä¹æ°ÌÃÖ¤¬ 4 byte +°Ê²¼¤Çɽ¸½¤Ç¤­¤µ¤¨¤¹¤ì¤Ð¡¢Ç¤°Õ¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤òÍøÍѤǤ­¤ë¡£¤¿¤È¤¨ +¤Ð¡¢UCS ¤Î´ðËÜ¿¸À¸ìÌÌ (BMP) [2] ¤Ï 256¡ß256 ʸ»ú½¸¹ç¤Çɽ¸½¤Ç¤­ +¡¢UCS-4 Á´ÂΤâ¤Þ¤¿ $256^4$ ʸ»ú½¸¹ç¤Çɽ¸½¤Ç¤­¤ë¡£ + +

+XEmacs UTF-2000 ¤Î coded-charset ¤Ïʸ»ú¤ÎÆâÉôɽ¸½¤ÈľÀÜ´Ø·¸¤·¤Ê¤¤¤¿¤á¡¢ +coded-charset ¤Ë¼ýÏ¿¤µ¤ì¤¿³Æʸ»ú¤È¤½¤Î coded-charset ¤Ë¤ª¤±¤ëÉä¹æ°ÌÃÖ +¤ÎÂбþɽ¤Ë¤è¤Ã¤Æ¡¢Ê¸»ú¤ÈÉä¹æ°ÌÃ֤δط¸¤òɽ¸½¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤ÎÂбþɽ +¤È¤·¤Æ¡¢Éä¹æ°ÌÃÖ¤«¤éʸ»ú¤òÆÀ¤ë¤¿¤á¤Îdecoding-table¤È¡¢Ê¸»ú¤«¤éÉä +¹æ°ÌÃÖ¤òÆÀ¤ë¤¿¤á¤Îencoding-table¤Î£²¼ïÎà¤òÀߤ±¤Æ¤¤¤ë¡£¤³¤Î¤¦¤Á¡¢ +¸å¼Ô¤Ïʸ»ú°À­¤Î°ì¼ï¤È¤·¤Æ¡¢¤¹¤Ê¤ï¤Á¡¢coded-charset ¤Î̾¾Î¤ò +°À­Ì¾¤È¤¹¤ëʸ»ú°À­¤Ï encoding-table ¤ÎÍ×ÁǤȸ«Ê蘆¤ì¤ë¤è¤¦¤Ë¤Ê¤Ã +¤Æ¤¤¤ë¡£´Ø¿ô define-char ¤â¤·¤¯¤Ï´Ø¿ôput-char-attribute ¤Ç +coded-charset ¤Î̾¾Î¤ò»ý¤Äʸ»ú°À­¤ò»ØÄꤹ¤ë¤È¡¢¤½¤ÎÃͤϻØÄꤵ¤ì¤¿ +coded-charset ¤ÎÉä¹æ°ÌÃ֤Ȥߤʤµ¤ì¡¢»ØÄꤵ¤ì¤¿coded-charset ¤Î +decoding-table ¤âƱ»þ¤ËÀßÄꤵ¤ì¤ë¡£¤³¤Î»ÅÁȤߤˤè¤ê¡¢ÍÛ¤ËÊÑ´¹É½ +¤ò»ØÄꤷ¤¿¤ê¡¢ÆâÉôɽ¸½¤Ë¸ÀµÚ¤·¤¿¤ê¤»¤º¤Ë¡¢ +¤µ¤Þ¤¶¤Þ¤ÊÉä¹æ²½Ê¸»ú½¸¹ç¤òɽ¸½²Äǽ¤Ç¤¢¤ë¡£ + +

+ +

+


+ + +next + +up + +previous +
+ Next: ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹ + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸»úɽ¸½¥â¥Ç¥ë + +
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node4.html b/papers/mitou-2001-report/main/node4.html new file mode 100644 index 0000000..8e67eea --- /dev/null +++ b/papers/mitou-2001-report/main/node4.html @@ -0,0 +1,901 @@ + + + + + +ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹ + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: Topic Maps ¤Ë´ð¤Å¤¯Âç°èʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹ + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï +
+
+ + +Subsections + + + +
+ +

+ +
+ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹ +

+ +

+¤³¤ì¤Þ¤Ç½Ò¤Ù¤Æ¤­¤¿¤è¤¦¤Ë¡¢¡ØUTF-2000 Êý¼°¡Ù¤Î¼ÂÁõ¡Ê°Ê²¼¤Ç¤Ï¡ØUTF-2000 +¼ÂÁõ¡Ù¤È¸Æ¤Ö¡Ë¤Ç¤Ïʸ»ú¤Ë¤ËÂФ·¤Æ²¿¤é¤«¤Î½èÍý¤ò¹Ô¤Ê¤¦»þ¤ËÂоݤȤʤëʸ»ú +¤Î½èÍý¤ËɬÍפÊ°À­¤ò»²¾È¤¹¤ëɬÍפ¬¤¢¤ë¡£¤³¤Î¤¿¤á¡¢³Æʸ»ú¤Î°À­¤òµ¡³£²Ä +ÆɤʷÁ¤Ç³ÊǼ¤·¤¿¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬É¬ÍפȤʤ롣 + +

+¤³¤Î¤¿¤á¡¢²æ¡¹¤Ï define-char ·Á¼°¤Çɽ¸½¤µ¤ì¤ëʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò³« +ȯÃæ¤Ç¤¢¤ë¡£¤³¤ì¤Ï¡¢UTF-2000 µ»½Ñ¤Î¼Â¾Ú¤òÌÜŪ¤È¤¹¤ë¤È¤È¤â¤Ë¡¢¾­Íè¤Ë¤ª +¤±¤ë UTF-2000 µ»½Ñ¤Ë´ð¤Å¤¯Ê¸»ú¾ðÊó¸ò´¹¤Î¥Ù¡¼¥¹¤È¤Ê¤ëɸ½àŪ¤Ê¥Ç¡¼¥¿¥Ù¡¼ +¥¹¤ò¹½ÃÛ¤¹¤ë¤³¤È¤â»ëÌî¤ËÃÖ¤¤¤Æ¤¤¤ë¡£ + +

+¸½ºß¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï + +

+Åù¤òÅý¹ç¤·¡¢¸ß¤¤¤ÎÌ·½âÅÀ¤ò½¤Àµ¤¹¤ë¤â¤Î¤Ç¤¢¤ë¡£¤Þ¤À¸í¤ê¤â¿¤¯¡¢ÉʼÁ¤Ï¹â +¤¯¤Ï¤Ê¤¤¤¬¡¢¸½»þÅÀ¤ÇÌó 7 Ëü»úʬ¤ÎÄêµÁ¤¬Â¸ºß¤¹¤ë¡£ + +

+¤³¤Îɸ½àʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ç¤Ï¡¢Èó´Á»ú¤Ë´Ø¤·¤Æ¤Ï¤ª¤ª¤à¤Í Unicode +[5] ¤ÎÄêµÁ¤Ë¤Î¤Ã¤È¤Ã¤Æ¤¤¤ë°ìÊý¡¢´Á»ú¤Ë´Ø¤·¤Æ¤ÏÈù¾®¤Ê»úÂκ¹ +¤â¶èÊ̤·¤Æ¤¤¤ë¡£´Á»ú¤Î³Æʸ»ú¡Ê»úÂΡˤÎÆâ¡¢Âç´Áϼ­Åµ¤ÈƱ¤¸»úÂΤǤʤ¤¤â +¤Î¤Ë¤Ä¤¤¤Æ¤Ï¡¢ morohashi-daikanwa ¤È¤¤¤¦Â°À­¤ÎÃͤȤ·¤Æ¡¢Âç´ÁÏÂÈÖ +¹æ¤Èº¹°Û¤ÎÅٹ礪¤è¤ÓÀ°ÍýÈÖ¹æ¤ò»ý¤¿¤»¤Æ¤¤¤ë¡£¤Þ¤¿¡¢The Unicode +Standard [5] ¤ÎÎ㼨»úÂΤÈƱ¤¸»úÂΤǤʤ¤¤â¤Î¤ËÂФ·¤Æ¤Ï¡¢ÂÐ +±þ¤¹¤ë Unicode ¤ÎÉä¹æ°ÌÃÖ¤ò =>ucs [*] ¤È¤¤¤¦Â°À­¤ÎÃͤȤ¹¤ë¡£ + +

+UTF-2000 ¼ÂÁõ¤Ë¤È¤Ã¤Æ¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï¼ÂÁõ¤ÎµóÆ°¤òÄêµÁ¤¹¤ë¤â¤Î +¤Ç¤¢¤ë¤Î¤Ç¡¢½èÍý¤ËɬÍפÊʸ»ú°À­¤È UTF-2000 ¼ÂÁõ¤ÎµóÆ°¤òÂбþÉÕ¤±¤ëɬÍ× +¤¬¤¢¤ë¡£¤¹¤Ê¤ï¤Á¡¢¾¯¤Ê¤¯¤È¤â½èÍý¤ËÍѤ¤¤ëʸ»ú°À­¤ËÂФ·¤Æ¤Ï¡¢Ì¾Á°¤ä·¿¤ä +°ÕÌ£¤òÄêµÁ¤¹¤ëɬÍפ¬¤¢¤ë¤È¤¤¤¨¤ë¡£¤Þ¤¿¡¢¿Í´Ö¤¬¤³¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò»ú½ñ¤È +¤·¤ÆÍѤ¤¤ë¾ì¹ç¤ä¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î¥á¥ó¥Æ¥Ê¥ó¥¹¤ò¹Ô¤Ê¤¦¾å¤Ç¤âʸ»ú°À­¤Î·Á¼° +¤ä°ÕÌ£¤òµ¬Äꤹ¤ë¤³¤È¤Ï½ÅÍפǤ¢¤ë¡£ + +

+¤³¤Î¤è¤¦¤Ê´ÑÅÀ¤Ë´ð¤Å¤­¡¢XEmacs UTF-2000 ¤òÂоݤ˴ö¤Ä¤«¤Îʸ»ú°À­¤Î̿̾ +µ¬Ìó¤È´ö¤Ä¤«¤Îʸ»ú°À­¤Î·Á¼°¡¦°ÕÌ£¤òµ¬Äꤷ¡¢¤½¤ì¤Ë´ð¤Å¤¯Ê¸»ú°À­¥Ç¡¼¥¿ +¥Ù¡¼¥¹¤ò³«È¯¤·¤Æ¤¤¤ë¡£¤³¤Î¾Ï¤Ç¤Ïʸ»ú°À­¤Ë´Ø¤¹¤ëµ¬Ìó¤Èʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼ +¥¹¤Ë¤Ä¤¤¤Æ³µÀ⤹¤ë¡£ + +

+ +

+ +
+ʸ»ú°À­Ì¾¤Î̿̾µ¬Ìó +

+ +

+UTF-2000 ¥â¥Ç¥ë¤Ï XEmacs UTF-2000 ¤Ï¸½ºß¤Î¤È¤³¤í¡¢ +4.2 Àá¤Ç½Ò¤Ù¤ë¡ØÉä¹ç°ÌÃÖ°À­¡Ù¤È +4.4 Àá¤Ç½Ò¤Ù¤ë ¡Ø->decomposition °À­¡Ù¤ò½ü +¤­¡¢Ê¸»ú°À­¤Î°ÕÌ£¤òµ¬Äꤷ¤Æ¤¤¤Ê¤¤¡£¤·¤«¤·¤Ê¤¬¤é¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹ +¤ò¹½ÃÛ¡¦¥á¥ó¥Æ¥Ê¥ó¥¹¤·¤¿¤ê¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤òÍøÍѤ¹¤ë¥¢¥×¥ê¥±¡¼¥·¥ç +¥ó¡¦¥×¥í¥°¥é¥à¤ò¼Â¸½¤¹¤ë¾å¤Ç¤Ï°ìÄê¤Îµ¬Ì󤬤¢¤Ã¤¿Êý¤¬Ë¾¤Þ¤·¤¤¤È¤¤¤¨¤ë¡£ +¤½¤³¤Ç¡¢·Ð¸³Åª¤Ëʸ»ú°À­¤Î̿̾µ¬Ìó¤òÀ°È÷¤·¡¢Ê¸»ú°À­¤Î̾¾Î¤Î¥Ñ¥¿¡¼¥ó¤È +Âç¤Þ¤«¤Ê·Á¼°¤ª¤è¤Ó°ÕÌ£¤òÂбþÉÕ¤±¤è¤¦¤È¤·¤Æ¤¤¤ë¡£ + +

+ +

+ʸ»ú´Ö¤Î´Ø·¸¤Ë´Ø¤¹¤ë°À­ +

+ +

+¤¢¤ëʸ»ú $C$ ¤ËÂФ·¤Æ´Ø·¸ foo ¤ò»ý¤Äʸ»ú $\gamma_i$ ¤¬Â¸ºß¤¹ +¤ë»þ¡¢Ê¸»ú $C$ ¤Î°À­ ->foo ¤ÏÃͤγÆÍ×ÁÇ $\gamma_i$ ¤¬ +$C$ ¤Îfoo ¤Ç¤¢¤ë¤³¤È¤ò°ÕÌ£¤¹¤ë¡£¤³¤³¤Ç¡¢->foo +¤ÎÃÍ +$\gamma_1 ... \gamma_n$ ¤Ï¥ê¥¹¥È¤Ç¤¢¤ë¡£ + +

+ƱÍͤˡ¢$C$ ¤Î°À­ <-foo ¤Ïʸ»ú $C$ ¤¬ÃͤγÆÍ×ÁÇ +$\gamma_j$ ¤Î foo ¤Ç¤¢¤ë¤³¤È¤ò°ÕÌ£¤¹¤ë¡£->foo +¤ÈƱÍͤˡ¢<-foo ¤ÎÃÍ +$\gamma_1 ... \gamma_m$ ¤â¥ê¥¹¥È +¤Ç¤¢¤ë¡£ + +

+

+
Îã
+
¾®Ê¸»ú¤òɽ¤¹´Ø·¸¤ò lowercase ¤È¤¹¤ë»þ¡¢ +
+
(a)
+
ʸ»ú A ¤Î°À­ (->lowercase ?a) ¤Ï¡¢Ê¸»ú + a ¤¬Ê¸»ú A ¤Î¾®Ê¸»ú¤Ç¤¢¤ë¤³¤È¤òɽ¤·¤Æ¤¤¤ë¡£ + +
+
(b)
+
ʸ»ú a ¤Î°À­ (<-lowercase ?A) ¤Ï¡¢Ê¸»ú + a ¤¬Ê¸»ú A ¤Î¾®Ê¸»ú¤Ç¤¢¤ë¤³¤È¤òɽ¤·¤Æ¤¤¤ë¡£ + +
+
+
+
+ +

+ +

+ +
+ʸ»ú»ØÄê·Á¼° +

+ +

+UTF-2000 ¼ÂÁõ¤ÎÃæ¤Ç¤Ïʸ»ú¤Ï¥ª¥Ö¥¸¥§¥¯¥È¤Î°ì¼ï¤È¤·¤ÆÉä¹æ²½¤»¤º¤Ë°·¤¦¤³ +¤È¤¬¤Ç¤­¤ë¤¬¡¢UTF-2000 ¼ÂÁõ¤Î³°¤ÎÀ¤³¦¤È¤Î´Ö¤Ç¤Ï¤Ê¤ó¤é¤«¤ÎËÝÌõ¼êË¡¤¬É¬ +ÍפȤʤ롣 + +

+¤³¤Î¤È¤­¡¢¤â¤·Éä¹æ²½Ê¸»ú½¸¹ç¤¬ÍøÍѲÄǽ¤Ç¤«¤Äɽ¸½¤·¤¿¤¤Ê¸»ú¤ò½½Ê¬¤Ëɽ¸½ +²Äǽ¤Ç¤¢¤ë¤Ê¤é¡¢¤½¤ÎÉä¹æ²½Ê¸»ú½¸¹ç¤ÎÉä¹ç°ÌÃÖ¤ò»È¤Ã¤Æ¤½¤Îʸ»ú¤òɽ¸½¤¹¤ë +¤³¤È¤¬¤Ç¤­¤ë¡Ê¤³¤ÎÌÜŪ¤Î¤¿¤á¤Ë XEmacs UTF-2000 ¤Ï +4.2 Àá¤Ç½Ò¤Ù¤¿ coded-charset µ¡Ç½¤òÍÑ°Õ¤·¤Æ +¤¤¤ë¡Ë¡£¤·¤«¤·¤Ê¤¬¤é¡¢É½¸½¤·¤¿¤¤Ê¸»ú¤¬ÍøÍѲÄǽ¤ÊÉä¹æ²½Ê¸»ú½¸¹ç¤Ë¼ýÏ¿¤µ +¤ì¤Æ¤¤¤Ê¤¤¾ì¹ç¤ä¡¢Âбþ¤¹¤ëʸ»ú¤¬¼ýÏ¿¤µ¤ì¤Æ¤¤¤Æ¤â¤½¤³¤Çµ¬Äꤵ¤ì¤¿Ãê¾Ýʸ +»ú¤Èɽ¸½¤·¤¿¤¤Ê¸»ú¤È¤Îº¹°Û¤¬µöÍƤǤ­¤Ê¤¤¾ì¹ç¡¢Éä¹æ²½Ê¸»ú½¸¹ç¤òÍѤ¤¤ë¤³ +¤È¤Ï¤Ç¤­¤Ê¤¤¡£ + +

+¤½¤Î¤è¤¦¤ÊÌäÂê¤ò²ò·è¤¹¤ë¤¿¤á¤Ë¤Ï UTF-2000 Êý¼°¤Ë´ð¤Å¤­Ê¸»ú¥ª¥Ö¥¸¥§¥¯¥È +¤ÎÀ­¼Á¤òÎóµó¤¹¤ë¤è¤¦¤Ê·Á¼°¤¬¤¢¤ì¤ÐÎɤ¤¡£¤³¤¦¤·¤¿¤â¤Î¤È¤·¤Æ XEmacs +UTF-2000 ¤Ç¤Ïʸ»ú»ØÄê (character-specification; char-spec)·Á¼° +¤òµ¬Äꤷ¤Æ¤¤¤ë¡£ + +

+ʸ»ú»ØÄê¤Î·Á¼°¤Ï Lisp ¤ÎÏ¢Áۥꥹ¥È (association-list) ¤Ç¡¢¥ê¥¹¥È¤Î³ÆÍ× +ÁǤ¬³Æʸ»ú°À­¤òɽ¸½¤¹¤ë¡£Ï¢Áۥꥹ¥È¤Î¸° (key) Éô¡Ê³ÆÍ×ÁǤÎÀèƬ (car) +Éô¡Ë¤¬Â°À­Ì¾¤òɽ¤·¡¢Ï¢Áۥꥹ¥È¤ÎÃÍ (value) Éô¡Ê³ÆÍ×ÁǤλĤê (cdr) Éô¡Ë +¤¬Â°À­Ãͤòɽ¤¹¡£ + +

+ʸ»ú»ØÄ꤬ɽ¤¹°ÕÌ£¤Ï¤½¤ÎÀ­¼Á¤òÍ­¤¹¤ëÃê¾Ýʸ»ú¡Ê¶ñ¾Ýʸ»ú¡Ê½ñ¤«¤ì¤¿Ê¸»ú¡Ë +¤Î½¸¹ç¡Ë¤Ç¤¢¤ë¡£ + +

+¤Ê¤ª¡¢¤³¤Î·Á¼°¤Ï´Ø¿ô define-char ¤Î°ú¿ô¤Ç»ØÄꤵ¤ì¤ë¤â¤Î¤ÈƱ¤¸¤Ç¤¢¤ë¡£ + +

+ +

+ʸ»ú»²¾È·Á¼° +

+ +

+ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë¤ª¤¤¤Æʸ»ú´Ö¤Î´Ø·¸¤òµ­½Ò¤¹¤ë¤è¤¦¤Ê¾ì¹ç¡¢Ãͤ˵­ºÜ¤¹¤ë +ʸ»ú¤Î¾¤Ëʸ»ú´Ö¤Î´Ø·¸¤Ë¤â°À­¤òÉÕ¤±¤¿¤¤¾ì¹ç¤¬¤¢¤ë¡£Î㤨¤Ð¡¢Ê¸»ú¤ÎÀµµ¬ +²½¤ò¹Ô¤Ê¤¦¾ì¹ç¡¢¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤Ë¤è¤Ã¤ÆÀµµ¬²½µ¬Â§¤¬°Û¤Ê¤ë¤Î¤Ç¡¢¤³¤Î¤¿ +¤á¤Î°ÛÂλú¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¾ì¹ç¡¢¤É¤ÎÀµµ¬²½µ¬Â§¤òÍѤ¤¤Æ¤¤¤ë¤«¤òµ­ºÜ¤¹ +¤ëɬÍפ¬¤¢¤ë¡£¤Þ¤¿¡¢³Ø½ÑŪ¤Ê¥Ç¡¼¥¿¥Ù¡¼¥¹¤òºî¤ë¾ì¹ç¤Ë¤ª¤¤¤Æ¡¢Ê¸»ú³Ø¾å¤Î +³ØÀ⤬°Û¤Ê¤ë¾ì¹ç¤Ë½Ðŵ¤ä¤É¤Î³ØÀâ¤òÍѤ¤¤Æ¤¤¤ë¤«¤Ê¤É¤òµ­ºÜ¤¹¤ëɬÍפ¬¤¢¤ë¡£ +¤³¤Î¾¡¢ÃÎŪºâ»º¸¢¤Î´ÉÍý¤ò¹Ô¤Ê¤¦¾ì¹ç¤Ë¤â¥Ç¡¼¥¿¤Î½Ðŵ¤ä¸¢Íø¾ðÊó¤òµ­Ï¿¤¹ +¤ëɬÍפ¬¤¢¤ë¡£ + +

+¤³¤Î¤è¤¦¤ÊÌÜŪ¤Î¤¿¤á¤Ë¡¢XEmacs UTF-2000 ¤Ç¤Ïʸ»ú»²¾È +(character-reference; char-ref) ·Á¼°¤òµ¬Äꤷ¤Æ¤¤¤ë¡£ + +

+ʸ»ú»²¾È¤Î·Á¼°¤Ï Lisp ¤Î°À­¥ê¥¹¥È (property-list) ¤Ç¤¢¤ë¡£¤³¤³¤Ç¤ÏǤ +°Õ¤Î°À­¤¬ÍøÍѲÄǽ¤Ç¤¢¤ë¤¬¡¢´ö¤Ä¤«¤Î°À­Ì¾¤ËÂФ·¤Æ¤Ï¤½¤Î°ÕÌ£¤¬Í½¤áµ¬Äê +¤µ¤ì¤Æ¤¤¤ë¡£ + +

+°Ê²¼¤Ë°ÕÌ£¤¬µ¬Äꤵ¤ì¤Æ¤¤¤ë°À­¤Ë¤Ä¤¤¤ÆÀâÌÀ¤¹¤ë¡§ + +

+

+
:char
+
»²¾È¤µ¤ì¤ëʸ»ú + +

+[·¿] ʸ»ú¡¢¤â¤·¤¯¤Ï¡¢Ê¸»ú»ØÄê + +

+

+
:source
+
½Ðŵ¡¦ÅµµòÅù + +

+[·¿] ½Ðŵ¡¦Åµµò¤òɽ¤¹¥·¥ó¥Ü¥ë¡Ê½Ðŵ¥·¥ó¥Ü¥ë¡Ë¤Î¥ê¥¹¥È¡£ + +

+°Ê²¼¤Ë XEmacs UTF-2000 ¤Î´ðËÜʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÇÍѤ¤¤Æ¤¤¤ë½Ðŵ¥·¥ó¥Ü¥ë +¤òÎóµó¤¹¤ë¡£²æ¡¹¤Ï´ÁÀÒ¤ª¤è¤Ó¸½ÂåÃæ¹ñʸ¸¥¤ËÂФ¹¤ë½Ðŵ¥·¥ó¥Ü¥ë¤Ï¹ñºÝŪ¤Ë +ÍѤ¤¤é¤ì¤Æ¤¤¤ëÃæ¹ñ¸ì¤Î¥Ô¥ó¥¤¥óɽµ­¤òºÎÍѤ¹¤ë¤³¤È¤Ë¤·¤¿¤¬¡¢Îò»ËŪ»ö¾ð¤« +¤éÆüËܸì¥í¡¼¥Þ»úɽµ­¤Î¤â¤Î¤â¸ºß¤·¤Æ¤¤¤ë¡£É½ 4.1.3 +¤Ç¤Ï¡¢º£¸åÍѤ¤¤Æ¤¤¤¯½Ðŵ¥·¥ó¥Ü¥ë̾¤ò¡Ö̾Á°¡×¤Ëµ­ºÜ¤·¡¢Îò»ËŪ»ö¾ð¤«¤é¸½ +ºßÍѤ¤¤Æ¤¤¤ëÆüËܸì¥í¡¼¥Þ»úɽµ­¤Î½Ðŵ¥·¥ó¥Ü¥ë¤ò¡ÖÂåÂØ̾¾Î¡×¤Ëµ­ºÜ¤·¤Æ¤¤ +¤ë¡£ + +

+

+
+
+
+

+
+

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 4.1: +ʸ»ú»²¾È¤Ë¤ª¤±¤ë :source °À­
̾Á°ÂåÂØ̾¾ÎÆâÍÆ
 chuuka-daijitenÃæ²ÚÂç»úŵ
 doubun-tsuukouƱʸÄ̹Í
 gyokuhen¶ÌÊÔ
 henkaiÊÓ³¤
 inkai±¤²ñ
 inkaiho±¤²ñÊá
 jii»ú×Ã
 jiiho»ú×ÃÊá
jiyunshuuin½¸±¤
 kaihen³¤ÊÓ
kangxi ¹¯ô¦»úŵ
 kouin¹­±¤
morohashi-daikanwa Âç´Áϼ­Åµ
 ruishuu-meigishouÎàæÜ̾µÁ¾¶
 seiinÀµ±¤
 seiji-tsuuÀµ»úÄÌ
 setumon-tuukun-teiseiÀâʸÄÌ·±ÄêÀ¼
shouwen Àâʸ²ò»ú
 sougen-irai-zokujifuÁ׸µ°ÊÍ读úÉè
yuquan ¶ÌÀô
+
+
+ +

+

+
+ +

+ +

+ +
+Éä¹ç°ÌÃÖ°À­ +

+ +

+4.2 Àá¤Ç½Ò¤Ù¤¿¤è¤¦¤Ë¡¢XEmacs UTF-2000 ¤Ç¤Ï +coded-charset ¤Î̾Á°¤ò°À­Ì¾¤È¤¹¤ëʸ»ú°À­¤Ï coded-charset ¤Ë¤ª¤±¤ëÉä +¹ç°ÌÃÖ¤òɽ¤¹ÆÃÊ̤Êʸ»ú°À­¤Ç¤¢¤ë¡£¤³¤Î°À­¤Ë´Ø¤¹¤ë¾ðÊó¤Ï¡¢¥Õ¥¡¥¤¥ëÆþ½Ð +ÎϤʤɤˤª¤±¤ëʸ»úÉä¹ç¤ÎÊÑ´¹½èÍý¤Ë¤ª¤¤¤ÆÍøÍѤµ¤ì¤ë¡£ + +

+Éä¹ç°ÌÃÖ°À­¤ÎÃͤηÁ¼°¤ÏÀ°¿ô¤Ç¤¢¤ë¡£À°¿ôÃͤΤȤêÆÀ¤ëÈϰϤÏÂбþ¤¹¤ë +coded-charset ¤Ë¤è¤Ã¤ÆÀ©Ì󤵤ì¤ë¡£ + +

+¤È¤³¤í¤Ç¡¢¸½ºß¤Î¤È¤³¤í¡¢¤¢¤ëʸ»ú°À­Ì¾¤¬Éä¹ç°ÌÃÖ°À­Ì¾¤Ç¤¢¤ë¤«¤É¤¦¤«¤Ï¡¢ +¤½¤Î°À­Ì¾¤ò̾Á°¤È¤¹¤ë coded-charset ¤¬Â¸ºß¤¹¤ë¤«¤Ë¤è¤Ã¤Æ¤¤¤ë¡£¤¹¤Ê¤ï +¤Á¡¢Éä¹ç°ÌÃÖ°À­¤òɽ¤¹Â°À­Ì¾¤Î̿̾µ¬Ìó¤Ïº£¤Î¤È¤³¤í¸ºß¤·¤Æ¤ª¤é¤º¡¢Â°À­ +̾¤À¤±¤Ç¤Ï¤½¤Îʸ»ú°À­¤¬Éä¹ç°ÌÃÖ°À­¤«¤É¤¦¤«¤òµ¡³£Åª¤Ë·èÄꤹ¤ë¤³¤È¤¬¤Ç +¤­¤Ê¤¤¡£¸½ºß¤Î XEmacs UTF-2000 ¤Ë¤ª¤¤¤Æ¤ÏÊÌ¤Ë coded-charset ¤¬ÄêµÁ¤µ¤ì +¤ë¤¿¤á¤Ë¤³¤ì¤ÇÌäÂê¤Ï¤Ê¤¤¤Î¤Ç¤¢¤ë¤¬¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤È¤·¤Æ¤Ï²¿¤é¤« +¤Îµ¬Ì󤬤¢¤Ã¤¿Êý¤¬Îɤ¤¤«¤âÃΤì¤Ê¤¤¡£ + +

+ +

+¢Íucs °À­ +

+ +

+The attribute named =>ucs is used to indicate a UCS code point of +a character. If a user would not like to unify characters that are +unified in UCS, or would like to define a character that is not +included in UCS, this attribute is available to specify the nearest +UCS code point. + +

+If a user need to refer a code point of UCS, the user can use +

+(or (get-char-attribute CHAR 'ucs) + (get-char-attribute CHAR '=>ucs)) + +
+instead of (get-char-attribute CHAR 'ucs). + +

+The information of =>ucs attributes are stored in the internal +variants database, +and users can find variant characters corresponding +to a UCS code point by the following function: + +

+

+ + ´Ø¿ô char-variants (character) + +
+This function returns variants of character. +
+

+

Perhaps there are another kind of variant relations, so we are + planning to extend this feature more generally. + +
+
+ +

+ +

+ +
+¢ªdecomposition °À­ +

+ +

+The attribute named ->decomposition is used to specify combining +sequences of composite (precomposed) characters. The value of +->decomposition attribute is a list of characters or +character-specifications 4.1.2, which means +that a character defined with a ->decomposition attribute can be +interpreted as the sequence of characters specified by the value of +the attribute. +For example, if á has an attribute +(->decomposition ?a ?´), + the sequence (?a ?´) can be composed into á. + +

+This information can be used in the coding-system features, which is +code-conversion features of the Mule API [*]. + +

+In addition, there is a builtin function to find a precomposed +character from a list of combining sequence. + +

+

+ + ´Ø¿ô get-composite-char (list) + +
+This function returns a character composed from elements of the + list. +
+

+

Each element is a character, an integer or a + character-specification. If an element is an integer, it is + interpreted as a code point of UCS character. + + +
+
+ +

+ +

+´Á»ú¤ÎÉôÉÊÁȤ߹ç¤ï¤»¾ðÊó +

+ +

+¿¤¯¤Î´Á»ú¤ÏÊФÈÚդʤɤÎÉôÉʤÎÁȤ߹ç¤ï¤»¤Ë¤è¤Ã¤Æ¹½À®¤µ¤ì¤Æ¤¤¤ë¡£¤·¤«¤· +¤Ê¤¬¤é¡¢½¾Íè¤Î¿¤¯¤ÎÉä¹æ²½ÊýË¡¤Ç¤ÏÁȤ߹ç¤ï¤µ¤ì¤¿´Á»ú¤òñ°Ì¤Ë¤·¤Æ°·¤ï¤ì +¤Æ¤ª¤ê¡¢´Á»ú¤ÎÉôÉÊÁȤ߹ç¤ï¤»¹½Â¤¤Ë´Ø¤¹¤ë¾ðÊó¤ÏÉä¹æ²½¤ÎÂоݤȤʤé¤Ê¤¤¤³ +¤È¤¬Â¿¤«¤Ã¤¿¡£¤³¤Î¤¿¤á¡¢¼«Í³¤ËÍøÍѲÄǽ¤Ê¥Ç¡¼¥¿¤ÎÃßÀѤâÉÔ½½Ê¬¤Ç¤¢¤ë¡£ + +

+ËÜ¥×¥í¥¸¥§¥¯¥È¤Ç¤Ï´Á»ú¤ÎÉôÉÊÁȤ߹ç¤ï¤»¾ðÊó¤òÍøÍѤ¹¤ë¤¿¤á¤Ë +4.5.5 Àá¤Ç½Ò¤Ù¤ë +ideographic-structure °À­¤òÄêµÁ¤·¡¢XEmacs UTF-2000 ¤Ë¤ª¤¤¤Æ +¤³¤Î°À­¤ò°·¤¨¤ë¤è¤¦¤Ë¤·¤¿¡£ + +

+¤Þ¤¿¡¢´Á»ú¤ÎÉôÉÊÁȤ߹ç¤ï¤»¾ðÊó¤Ë´Ø¤¹¤ë¤³¤ì¤Þ¤Ç¤Î»î¤ß¤òÄ´ºº¤·¡¢¼«Í³¤ËÍø +ÍѲÄǽ¤Ê¤â¤Î¤ËÂФ·¤Æ¤Ï ideographic-structure ·Á¼°¤ËÊÑ´¹¤¹¤ë¤¿¤á¤Î¥×¥í +¥°¥é¥à¤òºîÀ®¤·¤¿¡£ + +

+ +

+What are glyph expressions +

+ +

+Kanji characters are very visibly composed not of atomic units, but of +a relatively small number of components. The tradition of defining a +character by its components is as old as the script itself. Encoding +of Kanji in computers, however, has so far failed to take advantage of +this structural feature and treated every Kanji as an atomic unit. To +make Kanji encoding more efficient, it has been suggested to encode +these parts and compose the characters on the fly. This will make the +rendering process more complex and potentially less appealing. The +feasability of this will depend on the applictions available for +rendering and certainly will require more research. Even if not used +as the primary encoding in text, however, such a character component +database will still serve an important purpose for classifying, +analysing and retrieving characters. + +

+Since the 1970's, research concerning such an analytic encoding has +been conducted in Taiwan, China, Japan and elsewhere. One of the most +important and thoroughly researched proposal has been that of Hsieh +Ching-chun (¼ÕÀ¶½Ó) of Academia Sinica, Taiwan. Building on previous +results, he started in 1990 to build a database of the structure of +Kanji characters. Since this work was carried out at his `Chinese +Document Processing' lab, it came to be known as the CDP database. +Christian Wittern has been involved with this project since 1994. +Currently, the database contains glyph expressions of more than 55500 +characters, including all characters contained in the ´Á¸ìÂ缭ŵ. +The database has been developed on the Traditional Chinese version of +Windows using Access as the database engine. The user interface only +runs on versions of Chinese Windows from Windows 95 up to Windows ME. +Professor Hsieh graciously gave permission to port the content of the +CDP database to the UTF-2000 project and release it under the GPL. + +

+ +

+The CDP database +

+ +

+The expressions in the CDP database are based on Big5, the local +encoding for Kanji characters mostly used in Taiwan. For the purpose +of expressing the parts of characters, that are not characters +themselves, more than 2000 codepoints from the private use area (PUA) +of Big5 had been used. Furthermore, the CDP database uses a set of +only three operators for connecting the characters, although in +practice, this has been expanded to 11 due to the introduction of +shortcut operators for handling multiple occurrences of the same +component in one character. Figure 4.5.2 shows a +list of these operators. There are three more operator-like +characters, which are used when embedding glyph expressions into +running text. + +

+ +

+
+ + + +
Figure 4.1: +The connecting operators used in the CDP
+
+ +\scalebox{0.5}{\includegraphics{mitou-report01.eps}} +
+

+ +

+ +

+The CBETA database +

+ +

+Another database of Chinese characters and glyph expressions, if +somewhat smaller than the CDP with around 13000 characters at the +moment, is the database developed by the Chinese Buddhist Electronic +Text Association (CBETA). This is a sideproduct of CBETA's +groundbreaking work of creating an electronic version of Chinese +Buddhist scriptures. So far, more than 80 million characters of text +have been input, carefully proofread and marked up in XML according to +the Guidelines of the Text Electronic Initiative. The base character +set used for this is again Big5. Characters that could not be found +in Big5 have been collected and expressed with glyph expressions. The +CBETA database again uses a simple system of three basic connecting +operators, expressed with ASCII interpunction as follows: + +

+

+
+
+
+

+
+

+ + + + + + + + + + + + + + +
Table 4.2: +The operators used in the CBETA character database
operatormeaning
/top/bottom connection
$*$left/right connection
@enclosure connection
+
+
+ +

+The CBETA character database avoids the reliance on characters from +the PUA. Instead, character components are expressed by using +arithmetic operators - and + for deletion and replacement of +characters. In this manner, a glyph exression for the character Ãþ +could thus be constructed as: [Á×-ÌÚ+ζ], here the part ÌÚ is replaced +with ζ. Using this simple arithmetic, a surprisingly large number of +characters can be expressed without much effort. Some expressions do +however get more complicated, for example +[((((¶¹-¸ý)-¾®)-Æü+(¹©/½½))*»Ù)/»®]. + +

+Since Christian Wittern has been involved with the CBETA project for +some time, it has been possible to gain permission to include the +CBETA character database into the UTF-2000 character database. This +is especially interesting, since the CBETA data are derived directly +from text input and sources for the characters are easily determined, +quite contrary to dictionaries and standard documents, where it is not +easy to find real world examples for some of the characters. + +

+ +

+ +
+ideographic-structure °À­ +

+ +

+´Á»ú¤ÎÉôÉÊÁȤ߹ç¤ï¤»¹½Â¤¤Ë´Ø¤¹¤ë¾ðÊó¤ò XEmacs UTF-2000 ¤ÇÍøÍѤ¹¤ë¤¿¤á +¤Ëideographic-structure °À­¤òÄêµÁ¤·¤¿¡£ + +

+ideographic-structure °À­¤Î·¿¤Ï¡¢Ê¸»ú¡¢Ê¸»ú»ØÄê¤Þ¤¿¤Ïʸ»ú»²¾È¤Î¥ê¥¹¥È +¤Ç¤¢¤ë¡£ + +

+ideographic-structure °À­¤ÎÃͤÎÍ×ÁǤȤ·¤Æ»ØÄꤵ¤ì¤¿Ê¸»ú¡Ê¤ª¤è¤Óʸ»ú»Ø +Äê¡¢¤Þ¤¿¤Ï¡¢Ê¸»ú»²¾È¤Î :char °À­¤Ç»ØÄꤵ¤ì¤¿Ê¸»ú¡¦Ê¸»ú»²¾È¡Ë¤Ï +ideographic-structure °À­¤ò¼è¤ë¤³¤È¤¬¤Ç¤­¤ë¡£¤³¤ì¤Ë¤è¤ê¡¢ +ideographic-structure °À­¤ÏÆþ¤ì»Ò¹½Â¤¤ò¼è¤ë¤³¤È¤¬¤Ç¤­¤ë¡£ + +

+ +

+Extending the UTF-2000 character database +

+ +

+The Unicode Standard introduced in version 3.0 a set of socalled +`IDEOGRAPHIC DESCRIPTION CHARACTER' (IDC) to allow the construction of +Kanji glyph sequences. This set of operators followed a proposal from +China, based on research done there and describes 12 operators. For +the purpose of using glyph expressions in the UTF-2000 character +database, we decided to use the operator set from Unicode/ISO 10646. +This set is shown in Figure 4.5.4. + +

+ +

+
+ + + +
Figure 4.2: +The IDC from Unicode/ISO 10646
+
+ +\scalebox{0.5}{\includegraphics{mitou-report02.eps}} +
+

+ +

+The adaption of the CDP and CBETA database and subsequent inclusion in +the UTF-2000 character database thus involved the following steps: + +

+ +

    +
  1. Converting the underlying character code from Big5 to Unicode +
  2. +
  3. Mapping of entries for characters outside of the reference +encoding Big5 to Unicode +
  4. +
  5. Mapping of the characters from the PUA to Unicode +where possible +
  6. +
  7. Where the previous step did not produce a mapping, +a recursive use of IDC was applied where possible +
  8. +
  9. Modify the glyph expressions to adjust for the +different scope of the operators +
  10. +
  11. Add new glyph expressions for characters not in CDP +
  12. +
+ +

+Apart from this, some related supporting tasks were also necessary. +Since it is difficult to input unknown and rare Kanji characters, a +new input method had to be devised. For this purpose, a table of +input keys of the Four Corner system originally created by Christian +Wittern for the Kanji characters in CNS-11642:1992 as part of the +`KanjiBase' has been ported and adopted so that it could be used +within UTF-2000. Additionally, input keys for Kanji radicals in +different shapes and other characters from Unicode that where not yet +covered (essentially, characters with less than 7 strokes) have been +added. This newly expanded input table contains now more than 50000 +input keys and will be part of the UTF-2000 character database. + +

+Quite a different problem, that requires further attention is the way +the glyph expressions are composed. The CDP database uses a +`intuitive' approach and splits characters where the most logical +cut-off line is. This is, however not always the ethymological +correct way of splitting. In the UTF-2000 character database, we +prefer to have ethymological splitting and new expressions are added +in this way. The task of systematically identifying and changing the +intuitive splittings has not yet been done. + +

+The whole process, which is not yet fully completed, involved a +tedious and time consuming task of meticulously checking the accuracy +of every single entry for more than 70000 characters. At the time of +this writing, the porting and checking is done in a first go for more +than 40000 characters. This is an important and fundamental addition +to the UTF-2000 character database. + +

+ +

+
+ + + +
Figure 4.3: +A table of glyph expressions in XEmacs UTF-2000
+
+ +\scalebox{0.5}{\includegraphics{mitou-report03.eps}} +
+

+ +

+ +

+Additional benefits +

+ +

+As has been mentioned above, a table of input keys for the Four Corner +method has been ported to UTF-2000 to be used as input keys. Since +the Four Corner numbers are systematically assigned to the four +corners of a character, it is possible to generate new Four Corner +values based on existing characters, if the composition of characters +is known. Since this information is exactly the content of the glyph +expressions, new Four Corner input keys can automatically be +generated, thus covering the whole 70000 Unicode characters. This +provides also an additional method for proofreading both the glyph +expression data and the Four Corner input codes. + +

+ +

+


+ + +next + +up + +previous +
+ Next: Topic Maps ¤Ë´ð¤Å¤¯Âç°èʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹ + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯Ê¸½ñÊÔ½¸·Ï + +
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node5.html b/papers/mitou-2001-report/main/node5.html new file mode 100644 index 0000000..a5cbfc5 --- /dev/null +++ b/papers/mitou-2001-report/main/node5.html @@ -0,0 +1,684 @@ + + + + + + Mitou project: Model and implementation of a Character + Object Database + + + +

Contents

+ + +
+

5. Topic Maps +

+ + +
+

5.1. The topic map paradigm +

+ + +

The Topic Map Standard + ‘provides a standardized notation + for interchangeably representing information about the + structure of information resources used to define topics, + and the relationships between topics. ’ + A set of one or more + interrelated documents that employs the notation defined + by this International Standard is called a `topic map'. In + general, the structural information conveyed by topic maps + includes: +

    +
  1. groupings of addressable information objects + around `topics' (`occurrences'), and +
  2. +
  3. relationships between + topics (`associations'). +
  4. +
+ +

+ +

A topic map defines a + multidimensional topic space — a space in which the + locations are topics, and in which the distances between + topics are measurable in terms of the number of + intervening topics which must be visited in order to get + from one topic to another, and the kinds of relationships + that define the path from one topic to another, if any, + through the intervening topics, if any. (from the document + defining Topic Maps as ISO/IEC 13250:2000) +

+ +

The standard as published in 2000 includes a + serialization format specification in form of a Document + Type Definition (DTD) originally using HyTime (ISO/IEC + 10744:1992) Architectural Forms and a SGML (ISO/IEC + 8879:1986) based syntax. An independent group of vendors + started development of a XML based version, which was + published as XTM 1.0 in December 2000 and per ballot + adopted as an amendment to the ISO standard in December + 2001. The development here is based on the XML syntax, + which has also quite different elements and structure. +

+ +

Since SGML/XML based formats are overly verbose + (especially XTM 1.0) and awkward to work with, other + formats have been suggested, including the `Asymptotic Topic Map Notation, Authoring' + (AsTMA, by Robert Barta) and `Linear Topic Map + Notation' (LTM, by Lars Marius Garshol). Both + are essentially line + based and can be easily edited in UTF-2000 and other + editors. +

+ +

Besides defining a serialization format for the exchange of + information, the Topic Map standard also includes + constructs that are intended to faciliate exchange of + information. One of the most important tasks is to + reliably identify identical pieces of information across + different sources. Towards this end, rules for subsetting + and merging of topic maps are laid down in the standard. + Topics can be defined with reference to Published Subject + Indicators (PSI), which function in a similar way to XML + Namespaces. +

+ +
+ +
+

5.2. The character database as a topic map +

+ + + +

Characters are used in scripts for the writing of + languages, languages are distributed in different areas. + The exact form of these characters, as well as their + phonetic representation changes over time and area. The + adaption of the Topic Map paradigm in a character database + tries to use these different axes to organise them in a way + that is appropriate to the domain they are encountered + in. Characters are thus not only objects in their own + right, but these objects are organized in a hierarchy of + `super-class / sub-class' and + `class / instance' hierarchies. +

+ +

+ +

The topic map currently contains information along the + following axes: + +

    +
  • abstract character +
  • +
  • character instances +
  • +
  • variant shapes +
  • +
  • character structure +
  • +
  • language +
  • +
  • readings +
  • +
  • meanings +
  • +
  • time +
  • +
  • space +
  • +
  • frequency of usage +
  • +
  • mappings to coded character sets +
  • +
  • references to dictionaries +
  • +
+

+ +

While most of these are organized as occurrences of the + +

+ +

It might be appropriate to illustrate this with an + example. The character attributes for the character + U+03432 when viewed within the UTF-2000 framework + might have attributes similar to those shown in Figure 4. +

+ +

The character U+03432 displayed with
								the  function.

+ Figure 4 + +   The character U+03432 displayed with + the `what-char-definition' function. + +

+

+ +

Transformed to the topic map notation, the attributes of + the same character will look similar to Figure 5 + content has not changed, only the notation, within the + <occurrence> element, the attributes are similar to + key / value pairs. What is not visible here, however, is + the underlying structure, which has been used to define the + topic map. +

+ +

+  The attributes of character U+03432
							in Topic Map notation.

+ Figure 5 + +    The attributes of character U+03432 + in Topic Map notation. + +

+ It should also be noted, that the attributes under + `ideographic-structure' are not listed + as occurences. These attributes are expressed using + separate topics for the character components and the + <association> element to connect them, as shown in + Figure 6. +

+ +

+  The ideographic-structure of character U+03432
							in Topic Map notation.

+ Figure 6 + +    The ideographic-structure of character U+03432 + in Topic Map notation. + +

+ +

+ +
+ +
+

5.3. A Topic Map engine with Zope +

+ + +
+

5.3.1. Why Zope? +

+ + +

Zope (Zope Object Publishing Environment) is an + object-oriented Web-Application server developped by Zope + Corporation (former Digital Creations) using a + community-based open-source development model. It is + written in Python, with only a few critical parts in C. + Although it is mostly considered as an environment for rapid + development of dynamic Web content, it is originally and + formostly an environment for publishing objects. The + underlying storage is a object oriented database, which + makes it uniquely suited for storing hierarchical data + structures like a Topic Map. +

+ +

Since Zope acts as a Web-Server, it can also be seen as a + networked database. It can be accessed through the HTTP + protocol, but also through WebDAV and XML-RPC. One of the + advantages of using a Zope based implementation is thus that + it can also be used as a distributed editing environment and + at the same time act as a backend to be accessed from XEmacs + UTF-2000. +

+ +
+ +
+

5.3.2. Requirements for a Topic Map engine +

+ + +

Since some of the concepts of Topic Maps are quite new + and not yet fully fleshed out in the Topic Map community + (for example is the Topic Maps Query Language TMQL still + in the stage of requirements and no consensus has been + reached, what it will mean to query a topic map), some of + the more arcane features will not be covered by this + prototype. Instead of more demanding Topic Map queries, + which might involve inferences and other Topic Map calculus, searches + will directly access the data in the Topic Map. Merging + directives, which are problematic among other things + because of the `Topic Map Basename + Constraint' (TMBC) are not initially supported. +

+ +

The prototype should be able to : + +

    +
  • Import and export data from XEmacs UTF-2000 +
  • +
  • Use a network based communiction protocol to communicate +
  • +
  • Provide access to the Topic Map (read/write + topics, occurrences and associations) +
  • +
  • Be designed for generic Topic Maps, not for + specific data types +
  • +
  • Allow an assessment of the feasability of this approach. +
  • +
+ +

+ +
+ +
+

5.3.3. Implementation details +

+ + +

Zope is extended in functionality by developing add-on + modules, called `Products' in + Zope-speach. Products can be developped within the + Zope-Database based on ZClasses or as file-system based + Python classes. In a first implementation, ZClasses + have been used. +

+ +

In this implementation, four classes have been used to + represent the different objects of a Topic Map: +

    +
  • topicmap: The container item for all the other classes +
  • +
  • topic: Container item for occurrences +
  • +
  • occurrence: Holds the key / value pairs of occurrences +
  • +
  • association: Information about the type, role + and value of the members is hold in instance attributes +
  • +
+ This data structure was closely modelled on the underlying + data structure of the Topic Map serialization format, as + realized in the XTM 1.0 DTD. The built-in Zope search + engine ZCatalog was used to built indices and access the + different information axes. Figure 7 + shows a screenshot from the Zope development interface + showing the classes being developed. +

+ +

+ The Zope Management screen with the ZClasses
								under development

+ Figure 7 + +   The Zope Management screen with the ZClasses + under development + +

+ +

+ +

This approach turned out to induce a large overhead for + the data and proved problematic for Topic Maps with more + than approximately 1000 topics and associations. For this + reasons, this approach has been given up. +

+ +

The next logical step was to use a native Python + Product, insted of the ZClasses. This should give better + performance, since less overhead is involved, it also + allows greater flexibility in the data structures. An + additional advantage is that a more efficient development + environment could be used due to the fact that the source + is on the file system and not in the Zope database. +

+ +

Performance was slightly improved, but not as much as + hoped for. It also turned out that some flaws in the + data structure defined for the Python classes did not + allow the full expressive power that was required for + Topic Maps in XTM 1.0. +

+ +

Around this time, development activity started once + again in the Zope ParsedXML product, which is the Zope + product that provides XML functionality. Since an XML + Document Object Model (DOM) tree shares some similiarity + to the Zope DOM (ZDOM) used to store the Zope objects, it + was expected that this approach might scale better. An + additional advantage was that Zope procedures could be + used to directly expose XML elements in DTML (Document + Template Markup Language). For this reason, it was + decided to start once again, this time with ZClasses using + the ParsedXML product. +

+ +

Development of this prototype had progressed quite some + while, when it was realized that the support for Unicode + in Zope, which was introduced in Zope 2.4.0 had some + flaws. While UTF-8 could be used with out problems in + previous versions, the partial support for Unicode meant + that Python UnicodeStrings in some cases could be cast as + AsciiStrings, which would crash the process. While some + patches became available and development of the Zope core + continued to adress this problem, it remained acute even + with the recent 2.5.0 release and will probably only + resolved in the upcoming Zope 3.0 release, which will be a + major rewrite. +

+ +

While the improvement of the support for Unicode within + Zope is important, it remains outside of the scope and + timeframe of this project. As a temporary fix therefore, + no Unicode characters can be used in the TopicMap engine. + While this is unfortunate, since the XML standard + explicitly requires conformant XML processors to support + at least UTF-8 and UTF-16, there is nothing that can be + done about this at the moment, this situation will improve + with the arrival of a fully Unicode compliant version of + Zope. +

+ +
+ +
+

5.3.4. A browser-based interface to the Topic Map engine +

+ + +

When a new Topic Map has been created or imported into + the Zope Topic Map engine, it can be explored on the + Topic Map overview screen, as shown in Figure 8

+ +

The Topic Map overview screen

+ Figure 8 + +   The Topic Map overview screen + +

+ +

+ +

This screen is divided in several parts. The top frame + provides a general interface to manage the display of the + Topic Map, it also here that other Topic Maps can be + selected. This part allows also the addition of new + topics as well as global searches over the Topic Map. The + frame on the left is for navigating the Topic Map. By + default, it shows a list of topics in the topic map. + Since this list can be potentially very long, the default + length is set to 20, if there are more topics, the list + will be displayed in batches. The list can be limited + down in various ways: +

    +
  • by using the scopes (or themes) defined in the + Topic Map +
  • +
  • by searching the Topic Map; this will limit the + list to the search results +
  • +
  • by defining new scopes (if the user has the + appropriate rights, these can also be stored in the Topic + Map and be used in the future) +
  • +
+ +

+ +

The main frame shows a short information about this + Topic Map engine, this will be used to display the topic + details as shown in Figure 9

+ +

The details of a topic

+ Figure 9 + +   The details of a topic + +

+ +

+ +

The Topic Map engine can not only be used to browse the + Topic Map, but also to add or edit new topics, occurrences + or associations. A click on the `Add' + button in the upper right area of Figure 8 will lead to the entry screen + in Figure 10

+ +

+ The entry form for new topics

+ Figure 10 + +   The entry form for new topics + +

+ +

+ +

Occurences for topics can be added from the topic + details screen as shown in Figure 9, + associations can be added by checking the topics to be + associated in the list of topics on the left frame and + then clicking on the `Add + Association' button. +

+ +

The interface to the Topic Map as developed here is + generic and rather primitive. It does however however to + develop and maintain Topic Maps in a distributed way. + Because of its generic nature, it is cumbersome to use + for specific Topic Maps, since it is not aware of topics + that might be defined as Topic Map templates. Since + there is not yet a standardized way to define Topic Map + templates, automatic generation of a customized user + interface for specific Topic Maps will have to wait + until such a definition is finalized. + +

+ +
+ +
+

5.3.5. The interface to XEmacs UTF-2000 +

+ + +

Beside the browser based user interface described in + the previous section, the Zope Topic Map engine can also + be interfaced from XEmacs UTF-2000. This can be done + through XML-RPC, WebDAV or HTTP. The format of the + returned values can be either in XML, HTML or in a list + formatted in LISP syntax. +

+ +

Currently, the following commands are implemented + (parameters are key/value pairs that are submitted using + the appropriate syntax): + +

Retrieval

+
    +
  • tm-topics: Lists topics. Parameters are: +
      +
    • scope: string that specifies the scoping topics +
    • +
    • name: string that will be used to search for + the <baseName> of topics +
    • +
    • display: scope to be used to select a name + to return +
    • +
    • occurences: type or scope of the occurrences + to be returned +
    • +
    • format: `XML', + `HTML' or `LISP'. +
    • +
    + +
  • +
  • tm-members: Lists associations that + have members as specified in the query. Parameters + are: +
      +
    • scope: string that specifies the members to + look for +
    • +
    • display: scope to be used to select a name + to return +
    • +
    • occurences: type or scope of the occurrences + to be returned for the members +
    • +
    • format: `XML', + `HTML' or `LISP'. +
    • +
    + +
  • +
+ +

Maintenance

+
    +
  • tm-add: This command will add a new + topic. If the topic already exists, it will replace + or add <occurrence> or <baseName> + elements as specified in the request. It can also + be used to change the list of scoping topics. Parameters: + +
      +
    • args: A string that gives the items to be + added as key/value pairs +
    • +
    + +
  • +
  • tm-delete: This command will delete + the specified topic. +
      +
    • topic: the topic to be deleted +
    • +
    + +
  • +
+ +

+ +

This is a very low-level interface that will need to be + complemented with higher-level commands to integrate it + with the oeverall workings of XEmacs and the XEmacs + UTF-2000 character database. +

+ + +
+ +
+

5.3.6. Evaluation +

+ + +

The goal of developping a complete Topic Map engine + based on Zope has not been reached. This has been partly + due to the development process, which had to confront some + fundamental issues of processing Topic Maps, which had not + been solved so far. While the goal of developing a + generic Topic Map engine is worthwhile and important, it + proved to be too ambitious for the context of this + project. We therefore had to settle to a solution that + works well for this context and are confident that it will + be possible to generalize from there. +

+ +

It has also been realized that Zope is maybe not a + suitable platform for holding the potentially very large + data of a Topic Map. Using a database for this approach + would be better. +

+ +
+ +
+ +
+

5.4. Other possibilities +

+ + +

The current model of implementing the Topic Map engine + and interfacing it with XEmacs UTF-2000 is + based on a two way connection. +

+ +

Storing the Topic Map in the Zope object database proved + to be a performance bottleneck. The logical way to solve + this problem is to move the data to an external storage. To + test the feasability of this approach, the Topic Map + datastructure has been mapped to a set of relational + database tables and a Topic Map has been imported into the database + Postgresql. +

+ +

The connection between XEmacs UTF-2000, the + Topic Map engine within Zope and the storage backend can now + be established in a triangular way as shown in Figure 12. The red arrows symbolize updates to + the database, while the green arrows stand for data that are + retrieved from the databases. Both, XEmacs UTF-2000 and the + Zope Topic Map engine will be able to commit updates and + retrieve data. While the model employed so far + assumed a direct communication between XEmacs UTF-2000 and + the Zope Topic Map engine, this model provides a far more + flexible way of communication by introducing another layer + between them. This model is also extendible, since more + partners can be connected to the database through a set of + well defined interfaces and a cascade of such layers can be + built in a distributed way. +

+ +

Communication between XEmacs UTF-2000, Zope and
					the PostgreSQL database

+ Figure 12 + +   Communication between XEmacs UTF-2000, Zope and + the PostgreSQL database + +

+ +

+ +

+ While time did not permit to properly change the backend + of the Topic Map engine, + this will be a straightforward task that is not expected to + require changes to the other layers of the program. +

+ +
+ +
+ +
+
Date: Time-stamp: "02/02/13 17:30:32 chris" +  Author: Christian Wittern. +
+
+ + \ No newline at end of file diff --git a/papers/mitou-2001-report/main/node6.html b/papers/mitou-2001-report/main/node6.html new file mode 100644 index 0000000..ba2e912 --- /dev/null +++ b/papers/mitou-2001-report/main/node6.html @@ -0,0 +1,323 @@ + + + + + +ʸ½ñÊÔ½¸·Ï¤È³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÎÅý¹ç + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: Bibliography + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: Topic Maps ¤Ë´ð¤Å¤¯Âç°èʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹ +
+
+ + +Subsections + + + +
+ +

+ʸ½ñÊÔ½¸·Ï¤È³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÎÅý¹ç +

+ +

+UTF-2000 ¼ÂÁõ¤Ç¤Ï½èÍýÂоݤȤ¹¤ë¤¹¤Ù¤Æ¤Îʸ»ú¤ÎÃ챤òʸ»ú°À­¤È¤·¤ÆÊÝ»ý +¤·¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¤¿¤á¤Ë¡¢Éä¹æ²½Ê¸»ú¥â¥Ç¥ë¤Ë´ð¤Å¤¯½¾Íè·¿¤Îʸ»ú½èÍý·Ï¤Ë +Èæ¤Ù¤Æ¿¤¯¤Îµ­²±»ñ¸»¤òɬÍפȤ¹¤ë¡£ + +

+XEmacs UTF-2000 ¤Ç¤Ïʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï define-char ·Á¼°¤Î Emacs +Lisp ¥×¥í¥°¥é¥à¤È¤·¤Æɽ¸½¤µ¤ì¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹Á´ÂΤòÆɤ߹þ¤ó¤À¾õ +Â֤ε­²±¥¤¥á¡¼¥¸¤ò¥À¥ó¥×¤·¤¿¼Â¹Ô·Á¼°¤òºî¤ê¡¢¤½¤Î¥À¥ó¥×¤µ¤ì¤¿¼Â¹Ô·Á¼°¤ò +ÍѤ¤¤ë¡£¤³¤Î¤¿¤á¡¢XEmacs UTF-2000 ¤Î¥À¥ó¥×¸å¤Î¼Â¹Ô·Á¼°¤ÎÂ礭¤µ¤È¸µ¤È¤Ê¤Ã +¤¿ XEmacs-Mule ¤Î¼Â¹Ô·Á¼°¤ÎÂ礭¤µ¤Îº¹¤Ïʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤òÊÝ»ý¤¹¤ë +¤¿¤á¤Îµ­²±»ñ¸»¤ÎÂ礭¤µ¤ò°ÕÌ£¤·¤Æ¤¤¤ë¡£ + +

+½é´ü¤Î XEmacs UTF-2000 ¤Ïʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÎÊÝ»ý¤¹¤ë¤¿¤á¤Îµ¡¹½¤Î¸ú +Ψ²½¤¬½½Ê¬¤Ç¤Ê¤«¤Ã¤¿¤³¤È¤â¤¢¤ê¡¢i386 ¥¢¡¼¥­¥Æ¥¯¥Á¥ã¾å¤Î Linux ¤Ë¤ª¤¤¤Æ +Åö»þ¤ÎXEmacs-Mule ¤Î¼Â¹Ô·Á¼°¤¬Ìó 10 MB ¤Ç¤¢¤Ã¤¿¤Î¤ËÂФ·¡¢Ìó5Ëü»úʬ¤Îʸ +»ú¥Ç¡¼¥¿¤òÊÝ»ý¤·¤¿¾õÂ֤Ǽ¹ԷÁ¼°¤¬ 40 MB ¤ò±Û¤¨¤ë¤è¤¦¤Ë¤Ê¤Ã¤¿¡£¤½¤Î¸å¡¢ +ʸ»ú¥Ç¡¼¥¿¤òÊÝ»ý¤¹¤ëµ¡¹½¤ò²þÎɤ·¡¢µ­²±¸úΨ¤ò¸þ¾å¤·¤¿¤¿¤á¡¢ºÇ¶á¤ÎXEmacs +UTF-2000 ¤Ç¤ÏÌó7Ëü»úʬ¤Îʸ»ú¥Ç¡¼¥¿¤òÊÝ»ý¤·¤¿¾õÂ֤Ǽ¹ԷÁ¼°¤ÏÌó27 MB ¤È +¤Ê¤Ã¤Æ¤¤¤ë¡£ + +

+¤³¤Î¤è¤¦¤Ë¡¢¼çµ­²±¾å¤Ëʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤òÊÝ»ý¤¹¤ëÊýË¡¤Ï¿¤¯¤Îµ­²±»ñ +¸»¤òÍפ¹¤ë¤È¤¤¤¦ÅÀ¤ÇÌäÂ꤬¤¢¤ë¡£¤½¤·¤Æ¡¢Ä̾ï¤ÎÍøÍѤÇɬÍפȤʤëʸ»ú¤Ï¿ô +É´¤«¤é¿ôÀé¤Ç¤¢¤ê¡¢É¬ÍפȤ¹¤ëʸ»ú°À­¤â¸Â¤é¤ì¤Æ¤¤¤ë¤È¤¤¤¦¤³¤È¤ò¹Í¤¨¤ë¤È¡¢ +ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹Á´ÂΤò¥À¥ó¥×¤¹¤ë¤È¤¤¤¦ÊýË¡¤Ï˾¤Þ¤·¤¯¤Ê¤¤¤È¹Í¤¨¤é¤ì +¤ë¡£ + +

+ʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹Á´ÂΤò¥À¥ó¥×¤¹¤ëÊýË¡¤Î¤â¤¦°ì¤Ä¤ÎÌäÂêÅÀ¤Ï¡¢UTF-2000 +¼ÂÁõ¤Èʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬ÉÔ²Äʬ¤Ë¤Ê¤Ã¤Æ¤·¤Þ¤¦¤³¤È¤Ç¤¢¤ë¡£¤Ä¤Þ¤ê¡¢°Û +¤Ê¤ë UTF-2000 ¼ÂÁõ´Ö¤Çʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬¶¦Í­¤Ç¤­¤Ê¤¤¤³¤È¤ò°ÕÌ£¤¹¤ë¡£ +¤Þ¤¿¡¢UTF-2000 ¼ÂÁõ¤Î³«È¯¤Èʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î¥á¥ó¥Æ¥Ê¥ó¥¹¤ÏÈó¾ï¤Ë +À­¼Á¤Î°Û¤Ê¤ëºî¶È¤Ç¤¢¤ë¤Ë¤â´Ø¤ï¤é¤º¡¢Î¾¼Ô¤òÅý¹ç¤·¤¿·Á¤Ç¥½¡¼¥¹¥Õ¥¡¥¤¥ë¤ò +´ÉÍý¤·¤Ê¤±¤ì¤Ð¤Ê¤é¤Ê¤¤¡£ + +

+¤³¤Î¤è¤¦¤Ê¤³¤È¤ò¹Í¤¨¤ë¤È¡¢UTF-2000 ¼ÂÁõ¤Èʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÏʬΥ¤· +¤¿Êý¤¬Îɤ¤¤È¹Í¤¨¤é¤ì¤ë¡£¤¹¤Ê¤ï¤Á¡¢Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï UTF-2000 ¼ÂÁõ +¤Î³°Éô¤Ë¤¢¤ë¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËÊÝ»ý¤·¡¢UTF-2000 ¼ÂÁõ¤Ë¤ª¤¤¤Æ³°Éô¥Ç¡¼¥¿¥Ù¡¼ +¥¹¤«¤é½èÍý¤ËɬÍפÊʬ¤À¤±¥Ç¡¼¥¿¤ò¥í¡¼¥É¤¹¤ë¤è¤¦¤Ë¤¹¤ëÌõ¤Ç¤¢¤ë¡£ + +

+¤³¤Î¤è¤¦¤Ê¹Í¤¨¤ËΩ¤Á¡¢ËÜ¥×¥í¥¸¥§¥¯¥È¤Ç¤Ï XEmacs UTF-2000 ¤Ë¤ª¤¤¤Æ³°Éô +¤Îʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤òÍøÍѤ¹¤ë¤¿¤á¤Îµ¡¹½¤ò³«È¯¤·¤¿¡£ËܾϤǤϤ³¤Îµ¡¹½¤Ë´Ø +¤·¤Æ³µÀ⤹¤ë¡£ + +

+ +

+´ðËܹ½Â¤ +

+ +

+¸½ºß¤Î XEmacs UTF-2000 ¤Ç¤Ïʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ï°À­Ëè¤Î¥Ç¡¼¥¿¹½Â¤ +(char-id-table) ¤Ëʬ¤±¤ÆÊÝ»ý¤µ¤ì¤Æ¤¤¤ë¡£¤³¤Îchar-id-table ¤Ïʸ»ú id ¤« +¤é°À­Ãͤòº÷¤¯¤¿¤á¤Î¥Ç¡¼¥¿¹½Â¤¤Ç¤¢¤ê¡¢Â°À­ÃͤȤ·¤ÆǤ°Õ¤Î Lisp ¥ª¥Ö¥¸¥§ +¥¯¥È¤¬³ÊǼ¤Ç¤­¤ë¡£Ê¸»ú°À­Ì¾¤È char-id-table ¤ÎÂбþ¤Ï¥Ï¥Ã¥·¥åɽ¤Ë¤è¤Ã +¤Æ´ÉÍý¤µ¤ì¤Æ¤ª¤ê¡¢¤³¤ì¤Ë¤è¤êʸ»ú°À­Ì¾¤«¤é¤½¤Î°À­Ãͤòº÷¤¯¤³¤È¤¬¤Ç¤­¤ë¡£ + +

+4.2 Àᤪ¤è¤Ó 4.2 Àá¤Ç½Ò¤Ù¤¿¤è +¤¦¤Ë¡¢Éä¹ç°ÌÃÖ°À­¤Ï coded-charset ¤Ç¤ÎÉü¹æ½èÍý¤ËÍøÍѤµ¤ì¤ë¤¬¡¢¤³¤ì¤Ï +char-id-table ¤òµÕº÷¤­¤¹¤ë¤³¤È¤ËÁêÅö¤¹¤ë¡£Éü¹æ½èÍý¤Ï½ÅÍפʽèÍý¤Ç¤¢¤ê¤³ +¤Î¹â®²½¤Î¤¿¤á¤Ë¡¢coded-charset ¤Ç¤ÎÉä¹ç°ÌÃÖ¤«¤éʸ»ú¥ª¥Ö¥¸¥§¥¯¥È¤òº÷¤¯ +¤¿¤á¤Î¥Ç¡¼¥¿¹½Â¤ (decoding-table) ¤âÍÑ°Õ¤·¤Æ¤¤¤ë¡£¤³¤ì¤Ï 1 byte Ëè¤Ëʬ +³ä¤·¤¿Éä¹ç°ÌÃÖ¤òÍѤ¤¤¿Æþ¤ì»Ò¾å¤ÎÇÛÎó¤Ç¼Â¸½¤µ¤ì¤Æ¤¤¤ë¡£ + +

+XEmacs UTF-2000 ¤Ë¤ª¤¤¤Æ³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤«¤éɬÍפʻþ¤Ë¾ðÊó¤ò³ÍÆÀ¤¹¤ë¤³ +¤È (lazy-loading) ¤ò¤¹¤ë¤¿¤á¤Ë¤Ï¡¢char-id-table ¤ª¤è¤Ó decoding-table +¤ËÃͤ¬Â¸ºß¤·¤Ê¤¤¤³¤È¤ò¼¨¤¹°õ¤¬É¬ÍפǤ¢¤ë¡£¤³¤Î¤¿¤á¤Ë¡¢µ­²±¶õ´ÖÃæ¤ËÃͤ¬ +¸ºß¤·¤Ê¤¤¤³¤È¤ò¼¨¤¹ÆÃÊÌ¤Ê Lisp ¥ª¥Ö¥¸¥§¥¯¥È Qunloaded ¤òƳÆþ +¤·¤¿¡£¤¹¤Ê¤ï¤Á¡¢char-id-table ¤ª¤è¤Ó decoding-table ¤òº÷¤¤¤¿»þ¤ËÆÀ¤é¤ì +¤¿Ãͤ¬ Qunloaded ¤Ç¤¢¤ì¤Ð³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤«¤é¾ðÊó¤ò³ÍÆÀ¤·¤Ê¤±¤ì¤Ð¤Ê¤é +¤Ê¤¤¤³¤È¤¬¤ï¤«¤ë¡£¤½¤·¤Æ¡¢¤³¤Î»þ¡¢³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤«¤é¾ðÊó¤ò³ÍÆÀ¤¹¤ë¤³ +¤È¤¬¤Ç¤­¤ì¤Ð¡¢Qunloaded ¤ò¤½¤Î·ë²Ì¤ÇÃÖ¤­´¹¤¨¤ë¤È¤È¤â¤Ë¡¢¤½¤Î³ÍÆÀ¤·¤¿ÃÍ +¤òÊÖ¤¹Ìõ¤Ç¤¢¤ë¡£ + +

+ +

+Berkeley DB ¤òÍѤ¤¤¿¼ÂÁõ +

+ +

+XEmacs ¤Ï Berkeley DB ¤Î¤è¤¦¤Ê°À­ÃͤòÊÝ»ý¤¹¤ë¤¿¤á¤Îñ½ã¤Ê¥Ç¡¼¥¿¥Ù¡¼¥¹ +¤òÃê¾Ý²½¤·¤¿ database µ¡Ç½¤ò»ý¤Ã¤Æ¤¤¤ë¡£ËÜ¥×¥í¥¸¥§¥¯¥È¤Ç¤Ï¤³ +¤Îµ¡Ç½¤òÍøÍѤ·¤¿Ê¸»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î³°Éô²½µ¡Ç½¤ò¼Â¸½¤·¤¿¡£¸½ºß¤Î¤È¤³ +¤í¡¢Æ°ºî³Îǧ¤Ï Debian GNU/Linux (sid) ¤Ë¤ª¤±¤ë Berkeley DB Version 3 +¤Ç¤Î¤ß¹Ô¤Ê¤Ã¤Æ¤¤¤ë¡£ + +

+ +

+¥Ç¡¼¥¿¥Ù¡¼¥¹¡¦¥Õ¥¡¥¤¥ë¤È¤ÎÂбþ¤Å¤± +

+ +

+char-id-table ¤ª¤è¤Ó decoding-table ¤Ï +

+¡Øʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î¥ë¡¼¥È¡Ù/¡Ø¸°¤Î¼ïÎà¡Ù/¡ØÃͤμïÎà¡Ù + +
+¤È¤¤¤¦¥Õ¥¡¥¤¥ë̾¤Î¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËÂбþÉÕ¤±¤é¤ì¤ë¡£ + +

+¤³¤³¤Ç¡¢¡Ø¸°¤Î¼ïÎà¡Ù ¤Ï¤½¤Î¾ðÊó¤Î¸°¤ÎÉä¹æ²½Ë¡¤òɽ¤¹¡£Î㤨¤Ð¡¢ +char-id-table ¤Î¾ì¹ç¡¢¸°¤ÎÉä¹æ²½Ë¡¤Ï XEmacs UTF-2000 ¤Îʸ»ú id ¤Ê¤Î¤Ç¡¢ +¤½¤ì¤òɽ¤¹Ì¾Á° system-char-id ¤¬ ¡Ø¸°¤Î¼ïÎà¡Ù ¤È¤Ê¤ë¡£ +decoding-table ¤Î¾ì¹ç coded-charset ¤Î̾Á°¤¬ ¡Ø¸°¤Î¼ïÎà¡Ù ¤È¤Ê +¤ë¡£Î㤨¤Ð¡¢ascii ¤Î¾ì¹ç ¡Ø¸°¤Î¼ïÎà¡Ù ¤Ï +ascii ¤È¤Ê¤ë¡£ + +

+¡ØÃͤμïÎà¡Ù ¤Ï¤½¤Î°À­¤Î̾Á°¤ËÂбþÉÕ¤±¤é¤ì¤ë¡£char-id-table ¤Î +¾ì¹ç¡¢Ê¸»ú°À­¤Î̾Á°¤¬ÍѤ¤¤é¤ì¤ë¡£¤¿¤À¤·¡¢Â°À­Ì¾¤Ë¥Õ¥¡¥¤¥ë̾¤È¤·¤ÆÍѤ¤ +¤ë¤³¤È¤¬¤Ç¤­¤Ê¤¤Ê¸»ú¤¬´Þ¤Þ¤ì¤Æ¤¤¤¿¾ì¹ç¡¢Emacs Lisp ¤Ë¤ª¤±¤ë +\-quoting ¤ò¹Ô¤Ê¤¦¡£¤Þ¤¿¡¢decoding-table ¤Î¾ì¹ç¡¢ +system-char-id ¤òÍѤ¤¤ë¡£ + +

+°Ê²¼¤Ë´ö¤Ä¤«Îã¤ò¼¨¤¹¡§ +

+¡Øʸ»ú°À­¥Ç¡¼¥¿¥Ù¡¼¥¹¤Î¥ë¡¼¥È¡Ù ¤ò + /usr/local/libexec/char-db/ ¤È¤¹¤ë»þ¡¢ +
+
Îã1
+
ʸ»ú°À­ ideographic-structure ¤Ï + /usr/local/libexec/char-db/system-char-id/ideographic-structure + ¤ËÂбþ¤¹¤ë¡£ + +
+
Îã2
+
ascii ¤Î decoding-table ¤Ï + /usr/local/libexec/char-db/ascii/system-char-id ¤ËÂбþ¤¹ + ¤ë¡£ + +
+
+ +

+ +

+³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ò°·¤¦¤¿¤á¤Î API +

+ +

+³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤«¤é¤Îʸ»ú¥Ç¡¼¥¿¤Î³ÍÆÀ¤ÏɬÍפʻþ¤Ë¼«Æ°Åª¤Ë¹Ô¤Ê¤ï¤ì¤ë¤¬¡¢ +µ­²±¶õ´ÖÃæ¤Îʸ»ú¥Ç¡¼¥¿¤È³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÎÆþ½ÐÎϤòÀ©¸æ¤¹¤ë¤¿¤á¤Ë´ö¤Ä¤« +¤Î API ¤ò³ÈÄ¥¤·¤¿¡£ + +

+

+ + ´Ø¿ô save-char-attribute-table (attribute) + +
+ʸ»ú°À­ attribute ¤Î¤¹¤Ù¤Æ¤ÎÃͤò¤³¤Î°À­¤ËÂбþÉÕ¤±¤é¤ì¤¿ + ³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ËÊݸ¤¹¤ë¡£ +
+

+

ÂбþÉÕ¤±¤é¤ì¤¿³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬Â¸ºß¤·¤Ê¤¤¾ì¹ç¡¢²¿¤â¤·¤Ê¤¤¡£ + +
+
+ +

+

+ + ´Ø¿ô save-charset-mapping-table) (coded-charset) + +
+coded-charset ¤Î decoding-table ¤òÂбþÉÕ¤±¤é¤ì¤¿³°Éô¥Ç¡¼ + ¥¿¥Ù¡¼¥¹¤ËÊݸ¤¹¤ë¡£ +
+

+

ÂбþÉÕ¤±¤é¤ì¤¿³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬Â¸ºß¤·¤Ê¤¤¾ì¹ç¡¢²¿¤â¤·¤Ê¤¤¡£ + +
+
+ +

+

+ + ´Ø¿ô reset-char-attribute-table (attribute) + +
+ʸ»ú°À­ attribute ¤ËÂбþÉÕ¤±¤é¤ì¤¿³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤¬Â¸ºß + ¤¹¤ë»þ¡¢¤¹¤Ù¤Æ¤Î°À­Ãͤò¾Ãµî¤·¡¢³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤«¤éÆɤ߹þ¤á¤ë¾õÂÖ + ¤Ë¤¹¤ë¡£ + +
+
+ +

+ +

+ɾ²Á +

+ +

+TM 5800 ¾å¤Î Debian GNU/Linux (sid) ¤Ë¤ª¤¤¤Æ¡¢Ìó7Ëü»ú¤Î¤Îʸ»úÄêµÁ¤ò»ý¤Ä +XEmacs 21.2.43 UTF-2000 ¤Î¥À¥ó¥×¸å¤Î¼Â¹Ô·Á¼°¤ÎÂ礭¤µ¤¬ 27 MB (strip ¸å +22 MB) ¤Ç¤¢¤ë¤Î¤ËÂФ·¤Æ¡¢lazy-loading ÈǤμ¹ԷÁ¼°¤ÎÂ礭¤µ¤Ï 15 MB +(strip ¸å 10 MB) ¤È¤Ê¤Ã¤¿¡£¤Á¤Ê¤ß¤Ë¡¢XEmacs 21.2.43¡Êmule ÉÕ¤­¡Ë¤Î¼Â¹Ô +·Á¼°¤ÎÂ礭¤µ¤Ï 10 MB (strip ¸å 6 MB) ¤Ç¤¢¤ë¡£ + +

+lazy-loading ÈǤμ¹ԷÁ¼°¤ÎÂ礭¤µ¤¬¤Ê¤ª XEmacs-Mule ¤è¤ê¤â 5 MB ÄøÂ礭 +¤¤¤Î¤Ï¡¢XEmacs-Mule ¤«¤é°ú¤­·Ñ¤¤¤À Emacs Lisp code ¤Ë¤ª¤¤¤Æ¡¢ +coded-charset ¤ò¸°¤È¤·¤¿ char-table ¤¬Â¿ÍѤµ¤ì¤Æ¤¤¤ë¤»¤¤¤À¤È¹Í¤¨¤é¤ì¤ë¡£ +XEmacs UTF-2000 ¤Ç¤Ï char-table ¤Ï char-id-table ¤Ç¼ÂÁõ¤µ¤ì¤Æ¤ª¤ê¡¢ +coded-charset ¤ò¸°¤Ë¤·¤ÆÃͤòÀßÄꤷ¤¿¾ì¹ç¡¢¤½¤Î coded-charset ¤Ë°¤¹¤ë +¤¹¤Ù¤Æ¤Îʸ»ú¤ËÂФ·¤ÆÃͤòÀßÄꤹ¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤ë¤¿¤á¡¢É¬Íפʵ­²±Î̤¬ËÄ +¤é¤à¤È¹Í¤¨¤é¤ì¤ë¡£¤Þ¤¿¡¢char-table ¤Ïʸ»ú°À­¤È°Û¤Ê¤ê³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹ +¤ËÂбþÉÕ¤±¤é¤ì¤Æ¤¤¤Ê¤¤¤¿¤á¡¢lazy-loading ¤¬¤Ç¤­¤Ê¤¤¤Î¤Ç¤¢¤ë¡£¤è¤Ã¤Æ¡¢ +¤³¤ÎÅÀ¤ò²þÎɤ¹¤ì¤Ð lazy-loading ÈÇ XEmacs UTF-2000 ¤Î¼Â¹Ô·Á¼°¤ÎÂ礭¤µ +¤ò XEmacs-Mule ¤ÈƱÄøÅ٤ˤ¹¤ë¤³¤È¤¬¤Ç¤­¤ë¤È¹Í¤¨¤é¤ì¤ë¡£ + +

+ +

+


+ + +next + +up + +previous +
+ Next: Bibliography + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: Topic Maps ¤Ë´ð¤Å¤¯Âç°èʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹ + +
+MORIOKA Tomohiko +2002-02-15 +
+ + diff --git a/papers/mitou-2001-report/main/node7.html b/papers/mitou-2001-report/main/node7.html new file mode 100644 index 0000000..f6e56e8 --- /dev/null +++ b/papers/mitou-2001-report/main/node7.html @@ -0,0 +1,98 @@ + + + + + +Bibliography + + + + + + + + + + + + + + + + + + + + +next + +up + +previous +
+ Next: About this document ... + Up: 2001ǯÅṲ̀Ƨ¥½¥Õ¥È¥¦¥§¥¢ÁϤ»ö¶È ʸ»ú¥Ç¡¼¥¿¥Ù¡¼¥¹¤Ë´ð¤Å¤¯ ʸ»ú¥ª¥Ö¥¸¥§¥¯¥Èµ»½Ñ¤Î¹½ÃÛ ¡Ê·ÀÌóÈÖ¹æ + Previous: ʸ½ñÊÔ½¸·Ï¤È³°Éô¥Ç¡¼¥¿¥Ù¡¼¥¹¤ÎÅý¹ç +

+ + +

+Bibliography +

1 +
+International Organization for Standardization (ISO). +
Information technology - Character code structure and extension + techniques, 1994. +
ISO/IEC 2022:1994 ¡Ê= JIS X 0202,¡Ö¾ðÊó¸ò´¹ÍÑÉä¹æ¤Î³Èĥˡ¡×¡Ë. + +

2 +
+International Organization for Standardization (ISO). +
Universal Multiple-Octet Coded Character Set (UCS) - Part 1: + Architecture and Basic Multilingual Plane (BMP), March 2000. +
ISO/IEC 10646-1:2000. + +

3 +
+Mikiko Nishikimi, Ken'ichi Handa, and Satoru Tomura. +
Mule: MULtilingual Enhancement to GNU Emacs. +
In Proc. INET '93, pages GAB-1-GAB-9, 1993. + +

4 +
+Richard M. Stallman et al. +
GNU Emacs version 20.7. +
ftp://ftp.gnu.org/gnu/emacs-20.7.tar.gz, June 2000. + +

5 +
+The Unicode Consortium. +
The Unicode Standard, Version 3.0, February 2000. + +

6 +
+XEmacs. +
http://www.xemacs.org/. +
+ +

+


+
+MORIOKA Tomohiko +2002-02-15 +
+ +