X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;f=papers%2Fmitou-2001-report%2Fmain%2Fnode5.html;fp=papers%2Fmitou-2001-report%2Fmain%2Fnode5.html;h=a5cbfc53aa00c17948d0603cd65ac2ef45f9a9e5;hb=1719044f96038f7b46acada68efffc7c7bb279e8;hp=0000000000000000000000000000000000000000;hpb=0d7256d0f62d646161c6b9348efe69331e8e7fb6;p=www%2Fchise.git diff --git a/papers/mitou-2001-report/main/node5.html b/papers/mitou-2001-report/main/node5.html new file mode 100644 index 0000000..a5cbfc5 --- /dev/null +++ b/papers/mitou-2001-report/main/node5.html @@ -0,0 +1,684 @@ + + + + + + Mitou project: Model and implementation of a Character + Object Database + + + +

Contents

+ + +
+

5. Topic Maps +

+ + +
+

5.1. The topic map paradigm +

+ + +

The Topic Map Standard + ‘provides a standardized notation + for interchangeably representing information about the + structure of information resources used to define topics, + and the relationships between topics. ’ + A set of one or more + interrelated documents that employs the notation defined + by this International Standard is called a `topic map'. In + general, the structural information conveyed by topic maps + includes: +

    +
  1. groupings of addressable information objects + around `topics' (`occurrences'), and +
  2. +
  3. relationships between + topics (`associations'). +
  4. +
+ +

+ +

A topic map defines a + multidimensional topic space — a space in which the + locations are topics, and in which the distances between + topics are measurable in terms of the number of + intervening topics which must be visited in order to get + from one topic to another, and the kinds of relationships + that define the path from one topic to another, if any, + through the intervening topics, if any. (from the document + defining Topic Maps as ISO/IEC 13250:2000) +

+ +

The standard as published in 2000 includes a + serialization format specification in form of a Document + Type Definition (DTD) originally using HyTime (ISO/IEC + 10744:1992) Architectural Forms and a SGML (ISO/IEC + 8879:1986) based syntax. An independent group of vendors + started development of a XML based version, which was + published as XTM 1.0 in December 2000 and per ballot + adopted as an amendment to the ISO standard in December + 2001. The development here is based on the XML syntax, + which has also quite different elements and structure. +

+ +

Since SGML/XML based formats are overly verbose + (especially XTM 1.0) and awkward to work with, other + formats have been suggested, including the `Asymptotic Topic Map Notation, Authoring' + (AsTMA, by Robert Barta) and `Linear Topic Map + Notation' (LTM, by Lars Marius Garshol). Both + are essentially line + based and can be easily edited in UTF-2000 and other + editors. +

+ +

Besides defining a serialization format for the exchange of + information, the Topic Map standard also includes + constructs that are intended to faciliate exchange of + information. One of the most important tasks is to + reliably identify identical pieces of information across + different sources. Towards this end, rules for subsetting + and merging of topic maps are laid down in the standard. + Topics can be defined with reference to Published Subject + Indicators (PSI), which function in a similar way to XML + Namespaces. +

+ +
+ +
+

5.2. The character database as a topic map +

+ + + +

Characters are used in scripts for the writing of + languages, languages are distributed in different areas. + The exact form of these characters, as well as their + phonetic representation changes over time and area. The + adaption of the Topic Map paradigm in a character database + tries to use these different axes to organise them in a way + that is appropriate to the domain they are encountered + in. Characters are thus not only objects in their own + right, but these objects are organized in a hierarchy of + `super-class / sub-class' and + `class / instance' hierarchies. +

+ +

+ +

The topic map currently contains information along the + following axes: + +

+

+ +

While most of these are organized as occurrences of the + +

+ +

It might be appropriate to illustrate this with an + example. The character attributes for the character + U+03432 when viewed within the UTF-2000 framework + might have attributes similar to those shown in Figure 4. +

+ +

The character U+03432 displayed with
								the  function.

+ Figure 4 + +   The character U+03432 displayed with + the `what-char-definition' function. + +

+

+ +

Transformed to the topic map notation, the attributes of + the same character will look similar to Figure 5 + content has not changed, only the notation, within the + <occurrence> element, the attributes are similar to + key / value pairs. What is not visible here, however, is + the underlying structure, which has been used to define the + topic map. +

+ +

+  The attributes of character U+03432
							in Topic Map notation.

+ Figure 5 + +    The attributes of character U+03432 + in Topic Map notation. + +

+ It should also be noted, that the attributes under + `ideographic-structure' are not listed + as occurences. These attributes are expressed using + separate topics for the character components and the + <association> element to connect them, as shown in + Figure 6. +

+ +

+  The ideographic-structure of character U+03432
							in Topic Map notation.

+ Figure 6 + +    The ideographic-structure of character U+03432 + in Topic Map notation. + +

+ +

+ +
+ +
+

5.3. A Topic Map engine with Zope +

+ + +
+

5.3.1. Why Zope? +

+ + +

Zope (Zope Object Publishing Environment) is an + object-oriented Web-Application server developped by Zope + Corporation (former Digital Creations) using a + community-based open-source development model. It is + written in Python, with only a few critical parts in C. + Although it is mostly considered as an environment for rapid + development of dynamic Web content, it is originally and + formostly an environment for publishing objects. The + underlying storage is a object oriented database, which + makes it uniquely suited for storing hierarchical data + structures like a Topic Map. +

+ +

Since Zope acts as a Web-Server, it can also be seen as a + networked database. It can be accessed through the HTTP + protocol, but also through WebDAV and XML-RPC. One of the + advantages of using a Zope based implementation is thus that + it can also be used as a distributed editing environment and + at the same time act as a backend to be accessed from XEmacs + UTF-2000. +

+ +
+ +
+

5.3.2. Requirements for a Topic Map engine +

+ + +

Since some of the concepts of Topic Maps are quite new + and not yet fully fleshed out in the Topic Map community + (for example is the Topic Maps Query Language TMQL still + in the stage of requirements and no consensus has been + reached, what it will mean to query a topic map), some of + the more arcane features will not be covered by this + prototype. Instead of more demanding Topic Map queries, + which might involve inferences and other Topic Map calculus, searches + will directly access the data in the Topic Map. Merging + directives, which are problematic among other things + because of the `Topic Map Basename + Constraint' (TMBC) are not initially supported. +

+ +

The prototype should be able to : + +

    +
  • Import and export data from XEmacs UTF-2000 +
  • +
  • Use a network based communiction protocol to communicate +
  • +
  • Provide access to the Topic Map (read/write + topics, occurrences and associations) +
  • +
  • Be designed for generic Topic Maps, not for + specific data types +
  • +
  • Allow an assessment of the feasability of this approach. +
  • +
+ +

+ +
+ +
+

5.3.3. Implementation details +

+ + +

Zope is extended in functionality by developing add-on + modules, called `Products' in + Zope-speach. Products can be developped within the + Zope-Database based on ZClasses or as file-system based + Python classes. In a first implementation, ZClasses + have been used. +

+ +

In this implementation, four classes have been used to + represent the different objects of a Topic Map: +

    +
  • topicmap: The container item for all the other classes +
  • +
  • topic: Container item for occurrences +
  • +
  • occurrence: Holds the key / value pairs of occurrences +
  • +
  • association: Information about the type, role + and value of the members is hold in instance attributes +
  • +
+ This data structure was closely modelled on the underlying + data structure of the Topic Map serialization format, as + realized in the XTM 1.0 DTD. The built-in Zope search + engine ZCatalog was used to built indices and access the + different information axes. Figure 7 + shows a screenshot from the Zope development interface + showing the classes being developed. +

+ +

+ The Zope Management screen with the ZClasses
								under development

+ Figure 7 + +   The Zope Management screen with the ZClasses + under development + +

+ +

+ +

This approach turned out to induce a large overhead for + the data and proved problematic for Topic Maps with more + than approximately 1000 topics and associations. For this + reasons, this approach has been given up. +

+ +

The next logical step was to use a native Python + Product, insted of the ZClasses. This should give better + performance, since less overhead is involved, it also + allows greater flexibility in the data structures. An + additional advantage is that a more efficient development + environment could be used due to the fact that the source + is on the file system and not in the Zope database. +

+ +

Performance was slightly improved, but not as much as + hoped for. It also turned out that some flaws in the + data structure defined for the Python classes did not + allow the full expressive power that was required for + Topic Maps in XTM 1.0. +

+ +

Around this time, development activity started once + again in the Zope ParsedXML product, which is the Zope + product that provides XML functionality. Since an XML + Document Object Model (DOM) tree shares some similiarity + to the Zope DOM (ZDOM) used to store the Zope objects, it + was expected that this approach might scale better. An + additional advantage was that Zope procedures could be + used to directly expose XML elements in DTML (Document + Template Markup Language). For this reason, it was + decided to start once again, this time with ZClasses using + the ParsedXML product. +

+ +

Development of this prototype had progressed quite some + while, when it was realized that the support for Unicode + in Zope, which was introduced in Zope 2.4.0 had some + flaws. While UTF-8 could be used with out problems in + previous versions, the partial support for Unicode meant + that Python UnicodeStrings in some cases could be cast as + AsciiStrings, which would crash the process. While some + patches became available and development of the Zope core + continued to adress this problem, it remained acute even + with the recent 2.5.0 release and will probably only + resolved in the upcoming Zope 3.0 release, which will be a + major rewrite. +

+ +

While the improvement of the support for Unicode within + Zope is important, it remains outside of the scope and + timeframe of this project. As a temporary fix therefore, + no Unicode characters can be used in the TopicMap engine. + While this is unfortunate, since the XML standard + explicitly requires conformant XML processors to support + at least UTF-8 and UTF-16, there is nothing that can be + done about this at the moment, this situation will improve + with the arrival of a fully Unicode compliant version of + Zope. +

+ +
+ +
+

5.3.4. A browser-based interface to the Topic Map engine +

+ + +

When a new Topic Map has been created or imported into + the Zope Topic Map engine, it can be explored on the + Topic Map overview screen, as shown in Figure 8

+ +

The Topic Map overview screen

+ Figure 8 + +   The Topic Map overview screen + +

+ +

+ +

This screen is divided in several parts. The top frame + provides a general interface to manage the display of the + Topic Map, it also here that other Topic Maps can be + selected. This part allows also the addition of new + topics as well as global searches over the Topic Map. The + frame on the left is for navigating the Topic Map. By + default, it shows a list of topics in the topic map. + Since this list can be potentially very long, the default + length is set to 20, if there are more topics, the list + will be displayed in batches. The list can be limited + down in various ways: +

    +
  • by using the scopes (or themes) defined in the + Topic Map +
  • +
  • by searching the Topic Map; this will limit the + list to the search results +
  • +
  • by defining new scopes (if the user has the + appropriate rights, these can also be stored in the Topic + Map and be used in the future) +
  • +
+ +

+ +

The main frame shows a short information about this + Topic Map engine, this will be used to display the topic + details as shown in Figure 9

+ +

The details of a topic

+ Figure 9 + +   The details of a topic + +

+ +

+ +

The Topic Map engine can not only be used to browse the + Topic Map, but also to add or edit new topics, occurrences + or associations. A click on the `Add' + button in the upper right area of Figure 8 will lead to the entry screen + in Figure 10

+ +

+ The entry form for new topics

+ Figure 10 + +   The entry form for new topics + +

+ +

+ +

Occurences for topics can be added from the topic + details screen as shown in Figure 9, + associations can be added by checking the topics to be + associated in the list of topics on the left frame and + then clicking on the `Add + Association' button. +

+ +

The interface to the Topic Map as developed here is + generic and rather primitive. It does however however to + develop and maintain Topic Maps in a distributed way. + Because of its generic nature, it is cumbersome to use + for specific Topic Maps, since it is not aware of topics + that might be defined as Topic Map templates. Since + there is not yet a standardized way to define Topic Map + templates, automatic generation of a customized user + interface for specific Topic Maps will have to wait + until such a definition is finalized. + +

+ +
+ +
+

5.3.5. The interface to XEmacs UTF-2000 +

+ + +

Beside the browser based user interface described in + the previous section, the Zope Topic Map engine can also + be interfaced from XEmacs UTF-2000. This can be done + through XML-RPC, WebDAV or HTTP. The format of the + returned values can be either in XML, HTML or in a list + formatted in LISP syntax. +

+ +

Currently, the following commands are implemented + (parameters are key/value pairs that are submitted using + the appropriate syntax): + +

Retrieval

+
    +
  • tm-topics: Lists topics. Parameters are: +
      +
    • scope: string that specifies the scoping topics +
    • +
    • name: string that will be used to search for + the <baseName> of topics +
    • +
    • display: scope to be used to select a name + to return +
    • +
    • occurences: type or scope of the occurrences + to be returned +
    • +
    • format: `XML', + `HTML' or `LISP'. +
    • +
    + +
  • +
  • tm-members: Lists associations that + have members as specified in the query. Parameters + are: +
      +
    • scope: string that specifies the members to + look for +
    • +
    • display: scope to be used to select a name + to return +
    • +
    • occurences: type or scope of the occurrences + to be returned for the members +
    • +
    • format: `XML', + `HTML' or `LISP'. +
    • +
    + +
  • +
+ +

Maintenance

+
    +
  • tm-add: This command will add a new + topic. If the topic already exists, it will replace + or add <occurrence> or <baseName> + elements as specified in the request. It can also + be used to change the list of scoping topics. Parameters: + +
      +
    • args: A string that gives the items to be + added as key/value pairs +
    • +
    + +
  • +
  • tm-delete: This command will delete + the specified topic. +
      +
    • topic: the topic to be deleted +
    • +
    + +
  • +
+ +

+ +

This is a very low-level interface that will need to be + complemented with higher-level commands to integrate it + with the oeverall workings of XEmacs and the XEmacs + UTF-2000 character database. +

+ + +
+ +
+

5.3.6. Evaluation +

+ + +

The goal of developping a complete Topic Map engine + based on Zope has not been reached. This has been partly + due to the development process, which had to confront some + fundamental issues of processing Topic Maps, which had not + been solved so far. While the goal of developing a + generic Topic Map engine is worthwhile and important, it + proved to be too ambitious for the context of this + project. We therefore had to settle to a solution that + works well for this context and are confident that it will + be possible to generalize from there. +

+ +

It has also been realized that Zope is maybe not a + suitable platform for holding the potentially very large + data of a Topic Map. Using a database for this approach + would be better. +

+ +
+ +
+ +
+

5.4. Other possibilities +

+ + +

The current model of implementing the Topic Map engine + and interfacing it with XEmacs UTF-2000 is + based on a two way connection. +

+ +

Storing the Topic Map in the Zope object database proved + to be a performance bottleneck. The logical way to solve + this problem is to move the data to an external storage. To + test the feasability of this approach, the Topic Map + datastructure has been mapped to a set of relational + database tables and a Topic Map has been imported into the database + Postgresql. +

+ +

The connection between XEmacs UTF-2000, the + Topic Map engine within Zope and the storage backend can now + be established in a triangular way as shown in Figure 12. The red arrows symbolize updates to + the database, while the green arrows stand for data that are + retrieved from the databases. Both, XEmacs UTF-2000 and the + Zope Topic Map engine will be able to commit updates and + retrieve data. While the model employed so far + assumed a direct communication between XEmacs UTF-2000 and + the Zope Topic Map engine, this model provides a far more + flexible way of communication by introducing another layer + between them. This model is also extendible, since more + partners can be connected to the database through a set of + well defined interfaces and a cascade of such layers can be + built in a distributed way. +

+ +

Communication between XEmacs UTF-2000, Zope and
					the PostgreSQL database

+ Figure 12 + +   Communication between XEmacs UTF-2000, Zope and + the PostgreSQL database + +

+ +

+ +

+ While time did not permit to properly change the backend + of the Topic Map engine, + this will be a straightforward task that is not expected to + require changes to the other layers of the program. +

+ +
+ +
+ +
+
Date: Time-stamp: "02/02/13 17:30:32 chris" +  Author: Christian Wittern. +
+
+ + \ No newline at end of file