X-Git-Url: http://git.chise.org/gitweb/?a=blobdiff_plain;f=papers%2Fmitou-2001-report%2Fmain%2Fnode5.html;fp=papers%2Fmitou-2001-report%2Fmain%2Fnode5.html;h=a5cbfc53aa00c17948d0603cd65ac2ef45f9a9e5;hb=1719044f96038f7b46acada68efffc7c7bb279e8;hp=0000000000000000000000000000000000000000;hpb=0d7256d0f62d646161c6b9348efe69331e8e7fb6;p=www%2Fchise.git diff --git a/papers/mitou-2001-report/main/node5.html b/papers/mitou-2001-report/main/node5.html new file mode 100644 index 0000000..a5cbfc5 --- /dev/null +++ b/papers/mitou-2001-report/main/node5.html @@ -0,0 +1,684 @@ + + +
+ + +The Topic Map Standard + âprovides a standardized notation + for interchangeably representing information about the + structure of information resources used to define topics, + and the relationships between topics. â + A set of one or more + interrelated documents that employs the notation defined + by this International Standard is called a `topic map'. In + general, the structural information conveyed by topic maps + includes: +
A topic map defines a + multidimensional topic space â a space in which the + locations are topics, and in which the distances between + topics are measurable in terms of the number of + intervening topics which must be visited in order to get + from one topic to another, and the kinds of relationships + that define the path from one topic to another, if any, + through the intervening topics, if any. (from the document + defining Topic Maps as ISO/IEC 13250:2000) +
+ +The standard as published in 2000 includes a + serialization format specification in form of a Document + Type Definition (DTD) originally using HyTime (ISO/IEC + 10744:1992) Architectural Forms and a SGML (ISO/IEC + 8879:1986) based syntax. An independent group of vendors + started development of a XML based version, which was + published as XTM 1.0 in December 2000 and per ballot + adopted as an amendment to the ISO standard in December + 2001. The development here is based on the XML syntax, + which has also quite different elements and structure. +
+ +Since SGML/XML based formats are overly verbose + (especially XTM 1.0) and awkward to work with, other + formats have been suggested, including the `Asymptotic Topic Map Notation, Authoring' + (AsTMA, by Robert Barta) and `Linear Topic Map + Notation' (LTM, by Lars Marius Garshol). Both + are essentially line + based and can be easily edited in UTF-2000 and other + editors. +
+ +Besides defining a serialization format for the exchange of + information, the Topic Map standard also includes + constructs that are intended to faciliate exchange of + information. One of the most important tasks is to + reliably identify identical pieces of information across + different sources. Towards this end, rules for subsetting + and merging of topic maps are laid down in the standard. + Topics can be defined with reference to Published Subject + Indicators (PSI), which function in a similar way to XML + Namespaces. +
+ +Characters are used in scripts for the writing of + languages, languages are distributed in different areas. + The exact form of these characters, as well as their + phonetic representation changes over time and area. The + adaption of the Topic Map paradigm in a character database + tries to use these different axes to organise them in a way + that is appropriate to the domain they are encountered + in. Characters are thus not only objects in their own + right, but these objects are organized in a hierarchy of + `super-class / sub-class' and + `class / instance' hierarchies. +
+ + + +The topic map currently contains information along the + following axes: + +
While most of these are organized as occurrences of the + +
+ +It might be appropriate to illustrate this with an + example. The character attributes for the character + U+03432 when viewed within the UTF-2000 framework + might have attributes similar to those shown in Figure 4. +
+ ++ Figure 4 + + The character U+03432 displayed with + the `what-char-definition' function. + +
+ + +Transformed to the topic map notation, the attributes of + the same character will look similar to Figure 5 + content has not changed, only the notation, within the + <occurrence> element, the attributes are similar to + key / value pairs. What is not visible here, however, is + the underlying structure, which has been used to define the + topic map. +
+ ++ Figure 5 + + The attributes of character U+03432 + in Topic Map notation. + +
+ It should also be noted, that the attributes under + `ideographic-structure' are not listed + as occurences. These attributes are expressed using + separate topics for the character components and the + <association> element to connect them, as shown in + Figure 6. + + ++ Figure 6 + + The ideographic-structure of character U+03432 + in Topic Map notation. + +
+ + + +Zope (Zope Object Publishing Environment) is an + object-oriented Web-Application server developped by Zope + Corporation (former Digital Creations) using a + community-based open-source development model. It is + written in Python, with only a few critical parts in C. + Although it is mostly considered as an environment for rapid + development of dynamic Web content, it is originally and + formostly an environment for publishing objects. The + underlying storage is a object oriented database, which + makes it uniquely suited for storing hierarchical data + structures like a Topic Map. +
+ +Since Zope acts as a Web-Server, it can also be seen as a + networked database. It can be accessed through the HTTP + protocol, but also through WebDAV and XML-RPC. One of the + advantages of using a Zope based implementation is thus that + it can also be used as a distributed editing environment and + at the same time act as a backend to be accessed from XEmacs + UTF-2000. +
+ +Since some of the concepts of Topic Maps are quite new + and not yet fully fleshed out in the Topic Map community + (for example is the Topic Maps Query Language TMQL still + in the stage of requirements and no consensus has been + reached, what it will mean to query a topic map), some of + the more arcane features will not be covered by this + prototype. Instead of more demanding Topic Map queries, + which might involve inferences and other Topic Map calculus, searches + will directly access the data in the Topic Map. Merging + directives, which are problematic among other things + because of the `Topic Map Basename + Constraint' (TMBC) are not initially supported. +
+ +The prototype should be able to : + +
Zope is extended in functionality by developing add-on + modules, called `Products' in + Zope-speach. Products can be developped within the + Zope-Database based on ZClasses or as file-system based + Python classes. In a first implementation, ZClasses + have been used. +
+ +In this implementation, four classes have been used to + represent the different objects of a Topic Map: +
+ Figure 7 + + The Zope Management screen with the ZClasses + under development + +
+ + + +This approach turned out to induce a large overhead for + the data and proved problematic for Topic Maps with more + than approximately 1000 topics and associations. For this + reasons, this approach has been given up. +
+ +The next logical step was to use a native Python + Product, insted of the ZClasses. This should give better + performance, since less overhead is involved, it also + allows greater flexibility in the data structures. An + additional advantage is that a more efficient development + environment could be used due to the fact that the source + is on the file system and not in the Zope database. +
+ +Performance was slightly improved, but not as much as + hoped for. It also turned out that some flaws in the + data structure defined for the Python classes did not + allow the full expressive power that was required for + Topic Maps in XTM 1.0. +
+ +Around this time, development activity started once + again in the Zope ParsedXML product, which is the Zope + product that provides XML functionality. Since an XML + Document Object Model (DOM) tree shares some similiarity + to the Zope DOM (ZDOM) used to store the Zope objects, it + was expected that this approach might scale better. An + additional advantage was that Zope procedures could be + used to directly expose XML elements in DTML (Document + Template Markup Language). For this reason, it was + decided to start once again, this time with ZClasses using + the ParsedXML product. +
+ +Development of this prototype had progressed quite some + while, when it was realized that the support for Unicode + in Zope, which was introduced in Zope 2.4.0 had some + flaws. While UTF-8 could be used with out problems in + previous versions, the partial support for Unicode meant + that Python UnicodeStrings in some cases could be cast as + AsciiStrings, which would crash the process. While some + patches became available and development of the Zope core + continued to adress this problem, it remained acute even + with the recent 2.5.0 release and will probably only + resolved in the upcoming Zope 3.0 release, which will be a + major rewrite. +
+ +While the improvement of the support for Unicode within + Zope is important, it remains outside of the scope and + timeframe of this project. As a temporary fix therefore, + no Unicode characters can be used in the TopicMap engine. + While this is unfortunate, since the XML standard + explicitly requires conformant XML processors to support + at least UTF-8 and UTF-16, there is nothing that can be + done about this at the moment, this situation will improve + with the arrival of a fully Unicode compliant version of + Zope. +
+ +When a new Topic Map has been created or imported into + the Zope Topic Map engine, it can be explored on the + Topic Map overview screen, as shown in Figure 8
+ ++ Figure 8 + + The Topic Map overview screen + +
+ + + +This screen is divided in several parts. The top frame + provides a general interface to manage the display of the + Topic Map, it also here that other Topic Maps can be + selected. This part allows also the addition of new + topics as well as global searches over the Topic Map. The + frame on the left is for navigating the Topic Map. By + default, it shows a list of topics in the topic map. + Since this list can be potentially very long, the default + length is set to 20, if there are more topics, the list + will be displayed in batches. The list can be limited + down in various ways: +
The main frame shows a short information about this + Topic Map engine, this will be used to display the topic + details as shown in Figure 9
+ ++ Figure 9 + + The details of a topic + +
+ + + +The Topic Map engine can not only be used to browse the + Topic Map, but also to add or edit new topics, occurrences + or associations. A click on the `Add' + button in the upper right area of Figure 8 will lead to the entry screen + in Figure 10
+ ++ Figure 10 + + The entry form for new topics + +
+ + + +Occurences for topics can be added from the topic + details screen as shown in Figure 9, + associations can be added by checking the topics to be + associated in the list of topics on the left frame and + then clicking on the `Add + Association' button. +
+ +The interface to the Topic Map as developed here is + generic and rather primitive. It does however however to + develop and maintain Topic Maps in a distributed way. + Because of its generic nature, it is cumbersome to use + for specific Topic Maps, since it is not aware of topics + that might be defined as Topic Map templates. Since + there is not yet a standardized way to define Topic Map + templates, automatic generation of a customized user + interface for specific Topic Maps will have to wait + until such a definition is finalized. + +
+ +Beside the browser based user interface described in + the previous section, the Zope Topic Map engine can also + be interfaced from XEmacs UTF-2000. This can be done + through XML-RPC, WebDAV or HTTP. The format of the + returned values can be either in XML, HTML or in a list + formatted in LISP syntax. +
+ +Currently, the following commands are implemented + (parameters are key/value pairs that are submitted using + the appropriate syntax): + +
Retrieval
+Maintenance
+This is a very low-level interface that will need to be + complemented with higher-level commands to integrate it + with the oeverall workings of XEmacs and the XEmacs + UTF-2000 character database. +
+ + +The goal of developping a complete Topic Map engine + based on Zope has not been reached. This has been partly + due to the development process, which had to confront some + fundamental issues of processing Topic Maps, which had not + been solved so far. While the goal of developing a + generic Topic Map engine is worthwhile and important, it + proved to be too ambitious for the context of this + project. We therefore had to settle to a solution that + works well for this context and are confident that it will + be possible to generalize from there. +
+ +It has also been realized that Zope is maybe not a + suitable platform for holding the potentially very large + data of a Topic Map. Using a database for this approach + would be better. +
+ +The current model of implementing the Topic Map engine + and interfacing it with XEmacs UTF-2000 is + based on a two way connection. +
+ +Storing the Topic Map in the Zope object database proved + to be a performance bottleneck. The logical way to solve + this problem is to move the data to an external storage. To + test the feasability of this approach, the Topic Map + datastructure has been mapped to a set of relational + database tables and a Topic Map has been imported into the database + Postgresql. +
+ +The connection between XEmacs UTF-2000, the + Topic Map engine within Zope and the storage backend can now + be established in a triangular way as shown in Figure 12. The red arrows symbolize updates to + the database, while the green arrows stand for data that are + retrieved from the databases. Both, XEmacs UTF-2000 and the + Zope Topic Map engine will be able to commit updates and + retrieve data. While the model employed so far + assumed a direct communication between XEmacs UTF-2000 and + the Zope Topic Map engine, this model provides a far more + flexible way of communication by introducing another layer + between them. This model is also extendible, since more + partners can be connected to the database through a set of + well defined interfaces and a cascade of such layers can be + built in a distributed way. +
+ ++ Figure 12 + + Communication between XEmacs UTF-2000, Zope and + the PostgreSQL database + +
+ + + ++ While time did not permit to properly change the backend + of the Topic Map engine, + this will be a straightforward task that is not expected to + require changes to the other layers of the program. +
+ +