About SGMLISUG PubsBookstoreChaptersDeveloping SGMLJoin ISUG

Topic Navigation Maps

Topic Navigation Maps - An Overview. By Martin Bryan, The SGML Centre, Churchdown, Gloucestershire, United Kingdom. Email: mtbryan@sgml-u-net.com; WWW: http://www.personal.u-net.com/~sgml/.

As more and more text becomes available on-line as part of the world wide web (WWW) of electronic information, finding information about a specific topic becomes harder and harder. The ambiguities found in most languages mean that few terms have a single meaning. Whilst some words have the same meaning in a number of languages, many meanings have different expressions in different languages. Full-text searching typically fails to distinguish between the different meanings of a word. It often fails to distinguish between the use of the same word in different languages. Because of this, full-text searching often provides too many 'hits' for users to have time to find the information they need from the morass of irrelevant information.

The traditional way of finding information stored in large sets is via structured catalogues of the type used in libraries for many centuries. Cataloguing of data held in libraries has become very sophisticated, but in many cases it is difficult to relate items stored in different catalogues. Many schemes for introducing cataloguing metadata into the WWW environment are being studied, but in general these only rely on assigning searchable keywords to files, without providing any means to define the relationships that might exist between keywords, or between different sets of keywords.

There are a number of search engines on the WWW that provide a catalogue-based approach to finding data. For reasons related to how best to find information in an interactive environment, on-line search catalogues tend to be broad and relatively shallow. Typically such categorized services are designed for use by those who are not familiar with the terms used by a specialist community, and are designed to identify a broad range of sources that provide a starting point for more in-depth study. Because they deliberately adopt such a policy, category-based search engines often make it difficult for experienced users to find information related to specific disciplines, or to specialist areas within disciplines.

Many user communities have set up web sites whose sole purpose is to provide a start point from which interested parties can visit most of the sites that provide information on a particular topic. At present these pages are constrained to use the limited form of link anchor provided by the HyperText Markup Language (HTML) subset of SGML. The main problem of this form of link anchor is that it can only identify a single document, and a single point within each document. This means that multiple references to a subject in a document have to be represented by multiple links. Another complication is that HTML anchors can only identify points within a document that have previously been assigned names by their creator. They do not permit arbitrary points in a document to be identified, or to allow users to identify all occurrences of a particular term within a document.

One of the key advantages of the Internet that forms the web is that it allows user communities to develop new concepts faster. New concepts typically involve applying new meanings to existing words. As people are introduced to new concepts they need to be made aware of the new meanings assigned to terms they may have thought they already understood if they are to catch the nuances of the debate. For this reason it is vital that the meaning of key terms be clearly identifiable to the user communities using them. This cannot be done by simply referring the terms to a dictionary. It must be done by creating a link between the term and the places at which it was first used/defined by the relevant user community.

In the past, recording the meaning of words has been the role of specialist lexicographers, whilst devising cataloguing schemes was left to experienced librarians. In today's interactive world, however, such delegation is no longer possible. User communities must clearly identify their own definition of terms and to catalogue where they have used these terms in a consistent manner. The role of lexicographers then changes to identifying those points at which terms have been assigned a specific meaning so that a single reference source for all possible meanings of a term can be developed. Similarly the role of librarians becomes that of defining the relationships between terms in such a way that it is possible to identify which terms form a sub-category of a given subject.

For this to be possible it is important that the description of the meaning of terms, and the location of references to these terms, are separated from the definition of the relationships between terms. It is also important that the maintenance of the locations at which a term is used is separated from the maintenance of the definition of the term, as a particular user community should only need to define its meaning of a term once, but will continue to use that term in new documents for a long time.

In addition, the web requires us to make other forms of distinction between terms if we are to enable users to distinguish one use of a term from another when doing full-text searching. Users need to know which universes of discussion (domains) the term is being used, and for which languages this term is relevant. They may also need to know what the equivalent term is in another language, and when the term started to be, or stopped being, employed with the specified meaning in that language.

ISO's Topic Navigation Map standard (ISO 13250) provides facilities for creating, maintaining and interchanging topic-based navigational aids to large corpora of documents containing inter-related information. The standard makes a distinction between the highly concentrated and independent topic navigation maps - sets of relations between the topics covered in a given corpus - defined within this standard and the addresses of relevant information within the corpora themselves, which are typically defined using facilities provided by ISO/IEC 10744, which defines the Hypermedia/Time-based Structuring Language known as HyTime.

Topic navigation maps can improve the accessibility of information by facilitating, and to some extent automating, the task of providing navigational resources. Topic navigation maps are designed to simplify groupware-supported production of data for which navigational aids such as indexes, glossaries, tables of contents, lists and catalogs need to be generated. Topic navigation maps can also be used to enhance the navigability of very large information bases by providing in-depth sub-categorization of terminology bases.

Topic navigation maps can be considered as a customized view of an information repository. Different views can be developed by different user communities to allow various points of view to be expressed. Several topic navigation maps can be interconnected to form a more general-purpose knowledge base.

To ensure the maximum possible flexibility of use, the topic navigation map standard is defined in terms of an SGML architecture, using the rules specified in the SGML Extended Facilities annex of ISO/IEC 10744, for creating and maintaining data that classifies information in documents according to topic, and classifies topics with respect to each other. The discipline that can be imposed by using such a formally defined architecture will assist those who create and/or collect libraries of documents, and who wish to provide a given collection with a unified, consistent, and minimally redundant topic index. It should be noted, however, the concepts behind topic navigation maps are applicable in any coding scheme, and do not need to be restricted to use with documents encoded in SGML.

The Standard Generalized Markup Language (SGML) defined in ISO 8879:1986 allows all kinds of documents to become databases. By providing ways to navigate data stores so that parts of documents that are relevant to a particular topic can be easily found and organized rapidly by machine, the topic navigation map standard augments the widely recognized suitability of SGML for electronic document interchange.

The number and complexity of indexable topics, and the relationships between them, greatly exceeds the number and complexity of relations normally represented in traditional databases or, for that matter, in the kinds of indexes normally found in books. The number of topic relationships that might usefully be represented with respect to any reasonably large collection of documents is, in fact, for all practical purposes limitless. Moreover, even in archived documents, new kinds of topic relationships can be expected to appear from time to time. This standard, therefore, is specifically designed to allow multiple topic navigation maps to be created over a period of time for any collection of data, and to allow for different topic navigation maps to be inter-related.

Creating and maintaining indexes can be a difficult and expensive proposition. Many indexes are indexes in name only. All too often, even when an index is well thought out, well constructed, and useful, little thought is given to its maintainability. When the time comes to create an updated or corrected index, the original documentation for the topic architecture of the index is no longer available. Indeed, it may never have existed or have been consciously expressed in any abstract way. Even an index on which enormous maintenance effort has been expended can quite easily become self-inconsistent, especially when the size of the indexing task dictates that it must be a cooperative effort, or when there have been changes in the responsible personnel.

An application-neutral, internationally understandable, rigorous, and yet flexible and open way to represent topical indexes, such as the one set forth in the topic navigation map standard, can help to make indexes easier to make, easier to maintain, and easier to use. Creating topic navigation maps is a complex task, similar to planning and building a building, involving myriad assumptions and artistic decisions. As new relationships are discovered and included as part of a topic architecture, the architecture changes. Many specialists may have to collaborate and contribute, over a number of years, to an evolving knowledge base built on topic navigation maps, which at any given time must unambiguously and comprehensibly govern all maintenance activities. Unless those who are adding and/or maintaining anchors have clear guidance, the instantiation of those topic navigation maps - the index itself - may become unsound and unsafe. It is important, therefore, that rules applied for the creation of a topic navigation map are stored as part of the specification.

A topic navigation map defines both topics and the relations that they bear to one another. It must, therefore, permit:

to be represented, universally interchanged, processed, merged, and used for data navigation. An international standard for representing (among many other things) arbitrary relationships between arbitrary pieces of information wherever they are in situ, exists in ISO/IEC 10744. This standard uses a HyTime-based approach for linking topics with information, and defines an SGML architecture that can support applications that provide:

Topic navigation maps are defined using TNM.SemanticAssignment-form elements whose roles are defined by their user communities, and TNM.TopicRelation-form elements that identify specific relations between topics. Categories of topics may be interactively identified and described by linking suitable topics to other topics belonging to the category.



Figure 1: Defining topics and their relationships

A topic navigation map is created by linking, using HyTime hyperlinks, several pieces of information about a topic through a semantic assignment. Each semantic assignment has an anchor roles (anchrole) attribute that defines the relationship between a topic and the references that are made to it. The first anchor role identified by the anchrole attribute identifies titles by which the topic may be referenced. The second anchor role can be used to identify a formal definition of the topic.

Other anchor roles will identify application-dependent sets of uses of the topic. The role description (roledesc) attribute can be used to associate information about the roles of each anchor with the element defining them to help users in correctly identifying the information to be associated with each role. Users have the option of creating one description covering all of the roles or creating individual descriptions for each role.

Topics can be linked together through topic relation elements. Each topic relation element has two anchor roles which identify two attributes containing the locators used to find the topics the relationship is to interconnect. It also has two relationship names (rname1-2 and rname2-1) which describe the relationship between the topics when the relationship is traversed from one topic to another. Optionally a description of the relationship (rel-desc) can be associated with each relationship, or a single description can be used to describe both relationships.

Queries can be used to create topics based on relationship types. For example, all elements with a specified value in the rname1-2 or rname2-1 can be treated as a single topic. This feature can be especially useful in creating categories of topics:



Figure 2: Defining multilingual relationships

ISO's Topic Navigation Map standard (ISO 13250) is currently only available in Committee Draft form from national standards bodies. The final standard is expected to be available at the end of 1998.

NOTE: For further information, see the principal Web Site for Topic Navigation Maps.

Contact Robin Cover with corrections and updates, or to submit contributions to the ISUG online document database.

ISUG 
logo
Copyright © 1997 International SGML Users' Group