International SGML/XML Users' Group
<home>
<about>
<technology>
  <introdution/>
  <modelling>
  <syntax>
  <presentation>
  <linking>
  <graphics>
  <multimedia>
  <knowledge>
  <database>
</technology>
<membership>
<chapters>
<contact>
<news>
<events>
<search>

Related Introductory Resources...
XML Cover Pages
Unicode
TopXML

An introduction to markup

SGML and XML are metalanguages - languages for describing other languages - which let users design their own customized markup languages for limitless different types of documents.

SGML is very large, powerful, and complex. It has been in heavy industrial and commercial use for over a decade, and there is a significant body of expertise and software to go with it. XML is a lightweight cut-down version of SGML which keeps enough of its functionality to make it useful but removes all the optional features which make SGML too complex to program for in a Web environment.

HTML is just one of the SGML or XML applications, the one most frequently used in the Web.

The Web is becoming much more than a static library. Increasingly, users are accessing the Web for 'Web pages' that aren't actually on the shelves. Instead, the pages are generated dynamically from information available to the Web server. That information can come from data bases on the Web server, from the site owner's enterprise databases, or even from other Web sites.

And that dynamic information needn't be served up raw. It can be analyzed, extracted, sorted, styled, and customized to create a personalized Web experience for the end-user. To coin a phrase, web pages are evolving into web services.

For this kind of power and flexibility, XML is the markup language of choice. You can see why by comparing XML and HTML. Both are based on SGML - but the difference is immediately apparent:

In HTML:

    <p>Apple Titanium Notebook
    <br>Local Computer Store
    <br>$1438
In XML:
    <product>
    <model>Apple Titanium Notebook</model>
    <dealer>Local Computer Store</dealer>
    <price>$1438</price>
    </product>
Both of these may look the same in your browser, but the XML data is smart data. HTML tells how the data should look, but XML tells you what it means. With XML, your browser knows there is a product, and it knows the model, dealer, and price. From a group of these it can show you the cheapest product or closest dealer without going back to the server.

Unlike HTML, with XML you create your own tags, so they describe exactly what you need to know. Because of that, your client-side applications can access data sources anywhere on the Web, in any format. New "middle-tier" servers sit between the data sources and the client, translating everything into your own task-specific XML.

But XML data isn't just smart data, it's also a smart document. That means when you display the information, the model name can be a different font from the dealer name, and the lowest price can be highlighted in green. Unlike HTML, where text is just text to be rendered in a uniform way, with XML text is smart, so it can control the rendition.

And you don't have to decide whether your information is data or documents; in XML, it is always both at once. You can do data processing or document processing or both at the same time. With that kind of flexibility, it's no wonder that we're starting to see a new Web of smart, structured information. It's a "Semantic Web" in which computers understand the meaning of the data they share.

A DTD is a formal description in XML Declaration Syntax of a particular type of document. It sets out what names are to be used for the different types of element, where they may occur, and how they all fit together.

The XML Specification explicitly says XML uses ISO 10646, the international standard 31-bit character repertoire which covers most human (and some non-human) languages. This is currently congruent with Unicode and is planned to be superset of Unicode.

Copyright 2002 ISUG