








What follows is the text of a printed handout cobbled together the night before the release of the first public working draft of the XML specification at the GCA SGML conference in November, 1996. Its intended audience consisted of the SGML experts attending the conference. This flyer was never meant for wider distribution and will probably be a bit puzzling to readers who don't come from the SGML community at which it was aimed. While it's still interesting as the relic of a significant historical moment in the development of XML, its characterization of the differences between XML and SGML is rather dated. A much better comparison between SGML and the final XML 1.0 Recommendation of February, 1998 can be found at http://www.w3.org/TR/NOTE-sgml-xml-971215 . -- Jon Bosak
Q: What does "XML" stand for?
A: "Extensible Markup Language." The name emphasizes the key feature of the language as it will be seen by an HTML user - the ability to define your own tags and attributes, which, of course, HTML does not allow.
Q: So XML is just HTML on steroids?
A: No. XML is a true subset of SGML designed for use on the Internet. It supports all the structural and validation features that we expect from SGML. Valid XML documents are valid SGML documents.
Q: Is XML something like "monastic SGML"?
A: Yes, XML bears a strong resemblance to the kind of restricted SGML that a lot of organizations already produce as a matter of policy. The most noticeable feature of XML documents is that all of the end-tags are present and they all contain the GI of the element that they close. Since many organizations require production SGML documents to conform to this rule anyway, a lot of existing SGML documents are very close to being XML documents just the way they stand.
Q: That doesn't sound like much of a difference. What's the big deal?
A: Well, let's put it this way. The standard SGML reference is almost 500 pages long, plus about another 100 pages of annexes. The current XML specification is 26 pages, not counting the list of contributors.
Q: Wow! There must be a lot of stuff missing. What got taken out?
A: Basically, all the SGML features that make SGML client software difficult to implement. The most obvious of these hard-to-implement features are the ones that were put into the standard years ago to minimize keystrokes in manual entry: omitted start- and end-tags and omitted quotes on attribute values. The other ones are CONCUR, LINK, DATATAG, RANK, and SHORTREF; the "and" connector (&) in content models; inclusions and exclusions on content models; CURRENT and CONREF defaults for attributes; the attribute types NAME, NUMBER, NUMBERS, NUTOKEN, and NUTOKENS; the NET construct; abstract syntax; capacities and quantities; comments within other markup declarations; multiple comments in a single comment declaration; and public identifiers.
Q: But I use some of that stuff in authoring!
A: And you can keep on using it. XML is not designed to replace SGML; it's designed to allow the richness and complexity of SGML documents to be communicated to Web clients.
Q: But we can already send SGML over the Internet to browsers like Panorama and the most recent version of DynaText. Why down-translate to XML?
A: There are four main reasons.
1. In the world of the Internet, support for a feature set has to be ubiquitous or it might as well not be there at all. The big Web browser vendors have made it very clear that they are not going to build full SGML support into their products. They would much rather add proprietary script extensions to HTML and use the complexity of full SGML as an excuse not to support it. XML takes away that excuse and makes support for open, human-readable, extensible markup a reasonable customer requirement.
2. XML is designed to be so easy to implement that independent vendors can provide XML support via homegrown applications or as plug-ins or downloadable applets into existing HTML browsers. If the big guys don't want to go down the easy path to SGML support that we've carved out for them, then XML gives us a fighting chance of accomplishing the same thing at the grass-roots level. XML puts "Web SGML" software at the level of a grad-student project.
3. Browser applications are not the only consumers of structured information. HTML augmented in various ad hoc ways is rapidly finding use as an interprocess communication format in a variety of hidden, lightweight software modules that communicate using Internet protocols. These are functions that should be performed by a standard, extensible language instead. XML provides that language.
4. XML is an easy on-ramp to structured markup for HTML users. It is vastly easier to explain than full SGML and gives us our most effective weapon in winning the hearts and minds of the existing Web community.
Q: Hmm. So to produce XML, I just run my current SGML through a normalizer that expands the start- and end-tags?
A: Wellll... almost. You also have to change the syntax of any EMPTY elements to make them "self identifying". Thus, if you have an element type GRAPHIC that has been declared EMPTY, then wherever you have in your instance:
<GRAPHIC file="blort.gif">