7 Attributes

In the SGML context, the word `attribute', like some other words, has a specific technical sense. It is used to describe information which is in some sense descriptive of a specific element occurrence but not regarded as part of its content. For example, you might wish to add a status attribute to occurrences of some elements in a document to indicate their degree of reliability, or to add an identifier attribute so that you could refer to particular element occurrences from elsewhere within a document. Attributes are useful in precisely such circumstances.

Although different elements may have attributes with the same name, (for example, in the TEI scheme, every element is defined as having an id attribute), they are always regarded as different, and may have different values assigned to them. If an element has been defined as having attributes, the attribute values are supplied in the document instance as attribute-value pairs inside the start-tag for the element occurrence. An end-tag may not contain an attribute-value specification, since it would be redundant.

For example

<poem id=P1 status="draft"> ... </poem>
The <poem> element has been defined as having two attributes: id and status. For the instance of a <poem> in this example, represented here by an ellipsis, the id attribute has the value P1 and the status attribute has the value draft. An SGML processor can use the values of the attributes in any way it chooses; for example, a formatter might print a poem element which has the status attribute set to draft in a different way from one with the same attribute set to revised; another processor might use the same attribute to determine whether or not poem elements are to be processed at all. The id attribute is a slightly special case in that, by convention, it is always used to supply a unique value to identify a particular element occurrence, which can be used for cross reference purposes, as discussed further below.

Like elements, attributes are declared in the SGML document type declaration, using rather similar syntax. As well as specifying its name and the element to which it is to be attached, it is possible to specify (within limits) what kind of value is acceptable for an attribute and a default value.

The following declarations could be used to define the two attributes we have specified above for the <poem> element:

<!ATTLIST poem
          id       ID                              #IMPLIED
          status   (draft | revised | published)   draft        >

The declaration begins with the symbol ATTLIST, which introduces an attribute list specification. The first part of this specifies the element (or elements) concerned. In our example, attributes have been declared only for the <poem> element. If several elements share the same attributes, they may all be defined in a single declaration; just as with element declarations, several names may be given in a parenthesized list. Following this name (or list of names), is a series of rows, one for each attribute being declared, each containing three parts. These specify the name of the attribute, the type of value it takes, and a default value respectively.

Attribute names (id and status in this example) are subject to the same restrictions as other names in SGML; they need not be unique across the whole DTD, however, but only within the list of attributes for a given element.

The second part of an attribute specification can take one of two forms, both illustrated above. The first case uses one of a number of special keywords to declare what kind of value an attribute may take. In the example above, the special keyword ID is used to indicate that the attribute ID will be used to supply a unique identifying value for each poem instance (see further the discussion below). Among other possible SGML keywords are

In the example above, a list of the possible values for the status attribute has been supplied. This means that a parser can check that no <poem> is defined for which the status attribute does not have one of draft, revised, or published as its value. Alternatively, if the declared value had been either CDATA or NAME, a parser would have accepted almost any string of characters (status=awful or status=12345678 if it had been a NMTOKEN; status="anything goes" or status = "well, ALMOST anything" if it were CDATA). Sometimes, of course, the set of possible values cannot be pre-defined. Where it can, as in this case, it is generally better to do so.

The last piece of each information in each attribute definition specifies how a parser should interpret the absence of the attribute concerned. This can be done by supplying one of the special keywords listed below, or (as in this case) by supplying a specific value which is then regarded as the value for every element which does not supply a value for the attribute concerned. Using the example above, if a poem is simply tagged <poem>, the parser will treat it exactly as if it were tagged <poem status=draft>. Alternatively, one of the following keywords may be used to specify a default value for an attribute:

For example, if the attribute definition above were rewritten as

<!ATTLIST poem
          id       ID                             #IMPLIED
          status   (draft | revised | published)  #CURRENT      >
then poems which appear in the anthology simply tagged <poem> would be treated as if they had the same status as the preceding poem. If the keyword were #REQUIRED rather than #CURRENT, the parser would report such poems as erroneously tagged, as it would if any value other than draft, published, or revised were supplied. The use of #CURRENT implies that whatever value is specified for this attribute on the first poem will apply to all subsequent poems, until altered by a new value. Only the status of the first poem need therefore be supplied, if all are the same.

It is sometimes necessary to refer to an occurrence of one textual element from within another, an obvious example being phrases such as ``see note 6'' or ``as discussed in chapter 5.'' When a text is being produced the actual numbers associated with the notes or chapters may not be certain. If we are using descriptive markup, such things as page or chapter numbers, being entirely matters of presentation, will not in any case be present in the marked up text: they will be assigned by whatever processor is operating on the text (and may indeed differ in different applications). SGML therefore provides a special mechanism by which any element occurrence may be given a special identifier, a kind of label, which may be used to refer to it from anywhere else within the same text. The cross-reference itself is regarded as an element occurrence of a specific kind, which must also be declared in the DTD. In each case, the identifying label (which may be arbitrary) is supplied as the value of a special attribute.

Suppose, for example, we wish to include a reference within the notes on one poem that refers to another poem. We will first need to provide some way of attaching a label to each poem: this is done by defining an attribute for the <poem> element, as suggested above.

<!ATTLIST poem
          id       ID     #IMPLIED >

Here we define an attribute id, the value of which must be of type ID. It is not required that any attribute of type ID have the name id as well; it is however a useful convention almost universally observed. Note that not every poem need carry an id attribute and the parser may safely ignore the lack of one in those which do not. Only poems to which we intend to refer need use this attribute; for each such poem we should now include in its start-tag some unique identifier, for example:

         <POEM ID=Rose>
              Text of poem with identifier 'ROSE'
         </POEM>
 
         <POEM ID=P40>
              Text of poem with identifier 'P40'
         </POEM>
 
         <POEM>
              This poem has no identifier
         </POEM>
 

Next we need to define a new element for the cross reference itself. This will not have any content---it is only a pointer---but it has an attribute, the value of which will be the identifier of the element pointed at. This is achieved by the following declarations:

    <!ELEMENT poemref - O EMPTY                  >
    <!ATTLIST poemref     target IDREF #REQUIRED >

The <poemref> element needs no end-tag because it has no content. It has a single attribute called target. The value of this attribute must be of type IDREF (the keyword used for cross reference pointers of this type) and it must be supplied.

With these declarations in force, we can now encode a reference to the poem with id Rose as follows:

    Blake's poem on the sick rose <POEMREF TARGET=Rose> ...

When an SGML parser encounters this empty element it will simply check that an element exists with the identifier Rose. Different SGML processors could take any number of additional actions: a formatter might construct an exact page and line reference for the location of the poem in the current document and insert it, or just quote the poem's title or first lines. A hypertext style processor might use this element as a signal to activate a link to the poem being referred to. The purpose of the SGML markup is simply to indicate that a cross reference exists: it does not determine what the processor is to do with it.


Back to table of contents
On to next section
Back to previous section