<!ENTITY tei "Text Encoding Initiative">defines an entity whose name is tei and whose value is the string ``Text Encoding Initiative.'' [See note 9] This is an instance of an entity declaration, which declares an internal entity. The following declaration, by contrast, declares a system entity:
<!ENTITY ChapTwo SYSTEM "sgmlmkup.txt">This defines a system entity whose name is ChapTwo and whose value is the text associated with the system identifier --- in this case, the system identifier is the name of an operating system file and the replacement text of the entity is the contents of the file.
Once an entity has been declared, it may be referenced anywhere within a document. This is done by supplying its name prefixed with the ampersand character and followed by the semicolon. The semicolon may be omitted if the entity reference is followed by a space or record end.
When an SGML parser encounters such an entity reference, it immediately substitutes the value declared for the entity name. Thus, the passage ``The work of the &tei has only just begun'' will be interpreted by an SGML processor exactly as if it read ``The work of the Text Encoding Initiative has only just begun''. In the case of a system entity, it is, of course, the contents of the operating system file which are substituted, so that the passage ``The following text has been suppressed: &ChapTwo;'' will be expanded to include the whole of whatever the system finds in the file sgmlmkup.txt. [See note 10]
This obviously saves typing, and simplifies the task of maintaining consistency in a set of documents. If the printing of a complex document is to be done at many sites, the document body itself might use an entity reference, such as &site;, wherever the name of the site is required. Different entity declarations could then be added at different sites to supply the appropriate string to be substituted for this name, with no need to change the text of the document itself.
This string substitution mechanism has many other applications. It can be used to circumvent the notorious inadequacies of many computer systems for representing the full range of graphic characters needed for the display of modern English (let alone the requirements of other modern scripts or of ancient languages). So-called `special characters' not directly accessible from the keyboard (or if accessible not correctly translated when transmitted) may be represented by an entity reference.
Suppose, for example, that we wish to encode the use of ligatures in early printed texts. The ligatured form of `ct' might be distinguished from the non-ligatured form by encoding it as &ctlig; rather than ct. Other special typographic features such as leafstops or rules could equally well be represented by mnemonic entity references in the text. When processing such texts, an entity declaration would be added giving the desired representation for such textual elements. If, for example, ligatured letters are of no interest, we would simply add a declaration such as
<!ENTITY ctlig "ct" >and the distinction present in the source document would be removed. If, on the other hand, a formatting program capable of representing ligatured characters is to be used, we might replace the entity declaration to give whatever sequence of characters such a program requires as the expansion.
A list of entity declarations is known as an entity set. Standard entity sets are provided for use with most SGML processors, in which the names used will normally be taken from the lists of such names published as an annex to the SGML standard and elsewhere, as mentioned above.
The replacement values given in an entity declaration are, of course, highly system dependent. If the characters to be used in them cannot be typed in directly, SGML provides a mechanism to specify characters by their numeric values, known as character references. A character reference is distinguished from other characters in the replacement string by the fact that it begins with a special symbol, conventionally the sequence `&#', and ends with the normal semicolon. For example, if the formatter to be used represents the ligatured form of ct by the characters c and t prefixed by the character with decimal value 102, the entity declaration would read:
<!ENTITY ctlig "fct" >Note that character references will generally not make sense if transferred to another hardware or software environment: for this reason, their use is only recommended in situations like this.
Useful though the entity reference mechanism is for dealing with occasional departures from the expected character set, no one would consider using it to encode extended passages, such as quotations in Greek or Russian in an English text. In such situations, different mechanisms are appropriate. These are discussed elsewhere in these Guidelines (see chapter 4: Characters and Character sets).
A special form of entities, parameter entities, may be used within SGML markup declarations; these differ from the entities discussed above (which technically are known as general entities) in two ways:
<!ENTITY % TEI.prose 'INCLUDE'> <!ENTITY % TEI.extensions.dtd SYSTEM 'mystuff.dtd'>
The TEI document type definition makes extensive use of parameter entities to control the selection of different tag sets and to make it easier to modify the TEI DTD. Numerous examples of their use may thus be found in chapter 3 : Structure of the TEI Document Type Definition.