DTDs, Entities, and Entity References

The xmlTree API has limited support for reading DTDs, geared towards using entities defined in an XML document. All DTD objects are xmlNode's with various type values - XML_DTD_NODE, XML_ENTITY_DECL, etc.

It's possible for an XML document two have two different DTDs, an internal DTD and an external DTD. The XML_DTD_NODE object from these can be fetched with the xmlTreeGetInternalSubset() and xmlTreeGetExternalSubset() functions, respectively.

Entities are declared in the DTD, and when they are used in a document, XML_ENTITY_REF nodes are used to refer back to the entities. Consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE top [
<!ENTITY ts "Thunderstone Software, LLC.">
<!ELEMENT top (#PCDATA)>
]>
<top>ts is &ts;!</top>
</capture>

This defines a single entity, ts, and it is referenced in the element top. Entities don't have to be dealt with if you don't want to:

  • You can permanently substitute them when parsing with the XML_PARSE_NOENT option. This substitutes the entity's value in the tree, so in the previous example <top> will only have one text child, ts is Thunderstone Software, LLC.!, rather than an entity reference. See xmlTreeNewDocFromFile() (here) and xmlTreeNewDocFromString() (here) for more information.

  • calling xmlTreeGetContent() on <top> will return ts is Thunderstone Software, LLC.!, which performs the entity substitution for you. It can also be called with NO_INLINE to leave entity references in place. See xmlTreeGetContent() (here) for more information.

The example document would have the following hierarchical structure in the xmlTree API:

XML_DOC_NODE
  |  |
  |  |
  | XML_DTD_NODE
  |  |
  |  +-XML_ENTITY_DECL <------\
  |  |                        |
  |  \-XML_ELEMENT_DECL       |
  |                           |
  \-XML_ELEMENT_NODE <top>    |
    |                         |
    +-XML_TEXT_NODE "ts is "  |
    |                         |
    +-XML_ENTITY_REF          |
    |   |                     |
    |   \---------------------/
    |
    \-XML_TEXT_NODE "!"

The element <top> actually has three children; the entity reference, and the two text node children around it. The entity reference appears to have a child, which just refers back to the entity that was declared in the DTD. Calling xmlTreeGetAllContent() on the XML_ENTITY_REF node will properly return the entity's contents.

See the sample xmlTree09-DTD for an example of working with an XML document and DTD like this.


Copyright © Thunderstone Software     Last updated: Apr 15 2024
Copyright © 2024 Thunderstone Software LLC. All rights reserved.