|
A common problem when working with the xmlTree API is getting the
first child of the root element and finding has no content, despite
the fact that the root's first child element does have content.
This is due to an often overlooked aspect of the XML specification:
- All whitespace that occurs between XML elements is
significant.
Consider the following XML document:
<?xml version="1.0" encoding="UTF-8">
<top>
<item>I'm an item!</item>
<item>So am I!</item>
</top>
It looks like the <top> element as two children, both of which
are <item> elements. But remembering that whitespace is
significant, it actually has five children, which includes the
text in-between the nodes:
- a text node, containing a newline and 4 spaces (the text between
<top> and the first <item>) - an element node, the first
<item> - a text node, containing a newline and 4 spaces (the text between
the two
<item>s) - an element node, the second
<item> - a text node, containing only a newline (the text between the
last
<item> and the closing </top>)
Going back to the pitfall from the beginning, when we get the first
child of the root element we're actually getting the text node in
between the <top> and <item> elements, instead of the
<item> element itself.
Without those extra text nodes, the document would look like this:
<?xml version="1.0" encoding="UTF-8">
<top><item>I'm an item!</item><item>So am I!</item></top>
With many XML documents it is handy to ignore empty whitespace and
think of <top> as only having two children. This can be done by
passing the option XML_PARSE_NOBLANKS when parsing the XML
data. The parser will determine when a text node contains only
whitespace (as defined by the XML spec), and discard them when they
do.
See xmlTreeNewDocFromString()
(here) and xmlTreeNewDocFromFile()
(here) for more information.
Copyright © Thunderstone Software Last updated: Mon Feb 18 10:28:15 EST 2013
|