Programming

When working with XML, there are two standards you can consider:

  • the XML Document Object Model (XML DOM). With this specification, you access the XML data through a hierarchical object oriented interface, so you can actually traverse the hierarchy of the document, without a specific order (i.e., you can also step back into upper hierarchies).
  • the Simple API to XML (SAX). With this specification, you walk through the XML document on a line-by-line basis. You cannot go back, nor can you skip a sub-hierarchy: all elements must be processed.

Either method has its benefits and drawbacks:

XML DOM (cons)
  • The XML document must be well-formed, otherwise the DOM method cannot access all nodes of the document.
  • The XML document must be loaded, parsed and the DOM tree must be built before you can access every single node. This means that DOM is rather slow and memory intensive.
XML DOM (pros)
  • You can step through the document at will; you can access all nodes at any time.
  • You can easily make changes to the DOM hierarchy, and easily save these changes as well. E.g., you can alter values, or rearrange, add and delete the nodes themselves.
  • You can use XPath (or XSL queries) to find groups of nodes; you do not need any logic for finding specific data within your application.
  • You can use XSL Transformations (also known as just XSL or XSLT) to declaratively alter the XML structure with XSL templates. For example, the following rather useless template
    • <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
      • <xsl:template match="nest">
        • <sort>
          • <xsl:for-each select=".//*">
            • <xsl:sort select="name()"/>
            • <xsl:element name="{name()}"/>
          • </xsl:for-each>
        • </sort>
      • </xsl:template>
      </xsl:stylesheet>
    changes
    • <nest>
      • <a>
        • <A/>
      • </a>
      • <b>
        • <B/>
      • </b>
    • </nest>
    into
    • <sort>
      • <A/>
      • <a/>
      • <B/>
      • <b/>
    • </sort>
SAX (cons)
  • The SAX(2) specification is not widely known, nor is it well documented. You're pretty much on your own here.
  • As said before, SAX processes the XML document on a line-by-line basis, forcing you to provide your own caching mechanisms if you'd want to search the XML document.
SAX (pros)
  • SAX is a lot faster than DOM, and with a lot less overhead.
  • SAX is (theoretically speaking) better equipped to handle mallformed XML documents. When you've got a mallformed XML document, it's likely the DOM method will be useless, yet SAX will only stumble over the error as they occur.

There are several excellent free implementations of XML parsers and processors, like the MSXML3 implementation by Microsoft, see the References section.