Web Consortium, DOM is a cross-language set of objects that represent elements, attributes, text, processing instructions, and all the other weird and wonderful things that can appear in XML. For example, the XML document:
corresponds to the object tree shown in Figure 1. Note that:
- The root of the tree must be a Document object, whose single child is the root element of the document.
- All of the text--including the whitespace between elements--is stored.
There are several DOM implementations in Python, such as minidom (which is part of the standard library), and packages, like Fredrik Lundh's ElementTree, that have similar features, but more Pythonic interfaces. There are also special-purpose tools, like XSLT, which are custom-built for working with XML. In practice, though, I've usually found these special-purpose tools to be more trouble than they are worth. Especially since most don't include features like regular expressions and database libraries that my crunching programs need.
For our purposes, minidom will do fine. What we have is a list, each of whose elements is a variable name and a (possibly empty) list of parameters. What we want is some XML. Let's start by creating the document and its root settings element:
For each entry in data, we need to add a new var element to the settings:
for (var, params) in data:
varNode = doc.createElement('var')
. . .
Similarly, for each parameter, we need to add a param element to the var element. We must also add a text node to the param element to store the parameter's value:
Great, except that when this document is converted into text for output, the result is:
There are no newlines or indentation to make it easy for human beings to read. We could easily insert them by adding text nodes in the right places, but there's no point since these files are only going to be read by other programs. In my next column, I'll explain how to merge your new XML data with a database.