Data Crunching

[article]
Part 1

Web Consortium, DOM is a cross-language set of objects that represent elements, attributes, text, processing instructions, and all the other weird and wonderful things that can appear in XML. For example, the XML document:


corresponds to the object tree shown in Figure 1. Note that:

  • The root of the tree must be a Document object, whose single child is the root element of the document.
  • All of the text--including the whitespace between elements--is stored.

There are several DOM implementations in Python, such as minidom (which is part of the standard library), and packages, like Fredrik Lundh's ElementTree, that have similar features, but more Pythonic interfaces. There are also special-purpose tools, like XSLT, which are custom-built for working with XML. In practice, though, I've usually found these special-purpose tools to be more trouble than they are worth. Especially since most don't include features like regular expressions and database libraries that my crunching programs need.

For our purposes, minidom will do fine. What we have is a list, each of whose elements is a variable name and a (possibly empty) list of parameters. What we want is some XML. Let's start by creating the document and its root settings element:

XDD10384imagelistfilename11

For each entry in data, we need to add a new var element to the settings:

    def process(data);

        for (var, params) in data:

        varNode = doc.createElement('var')

        varNode.setAttribute('name',var)

        root.appendChild(varNode)

        .    .    .    

    return doc

Similarly, for each parameter, we need to add a param element to the var element. We must also add a text node to the param element to store the parameter's value:


Great, except that when this document is converted into text for output, the result is:

XDD10384imagelistfilename13
There are no newlines or indentation to make it easy for human beings to read. We could easily insert them by adding text nodes in the right places, but there's no point since these files are only going to be read by other programs. In my next column, I'll explain how to merge your new XML data with a database.

Figure 1.

User Comments

12 comments
John Leather's picture

Greg,<br/><br/>Great article, I look forward to part 2! I do have one question, where is Figure 1?<br/><br/>Thanks,<br/><br/>John Leather

March 1, 2006 - 3:19am
John Leather's picture

Greg,<br/><br/>Great article, I look forward to part 2! I do have one question, where is Figure 1?<br/><br/>Thanks,<br/><br/>John Leather

March 1, 2006 - 3:19am

Pages

About the author

Greg Wilson's picture Greg Wilson

Greg Wilson’s book Data Crunching was published by the Pragmatic Bookshelf in April 2005. He received a PhD in computer science from the University of Edinburgh in 1993 and is now a freelance software developer, a contributing editor at Doctor Dobb's Journal, and an adjunct professor in computer science at the University of Toronto.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

Nov 09
Nov 09
Apr 13
May 03