Data Crunching, Part 2

[article]

languages, database connections are managed by driver classes that, in theory, all offer the same interface. In order to use MySQL or PostgreSQL instead of SQLite, we should only have to change the import statement that loaded the driver and the connect statement that created the link. In practice, each database supports a slightly different dialect of SQL, so it's common to have to tweak our queries as well.

Once we have a connection, we create a cursor. This marks our place in the database, just as an editor cursor marks your place in a document. We can then execute the query, which we pass to the cursor as a string. Following is the fetchall call that grabs the
query's results in one go. If we were expecting a lot of data, we might page through the results instead.

Next, we process the results of the query (discussed below). Once that's done, we close the cursor and then close the connection. Forgetting to tidy up is a common mistake, and while it won't cause an immediate crash, your program may eventually fall over because it has run out of connections.

The processing inside the loop separates the variable's name from its parameters and then tries to find a matching node in the XML document. If the node exists, its parameters' values are overwritten. If it doesn't, a new node is created and appended to the document:

The full text of the functions findVarNode,
addOrReplaceParam, and
addVarNode can be found in sample code online.

It's 11:30 a.m., and the script is humming away happily in the background. Of the two-and-a-half hours it took to solve this problem, almost an hour involved tracking down information about file formats, making sure files had the right permissions, and so on; coding and testing only took about ninety minutes. Knowing how to use standard data-crunching tools--the Unix command line, regular expressions, DOM, a little SQL, and most importantly a scripting language (in this case Python)--is the reason it took so little time. We can use data-crunching tools to solve many development project problems, and investing a few hours in mastering them pays enormous dividends.

Click Here to Read
Data Crunching Part 1 of 2

About the author

Greg Wilson's picture Greg Wilson

Greg Wilson’s book Data Crunching was published by the Pragmatic Bookshelf in April 2005. He received a PhD in computer science from the University of Edinburgh in 1993 and is now a freelance software developer, a contributing editor at Doctor Dobb's Journal, and an adjunct professor in computer science at the University of Toronto.

AgileConnection is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!