- tools that allow us to take snapshots, store diffs, etc. Databases are more complex
- Databases are managed separately from the code. This does not make sense in many cases, since databases and code are closely related; the code has components that access the database, and the database structure is, in part, determined by what the code does. Working in a database application involves stepping across a chasm between the "Developer view" and the "DBA view" quite frequently.
- Database Administrators (DBAs) have traditionally worked from the mindset of a Systems Administrator. This mindset gets them thinking that they are simply "managing" a growing and changing system versus being involved in creating a reproducible "release" of an application. Many DBAs never create "releases" of their databases, they simply make changes to an existing database. This contrasts with the view one takes when managing source code where we build and release entirely new versions of components rather than alter existing ones. Changes to the database are also associated with the process of migrating existing data to the new schema. This is an important task, but you can view it as one that depends upon tracking changes to the schema rather than the actual act of tracking changes to the schema.
One approach to managing database changes is to have strict processes in place that control who can make changes, and when and how these people can make them. This is not an acceptable solution in an agile environment, since the database and code are often linked closely together, and bottlenecks in changing the database schema make your team less than agile.
What does it all mean?
We need to define what we mean by the phrase "managing versions of databases." This phrase can mean at least two things, and often both at the same time:
- Managing how the "schema" (the logical structure of the databases) changes over time. Answering this question means that you need to understand some of the following issues:
- o When did a table/column/etc get added or removed?
- o What in the schema changed between last week and today and who made the change?
- o How do I change the structure of an existing database to match the current one?
- Managing how to migrate the data in an existing database into the new schema. This is, or course, essential in a production environment, but is arguably something that you can do with the help of a change management system.
A related task is managing changes to the database data at various points in time, but this is the domain of backup and recovery practice, rather than "version management" practice, so we will not address it here.
The most basic function you need is the ability to track and maintain changes to the schema, and to reproduce an earlier version of the database schema, in particular, one which goes with an earlier version of the application code. A straightforward way to do this is to manage the artifacts of the database.
Versioning the Artifacts
One approach to tracking changes to the schema is to track changes to some artifacts that we use to create and manage the database. This seems analogous to versioning the source code of an application, rather than the executable components.
The key artifacts for a database contain Data Definition Language (DDL) statements, which specify how to create the tables/columns etc. From a DDL file you should be able to create a new, empty, database, from nothing. A file containing Data Manipulation Language (DML) commands specifies how to insert important data into the tables when you have data