Thursday, 19 June 2008

Database schema in source control

Getting your database schema under source control is an important part of any development process as your solution isn't just the finely crafted code you've built, it's also the database schemas it uses.

There are a variety of ways of bringing your schema into SCC, such as using Visual Studio Database Professional Edition, or one of the fine offerings from red gate, at an absolute minimum you should keep a script that is capable of recreating your schema, but with a little planning and a simple custom tool, it's reasonably straight forward to manage your schema very well without spending a fortune on tools. I present below the mechanism I'm using for the next release of flux. If you've not yet got your database schemas under source control, I'd recommend doing it now!

The way I approach database schemas in source control is to setup a project with the following directories (if using visual studio, you can setup a simple database project and then create the structure shown below).


The concept is, we define an immutable baseline that represents the initial structure of our database tables that, once committed to source control, will not be changed. We then apply any version updates to the structure to update it to the latest version before finally dropping and re-creating any dynamic elements such as stored procedures, views and functions.

The baseline directory contains a single script to establish the initial structure of the database. At an absolute minimum this establishes a DatabaseVersions table as below and inserts a record into it that states the current version is 1.0.0.


Note that the baseline scripts should not include the creation of any stored procedures, views, functions or other dynamic elements - these are re-created each time we run the update process against our database.

With the baseline in place, the updates directory is then used to apply change scripts to the database as each change is made. Each change script makes all of the adjustments necessary in one atomic unit. Again updates are only to the tables, keys and constraints and not the dynamic elements which are covered shortly. As each update is applied, it sets the version number in the DatabaseVersions table to the new version.

Each update is made up of one single script, and the file is named according to the version of the database that it updates. eg: If your script will update a version 2.0.4 database to 3.0.0, the script will be named update020004.sql.


The dynamic directory contains all of the various scripts that will drop and re-create any views, stored procedures, functions etc. Each object is represented by it's own script and each script should start by querying if the object already exists before dropping it and re-creating.

The dynamic elements allow the database schema to be validated - for example, if a view uses a field that has been dropped in a recent update, the update process will fail as the view cannot be re-created with this erroneous field.

As a manual process this would all be reasonably tiresome, so I suggest the creation of a custom build task, or command line application that you can run against a database, it's objectives being;

  1. Connect to the database (optionally create it)
  2. Does the DatabaseVersions table exist? If not, run the initial baseline script against the database.
  3. Read the current version number from DatabaseVersions.
  4. If a script exists named updateXXYYZZ.sql in the Updates directory, run it and then repeat from step 3.
  5. Execute every script contained in the dynamic directory.

All of the above would be completed within one transaction, so if any part of an update failed, the entire update phase would fail.

Having this process in place allows you to keep your schema as safe as you keep your code, but also keeps the overhead of maintaining it reasonably low.

You can even use the above process during the initial development of your baseline. Your baseline starts out as being nothing more than the creation of the database version table, with each update then adding the tables as you develop them. Once your happy with your database and want to make it the new baseline, simply re-script it's structure, overwrite the current baseline and remove the update scripts.

Your tool to process the scripts should also be capable of generating standalone update scripts that can be shipped with your application. Your app will need a script to create the initial release 1 database, but will also then need a single "update v1 to v2" script, followed by an "update v2 to v3" script and so on. This should be achieved by running the tool in script generation mode. It would then;

1. Spit out a script to create the entire database in it's present version. Incorporating baseline, all updates and all dynamic items.

2. Spit out a script to move from baseline to v2, v2 to v3 and so on - but without the dynamic components. Along with a separate script to update the dynamic components.

This would mean that the user of your app only needs to run the main script if they are installing a new instance, or run the individual update scripts followed by the single dynamic script to update an existing instance.

[Edit: The source code to a tool that implements all of the above is now available from this post or from codeplex here]