Database versioning

Every developers know the importance of source code versioning and how to use SVN, Git etc to achieve but DB versioning seems unfamiliar and more difficult to most of them. But in my opinion, DB versioning is as important as source code versioning, especially when you are working in an Agile project where whole team is responsible to the environment. In this post, I try to describe the common issue with DB and how to solve this.

Problem

Most of software use DB now and of course, it need to be changed during the development and shared with the modules or whole system. And its problem is exact the same as source code where everybody works in only 1 source code repository at the same time:

  • When a developer makes a DB change, how can others aware it? It seems so easy with the source code with SVN or Git because they support the check-in, check-out. So before committing the code, a developer can view the log, see the code changes by other’s check-in.
  • When conflicts occur, how can developers aware and resolve them?

Why do you rarely care?

Let start with the traditional software development process, where the DB:

  • is built in the design stage
  • and it’s mostly fixed during the development.
  • and it’s taken care by a deployment person or team who normally put in charged of environment configuration.

So the developer doesn’t need to care about this. But in the Agile project, its design is unfixed as we may has increments, code or structure refactoring, and the DB schema is often updated sprint by sprint.

What do you usually handle? And the issues?

Follow my experience, there 2 normal ways we are using to handle this problem:

  • Shared environment. The easiest (and sometimes it is the best way) is using shared environment. For example, a developer works on his checked-out source code individually but query on a single shared DB server. It seems a good way but still facing some issues:
    • Performance. Normally it speeds is not as good as using local environment because the shared DB needs to serve all developers’ uses while the local one is more specific on the current module. And it may affect to the productivity.
    • Revision. It so hard to manage the DB revision because it need to go parallel with source code revision where we sometimes roll source code back to the good one. One solution is storing all the DB backup for each commit, but it takes a large amount of storage.
  • Scripting the changes. Whenever we have change on DB, script it and apply to all developer locals. It is the good way to reduce the storage and make easier to version the change. But how team members aware that the scripts are updated and resolve the conflicts?

Solution

DB versioning control

DB versioning control is an approach to keep track the changes on DB schema and data in a shared repository to make sure that team can aware and apply these changes easier.

How does it work?

There are some ways to implement the DB versioning but the most popular one is versioning DB by versioning its change scripts. So the DB versioning problem becomes the source code versioning problem that is solved really good right now.

The details of DB versioning implementation are vary project by project but the key points usually are:

  • DB changes are built by scripts. To reduce the storage and easy to update, we should script DB to files and arrange them in a good structure.
  • DB is under the source control. After scripting DB to files, it makes sense to keep them under the source control because they are the source code now.
  • The DB changes are verified during the project build or application start-up. It makes the developers aware changes that need to be applied. In my opinion, the verification should occur as soon as possible to avoid wasted time. The application start should be failed if the changes haven’t been applied. But it’s better with the failure during the build stage.
  • The DB versions are stored in the DB itself. It’s a good approach to keep the solution simple by for example, a table that stores the run history.

Better solution?

The purpose and a good approach is described above, but there are various specific solutions from project to project. I will have another post to describe an hand-on implementation in a specific project later but you should try to go further by this start as I think DB versioning is very important in Agile projects.

Reference: http://www.infoq.com/articles/db-versioning-scripts