Not All Migrations are Equal: Schema vs. Data
11 Dec 2014Update 01/26/2015: Check out the nondestructive_migrations gem. It’s similar to dimroc/datafix but simpler because it leverages existing AR code. It does not however generate specs… yet.
You’ve been using Active Record Migrations to manage changes in your database and you love it. But then a model’s validations change, and all your existing data becomes invalid.
What do you do? Place it in an AR migration? Depends. Those are primarily for schema migrations and this is not a schema change.
You need to run a data migration.
What are your options?
Stuff it into an AR migration? If it’s simple enough.
It would probably be your first move. Sure, you can get away with a few hundred or even thousand of rows and no one will break a sweat. Let’s look at how you would do that.
Bad Schema Migration
Good Schema Migration
Better Schema Migration
With most people’s usage pattern, everything in db/migrate/ has to live for months due to habitual rake db:migrate
invocations,
which is why using application code in an AR migration is frowned upon (sure, you can move onto schema:load and delete migrations, but let’s keep things simple for now).
That application code will change weeks or even days from now, and then running rake db:migrate
will be busted.
Most people get by using the Good and Better schema migration methods, but there comes a time when either the scale or the complexity of the migration warrants its own code. The time when pure SQL will only get you so far or when the runtime of the migration spans days not seconds.
What do you do?
Create a one off rake task? No.
Perhaps, but the code will be difficult to test and won’t have mechanisms in place to roll back to changes. Even if you refactor the logic out of the rake task into a separate ruby class, you will now have to maintain code that is ephemeral in nature. It merely exists for this one off data migration.
One approach is to create a oneshots.rake file, but that ends up being a ghetto of random tasks with no test coverage that never gets cleaned up
Datafixes! Yes.
Basically a mirror of AR migrations, every rails user will feel right at home with datafixes.
Install the gem from my repo:
Run the generator to create the datafix template:
Fill out the datafix with your data migration:
Then just run the rake tasks:
Unlike AR migrations, it generates specs:
And the real kicker: when the code has overstayed its welcome, you can just delete the datafix. That’s not so simple with a schema migration in db/migrate/. The datafix is ephemeral in nature and isn’t worth maintaining months down the road.
This is super handy in all the scenarios:
- Denormalizing values to another table
- Changing data to comply with changing validations
- Long running data migrations that span days
- Migrating from one table to another
Wrap Up
For data migrations, datafixes are far better than anything out there, but it’s still brand new and rough around the edges. It doesn’t even have rake db:datafix:rollback yet! Check it out!
Note
The dimroc fork has many upgrades to the Casecommons version, including the rake tasks that function like *rake db:migrate. It will eventually be incorporated into the Casecommons version when they stop sending email and look at the PR.*