Not All Migrations are Equal: Schema vs. Data

Update 01/26/2015: Check out the nondestructive_migrations gem. It’s similar to dimroc/datafix but simpler because it leverages existing AR code. It does not however generate specs… yet.


You’ve been using Active Record Migrations to manage changes in your database and you love it. But then a model’s validations change, and all your existing data becomes invalid.

What do you do? Place it in an AR migration? Depends. Those are primarily for schema migrations and this is not a schema change.

You need to run a data migration.

What are your options?

Stuff it into an AR migration? If it’s simple enough.

It would probably be your first move. Sure, you can get away with a few hundred or even thousand of rows and no one will break a sweat. Let’s look at how you would do that.

Bad Schema Migration

class ChangeAdminDefaultToFalseOnUsers < ActiveRecord::Migration
  def up
    change_column_default(:users, :admin, false)
    User.reset_column_information

    # Bad: Use of application code that changes over time.
    User.update_null_to_false! 
  end
end

Good Schema Migration

class ChangeAdminDefaultToFalseOnUsers < ActiveRecord::Migration
  # Create empty AR model that will attach to the users table,
  # and isolate migration from application code.
  class User < ActiveRecord::Base; end

  def up
    change_column_default(:users, :admin, false)
    User.where(admin:nil).update_all(admin: false)
  end
end

Better Schema Migration

class ChangeAdminDefaultToFalseOnUsers < ActiveRecord::Migration
  def up
    change_column_default(:users, :admin, false)
    execute "UPDATE users SET admin = false WHERE admin IS NULL"
  end
end

With most people’s usage pattern, everything in db/migrate/ has to live for months due to habitual rake db:migrate invocations, which is why using application code in an AR migration is frowned upon (sure, you can move onto schema:load and delete migrations, but let’s keep things simple for now).

That application code will change weeks or even days from now, and then running rake db:migrate will be busted.

Most people get by using the Good and Better schema migration methods, but there comes a time when either the scale or the complexity of the migration warrants its own code. The time when pure SQL will only get you so far or when the runtime of the migration spans days not seconds.

What do you do?

Create a one off rake task? No.

Perhaps, but the code will be difficult to test and won’t have mechanisms in place to roll back to changes. Even if you refactor the logic out of the rake task into a separate ruby class, you will now have to maintain code that is ephemeral in nature. It merely exists for this one off data migration.

One approach is to create a oneshots.rake file, but that ends up being a ghetto of random tasks with no test coverage that never gets cleaned up

Datafixes! Yes.

Basically a mirror of AR migrations, every rails user will feel right at home with datafixes.

Install the gem from my repo:

gem 'datafix', github: 'dimroc/datafix' # The changes will eventually be incorporated into the main gem `datafix`

Run the generator to create the datafix template:

> rails g datafix AddValidWholesalePriceToProducts
  create  db/datafixes/20141211143848_add_valid_wholesale_price_to_products.rb
  create  spec/db/datafixes/20141211143848_add_valid_wholesale_price_to_products_spec.rb

Fill out the datafix with your data migration:

class Datafixes::AddValidWholesalePriceToProducts < Datafix
  def self.up
    Product.where(wholesale_price_cents: nil).each do |product|
      product.update_attributes({
        wholesale_price_cents: product.fetch_price_from_amazon
      })
    end
  end

  def self.down
  end
end

Then just run the rake tasks:

> rake db:datafix
  migrating AddValidWholesalePriceToProducts up

> rake db:datafix:status

  database: somedatabase_development

   Status   Datafix ID            Datafix Name
  --------------------------------------------------
     up    20141211143848       AddValidWholesalePriceToProducts

Unlike AR migrations, it generates specs:

require "rails_helper"
require Rails.root.join("db", "datafixes", "20141211143848_add_valid_wholesale_price_to_products")

describe Datafixes::AddValidWholesalePriceToProducts do
  describe ".up" do
    # Fill out the describe block
    let!(:product) do
      product = FactoryGirl.build(:product, wholesale_price_cents: nil)
      product.save(validate: false)
      product
    end

    it "should fix the price and be valid" do
      expect(product).to_not be_valid
      subject.migrate('up')
      expect(product.reload).to be_valid
    end
  end
end

And the real kicker: when the code has overstayed its welcome, you can just delete the datafix. That’s not so simple with a schema migration in db/migrate/. The datafix is ephemeral in nature and isn’t worth maintaining months down the road.

This is super handy in all the scenarios:

  1. Denormalizing values to another table
  2. Changing data to comply with changing validations
  3. Long running data migrations that span days
  4. Migrating from one table to another

Wrap Up

For data migrations, datafixes are far better than anything out there, but it’s still brand new and rough around the edges. It doesn’t even have rake db:datafix:rollback yet! Check it out!

Note

The dimroc fork has many upgrades to the Casecommons version, including the rake tasks that function like *rake db:migrate. It will eventually be incorporated into the Casecommons version when they stop sending email and look at the PR.*

References

comments powered by Disqus