Active Record Migration is one of the greatest features in Rails. It actually help to reduce a very large and complicated amount of stuffs in development process. Most actions in migration such as creating table, add fields, remove fields, add index, remove index to existing table are simple and easy to do since the existing document in Rails Guides is very straight forward.
On the other hand, something like importing large amount of data, or moving data from one to another large table, migration become very complicated and panic.
Given that we have 1 millions record of products in the database. And each product
belongs_to
one global_category
.
As developers, we want to change the relation belongs_to :global_category
to be
has_many :categories
instead.
If the existing product contains no global_category, then product categories
should contain a category named Uncategorized
.
See example below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
The code looks fine and simple from the outside.
It is actually not really fine, and good practice for large amount of data migration at all.
The Product.all.each
, will load all products from your database into the
memory. Once you got 1 million record of products, you will be running out of
memory and will cause and error.
The suggestion is to use Product.find_each
instead, which will load only 1000
record continuously and will not consume too much memory.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Next, what will happen when there is expected errors happen? Actually, the program will crash and exit in the middle on the road. We don’t know what cause the problem and where the problem might happen. So we have to update the exisitng migration code to be rerunable, and rerun the whole migration again.
This is not an ideal solution at all. We can’t make sure that the new updated code won’t cause any other problems.
The suggestion for this is to find out any products that cause the problem during migration process, and store those products some where for later debug. Then recue from any error happen on running time and continue the working until all data are migrated.
See the refactorred example below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
From the example above, all exceptions raised will be rescued with the code in the
rescue block which will push any error products into the unprocessable_products
that will be writted into the error.txt
file, and the migration still continue to run for the rest of the records.
Once the migration is finished, you just need to check in the error.txt
file
which will list all the error product ids. With the ids in hand, you could now
debug your code easily. Anyway, if you couldn’t find the error.txt
file, it
means there is no error at all, yeee.
Hope this help! Happy migrating!