Renaming models in Django without heavy data migrations

 

Renaming models in Django without heavy data migrations

Introduction

In the fast-growing modern world, business definitions and terminology change rapidly.

The main goal of programming is to model this surrounding world so when such changes occur we need to reflect them in our code base, documentation, database. etc.

In this blog post, I’ll tell you about a huge data migration that we were about to execute. We had to think of a way to rename the most fundamental model in our database and we had to do it as painless and effortless we could. If you are not interested in the story you can directly go to The Execution

The problem

Real-world example

Let’s say we have a system that is used to manage all the restaurants from a famous brand.

This system has been in production for 3 years and is used by the staff members of the brand. It uses Django for a backend.

The technologies are not important. You can still follow the steps we did using the technologies you prefer.

As you may have guessed already the main database model/table in this system is called Restaurant and almost everything in the system is related to it.

New business requirement

During the planning of the next sprint, the business folks came up with a new requirement to the system. It was formulated as follows:

Our local restaurants are doing really well and we gain a lot of profit from them. We’ve decided to expand our business model so we’ll occupy more and more places (restaurants, bars, pubs, etc.). We want to be able to group these places so we can gain information from a certain set of places. This means that we need to change our terminology so from now on “Restaurant” will have a different definition and what we called “Restaurant” before will now be called “Place”

You can imagine how we reacted to this new requirement…

we-are-fucked

Of course, we tried to suggest other solutions to this problem and other terminology we can use so we don’t need to rename the main model we have. The business people rejected all of these proposals.

Understanding

They understand the problem just like “Just do some find-replace and rename some words.”. Personally, if I were them I’d think the same way.

We managed to explain to them that this is not that simple and the change would have a massive impact on the whole system and on the development process.

We knew that all of the developers should stop developing any new features as resolving conflicts after that would be inconceivable.

Furthermore, how would we assert that everything is working correctly after the change? Tests? Renaming will affect them as well. So, if a test fails what is the problem – the code in the test itself or the tested code?

According to me, after such a massive change the project can easily get into a constant bugs regression approaching infinity

Conditions

Once, both of the parties were on the same page we had to think if this change would worth its effort.

There were two major factors that determined the cost of the task:

  • Downtime
  • Development freeze

If the cost wasn’t satisfying enough we agreed to drop the renaming and think of something else as a solution to the problem.

And by satisfying enough we understood:

  • Downtime <= 8 hours/1 night
  • Development freeze <= half week

Estimations

Our first job was to estimate the boundaries of the above unknowns.

Estimating such task is a huge task on its own!

We’ve gathered the whole team for brainstorming and planning the task. There were several points that we needed to follow:

  1. Make sure you don’t lose any production data!
  2. Make sure we don’t introduce any bugs that will lead to a system outage or failure. Making such change to a system will certainly lead into such outages so this point is more like “Make sure we’ve made your best to not introduce any bugs that will lead the system to outages and failures!”.
  3. Do we have the resources to perform such change to the system?
  4. Is it even possible to fulfill the ultimatum?

Brainstorming

After a long session of brainstorming in which the whole team participated, we’ve come up with a couple of ideas that I’ll describe in this section.

Frontend-only renaming

One of our first ideas was to rename the terminology only on the frontend side of the project (and maybe the URLs ). We realized we’d shoot ourselves in the foot so we rejected it almost immediately.

The problem was we wouldn’t only rename “Restaurant” to “Place” – we would use “Restaurant” with a different meaning.

For example, let’s say someone reports a strange behavior with the “Restaurants List”. Every time such reports would be followed by a “By Restaurant you mean Place or the Restaurant themselves?” question from our side and that is not eligible by any means.

All data migrations at once

We had already performed several data migrations of this kind in the project but they didn’t impact that huge area of the system by any means. Still, it was possible to do something like:

  1. Make a proxy to the models so you can rename only the models and the database tables
  2. Rename the models and make several data migrations
  3. Rename rest of the “restaurant” occurrences in the system

The problem was it wasn’t only renaming one model and the corresponding foreign keys. There were several models having “Restaurant” in their names (e.g. RestaurantToProvider M2M model).

If we somehow manage to make the data migration following this strategy we would (most-likely) fulfill the ultimatum of the downtime but it’d take a lot of effort from the whole team and would affect the development process drastically. The cost was too high!

Migrate models one by one

Another approach was to make the renaming model by model and finally rename the huge Restaurant model to Place. This solution sounded really good as we could continue developing and have one person renaming just part of the system. This step-by-step solution would really be possible but we’ve decided that the overall effort and time we would have to put in it were enormous.

You would ask “Why?” Well, if you search for “restaurant” (not case-sensitive) only on the backend you would have ~17K matches in ~400 files

Yes, we were estimating that huge change to a system!

Just think about fixing a merge conflict in files with renamed names…

We knew we had to think of something else and maybe not the elegant (or not elegant at all).

The solution

Half a day of brainstorming and a dozen cups of coffee later, one of my colleagues rose up and said:

The problem is “renaming”, isn’t it? Why don’t we just rename!? Let’s instead think of the problem as “You need to rename variable X to Y in N files”.

And that was absolutely genius!

brain-expand-gif

Here is the plan that we decided to follow:

  • Rename classes, variables, strings, etc. in the code:
    • restaurant -> place
    • Restaurant -> Place
    • RESTAURANT -> PLACE
  • Rename all files and directories following the same pattern
  • Squash all migrations
  • Dump the database data into a JSON
  • Rename all “restaurant” to “place” in the JSON file following the same pattern
  • Populate the new migrations in the database
  • Load the renamed JSON file in the database

This plan looked really dead-simple and would fulfill all of the ultimatums we had so we decided to stick to it.

The execution

Here, I’ll list you everything I used to execute the plan from above.

NOTA BENE

Renaming code

Create a new branch:

git checkout -b rename-restaurant-to-place

Open you text editor or use find and rename all “restaurant” words with “place”.

I’m using Sublime  for a text editor. You can use ctrl+shift+f to find and replace.

NOTE: Do it case-sensitive!

Case-sensitive find-replace in Sublime Text

You need to do the same thing for “Restaurant” & “RESTAURANT”.

When you click on Replace you will be prompted if you sure that you want to replace all these words:

Sublime replace alert

Click on Replace again and all of these 377 files will be loaded in your editor. Now you need to save all of these files.

TIP:

Yes, you can do it at once!

If you don’t have such key binding go to Preferences -> Key Bindings and paste this:

{ "keys": ["ctrl+shift+s"], "command": "save_all" }

To close all opened files:

{ "keys": ["ctrl+shift+w"], "command": "close_all" },

Renaming files and directories

To rename all files and directories I used find. Here are useful commands that worked in our case:

# If you want to get a list of all files and directories that have booking in them:
find my_project/ -type f \( -name "*restaurant*" ! -iname "*.pyc" \) -o -type d \( -name "*restaurant*" ! -iname "*__pycache__*" \)

# To rename all directories run:
find my_project/ -type d \( -name "*restaurant*" ! -iname "*__pycache__*" \) -exec rename 's/restaurant/place/' '{}' \;

# To rename all files run:
find my_project/ -type f \( -name "*restaurant*" ! -iname "*.pyc" \) -exec rename 's/restaurant/place/' '{}' \;

NOTE: You cannot find & rename files and directories at the same time.

find will start renaming your files consecutively. For example, if it renames your directory first, the files in it won’t be renamed as there is no such path anymore.

TIP:

Don’t worry if you see an error that says path: No such file or directory This means find renamed a parent directory and there are files or directories in it that need to be renamed, too.

As a solution, you can continue running the commands until there are no errors.

Squashing migrations

Once, you have renamed your models you can drop all previous migrations. They had already been run so it’s OK to just delete them.

Squashing the migrations might not be that simple and may hit several unexpected problems. We’ll write more articles on that theme in the future so keep in touch with our blog.

Once, you’ve removed the migrations just create new fresh ones for your renamed models. In Django, it’s just running makemigrations.

python manage.py makemigrations

Dumping data into a JSON

Go back to the master branch.

If you use Django you can use the dumpdata command. If you use any different technology you can search for a command that works for you.

There are several tables that Django creates and uses internally. You usually want to exclude them. More on this you can find in the docs.

Here is the command that worked for us:

git checkout master

python manage.py dumpdata --exclude=auth --exclude=contenttypes --exclude=sessions --exclude=admin --exclude=django_celery_results.TaskResult -v 1 --indent=2 -o data.json

Why do we --exclude these tables?

It was a good time to get rid of some tables that didn’t store any information that we’d need in future. Such a table is the sessions one (NOTE: dropping it means you’ll log out all your users). django_celery_results.TaskResult keeps the log information of the already ran Celery tasks – we don’t really need it as we store the information for the failed tasks in a separate place and the succeeded ones don’t have any useful data for us.

auth and contenttypes tables need to be excluded too as they we’ll be generated when Django populates the migrations and you’ll have collisions.

Finally, we’ve decided to exclude the admin records as we rarely use the django-admin and there is no need to preserve the log of the actions.

I highly recommend you to revisit this part of the Django docs. It’s really useful.

Renaming the JSON

As you can imagine this data.json file is huge. You may struggle opening it with your favorite text editor (I did!). Instead, you can just use vi in order to rename your fields and tables:

vi -esnc "%s/restaurant/place/g|:wq" data.json
vi -esnc "%s/Restaurant/Place/g|:wq" data.json

Loading the JSON

This was the trickiest part of the entire solution. It took us about 2 days to debug and load the data into the new database.

Signals

You may have a really hard time if your business logic mainly relies on Django’s signals. For example, we had some signals that create objects upon other object creation (e.g. BaseUser -> Profile).

We commented out the signals during the loaddata command execution and that worked good enough for us.

I’d suggest you go through all your signals and inspect if something might go wrong.

TIP:

You can’t imagine how many times you will need to dump and try to load the data back. One thing that really helped me was to create a new database and have two “parallel” databases. The following commands are for Postgres:

git checkout rename-restaurant-to-place
sudo -u postgres createdb -O <db_owner> <test_db_name>

Open a new terminal and export the DB URL to the new database:

export DATABASE_URL='postgres:///test_db_name'

And here is the Django setting that handles this:

# settings.py

DATABASES = {
    'default': env.db('DATABASE_URL', default='postgres:///db_name'),
}

Now, you need to populate your squashed migrations. In Django you need just to run the migrate command:

python manage.py migrate

You are ready for your first try to load the data.

python manage.py loaddata data.json

NOTE: This step is the hardest one as I said. It usually takes a lot of time for the command just to finish. Don’t back down if your loaddata fails. Go back to master fix the problems and dump the database over and over again.

That’s it!

Huzzah-gif

NOTA BENE!

(Go back to The Execution)

This solution was only tested locally. For better or for worse it never had to go to production. The business people decided that they don’t need the renaming because this change would rather confuse than help their staff members.

You can try it in production but it’s on your own risk!

If the renaming of the model is not that crucial and it doesn’t impact the system on that level I highly recommend you to do the data migrations instead (following the Django guide)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.