TL;DR This article is targeted at programmers who have very little or no experience with fakers and factories. If you are already skilled in the topic this article may not be that interesting to you.
Introduction
In our Django apps we have the M(odel)T(emplate)V(iew) structure (mostly known as MVC). When we want to test its functionality we usually have to create some model instances and work with them and the database.
A nice and easy approach for doing so is to create fake data in our tests in the form of factories and fakers.
In the following article we are going to look over some practical examples and techniques. This will improve your tests readability and behavior.
What are we going to use?
In our Django projects we use faker to generate fake data and factory_boy to create factories of our models.
Why and Where can I use them?
Why?
- We usually want to generate some random strings and numbers.
- We usually want to generate some unique instances of our models that have no common fields.
Where?
Have you ever hard coded “asdf” or 1 in your tests where you needed something randomly?
Furthermore, let’s say we have the model User and every user has field name. We want to test some functionality connected to it. If our tests need just one user we can easily name it “Adam” and everything will work fine (except all our of tests will have the user “Adam”).
That seems OK but what happens when we want to test anything connected to a list of 10, 100, 1000 users? We have to think of a 1000 unique names? Or we can name them in randomly typed characters?
All of this is possible but will surely block our work flow and will definitely bring us trouble.
That’s where our faker and factories come in!
Simple Django project
For better illustration of my examples, I’ve created a simple Django app. You can find it here.
The project is called MyLibrary and has the following model schema:
That is how it looks in our models.py
file:
# in models.py import uuid from djmoney.models.fields import MoneyField from django.db import models from django.utils import timezone from my_library.users.models import User class Library(models.Model): address = models.CharField(max_length=255) librarian = models.OneToOneField('User', related_name='library', on_delete=models.CASCADE) class Book(models.Model): library = models.ForeignKey(Library, related_name='books', on_delete=models.CASCADE) public_id = models.UUIDField(unique=True, default=uuid.uuid4, editable=False) title = models.CharField(max_length=255) description = models.CharField(max_length=500) class BookBorrow(models.Model): book = models.ForeignKey(Book, related_name='book_borrows', on_delete=models.CASCADE) user = models.ForeignKey(User, related_name='book_borrows', on_delete=models.CASCADE) start_date = models.DateTimeField(default=timezone.now) end_date = models.DateTimeField(null=True, blank=True) charge = MoneyField(max_digits=10, decimal_places=2, default_currency='GBP') returned = models.BooleanField(default=False) class Meta: unique_together = ('book', 'user', 'start_date', )
Functionality
Let’s say we want to have a view where we show all books borrowed by a single user. This is how this simple view would look like:
# in views.py from django.shortcuts import render from django.http import HttpResponseNotAllowed from .models import BookBorrow def book_borrow_list(request, user_id): if request.method == 'GET': object_list = BookBorrow.objects.filter(user__id=user_id) return render(request, 'book_borrow_list.html', locals()) return HttpResponseNotAllowed(['GET'])
Now we have some custom logic in our system. That’s right! It’s time for adding some tests.
Plan our tests
Let’s take a look at our view. It should list all books that are borrowed by a user. This straightforwardly lead us to two test cases:
- Test if all borrowed books by user X are listed;
- Test if listed borrowed books are only for user X (not for user Y, now for user Z and so on).
Test it
Let’s add our first test. For this example we want to test the behavior of our system:
- If we have 5 books borrowed by a single user we have to assure that the view lists them all.
- If we have 5 books borrowed by a single user and 5 books borrowed by another user we have to assure that the view lists the books of the user whose ID is given in the url.
# in tests/test_views.py from test_plus import TestCase from my_library.users.models import User from ..models import Book, BookBorrow, Library class BookBorrowListViewTests(TestCase): def setUp(self): self.librarian = User.objects.create(name='Librarian', email='librarian@test.com', is_superuser=True) self.library = Library.objects.create(address='Test address', librarian=self.librarian) self.user = User.objects.create(name='Tester', email='tester@test.com') self.url = self.reverse('library:book_borrow_list', user_id=self.user.id) def test_with_several_books_borrowed_by_one_user(self): book1 = Book.objects.create( library=self.library, title='Test Title 1', description='asdf' ) book2 = Book.objects.create( library=self.library, title='Test Title 2', description='asdf' ) book3 = Book.objects.create( library=self.library, title='Test Title 3', description='asdf' ) book4 = Book.objects.create( library=self.library, title='Test Title 4', description='asdf' ) book5 = Book.objects.create( library=self.library, title='Test Title 5', description='asdf' ) BookBorrow.objects.create(user=self.user, book=book1, charge=1) BookBorrow.objects.create(user=self.user, book=book2, charge=1) BookBorrow.objects.create(user=self.user, book=book3, charge=1) BookBorrow.objects.create(user=self.user, book=book4, charge=1) BookBorrow.objects.create(user=self.user, book=book5, charge=1) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, book1.title) self.assertContains(response, book2.title) self.assertContains(response, book3.title) self.assertContains(response, book4.title) self.assertContains(response, book5.title) def test_with_several_books_borrowed_by_one_user_and_another_user(self): another_user = User.objects.create(name='Another user', email='another@test.com') user1_book1 = Book.objects.create( library=self.library, title='Test Title 1', description='asdf' ) user1_book2 = Book.objects.create( library=self.library, title='Test Title 2', description='asdf' ) user1_book3 = Book.objects.create( library=self.library, title='Test Title 3', description='asdf' ) user1_book4 = Book.objects.create( library=self.library, title='Test Title 4', description='asdf' ) user1_book5 = Book.objects.create( library=self.library, title='Test Title 5', description='asdf' ) user2_book1 = Book.objects.create( library=self.library, title='Test Title A', description='asdf' ) user2_book2 = Book.objects.create( library=self.library, title='Test Title B', description='asdf' ) user2_book3 = Book.objects.create( library=self.library, title='Test Title C', description='asdf' ) user2_book4 = Book.objects.create( library=self.library, title='Test Title D', description='asdf' ) user2_book5 = Book.objects.create( library=self.library, title='Test Title F', description='asdf' ) BookBorrow.objects.create(user=self.user, book=user1_book1, charge=1) BookBorrow.objects.create(user=self.user, book=user1_book2, charge=1) BookBorrow.objects.create(user=self.user, book=user1_book3, charge=1) BookBorrow.objects.create(user=self.user, book=user1_book4, charge=1) BookBorrow.objects.create(user=self.user, book=user1_book5, charge=1) BookBorrow.objects.create(user=another_user, book=user2_book1, charge=1) BookBorrow.objects.create(user=another_user, book=user2_book2, charge=1) BookBorrow.objects.create(user=another_user, book=user2_book3, charge=1) BookBorrow.objects.create(user=another_user, book=user2_book4, charge=1) BookBorrow.objects.create(user=another_user, book=user2_book5, charge=1) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, user1_book1.title) self.assertContains(response, user1_book2.title) self.assertContains(response, user1_book3.title) self.assertContains(response, user1_book4.title) self.assertContains(response, user1_book5.title) self.assertNotContains(response, user2_book1.title) self.assertNotContains(response, user2_book2.title) self.assertNotContains(response, user2_book3.title) self.assertNotContains(response, user2_book4.title) self.assertNotContains(response, user2_book5.title)
Urgh…
I think that’s what most of you thought when you saw these 2 tests. Yes, they are working correctly but imagine generating 100 objects instead of 5 or having more than 2 tests for a view (which you will most likely have).
Let’s review what is wrong (even though all of you noticed it):
- The tests are so long in code lines that you need a couple of scroll-downs to cover them. This is totally a precondition to not read the test at all (yes, the thing you did a couple of seconds ago)
- Imagine adding a new required field to the model that you use to generate objects…
- We have only 2 tests but in real life projects you will have way more for your views.
- Typing random strings and numbers in your tests is never a good idea.
- R.I.P DRY
Let’s start refactoring
Since we summarized the bad practices that our tests are currently introducing, we are ready to start refactoring them step by step.
Use faker instead of hard coded values
First, what is faker?
Faker is a Python package that generates fake data for you.
The first step we are going to make is to use faker’s powers for our hard coded values. This is how our tests are going to look like:
from test_plus import TestCase from faker import Factory from my_library.users.models import User from my_library.library.models import Book, BookBorrow, Library faker = Factory.create() class BookBorrowListViewTests(TestCase): def setUp(self): self.librarian = User.objects.create(name=faker.name(), email=faker.email(), is_superuser=True) self.library = Library.objects.create(address=faker.street_address(), librarian=self.librarian) self.user = User.objects.create(name=faker.name(), email=faker.email()) self.url = self.reverse('library:book_borrow_list', user_id=self.user.id) def test_with_several_books_borrowed_by_one_user(self): book1 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) book2 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) book3 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) book4 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) book5 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) BookBorrow.objects.create(user=self.user, book=book1, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=book2, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=book3, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=book4, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=book5, charge=faker.random_number()) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, book1.title) self.assertContains(response, book2.title) self.assertContains(response, book3.title) self.assertContains(response, book4.title) self.assertContains(response, book5.title) def test_with_several_books_borrowed_by_one_user_and_another_user(self): another_user = User.objects.create(name=faker.name(), email=faker.email()) user1_book1 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user1_book2 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user1_book3 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user1_book4 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user1_book5 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user2_book1 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user2_book2 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user2_book3 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user2_book4 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) user2_book5 = Book.objects.create( library=self.library, title=faker.word(), description=faker.text() ) BookBorrow.objects.create(user=self.user, book=user1_book1, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=user1_book2, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=user1_book3, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=user1_book4, charge=faker.random_number()) BookBorrow.objects.create(user=self.user, book=user1_book5, charge=faker.random_number()) BookBorrow.objects.create(user=another_user, book=user2_book1, charge=faker.random_number()) BookBorrow.objects.create(user=another_user, book=user2_book2, charge=faker.random_number()) BookBorrow.objects.create(user=another_user, book=user2_book3, charge=faker.random_number()) BookBorrow.objects.create(user=another_user, book=user2_book4, charge=faker.random_number()) BookBorrow.objects.create(user=another_user, book=user2_book5, charge=faker.random_number()) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, user1_book1.title) self.assertContains(response, user1_book2.title) self.assertContains(response, user1_book3.title) self.assertContains(response, user1_book4.title) self.assertContains(response, user1_book5.title) self.assertNotContains(response, user2_book1.title) self.assertNotContains(response, user2_book2.title) self.assertNotContains(response, user2_book3.title) self.assertNotContains(response, user2_book4.title) self.assertNotContains(response, user2_book5.title)
Check faker documentation for more useful methods.
What did we do?
To be honest, we have the same test functionality as before but now there are no 'asdf'
, 'Test Title F'
, 1
, etc. Faker is generating values like these randomly and we don’t need to bother about similar problems any more.
Now we have no hard coded strings and integers in our code. This is actually a good achievement, especially compared to the previous implementation, but there is still a lot of code that is repeated.
The next step is to create some factories.
What is actually a Factory?
Factories are Python classes that behave similarly to Django models – they write to your database as Django models do. The thing we love about them is that they give us a lot of automation.
In other words, instead of doing <Model>.objects.create(...)
we are going to have ModelFactory()
and it will do the magic for us.
Let’s create factories for our models:
# in factories.py import factory from faker import Factory from my_library.users.models import User from my_library.library.models import ( Book, Library, BookBorrow, ) faker = Factory.create() class UserFactory(factory.DjangoModelFactory): class Meta: model = User name = faker.name() email = faker.email() class LibraryFactory(factory.DjangoModelFactory): class Meta: model = Library address = faker.street_address() librarian = factory.SubFactory(UserFactory) class BookFactory(factory.DjangoModelFactory): class Meta: model = Book library = factory.SubFactory(LibraryFactory) title = faker.word() description = faker.text() class BookBorrowFactory(factory.DjangoModelFactory): class Meta: model = BookBorrow book = factory.SubFactory(BookFactory) user = factory.SubFactory(UserFactory) charge = faker.random_number()
We usually like to store out factories in a different app called seed
. That’s how you can easily create only one faker instance. Second great benefit is that you have all your factories in one place (since models can have foreign keys to other models in different apps) so you can make SubFactories easier.
Little clarification
In simple words:
- In
class Meta
define themodel
of your factory - Use
faker
to generate random values factory.SubFactory(MyModelFactory) == models.ForeignKey(MyModel)
Use them
Since we have our factories, we had better use them in the tests:
from test_plus import TestCase from my_library.seed.factories import ( UserFactory, BookFactory, LibraryFactory, BookBorrowFactory, ) class BookBorrowListViewTests(TestCase): def setUp(self): self.librarian = UserFactory(is_superuser=True) self.library = LibraryFactory(librarian=self.librarian) self.user = UserFactory() self.url = self.reverse('library:book_borrow_list', user_id=self.user.id) def test_with_several_books_borrowed_by_one_user(self): book1 = BookFactory(library=self.library) book2 = BookFactory(library=self.library) book3 = BookFactory(library=self.library) book4 = BookFactory(library=self.library) book5 = BookFactory(library=self.library) BookBorrowFactory(user=self.user, book=book1) BookBorrowFactory(user=self.user, book=book2) BookBorrowFactory(user=self.user, book=book3) BookBorrowFactory(user=self.user, book=book4) BookBorrowFactory(user=self.user, book=book5) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, book1.title) self.assertContains(response, book2.title) self.assertContains(response, book3.title) self.assertContains(response, book4.title) self.assertContains(response, book5.title) def test_with_several_books_borrowed_by_one_user_and_another_user(self): another_user = UserFactory() user1_book1 = BookFactory(library=self.library) user1_book2 = BookFactory(library=self.library) user1_book3 = BookFactory(library=self.library) user1_book4 = BookFactory(library=self.library) user1_book5 = BookFactory(library=self.library) user2_book1 = BookFactory(library=self.library) user2_book2 = BookFactory(library=self.library) user2_book3 = BookFactory(library=self.library) user2_book4 = BookFactory(library=self.library) user2_book5 = BookFactory(library=self.library) BookBorrowFactory(user=self.user, book=user1_book1) BookBorrowFactory(user=self.user, book=user1_book2) BookBorrowFactory(user=self.user, book=user1_book3) BookBorrowFactory(user=self.user, book=user1_book4) BookBorrowFactory(user=self.user, book=user1_book5) BookBorrowFactory(user=another_user, book=user2_book1) BookBorrowFactory(user=another_user, book=user2_book2) BookBorrowFactory(user=another_user, book=user2_book3) BookBorrowFactory(user=another_user, book=user2_book4) BookBorrowFactory(user=another_user, book=user2_book5) response = self.get(self.url) self.assertEqual(200, response.status_code) self.assertContains(response, user1_book1.title) self.assertContains(response, user1_book2.title) self.assertContains(response, user1_book3.title) self.assertContains(response, user1_book4.title) self.assertContains(response, user1_book5.title) self.assertNotContains(response, user2_book1.title) self.assertNotContains(response, user2_book2.title) self.assertNotContains(response, user2_book3.title) self.assertNotContains(response, user2_book4.title) self.assertNotContains(response, user2_book5.title)
Now we don’t really interact directly with the database from the tests and they….
Oops!
Yes, if you have tried it yet you will know that our tests started failing! The reason is that we started using factories and faker.
Why? Since our faker is supposed to generate random values when we need them, it’s pretty common for it (and all other structures that work the same way) to have something like a seed for it’s random generator.
In other words, we have to call faker.<provider>
lazily every time we need new one. Thank God, factory_boy has an easy solution to this – LazyAttribute
!
Become Lazy!
This is how our factories’ fields become lazy where needed:
import factory
from faker import Factory
from my_library.users.models import User
from my_library.library.models import (
Book,
Library,
BookBorrow,
)
faker = Factory.create()
class UserFactory(factory.DjangoModelFactory):
class Meta:
model = User
name = faker.name()
email = faker.email()
class LibraryFactory(factory.DjangoModelFactory):
class Meta:
model = Library
address = factory.LazyAttribute(lambda _: faker.street_address())
librarian = factory.SubFactory(UserFactory)
class BookFactory(factory.DjangoModelFactory):
class Meta:
model = Book
library = factory.SubFactory(LibraryFactory)
title = factory.LazyAttribute(lambda _: faker.word())
description = faker.text()
class BookBorrowFactory(factory.DjangoModelFactory):
class Meta:
model = BookBorrow
book = factory.SubFactory(BookFactory)
user = factory.SubFactory(UserFactory)
charge = faker.random_number()
NOTE: factory.LazyAttribute
expects function as an argument. It’s convenient to use lambda
here
After this fix our tests started succeeding!
DRY is still broken
Although we are completely random and cool in our tests at the moment, we still repeat a lot of code – we have <Model>.objects.create
in a lot of places.
There are people that still underestimate code repetition in their tests but in my opinion if you keep DRY principals in your project you have to preserve them in the tests too!
Factories love DRY
Fortunately, factory developers have given us the ability to run away from this bad practice in such an easy way – create_batch
:
from test_plus import TestCase
from my_library.seed.factories import (
UserFactory,
BookFactory,
LibraryFactory,
BookBorrowFactory,
)
class BookBorrowListViewTests(TestCase):
def setUp(self):
self.librarian = UserFactory(is_superuser=True)
self.library = LibraryFactory(librarian=self.librarian)
self.user = UserFactory()
self.url = self.reverse('library:book_borrow_list',
user_id=self.user.id)
def test_with_several_books_borrowed_by_one_user(self):
books = BookFactory.create_batch(5, library=self.library)
for book in books:
BookBorrowFactory(user=self.user, book=book)
response = self.get(self.url)
self.assertEqual(200, response.status_code)
for book in books:
self.assertContains(response, book.title)
def test_with_several_books_borrowed_by_one_user_and_another_user(self):
another_user = UserFactory()
user1_books = BookFactory.create_batch(5, library=self.library)
user2_books = BookFactory.create_batch(5, library=self.library)
for book in user1_books:
BookBorrowFactory(user=self.user, book=book)
for book in user2_books:
BookBorrowFactory(user=another_user, book=book)
response = self.get(self.url)
self.assertEqual(200, response.status_code)
for book in user1_books:
self.assertContains(response, book.title)
for book in user2_books:
self.assertNotContains(response, book.title)
This is how we literally create a batch of objects without repeating our code!
Even though we are using factories, they are behaving like objects from the ORM – you can easily manipulate and overwrite their values.
Conclusion
Faker and factories are beneficial to have in our tests. If we are smart enough we can use them in other places – while debugging, creating seeds, etc.
This article was just an introduction to factories. If you are interested in the topic or just want to check how we handle problems that occur during testing, , write your questions in the comments!
This is really helpful. Thanks!
You can use Faker directly from factory instead of going through the LazyAttribute. E.g.
factory.Faker('street_address')
instead offactory.LazyAttribute(lambda _: faker.street_address())
Yes, that’s correct. The advantage of using the
LazyAttribute
is that it allows us to instantiate theFaker
instance just once. This allows you to use your custom providers and seeds if you have any.Wouldn’t it make sense to pass the actual faker functions to the lazy attributes instead of wrapping calls to them in lambdas? Like this:
address = factory.LazyAttribute(lambda _: faker.street_address())
becomes:
address = factory.LazyAttribute(faker.street_address)
`
Yes, you are correct.
The
lambda
usage here would be more suitable if we pass any arguments to the faker provide, e.g.age = factory.LazyAttribute(lambda _: faker.pyint(min_value=18, max_value=100, step=1)
I’ve updated the example to illustrate what I mean. I’ll use your suggestion as well, thank you! 🙂
Great article!
Also checkout https://github.com/mimesis-lab/mimesis-factory it is faster (it hits you when you have a lot of tests!) and has more built-in data and languages.
Thanks, will definitely check this out 🙂
Thx this article is Helpful