Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk creating related models in Django

Consider two simple related models:

class A(models.Model):
    id = models.IntegerField(primary_key=True)

class B(models.Model):
    a = models.ForeignKey(A)
    # other fields

Before doing a very large bulk insertion on B:

lots_of_b_objects = [B(a_id=1234), B(a_id=5678), ...] 
B.objects.bulk_create(lots_of_b_objects)

(Note that for performance reasons I never actually hold A objects in the bulk creation, I only refer to their well-known id, whether it exists or not)

What's a highly performant way to ensure all the related A objects also exist?

Right now the best solution I have is to predetermine the set of related A's and run get_or_create() for each. This isn't fast enough. Is there a better way to create all the A objects before doing the bulk insert?

De-normalizing the models is not an option here, since the data model is slightly more complicated that described.

like image 895
Yuval Adam Avatar asked Dec 29 '25 08:12

Yuval Adam


1 Answers

It's a hackish way but something like this should be far better than using get_or_create in a loop (But it may vary case to case, so I don't know this way can be valid for you or not).

existing_As = A.objects.filter(id__in=a_ids).values_list('id', flat=True)
As_to_create = list(set(a_ids) - set(existing_As))
A.objects.bulk_create([A(id=x) for x in As_to_create])

# Now we are sure all the As exist as we just created them, so
lots_of_b_objects = [B(a_id=1234), B(a_id=5678), ...] 
B.objects.bulk_create(lots_of_b_objects)
like image 139
Muhammad Tahir Avatar answered Dec 31 '25 23:12

Muhammad Tahir