Consider two simple related models:
class A(models.Model):
id = models.IntegerField(primary_key=True)
class B(models.Model):
a = models.ForeignKey(A)
# other fields
Before doing a very large bulk insertion on B:
lots_of_b_objects = [B(a_id=1234), B(a_id=5678), ...]
B.objects.bulk_create(lots_of_b_objects)
(Note that for performance reasons I never actually hold A objects in the bulk creation, I only refer to their well-known id, whether it exists or not)
What's a highly performant way to ensure all the related A objects also exist?
Right now the best solution I have is to predetermine the set of related A's and run get_or_create() for each. This isn't fast enough. Is there a better way to create all the A objects before doing the bulk insert?
De-normalizing the models is not an option here, since the data model is slightly more complicated that described.
It's a hackish way but something like this should be far better than using get_or_create in a loop (But it may vary case to case, so I don't know this way can be valid for you or not).
existing_As = A.objects.filter(id__in=a_ids).values_list('id', flat=True)
As_to_create = list(set(a_ids) - set(existing_As))
A.objects.bulk_create([A(id=x) for x in As_to_create])
# Now we are sure all the As exist as we just created them, so
lots_of_b_objects = [B(a_id=1234), B(a_id=5678), ...]
B.objects.bulk_create(lots_of_b_objects)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With