The app I'm developing needs to deliver a list of ORM objects to the user via a REST endpoint. The list of objects is large - up to 500 or 600 at a time (pagination is not an option).
The model looks something like this:
class PositionItem(models.Model):
account = models.ForeignKey(core_models.Portfolio, on_delete=models.CASCADE)
macro_category = models.ForeignKey(MacroCategory, on_delete=models.SET_NULL, null=True)
category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True)
sub_category = models.ForeignKey(SubCategory, on_delete=models.SET_NULL, null=True)
I started using a standard ModelSerializer with many=True set, but it was very slow - taking up to 12 seconds to serialize all the objects. I decreased the run time by a few seconds by pre-fetching/caching foreign keys needed for the endpoint with the .select_related() method, and another couple seconds by replacing the ModelSerializer with a custom serializer function that just maps the fields to a dictionary with none of the validation overhead. However, it's still slow (6-7 seconds) and I'd like to optimize further. I thought about trying to parallelize the serializer instead, but am having some problems implementing.
My custom serializer looks like this:
def serialize_position_record(record):
account = record.account
macro_category = record.macro_category
category = record.category
sub_category = record.sub_category
return {
'account': account.portfolio_id,
'macro_category': macro_category.macro_category,
'category': category.category,
'sub_category': sub_category.sub_category,
'sorting': {
'macro_category': macro_category.sort_order,
'category': category.sort_order,
'sub_category': sub_category.sort_order
}
}
I've tried multiprocessing with a Pool:
import multiprocessing
import models
import utils
items = models.Item.objects.select_related().filter(account__user=user)
pool = multiprocessing.Pool(4)
serialized_items = pool.map(utils.serialize_position_record, items)
but this hangs for at least 60 seconds (probably more, I killed it before it returned anything).
I also tried threading using the multiprocessing.dummy API:
import multiprocessing
import models
import utils
items = models.Item.objects.select_related().filter(account__user=user)
pool = multiprocessing.dummy.Pool(4)
serialized_items = pool.map(utils.serialize_position_record, items)
but I get exceptions:
Traceback (most recent call last):
File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/models/fields/related_descriptors.py", line 164, in __get__
rel_obj = self.field.get_cached_value(instance)
File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/models/fields/mixins.py", line 13, in get_cached_value
return instance._state.fields_cache[cache_name]
KeyError: 'sub_category'
and
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "reports_subcategory" does not exist
LINE 1: ...", "reports_subcategory"."sort_order" FROM "reports_s...
So I'm at a loss as far as what to do. I don't write a lot of parallel code - am I writing something incorrectly? Or is there a better way to optimize this process besides parallelization? Have I hit the ceiling for performance? I'm also using django-tenants as this is a multi-tenant app - not sure if that's contributing to the relation-doesn't-exist error.
Any ideas?
Consider using third party serialization libraries such as serpy and marshmallow. Both claim to provide significant speed improvements over the native Django Rest Framework serializers.
Serpy provides some detailed benchmarks on their docs.

Both libraries are fairly intuitive if you're already familiar with serializers in Django Rest Framework.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With