Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel serialization in Django REST Framework? Or other methods of speeding up model serialization?

The app I'm developing needs to deliver a list of ORM objects to the user via a REST endpoint. The list of objects is large - up to 500 or 600 at a time (pagination is not an option).

The model looks something like this:

class PositionItem(models.Model):
    account = models.ForeignKey(core_models.Portfolio, on_delete=models.CASCADE)
    macro_category = models.ForeignKey(MacroCategory, on_delete=models.SET_NULL, null=True)
    category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True)
    sub_category = models.ForeignKey(SubCategory, on_delete=models.SET_NULL, null=True)

I started using a standard ModelSerializer with many=True set, but it was very slow - taking up to 12 seconds to serialize all the objects. I decreased the run time by a few seconds by pre-fetching/caching foreign keys needed for the endpoint with the .select_related() method, and another couple seconds by replacing the ModelSerializer with a custom serializer function that just maps the fields to a dictionary with none of the validation overhead. However, it's still slow (6-7 seconds) and I'd like to optimize further. I thought about trying to parallelize the serializer instead, but am having some problems implementing.

My custom serializer looks like this:

def serialize_position_record(record):

    account = record.account
    macro_category = record.macro_category
    category = record.category
    sub_category = record.sub_category

    return {
        'account': account.portfolio_id,
        'macro_category': macro_category.macro_category,
        'category': category.category,
        'sub_category': sub_category.sub_category,
        'sorting': {
            'macro_category': macro_category.sort_order,
            'category': category.sort_order,
            'sub_category': sub_category.sort_order

        }
    }

I've tried multiprocessing with a Pool:

import multiprocessing
import models
import utils

items = models.Item.objects.select_related().filter(account__user=user)
pool = multiprocessing.Pool(4)
serialized_items = pool.map(utils.serialize_position_record, items)

but this hangs for at least 60 seconds (probably more, I killed it before it returned anything).

I also tried threading using the multiprocessing.dummy API:

import multiprocessing
import models
import utils

items = models.Item.objects.select_related().filter(account__user=user)
pool = multiprocessing.dummy.Pool(4)
serialized_items = pool.map(utils.serialize_position_record, items)

but I get exceptions:

Traceback (most recent call last):
  File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/models/fields/related_descriptors.py", line 164, in __get__
    rel_obj = self.field.get_cached_value(instance)
  File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/models/fields/mixins.py", line 13, in get_cached_value
    return instance._state.fields_cache[cache_name]
KeyError: 'sub_category'

and

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/xx/venvs/web-portal/lib/python3.7/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "reports_subcategory" does not exist
LINE 1: ...", "reports_subcategory"."sort_order" FROM "reports_s...

So I'm at a loss as far as what to do. I don't write a lot of parallel code - am I writing something incorrectly? Or is there a better way to optimize this process besides parallelization? Have I hit the ceiling for performance? I'm also using django-tenants as this is a multi-tenant app - not sure if that's contributing to the relation-doesn't-exist error.

Any ideas?

like image 563
jsxgd Avatar asked Nov 25 '25 04:11

jsxgd


1 Answers

Consider using third party serialization libraries such as serpy and marshmallow. Both claim to provide significant speed improvements over the native Django Rest Framework serializers.

Serpy provides some detailed benchmarks on their docs.

Serialization Times According to Serpy

Both libraries are fairly intuitive if you're already familiar with serializers in Django Rest Framework.

like image 192
Bitsplease Avatar answered Nov 27 '25 16:11

Bitsplease