Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query Django ORM on models connected with latest record

Hi I have a Django Project with the following models

class Organization(models.Model):
    name = models.CharField("Name", max_length=128, unique=True)
    description = models.CharField("Description", max_length=256)

class Scholar(models.Model):
    organization = models.ForeignKey(Organization, on_delete=models.CASCADE)
    name = models.CharField("Name", max_length=128)
    title = models.CharField("Title", max_length=256)
    
    def get_latest_snapshot(self):
        return self.snapshotscholar_set.latest('date_crawled')

class SnapshotScholar(models.Model):
    scholar = models.ForeignKey(Scholar, on_delete=models.CASCADE)
    date_crawled = models.DateTimeField("Date Crawled", auto_now_add=True, db_index=True)
    title = models.CharField("Title", max_length=256)

class SnapshotScholarPublication(models.Model):
    snapshot_scholar = models.ForeignKey(SnapshotScholar, on_delete=models.CASCADE)
    title = models.CharField("Title", max_length=256)
    citation_count = models.IntegerField()

Now when I want to get a count of the number of scholars per organization I can do Organization.objects.annotate(num_scholars=Count('scholar')). But how do I get the number of publication per organization when I only want to count publications of the latest snapshot. That is, I want it sorted by SnapshotScholar.date_crawled, and I want all the publications on the latest SnapshotScholar in the database.

Going by some of the questions on here I managed to create this SQL -

SELECT COUNT(pub.id) as publications, org.id
FROM (main_snapshotscholarpublication pub, main_snapshotscholar snap, main_scholar scholar, main_organization org)
INNER JOIN (
    SELECT MAX(main_snapshotscholar.date_crawled) as latest_date, main_snapshotscholar.scholar_id as 'id'
    FROM main_snapshotscholar
    GROUP BY main_snapshotscholar.scholar_id
) as latest_snap ON (latest_snap.id = snap.id)
WHERE pub.snapshot_scholar_id = snap.id
AND snap.scholar_id = scholar.id
AND scholar.organization_id = org.id
GROUP BY org.id

The results I'm getting with this raw SQL is an error margin of 1-5% from the actual count numbers. Can someone please help me figure out how to get the right results?

Thanks

EDIT: As per request of one of the people answering the question here is some sample data - https://pastebin.com/4ZJkymeb . Just load the data with python manage.py loaddata data.json

like image 597
arunkumar Avatar asked Oct 22 '25 15:10

arunkumar


2 Answers

You can use Subquery for getting the results. For example:

from django.db.models import OuterRef, Subquery, Q, F, Count

subquery = SnapshotScholar.objects.filter(scholar=OuterRef('pk')).order_by('-date_crawled')
queryset = Scholar.objects.annotate(max_scholar_snapshot=Subquery(subquery.values('pk')[:1])).annotate(publication_count=Count('snapshotscholar__snapshotscholarpublication', filter=Q(snapshotscholar=F('max_scholar_snapshot'))))
queryset.values('organization', 'pk', 'publication_count')

Here I am first annotating the SnapshotScholar information with the queryset by Subquery. I am ordering the SnapshotScholar using date_crawled, then attaching the latest one's id to the queryset. Then I am running the Count for SnapshotScholarPublication based on filtering the value of SnapshotScholar which was found in last step.

like image 65
ruddra Avatar answered Oct 25 '25 04:10

ruddra


Building up on @ruddra's answer, have a try with this:

snapshot_subquery = SnapshotScholar.objects.filter(
    scholar__organization=OuterRef('pk'),
).order_by('-date_crawled')

organizations = Organization.objects.annotate(
    latest_snapshot=Subquery(snapshot_subquery.values('pk')[:1]),
).annotate(
    latest_snapshot_publication_count=Count(
        'scholar__snapshotscholar__snapshotscholarpublication', 
        filter=Q(scholar__snapshotscholar=F('latest_snapshot'))
    ),
)

It is important to first order the snapshot scholar by descending date_crawled, and then limiting the subquery rows to one row..[Django-doc] using slice notation.

like image 37
Brian Destura Avatar answered Oct 25 '25 06:10

Brian Destura



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!