Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting non-unique values from a django query

I'm writing a script where I want to get every occurrence of a value, from visited sites.

First I get sites visited:

sd = SessionData.objects.filter(session_id__mlsession__platform__exact=int('2'))
result =  sd.values('last_page')

I then get the values that I'm expecting:

[{'last_page': 10L}, {'last_page': 4L}, {'last_page': 10L}]

With that, I want the page with 10L as an id to have double the weight of 4L, since it's appearing two times.

I try to get the values from the list:

wordData = KeywordData.objects.filter(page_id__in=result)

but then I only get unique values:

[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>]

where my wanted outcome would be:

[<KeywordData: 23>, <KeywordData: 24>, <KeywordData: 8>, <KeywordData: 23>, <KeywordData: 24>]

The only way I've managed to not get a unique list is by iterating through a for-loop but that isn't really an option since the data I'm dealing with has millions of entries.

Is the "__in" filter in django made to only return unique entries? Is there a way that I can get the right output the "django"-way?

Thank you in advance for your help!

EDIT: The relevant models:

class KeywordData(models.Model):
    page = models.ForeignKey(Page, db_column='page_id', related_name='page_pageid', default=None)
    site = models.ForeignKey(Page, db_column='site_id', related_name='page_siteid', default=None)
    keywords = models.CharField(max_length=255, blank=True, null=True, default=None)

class MLSession(models.Model):
    session = models.ForeignKey(Session, null=True, db_column='session_id')
    platform = models.IntegerField(choices=PLATFORM_CHOICE)
    visitor_type = models.IntegerField(default=1)

class SessionData(models.Model):
    session = models.ForeignKey(Session, db_column='session_id', on_delete=models.CASCADE)
    site = models.ForeignKey(Site, db_column='site_id', db_index=True, default=None, null=True)
    last_page = models.ForeignKey(Page, db_column='last_page_id', default=None, null=True, related_name='session_last_page')
    first_page = models.ForeignKey(Page, db_column='first_page_id', default=None, null=True, related_name='session_first_page')

The tables Session and Page are only referred to in terms of their ids, which are auto-incremented.

I want to look at the last page of the session, thus only taking in the last_page_id, and get the keywords from the respective page. If the same page is often the last page, I want to add more weight, as previously stated.

Let me know if some more information is needed, and thanks again!

like image 259
Helga Sigurðardóttir Avatar asked Jan 01 '26 17:01

Helga Sigurðardóttir


1 Answers

Is the "__in" filter in django made to only return unique entries?

The __in filter in Django maps directly to the IN condition in SQL, and its behavior is as you've observed.

If you want duplicate rows you should probably reframe your query as an SQL JOIN. You didn't post your models so I'm forced to guess here, but the following Django query should give you what you want:

KeywordData.objects.filter(page__session_last_page__session_id__mlsession__platform=2)
like image 195
Kevin Christopher Henry Avatar answered Jan 03 '26 05:01

Kevin Christopher Henry



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!