Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement Where Exists in Django?

I have two models in Django, one for Songs, one for Albums, an Album has many Songs. I am trying to filter Albums where Songs are valid. For example, at least one Song has to have an audio file in order for the Album to be returned by the filter. I am using Postgres.

I am trying to figure out how to do this logic via a Django QuerySet but i am not certain how to use where exists instead of exists.

The following is the Django orm statement i am trying to get to work:

valid_songs = Song.objects.filter(
    album=OuterRef('pk'),
    audio_file__isnull=False).only("album")

Album.objects.annotate(
    valid_song=Exists(valid_songs)).filter(
valid_song=True).query

This is the query that is generated:

SELECT "api_album"."id", 
       "api_album"."created_at", 
       "api_album"."updated_at", 
       "api_album"."title", 
       "api_album"."artwork_file_id", 
       "api_album"."user_id", 
       "api_album"."description", 
       "api_album"."tags", 
       "api_album"."genres", 
       EXISTS(SELECT U0."id", 
                     U0."album_id" 
              FROM   "api_song" U0 
              WHERE  ( U0."album_id" = ( "api_album"."id" ) 
                       AND U0."audio_file_id" IS NOT NULL )) AS "valid_song" 
FROM   "api_album" 
WHERE  EXISTS(SELECT U0."id", 
                     U0."album_id" 
              FROM   "api_song" U0 
              WHERE  ( U0."album_id" = ( "api_album"."id" ) 
                       AND U0."audio_file_id" IS NOT NULL )) = true 

This is the postgres query plan for the above query generated by Django's QuerySet:

Seq Scan on api_album  (cost=0.00..287.95 rows=60 width=641)
 Filter: (alternatives: SubPlan 3 or hashed SubPlan 4)
 SubPlan 3
   ->  Seq Scan on api_song u0_2  (cost=0.00..1.54 rows=1 width=0)
         Filter: ((audio_file_id IS NOT NULL) AND (album_id = api_album.id))
 SubPlan 4
   ->  Seq Scan on api_song u0_3  (cost=0.00..1.43 rows=10 width=4)
         Filter: (audio_file_id IS NOT NULL)
 SubPlan 1
   ->  Seq Scan on api_song u0  (cost=0.00..1.54 rows=1 width=0)
         Filter: ((audio_file_id IS NOT NULL) AND (album_id = api_album.id))
 SubPlan 2
   ->  Seq Scan on api_song u0_1  (cost=0.00..1.43 rows=10 width=4)
         Filter: (audio_file_id IS NOT NULL)
(14 rows)

However, there is much more efficient query for this

SELECT * 
FROM   "api_album" 
WHERE  EXISTS(SELECT U0."id", 
                     U0."album_id" 
              FROM   "api_song" U0 
              WHERE  ( U0."album_id" = ( "api_album"."id" ) 
                       AND U0."audio_file_id" IS NOT NULL )) 

Hash Semi Join  (cost=1.55..13.26 rows=10 width=640)
 Hash Cond: (api_album.id = u0.album_id)
 ->  Seq Scan on api_album  (cost=0.00..11.20 rows=120 width=640)
 ->  Hash  (cost=1.43..1.43 rows=10 width=4)
       ->  Seq Scan on api_song u0  (cost=0.00..1.43 rows=10 width=4)
             Filter: (audio_file_id IS NOT NULL)
(6 rows)

So my questions are as follows:

  1. What is the difference between where exists vs exists in this scenario and why aren't the same query plans created?
  2. How do I get the Django ORM to generate the more efficient query?

Edit: the django models are as follows:

  class Album(BaseModel):
    title = models.CharField(max_length=255, blank=False)
    artwork_file = models.ForeignKey(
        S3File, null=True, on_delete=models.CASCADE,
        related_name="album_artwork_file")
    user = models.ForeignKey(settings.AUTH_USER_MODEL,
                             related_name="albums",
                             on_delete=models.CASCADE)
    description = models.TextField(blank=True)
    tags = ArrayField(models.CharField(
        max_length=16), default=default_arr)
    genres = ArrayField(models.CharField(
        max_length=16), default=default_arr)



class Song(BaseModel):
    title = models.CharField(max_length=255, blank=False)
    album = models.ForeignKey(Album,
                              related_name="songs",
                              on_delete=models.CASCADE)
    audio_file = models.ForeignKey(
        S3File, null=True, on_delete=models.CASCADE,
        related_name="song_audio_file")

the following DOES not work because if you use a get() on this QuerySet it will throw an exception

Album.objects.filter(songs__audio_file__isnull=False).get(pk=1)
Album.MultipleObjectsReturned: get() returned more than one Album 

The query set is being used with DjangoRest ModelViewSet, where the queryset is used for crud operations, and passed to the Album Serializer. This requires get() to work and return a single value.

class AlbumViewSet(viewsets.ModelViewSet):

    serializer_class = AlbumSerializer

    def get_queryset(self): 

        valid_songs = Song.objects.filter(
            album=OuterRef('pk'),
            audio_file__isnull=False).only('album')

        # Slow query posted above
        return Album.objects.annotate(
            valid_song=Exists(valid_songs)
        ).filter(valid_song=True)
like image 899
user1152226 Avatar asked Jan 27 '26 05:01

user1152226


1 Answers

I'm not sure why you're doing either of these queries. Finding albums where at least one song has an audio file is expressed simply as:

Album.objects.filter(song__audio_file__isnull=False)
like image 62
Daniel Roseman Avatar answered Jan 28 '26 17:01

Daniel Roseman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!