Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails: remove duplicates after ordering a join table

I have a Bookthat has_many Reviews, and I've added a class method to Book to show 'recently reviewed' books on my home page. I've tried this:

def self.recently_reviewed
  Book.joins(:reviews).order('reviews.created_at DESC').limit(5)
end

Which produces many duplicate records, therefore I tried using distinct like so:

Book.joins(:reviews).order('reviews.created_at DESC').distinct.limit(5)

I also tried it before and after the .order, which

ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

I am a bit confused as to how to solve this, should I be dropping down to .select to have some more flexibility?

like image 401
mycellius Avatar asked Sep 08 '25 10:09

mycellius


2 Answers

In

Book.joins(:reviews).order('reviews.created_at DESC').distinct

you're trying to select distinct bookings from the join table of books and reviews, and then order this list of distinct bookings according to the reviews.created_at time. The SQL would be like this:

SELECT DISTINCT "books"."id" FROM "books" INNE JOIN "reviews" ON "reviews"."book_id" = "books"."id" ORDER BY reviews.created_at

There is a good reason why this is not allowed. Because the results are indeterminate. Imagine you have 100 reviews for one book. In the join table, you'll have 100 rows of this book with all different reviews. When you select a distinct list, you end up with one row of this book. This could be any one of the 100 in the join table. Then you order this based on the created_at of this review. As the review could be any one of the 100, the order could be different every time.

This would be perfectly fine:

Book.joins(:reviews).order('books.id DESC').distinct

Because it doesn't matter which of the 100 rows it picks for that book, the books.id is the same.

Back to your problem. Seems you're trying to get the 5 books with the most recent reviews. I don't see a simple way to do it but here's my solution:

res = Review.group("book_id").maximum("created_at") # {book_id => create_at}, each book with its most recent review time
arr = res.to_a.sort { |a,b| b[1]<=>a[1] } #array sorted by created_at in desc order
arr.map{ |n| n[0] }.take(5)  #top 5 books' ids with most recent reviews
like image 196
EJAg Avatar answered Sep 10 '25 08:09

EJAg


This is actually a more complex DB query than ActiveRecord can handle in it's simplest form because it requires a sub-query. Here's how I would do this entirely with a query:

SELECT   book.*
FROM     book
         INNER JOIN books on reviews.book_id = books.id
WHERE    reviews.created_on = ( SELECT  MAX(reviews.created_at)
                              FROM    reviews
                              WHERE   reviews.book_id = books.id)
GROUP BY books.id

To convert this into ActiveRecord I would do the following:

class Book
  scope :recently_reviewed, joins(:reviews)
    .where('reviews.created_on = (SELECT MAX(books.created_at) FROM reviews WHERE reviews.book_id = books.id)')
    .group('books.id')
end

You can then get a list of all books that have a last review by doing the following:

Book.recently_reviewed

You can then get a list of n number of books by

Book.recently_reviewed.limit(n)
like image 23
Ritesh Ranjan Avatar answered Sep 10 '25 08:09

Ritesh Ranjan