I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.
The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:
SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20
... to select all rows needed, and then
SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5
... in a loop per each row selected by the first query.
Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts setup is just for example, I made this up so it's, hopefully, easier to understand what I need.
I started with the following query:
SELECT *
FROM `posts`
WHERE `user_id`
IN (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4
)
LIMIT 5
But I get the #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' error.
So I've tried the JOIN idea like suggested here:
SELECT posts.id,
posts.user_id,
NULL
FROM (
SELECT posts.user_id
FROM posts
WHERE posts.deleted = 0
LIMIT 20
) q
JOIN posts
ON posts.user_id = q.user_id
I've also tried several nested queries as suggested here:
SELECT *
FROM posts
WHERE user_id IN (
SELECT * FROM (
SELECT user_id
FROM posts
LIMIT 20
)
as t);
And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN, RIGHT JOIN, even INNER JOIN, but still no success.
Please help.
UPDATE Forgot to mention that the table is around 5GB in size.
UPDATE Tried the sub-sub query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
Same as above, it returns the following:
+----+---------+------+
| id | user_id | post |
+----+---------+------+
| 1 | 1 | a |
+----+---------+------+
| 2 | 1 | b |
+----+---------+------+
| 3 | 1 | c |
+----+---------+------+
| .. | .. | .. |
I.e. the 5 (which is what the outer LIMIT is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:
SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
I get the 4 distinct values:
+---------+
| user_id |
+---------+
| 1 |
+---------+
| 2 |
+---------+
| 3 |
+---------+
| 4 |
+---------+
You have to use variables, doing two different counts on an ordered query: one on the number of the posts for each user, and one for the users:
SELECT posts_counts.*
FROM (
SELECT
posts.*,
@post_count:=case when @prec_user_id=user_id then @post_count+1 else 1 end as pc,
case when @prec_user_id<>user_id then @user_count:=@user_count+1 else @user_count end as uc,
@prec_user_id:=user_id
FROM
posts,
(select @prec_user_id:=0, @user_count:=0, @post_count:=0) counts
ORDER BY
posts.user_id ) posts_counts
WHERE pc<5 and uc<4
EDIT: You might also consider trying this query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT user_id FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
(this will justs select 5 posts, out of all posts from each selected user, so it's still not what you need, but it uses a trick to use a LIMIT in a sub-subquery)
EDIT2: Next query will limit 5 posts for each of the 20 users:
select posts_limited.*
from (
select
posts.*,
@row:=if(@last_user=posts.user_id, @row+1, 1) as row,
@last_user:=posts.user_id
from
posts inner join
(select user_id from
(select distinct user_id
from posts
order by user_id desc
LIMIT 20) limit_users
) limit_users
on posts.user_id = limit_users.user_id,
(select @last_user:=0, @row:=0) r
) posts_limited
where row<=5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With