SQL

Question

It looks like we might have some duplicate values in our preference table so I want to delete all the duplicate entries for each user but the more recent one. I've spent quite a while trying to come up with a solution and I just keep running around in circles. I've tried max, grouped by, sub-queries, etc. It's pretty easy to delete the highest, but not everything but the highest. And especially for each user

So in the database I might have:

id  :   name:       value   :   userId
1   :   sortOrder:  Asc     :   1
2   :   sortOrder:  Desc    :   1
3   :   sortOrder:  Asc     :   2
4   :   something:  value2  :   1

So in this case, I want to remove the first row since it's duplicate for User 1 and has the highest id and keep all the rest of the columns.

So I know it's at least a sub-query with group by, but I just can't figure it out. So far I have:

SELECT 
    MAX(id),
    name
FROM 
    preference
GROUP BY
    name

which gives me a list of those I want to keep, BUT it still misses for each user. So I somewhat want the opposite if I had it for each user. However I don't believe I can use <> in a sub-query anyways.

What I want to ultimately do is say delete all those that have a lower ID for the same User that are duplicate entries.

Mitch Wheat · Accepted Answer

One way, assuming Id column is a unique Id:

DELETE FROM preference
where Id NOT IN (SELECT MAX(id) FROM preference GROUP BY name, userId)

[Note: The usual advice applies. don't run in production without testing first]

Test First with:

SELECT * FROM preference
where Id NOT IN (SELECT MAX(id) FROM preference GROUP BY name, userId)

SQLFiddle is here. Works as intended. I suggest testing with a bigger data set though...

SQL - Delete all duplicates but the highest of each group for each user ID

Tags:

select

group-by

Stephane Grenier

1 Answers

Mitch Wheat

Recent Activity

Donate For Us