Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL delete duplicates based on a column

The screenshot shows the first 8 rows of a table. For the same id (each id has thousands of rows), based on the same "updatetime", I only want to keep the first row, delete the rest row(s). Here for example, I want to delete 3rd, 5th, 8th rows. All columns of two rows could be exactly the same (here when updatetime are the same, the UpdateMillisec are different, but not necessary). The screenshot is the result of a query, and I don't have the primary key now (the left-most column in the screenshot is not available in the table now). What SQL code should I write? Thanks in advance!

Screeshot

like image 400
Ruixin Avatar asked Mar 11 '26 22:03

Ruixin


1 Answers

There is an easy way to delete duplicate rows.

In a first step, we will sort the records and add a rownumber.
Second step will be deleting rows with rownumber > 1.

WITH CTE AS
(
SELECT  *
       ,ROW_NUMBER() OVER 
                (PARTITION BY id, updatetime
                     ORDER BY id, updatetime, UpdateMillisec ASC
                     ) AS RowNum
  FROM yourtable

)
SELECT * FROM CTE                    -- for checking the result before deleting
-- DELETE FROM CTE WHERE RowNum > 1  -- uncomment this row for the final DELETE

Attention:
To identify, which is the first record and which is a following (second, third,..) record, we have to sort the data.
Before deleting them, always check the resultset with a SELECT * FROM CTE first

In your case i checked the resultset of the above query, which is:

id  lastprice   updatetime          UpdateMillisec  RowNum
211709  51370   09:30:00.0000000    500             1
211709  51380   09:30:01.0000000    0               1
211709  51370   09:30:01.0000000    500             2
211709  51370   09:30:02.0000000    0               1
211709  51370   09:30:02.0000000    500             2
211709  51370   09:30:03.0000000    0               1
211709  51370   09:30:04.0000000    0               1
211709  51370   09:30:04.0000000    500             2

As we can see, exactly those records, which you want to delete, have RowNum = 2. So finally we can change the SELECT * to a DELETE and execute the query again.

like image 182
Esteban P. Avatar answered Mar 13 '26 14:03

Esteban P.