Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge Duplicate Rows in MySQL

Tags:

mysql

I have a database like this:

users
id    name    email                phone
1     bill    [email protected]
2     bill    [email protected]   123456789
3     susan   [email protected]
4     john    [email protected]   123456789
5     john    [email protected]   987654321

I want to merge records considered duplicates based on the email field.

Trying to figure out how to use the following considerations.

  1. Merge based on duplicate email
  2. If one row has a null value use the row that has the most data.
  3. If 2 rows are duplicates but other fields are different then use the one

with the highest id number (see the [email protected] row for an example.)

Here is a query I tried:

DELETE FROM users WHERE users.id NOT IN 
(SELECT grouped.id FROM (SELECT DISTINCT ON (email) * FROM users) AS grouped)

Getting a syntax error.

I'm trying to get the database to transform to this, I can't figure out the correct query:

users
id   name    email                 phone
2    bill    [email protected]    123456789
3    susan   [email protected]   
5    john    [email protected]    987654321
like image 665
Jordash Avatar asked Sep 06 '25 02:09

Jordash


1 Answers

Here is one option using a delete join:

DELETE
FROM users
WHERE id NOT IN (SELECT id
                 FROM (
                     SELECT CASE WHEN COUNT(*) = 1
                                 THEN MAX(id)
                                 ELSE MAX(CASE WHEN phone IS NOT NULL THEN id END) END AS id
                     FROM users
                     GROUP BY email) t);

The logic of this delete is as follows:

  • Emails where there is only one record are not deleted
  • For emails with two or more records, we delete everything except for the record having the highest id value, where the phone is also defined.
like image 159
Tim Biegeleisen Avatar answered Sep 08 '25 02:09

Tim Biegeleisen