Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleanup MySQL tables after duplicate records where created and are referenced by second table [closed]

Tags:

sql

mysql

After a long period of this going wrong we found out that there is a error in our data set.

In the manufacturers table, manufacturers are often added multiple times and the product in the product table is referencing the duplicate ID's for the manufacturer.

This question is only about fixing these tables, we already prevent this from happening again.

manufacturers_id manufacturers_name
1 Manufacturer #1
2 Manufacturer #2
3 Manufacturer #3
4 Manufacturer #2
5 Manufacturer #3
6 Manufacturer #2
7 Manufacturer #1
8 Manufacturer #3
9 Manufacturer #2
products_id manufacturers_id
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

We need to achieve two things:

  1. Remove the duplicate entries from the table manufacturers and keep the first entry
  2. Update the product table where the duplicate manufacturers ID's are being replaced by the remaining first Id for that manufacturer

Each step can be done manually but the quantity of different manufacturers and products makes this not suitable for a manual task. And I am lacking the needed query knowledge, so help would be welcome.

This would be the desired result:

manufacturers_id manufacturers_name
1 Manufacturer #1
2 Manufacturer #2
3 Manufacturer #3
products_id manufacturers_id
1 1
2 2
3 3
4 2
5 3
6 2
7 1
8 3
9 2

Table structure:

CREATE TABLE manufacturers 
(
    `manufacturers_id` int(11) NOT NULL,
    `manufacturers_name` varchar(32) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;

ALTER TABLE manufacturers
    ADD PRIMARY KEY (`manufacturers_id`),
    ADD KEY `IDX_MANUFACTURERS_NAME` (`manufacturers_name`);
  
INSERT INTO manufacturers (`manufacturers_id`, `manufacturers_name`) 
VALUES (1, 'Manufacturer #1'),
       (2, 'Manufacturer #2'),
       (3, 'Manufacturer #3'),
       (4, 'Manufacturer #2'),
       (5, 'Manufacturer #3'),
       (6, 'Manufacturer #2'),
       (7, 'Manufacturer #1'),
       (8, 'Manufacturer #3'),
       (9, 'Manufacturer #2');

CREATE TABLE `products` 
(
    `products_id` int(11) NOT NULL,
    `manufacturers_id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;

ALTER TABLE `products`
    ADD PRIMARY KEY (`products_id`);
 
INSERT INTO `products` (`products_id`, `manufacturers_id`) 
VALUES (1, 1),
       (2, 2),
       (3, 3),
       (4, 4),
       (5, 5),
       (6, 6),
       (7, 7),
       (8, 8),
       (9, 9);
like image 682
Giancarlo Avatar asked Oct 24 '25 14:10

Giancarlo


1 Answers

You need to fix the references first and only then remove the duplicates.

You can achieve the first objective via:

update products
join manufacturers
on products.manufacturers_id = manufacturers.manufacturers_id
set products.manufacturers_id = (
    select min(m.manufacturers_id)
    from manufacturers m
    where m.manufacturers_name = manufacturers.manufacturers_name
);

Explanation:

  • it's an update-join
  • we join the product with the manufacturer by the manufacturer being the manufacturer of the product
  • we set the new manufacturer of the product to the result of a subquery which finds the minimum id that matches the name

And you can achieve the second objective via:

delete duplicate
from manufacturers duplicate
join manufacturers original
on duplicate.manufacturers_name = original.manufacturers_name and duplicate.manufacturers_id > original.manufacturers_id;

Explanation:

  • we delete-join
  • the duplicate manufacturer with the original
  • and we remove the duplicate

Fiddle.

like image 78
Lajos Arpad Avatar answered Oct 28 '25 04:10

Lajos Arpad