I have an animals
table (~2.7m records) and a breeds
table (~2.7m records) that have a one to many relationship (one animal can have multiple breeds). I'm trying to query all distinct breeds for a specific species. As I'm not a SQL expert, my initial thought was to go with a simple SELECT DISTINCT breed ... JOIN
, but this query took about 10 seconds which seemed much longer than I'd expect. So I changed this to a SELECT DISTINCT ... WHERE EXISTS
subquery and it executed in about 100ms in 5.7, which is much more reasonable. But now I'm migrating to MySQL 8 and this exact query takes anywhere from 10-30 seconds. Here are the table definitions:
CREATE TABLE `animals` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`species` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`sex` enum('Male','Female') CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`dob` date DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `animals_name_index` (`name`),
KEY `animals_dob_index` (`dob`),
KEY `animals_sex_index` (`sex`),
KEY `animals_species_index` (`species`,`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2807152 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci
CREATE TABLE `animal_breeds` (
`animal_id` int unsigned DEFAULT NULL,
`breed` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
UNIQUE KEY `animal_breeds_animal_id_breed_unique` (`animal_id`,`breed`),
KEY `animal_breeds_breed_animal_id_index` (`breed`,`animal_id`) USING BTREE,
CONSTRAINT `animal_breeds_animal_id_foreign` FOREIGN KEY (`animal_id`) REFERENCES `animals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Here is the query I'm running:
SELECT SQL_NO_CACHE *
FROM
(
SELECT DISTINCT `breed`
FROM `animal_breeds`
) AS `subQuery`
WHERE
EXISTS (
SELECT `breed`
FROM `animal_breeds`
INNER JOIN `animals` ON `animals`.`id` = `animal_breeds`.`animal_id`
WHERE `animals`.`species` = 'Dog'AND `animal_breeds`.`breed` = `subQuery`.`breed`
);
Here are the two EXPLAIN
statements from 5.7 and 8.0
MySQL 5.7
284 rows in set, 1 warning (0.02 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 7775 | 100.00 | Using where |
3 | DEPENDENT SUBQUERY | animal_breeds | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | allBreeds.breed | 348 | 100.00 | Using where; Using index |
3 | DEPENDENT SUBQUERY | animals | NULL | eq_ref | PRIMARY,animals_species_index | PRIMARY | 4 | animal_breeds.animal_id | 1 | 50.00 | Using where |
2 | DERIVED | animal_breeds | NULL | range | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | NULL | 7775 | 100.00 | Using index for group-by |
MySQL 8.0.27
284 rows in set, 1 warning (27.92 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 7776 | 100.00 | NULL |
1 | PRIMARY | <subquery3> | NULL | eq_ref | <auto_distinct_key> | <auto_distinct_key> | 1022 | allBreeds.breed | 1 | 100.00 | NULL |
3 | MATERIALIZED | animals | NULL | ref | PRIMARY,animals_species_index | animals_species_index | 153 | const | 1390666 | 100.00 | Using index |
3 | MATERIALIZED | animal_breeds | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | animals.id | 1 | 100.00 | Using index |
2 | DERIVED | animal_breeds | NULL | range | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_breed_animal_id_index | 1022 | NULL | 7776 | 100.00 | Using index for group-by |
Lastly, both of these databases are using the base docker image with no changes to the configuration. Although the query still runs poorly on an VPS running MySQL 8 with some tweaked settings. I also read through a thread about someone having a similar problem but the comments/answer didn't seem to help in my case.
Any help would be much appreciated!
EDIT:
Here is the execution plan for the SELECT DISTINCT ... JOIN
:
SELECT DISTINCT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'
MySQL 5.7
284 rows in set (25.27 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1385271 | 100.00 | Using index; Using temporary |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
MySQL 8.0
284 rows in set (29.45 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1390666 | 100.00 | Using index; Using temporary |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
SELECT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'
MySQL 5.7
2722722 rows in set (26.69 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1385271 | 100.00 | Using index | |
1 | SIMPLE | ab | ref | animal_breeds_animal_id_breed_unique | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
MySQL 8.0
2722722 rows in set (32.49 sec)
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | a | NULL | ref | PRIMARY,animals_species_index,id_species | animals_species_index | 153 | const | 1390666 | 100.00 | Using index |
1 | SIMPLE | ab | NULL | ref | animal_breeds_animal_id_breed_unique | animal_breeds_animal_id_breed_unique | 5 | a.id | 1 | 100.00 | Using index |
Filtering animals
before joining it to breeds
will improve performance (10x faster in some cases):
SELECT DISTINCT ab.breed
FROM animal_breeds ab
WHERE ab.animal_id IN (
SELECT a.id
FROM animals a
WHERE a.species = 'Dog');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With