Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL WHERE EXISTS subquery performance between 5.7 and 8.0

I have an animals table (~2.7m records) and a breeds table (~2.7m records) that have a one to many relationship (one animal can have multiple breeds). I'm trying to query all distinct breeds for a specific species. As I'm not a SQL expert, my initial thought was to go with a simple SELECT DISTINCT breed ... JOIN, but this query took about 10 seconds which seemed much longer than I'd expect. So I changed this to a SELECT DISTINCT ... WHERE EXISTS subquery and it executed in about 100ms in 5.7, which is much more reasonable. But now I'm migrating to MySQL 8 and this exact query takes anywhere from 10-30 seconds. Here are the table definitions:

CREATE TABLE `animals` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `species` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `sex` enum('Male','Female') CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `dob` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `animals_name_index` (`name`),
  KEY `animals_dob_index` (`dob`),
  KEY `animals_sex_index` (`sex`),
  KEY `animals_species_index` (`species`,`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=2807152 DEFAULT CHARSET=utf8mb3 COLLATE=utf8_unicode_ci
CREATE TABLE `animal_breeds` (
  `animal_id` int unsigned DEFAULT NULL,
  `breed` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  UNIQUE KEY `animal_breeds_animal_id_breed_unique` (`animal_id`,`breed`),
  KEY `animal_breeds_breed_animal_id_index` (`breed`,`animal_id`) USING BTREE,
  CONSTRAINT `animal_breeds_animal_id_foreign` FOREIGN KEY (`animal_id`) REFERENCES `animals` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Here is the query I'm running:

SELECT SQL_NO_CACHE * 
FROM
    (
        SELECT DISTINCT `breed` 
        FROM `animal_breeds`
    ) AS `subQuery`
WHERE
    EXISTS (
        SELECT `breed`
        FROM `animal_breeds`
        INNER JOIN `animals` ON `animals`.`id` = `animal_breeds`.`animal_id`
        WHERE `animals`.`species` = 'Dog'AND `animal_breeds`.`breed` = `subQuery`.`breed`
    );

Here are the two EXPLAIN statements from 5.7 and 8.0

MySQL 5.7

284 rows in set, 1 warning (0.02 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 7775 100.00 Using where
3 DEPENDENT SUBQUERY animal_breeds NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 allBreeds.breed 348 100.00 Using where; Using index
3 DEPENDENT SUBQUERY animals NULL eq_ref PRIMARY,animals_species_index PRIMARY 4 animal_breeds.animal_id 1 50.00 Using where
2 DERIVED animal_breeds NULL range animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 NULL 7775 100.00 Using index for group-by

MySQL 8.0.27

284 rows in set, 1 warning (27.92 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 7776 100.00 NULL
1 PRIMARY <subquery3> NULL eq_ref <auto_distinct_key> <auto_distinct_key> 1022 allBreeds.breed 1 100.00 NULL
3 MATERIALIZED animals NULL ref PRIMARY,animals_species_index animals_species_index 153 const 1390666 100.00 Using index
3 MATERIALIZED animal_breeds NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 animals.id 1 100.00 Using index
2 DERIVED animal_breeds NULL range animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_breed_animal_id_index 1022 NULL 7776 100.00 Using index for group-by

Lastly, both of these databases are using the base docker image with no changes to the configuration. Although the query still runs poorly on an VPS running MySQL 8 with some tweaked settings. I also read through a thread about someone having a similar problem but the comments/answer didn't seem to help in my case.

Any help would be much appreciated!

EDIT:

Here is the execution plan for the SELECT DISTINCT ... JOIN:

SELECT DISTINCT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'

MySQL 5.7

284 rows in set (25.27 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1385271 100.00 Using index; Using temporary
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index

MySQL 8.0

284 rows in set (29.45 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1390666 100.00 Using index; Using temporary
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique,animal_breeds_breed_animal_id_index animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index
SELECT ab.breed
FROM animal_breeds ab
INNER JOIN animals a on a.id=ab.animal_id
WHERE a.species='Dog'

MySQL 5.7

2722722 rows in set (26.69 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1385271 100.00 Using index
1 SIMPLE ab ref animal_breeds_animal_id_breed_unique animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index

MySQL 8.0

2722722 rows in set (32.49 sec)

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE a NULL ref PRIMARY,animals_species_index,id_species animals_species_index 153 const 1390666 100.00 Using index
1 SIMPLE ab NULL ref animal_breeds_animal_id_breed_unique animal_breeds_animal_id_breed_unique 5 a.id 1 100.00 Using index
like image 915
Jeff Avatar asked Oct 15 '25 23:10

Jeff


1 Answers

Filtering animals before joining it to breeds will improve performance (10x faster in some cases):

SELECT  DISTINCT ab.breed
FROM    animal_breeds ab
WHERE   ab.animal_id IN (
                     SELECT a.id
                     FROM animals a
                     WHERE a.species = 'Dog');
like image 80
Italo Borssatto Avatar answered Oct 18 '25 15:10

Italo Borssatto



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!