I have the folowwing SQL query
SELECT CustomerID FROM sales WHERE `Date` <= '2012-01-01' GROUP BY CustomerID
The query is executed over 11400000 rows and runs very slow. It takes over 3 minutes to execute. If I remove the group-by part, this runs below 1 second. Why is that?
MySQL Server version is '5.0.21-community-nt'
Here is the table schema:
CREATE TABLE `sales` (
  `ID` int(11) NOT NULL auto_increment,
  `DocNo` int(11) default '0',
  `CustomerID` int(11) default '0',
  `OperatorID` int(11) default '0',
  PRIMARY KEY  (`ID`),
  KEY `ID` (`ID`),
  KEY `DocNo` (`DocNo`),
  KEY `CustomerID` (`CustomerID`),
  KEY `Date` (`Date`)
) ENGINE=MyISAM AUTO_INCREMENT=14946509 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Most of your queries are logically invalid, because you are selecting non aggregate columns not mentioned in the GROUP BY clause. That being said, aggregation takes time, and slowness is to be expected, assuming your tables are fairly large.
Conclusion. GROUP BY is a powerful statement, but it tends to slow down queries. Over time, my team and I have used it many times and defined SQL indexes to avoid the performance issues introduced by the GROUP BY clause, especially when dealing with large tables.
The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).
Try putting an index on (Date,CustomerID).
Have a look at the mysql manual for optimizing group by queries:- Group by optimization
You can find out how mysql is generating the result if you use EXPLAIN as follows:-
EXPLAIN SELECT CustomerID FROM sales WHERE `Date` <= '2012-01-01' GROUP BY CustomerID
This will tell you which indexes (if any) mysql is using to optimize the query. This is very handy when learning which indexes work for which queries as you can try creating an index and see if mysql uses it. So even if you don't fully understand how mysql calculates aggregate queries you can create a useful index by trial and error.
Without knowing what your table schema looks like, it's difficult to be certain, but it would probably help if you added a multiple-column index on Date and CustomerID. That'd save MySQL the hassle of doing a full table scan for the GROUP BY statement. So try ALTER TABLE sales ADD INDEX (Date,CustomerID). 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With