Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster: a lookup on a large denormalized table or a join between three smaller tables?

I have a denormalized table with 100,000 records in it. I can normalize this down to a table of less than 50 records and a many-to-many of 20000 records between the aforementioned table and another table of 10000 records. Is it faster to do a lookup in the 100,000 records or join one of the 10000 records to its relations in the many-to-many? Citations are more than welcome because I don't believe I can test both conditions.

like image 639
NobleUplift Avatar asked Jan 22 '26 09:01

NobleUplift


2 Answers

Generally, if the proper indices are in place, the denormalized table will be faster for select statements, but there are circumstances where the denormalized table will perform worse. It depends on the relative row widths. If you factor out columns that take up a large percentage of the denormalized table's row width, and the resulting table has a much smaller row count, then the normalized structure could be faster due to better caching (The tables will have a much smaller memory footprint).

In your case, you should know that 100K records is a pretty small database and you probably shouldn't let performance be the driving factor behind the change. There are many benefits to normalization beside performance.

like image 101
Aheho Avatar answered Jan 23 '26 21:01

Aheho


I all depends on the particulars of the situation. How big is the result set? Do you have a covering index or indices on the columns required by the query?

The "advantage" of the denormalized model is that all your columns are in one place; the disadvantages are many, but from a performance perspective, it means you have wide rows and therefore fewer rows per page. This means that the query has to fetch more pages from disk to find what it needs.

In general, a properly normalized data model (e.g. 3rd Normal Form) will perform quite well. Yes, your queries will be more complex, but what it brings to the table are narrow rows (more rows per page, meaning fewer reads for a a given query). Further, the join criteria the queries will be using are more likely to have covering indices, meaning the joins are likely to perform well.

But without knowing the details, it's impossible to say. The only way to find out is to examine the query plan for your particular query.

It's very easy to denormalize data. It's much more difficult to normalize data, since all the repeated, duplicated data is likely to have...discrepancies that will need to be resolved. Get your data model right: applications are transient, but [good] data lasts forever/

Denormalizing before you have a problem is a case of premature optimization.

like image 22
Nicholas Carey Avatar answered Jan 23 '26 22:01

Nicholas Carey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!