I am researching on the approach to use for a new database which will be used for our new application. The amount of data won't be huge like 100 million records but will be about 10 million record before it gets archived. Will using column based approach for this much data be very useful than row based database? Why are column based database faster to query?
Essentially, the main difference between row oriented and column oriented DBs is the model they use to serialize data. Row oriented DBs store the data in each row as contiguous blocks. Column oriented DBs store the data of each column in contiguous blocks. Further, those blocks are typically compressed and sorted.
So, the difference in performance really depends on the type of query. Queries that use less I/O seeks will perform faster and reading from a contiguous block is a single seek. So if your query relies on aggregations or sorting on a column (or multiple columns) then a column store can perform much faster because reading all values in the column can be a single seek under optimal conditions. If your query is searching for a handful of customer records in a huge customer table based on an indexed column like a customer ID then a row store DB might actually be faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With