Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Group By on Date Column in SQL

Tags:

sql

vertica

I have large historical data and I want to group by it month. I am preparing below query on HP Vertica database to get my data grouped by Month.

SELECT 
      Region, Country,
      TO_TIMESTAMP(TO_CHAR(Order_Date,'m/yyyy'),'m/yyyy') as MONTH_Order_Date, 
      SUM(CAST(Unit_Price as NUMERIC(37,15))) as Sum_Unit_Price, 
      SUM(CAST(Total_Revenue as NUMERIC(37,15))) as Sum_Total_Revenue 
FROM SalesRecords 
GROUP BY Region,Country,TO_TIMESTAMP(TO_CHAR(Order_Date,'m/yyyy'),'m/yyyy')
ORDER BY Region ASC,Country ASC,TO_TIMESTAMP(TO_CHAR(Order_Date,'m/yyyy'),'m/yyyy') ASC

Now problem with this query is performance, over 1 billion data it is taking almost 2 minute and return record set after group by is 0.1 million rows. I need data in below shown format and due to that I need to parse date at DB level to get formatted date and this formatting taking time on DB server.

Please suggest me any other way which I can use because on same table if I am grouping data on some other column except the date type column performance is good, returned record set almost in 40 second over 1 billion record.

I need to do this performance for all major RDBMS.

Region  Country     MONTH_Order_Date    Sum_Unit_Price          Sum_Total_Revenue
Asia    Bangladesh  2010-01-01 00:00:00 186252.349999999999937  920607119.140000001169023
Asia    Bangladesh  2011-01-01 00:00:00 186456.190000000000641  931633189.440000000452752
Asia    Bangladesh  2012-01-01 00:00:00 194925.550000000000312  969718040.969999998663069
Asia    Bangladesh  2013-01-01 00:00:00 194048.289999999999566  1017253078.219999998891571
Asia    Bangladesh  2014-01-01 00:00:00 184143.090000000000413  915867255.449999996567606
Asia    Bangladesh  2015-01-01 00:00:00 193697.769999999999864  959097995.869999999053554
Asia    Bangladesh  2016-01-01 00:00:00 184833.730000000000529  955360230.500000001682568
Asia    Bangladesh  2017-01-01 00:00:00 111476.840000000000014  563824376.189999998257226
Asia    Bhutan      2010-01-01 00:00:00 186506.900000000000474  916963415.479999997623498 
like image 523
sandeep tiwari Avatar asked Jan 25 '26 23:01

sandeep tiwari


1 Answers

Vertica run-length encoding (RLE) allows you to leverage low-cardinality columns for improved query performance and disk space savings, and converting a very granular data type like timestamp to mmyyyy might be perfect for the job.

1) Put the mmyyyy conversion into your data load process

2) create a new projection with a new computed column called month_year and formatted mmyyyy.

3) In the projection column list and in the projection ORDER BY, put this and other low-card columns at the beginning of the list. High-card columns go towards the end of the list. But any high-card join columns (like the key) should be immediately after the low-card columns.

4) Be sure to specify ENCODING_RLE on this and all other low-card columns.

These two links to Vertica docs explain the concepts involved:

Choosing Sort Order: Best Practices

Optimizing Queries for Projections with Predicates

like image 90
Victor Di Leo Avatar answered Jan 27 '26 13:01

Victor Di Leo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!