I am trying to get the number of request by hour of my CloudFront distribution using athena query.
I created cloudfront_logs table guided by this Link in my sample_db
Below is the query I made to get the number of request by hour
SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs"
WHERE
from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND
from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY TIME
ORDER BY TIME ASC;
However it returns error like this
SYNTAX_ERROR: line 2:8: '"date_trunc"('hour', "from_iso8601_timestamp"("concat"("concat"("date_format"("date", '%Y-%m-%d'), 'T'), "time")))' must be an aggregate expression or appear in GROUP BY clause
Therefore, I replaced the TIME
after GROUP BY
to date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)))
, and tried again.
SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs"
WHERE
from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND
from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)))
ORDER BY TIME ASC;
Finally, I got result.
I think, it should work with the first query also. Can I get any advice on the first query? Because It looks more simple.
Athena is based on Presto. In Presto, you cannot use SELECT
clause column aliases in the GROUP BY
clause.
However, you can use identical expression, e.g.:
SELECT some_expression(a) FROM ... GROUP BY some_expression(a)
Also, you can refer to SELECT
clause columns using their position in the SELECT list:
SELECT some_expression(a) FROM ... GROUP BY 1
This is ANSI SQL standard syntax. See Presto GROUP BY documentation for mode details.
This can be used in ORDER BY
as well, so your query would be something like
SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs"
WHERE ...
GROUP BY 1
ORDER BY 1 ASC;
This just appears that Athena SQL does not make an alias used in the SELECT
clause available in the GROUP BY
clause at the same level. Many versions of SQL do allow this, though. You may try using a CTE here to make things less verbose:
WITH cte AS (
SELECT from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) AS ts
FROM "sample_db"."cloudfront_logs"
)
SELECT
DATE_TRUNC('hour', ts) AS TIME,
COUNT(*) AS CNT
FROM cte
WHERE
ts >= from_iso8601_timestamp('2019-05-29T00:00:00') AND
ts < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY
DATE_TRUNC('hour', ts)
ORDER BY
DATE_TRUNC('hour', ts);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With