Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix 'must be an aggregate expression or appear in GROUP BY clause' with 'as' clause

I am trying to get the number of request by hour of my CloudFront distribution using athena query.

I created cloudfront_logs table guided by this Link in my sample_db

Below is the query I made to get the number of request by hour

SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs" 
WHERE 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY TIME
ORDER BY TIME ASC;

However it returns error like this

SYNTAX_ERROR: line 2:8: '"date_trunc"('hour', "from_iso8601_timestamp"("concat"("concat"("date_format"("date", '%Y-%m-%d'), 'T'), "time")))' must be an aggregate expression or appear in GROUP BY clause

Therefore, I replaced the TIME after GROUP BY to date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))), and tried again.

SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs" 
WHERE 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) >= from_iso8601_timestamp('2019-05-29T00:00:00')
AND 
    from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) < from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)))
ORDER BY TIME ASC;

Finally, I got result.

I think, it should work with the first query also. Can I get any advice on the first query? Because It looks more simple.

like image 582
Bumhwan Kim Avatar asked Oct 18 '25 22:10

Bumhwan Kim


2 Answers

Athena is based on Presto. In Presto, you cannot use SELECT clause column aliases in the GROUP BY clause. However, you can use identical expression, e.g.:

SELECT some_expression(a) FROM ... GROUP BY some_expression(a)

Also, you can refer to SELECT clause columns using their position in the SELECT list:

SELECT some_expression(a) FROM ... GROUP BY 1

This is ANSI SQL standard syntax. See Presto GROUP BY documentation for mode details.

This can be used in ORDER BY as well, so your query would be something like

SELECT date_trunc('hour',from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time))) as TIME, count(*) as CNT
FROM "sample_db"."cloudfront_logs" 
WHERE ...
GROUP BY 1
ORDER BY 1 ASC;
like image 109
Piotr Findeisen Avatar answered Oct 21 '25 18:10

Piotr Findeisen


This just appears that Athena SQL does not make an alias used in the SELECT clause available in the GROUP BY clause at the same level. Many versions of SQL do allow this, though. You may try using a CTE here to make things less verbose:

WITH cte AS (
    SELECT from_iso8601_timestamp(concat(concat(date_format(date, '%Y-%m-%d'), 'T'), time)) AS ts
    FROM "sample_db"."cloudfront_logs"
)

SELECT
    DATE_TRUNC('hour', ts) AS TIME,
    COUNT(*) AS CNT
FROM cte
WHERE
    ts >= from_iso8601_timestamp('2019-05-29T00:00:00') AND
    ts <  from_iso8601_timestamp('2019-05-30T00:00:00')
GROUP BY
    DATE_TRUNC('hour', ts)
ORDER BY
    DATE_TRUNC('hour', ts);
like image 29
Tim Biegeleisen Avatar answered Oct 21 '25 16:10

Tim Biegeleisen