I have a lambda that attempts to find out whether a previously executed athena query has returned any rows or not. To do so I am using the boto3 function get_query_runtime_statistics and then extracting the "Rows" data:
response = athena_client.get_query_runtime_statistics(QueryExecutionId=query_id)
row_count = response["QueryRuntimeStatistics"]["Rows"]["OutputRows"]
However, in a previous execution the response object has not contained the "Rows" data, resulting in a KeyError being thrown. I know I can get around the KeyError by using .get("Rows", {}).get("OutputRows") etc.
I reran the exact same query in the athena console (it returns 0 rows) and then used the query ID to get the runtime statistics of this duplicate query execution. This time it had the "Rows" data in the response. Therefore the behaviour doesn't appear to be consistent for a given query string; however, if I get the statistics for the original query execution the response consistently does not contain the "Rows" data.
What I want to know is whether every time "Rows" data is not present can I assume that the output row count was zero?
I couldn't find anything in the AWS docs explaining why "Rows" may not always be present in the API response.
Thanks :)
PS. If you don't want to follow the link to the documentation, here is the response schema according to boto3:
{
    'QueryRuntimeStatistics': {
        'Timeline': {
            'QueryQueueTimeInMillis': 123,
            'QueryPlanningTimeInMillis': 123,
            'EngineExecutionTimeInMillis': 123,
            'ServiceProcessingTimeInMillis': 123,
            'TotalExecutionTimeInMillis': 123
        },
        'Rows': {
            'InputRows': 123,
            'InputBytes': 123,
            'OutputBytes': 123,
            'OutputRows': 123
        },
        'OutputStage': {
            'StageId': 123,
            'State': 'string',
            'OutputBytes': 123,
            'OutputRows': 123,
            'InputBytes': 123,
            'InputRows': 123,
            'ExecutionTime': 123,
            'QueryStagePlan': {
                'Name': 'string',
                'Identifier': 'string',
                'Children': [
                    {'... recursive ...'},
                ],
                'RemoteSources': [
                    'string',
                ]
            },
            'SubStages': [
                {'... recursive ...'},
            ]
        }
    }
}
I raised a support ticket and got the follwoing responses:
The query finished successfully but it failed as an async process of getting runtime stats. This is an internal issue and internal team is aware about it and is working on it to fix the same.
I asked for clarification whether this issue only happens on queries that produce zero results, this was the response:
The issue could happen regardless of the query. Also as informed by internal team, it may take approximately 15-30 days to know the root cause and fix the issue.
[sent on 2023-02-10]
I hope this is helpful to anyone else who comes across this :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With