athena skipping keys starting with underscore

Question

I'm trying to work with aws athena to do some queries on json files we have stored in s3. So, I managed to create a simple schema and everything seemed to be fine until I noticed that some of my files are not accounted for.

The keys of the files are user ids, some of those start with _. All of those are missing in athena. They exist in s3. I can get them. They are similar to the other files. But Athena does not see them.

Obviously it does not like underscores at the beginning of keys. Is there a way around this other than renaming all the files? Underscores elsewhere in the key seem to be not an issue.

My schema (I simplified it by removing fields):

CREATE EXTERNAL TABLE IF NOT EXISTS db.table ( `user_id` string) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1' ) LOCATION 's3://xyz/myfiles/' TBLPROPERTIES ('has_encrypted_data'='false');

Prabhakar Reddy · Accepted Answer

When you query a table, Amazon Athena uses Presto under the hood.Presto ignores files that start with an underscore underscore _ or a dot starting from presto version 0.60.This is the behavior of Hadoop MapReduce / Hive

https://prestodb.io/docs/current/release/release-0.60.html

Refer to function used by presto to filter the hidden files with org.apache.hadoop.hive.common.FileUtils.HIDDEN_FILES_PATH_FILTER .As the property is derived from Hive the same applies to Hive tables which will ignore the files in particular location.

athena skipping keys starting with underscore

Tags:

amazon-athena

Jilles van Gurp

1 Answers

Prabhakar Reddy

Recent Activity

Donate For Us

athena skipping keys starting with underscore

Tags:

amazon-athena

Jilles van Gurp

1 Answers

Prabhakar Reddy

Related questions

Recent Activity

Donate For Us