I want to query s3 csv files to athena. Source csv file desc: (separator '|')
system information
val1|val2|val3|val4|val5|
Base on that I create table in athena:
Create external table dbname.fromcsv
(
col1 string,
col2 string,
col3 string,
col4 string,
col5 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = '|')
LOCATION
's3://mybucketloc/folder/'
TBLPROPERTIES ("skip.header.line.count"='1');
After creating table and querying athena everything looks fine, data are well separated etc.
First query result:
col1 col2 col3 col4 col5
val1 val2 val3 val4 val5
Next query against the same table is returning values with ',' separator stored in one column...
col1 col2 col3 col4 col5
val1,val1,val,val4,val5
And then In my s3 bucket additional files are added .csv.metadata
How to handle such case? I dont want to delete every single time .csv.metadata file ? Is there are way to keep the original create table definition?
I've tried to change create table to ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
and many others...
I believe your s3://mybucketloc/folder/ is the same as athena query results bucket. This bucket is created by Athena for storing files such as .csv.metadata etc. and is meant for its internal use, hence you have .csv.metadata files present in the same bucket as that of your csv files.
Easiest solution could be, have your csv files in a different s3 bucket and not in the athena-query-results bucket.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With