Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load CSV data with embedded double quote using CSV serde in Hive. Without updating the incoming data file

Tags:

csv

hadoop

hive

I have text file like below :

1,"TEST"Data","SAMPLE DATA"

and the table structure is like this :

CREATE TABLE test1( id string, col1 string , col2 string )
  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
  STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 
  LOCATION 'mylocation/test1'`

When I am putting the file in concerned HDFS location. 2nd and 03 rd column are populating as null that is because of the double quote in between (TEST"Data).

One way is to update the data file using escape character "/" but we are not allowed to update the incoming data. How can I load data properly and escape these embedded double quotes.

Appreciate the help !!

like image 200
PPPP Avatar asked Dec 21 '25 07:12

PPPP


1 Answers

You can load it using RegexSerDe

Demo

bash

mkdir test1

cat>test1/file.txt
1,"TEST"Data","SAMPLE DATA"
2,"TEST Data","SAMPLE DATA"
3,"TEST","Data","SAMPLE","DATA"

hdfs dfs -put test1 /tmp

hive

create external table test1 
( 
    id      string
   ,col1    string
   ,col2    string 
)
    row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    with serdeproperties  
    (
        'input.regex' = '^(\\d+?),"(.*)","(.*)"$'
    )
    location '/tmp/test1'
;

select * from test1
;

+----------+----------------------+-------------+
| test1.id |      test1.col1      | test1.col2  |
+----------+----------------------+-------------+
| 1        | TEST"Data            | SAMPLE DATA |
| 2        | TEST Data            | SAMPLE DATA |
| 3        | TEST","Data","SAMPLE | DATA        |
+----------+----------------------+-------------+
like image 116
David דודו Markovitz Avatar answered Dec 24 '25 02:12

David דודו Markovitz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!