Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

handling newline character in hive

Tags:

hadoop

hive

I have created a table in hive as

Create table(id int, Description String)  

My data looks something as follows :

 
1|This will return corrupt data since there is a ',' in the first string.
     some text
     Change the data  
2|There is prob in reading data 
    sometext

After the data is loaded into hive since the default line terminator is \n, the description column cannot be read by hive, Hence it displays a NULL value. Can anyone suggest how to handle newline before loading into hive.

like image 861
pramav Avatar asked Jan 24 '26 07:01

pramav


1 Answers

I know this question is old, but you have a couple of options. You can't control this with FIELDS TERMINATED BY, because that only controls what terminates the fields, not the records. Records in Hive are hard-coded to be terminated by the newline character (even though there is a LINES TERMINATED BY clause, it is not implemented).

  1. Write a custom InputFormat that uses a RecordReader that understands non-newline delimited records. Look at the code for LineReader/LineRecordReader and TextInputFormat.
  2. Use a format other than text/ASCII, like Parquet. I would recommend this regardless, as text is probably the worst format you can store data in anyway.
like image 187
Brian Schrameck Avatar answered Jan 26 '26 04:01

Brian Schrameck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!