Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load xml file into Hive

Tags:

xml

hadoop

hive

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file

Example :

            <?xml version='1.0' encoding='iso-8859-1'?>

            <section1>
                <id> 1233222 </id>
               // having lot of xml tages 
            </section1>

            <section2>
               // having lot of xml tages 
            </section2>

            <section3>
               // having lot of xml tages 
            </section3>

            <section4>
               // having lot of xml tages 
            </section4>

            </xml>

And i have the four tables

        section1Table

        id       section1    // fields 

        section2Table

        id       section2

        section3Table 

        id       section3

        section4Table

        id       section4

Now i want to split and load the data into each table.

How can i achieve this . Can anyone help me

Thanks

UPDATE

I have tried the following

CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\


SELECT xpath (name, '//section1') FROM test LIMIT 1 ;

but i got the following error

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}
like image 357
backtrack Avatar asked Jan 28 '26 23:01

backtrack


1 Answers

You have several options:

  • Load the XML into a Hive table with a string column, one per row (e.g. CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDF to do work on the XML.
  • Since you know the XPath's of what you want (e.g. //section1), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.
  • Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.
  • Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

It depends on your level of experience and comfort with these approaches.

like image 153
Vidya Avatar answered Jan 30 '26 14:01

Vidya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!