I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using
A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>' 
but get an error.
What is the syntax for correctly loading the file?
The schema file format is something like:
data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"
It's possible to load data with schema file.
When you store your data with the '-schema' flag, in the output path, there is .pig-schema file that hold json with the schema.
You can use it when loading data
B = LOAD '<>' USING PigStorage(',','-schema'); 
You can see the schema by running
describe A;
Check this good post for more details.
This feature is available beginning with Pig 0.10.
The AS clause is for specifying the schema directly not the path to the schema file.
 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';
Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:
{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}
This file is also generated if you specify the -schema option when storing with PigStorage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With