Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading JSON file into BigQuery table: Schema changes

I am trying to load a json file into a BQ table. My schema looks something like:

{"_eventid": "1234", "Keywords":""}
{"_eventid": "4567", "Keywords":{"_text":"abcd"} }

From above, you can see that the schema changes for "Keywords." How do I deal with this? Using something like:

  {
   "name":"Keywords",
   "type":"record",
   "mode":"nullable",
   "fields": [
   {
    "name":"_text",
    "type":"string",
    "mode":"nullable"
   }
   ]
},

Only works for the second entry. For the first, I get the error:

Errors:
file-00000000: JSON table encountered too many errors, giving up.    Rows: 1; errors: 1. (error code: invalid)
JSON parsing error in row starting at position 0 at file: file- 00000000. Flat value specified for record field. Field: Keywords;     Value: (error code: invalid)
like image 257
Feynman27 Avatar asked Mar 25 '26 19:03

Feynman27


2 Answers

Short Answer

Bigquery table is schema bounded. Whenever we try to ingest data which is not according to table schema we get error. In your case in the first record the value of Keywords is string but in the schema it is record with one nullable field which name is _text.

Workaround

You need to preprocess data before loading it to bigquery. If you have small json file you can write script and check if type of Keywords is record or string if it is string create the record first. So after preprocessing the file content would look like:

{"_eventid": "1234", "Keywords":{"_text": ""}}
{"_eventid": "4567", "Keywords":{"_text":"abcd"} }

According to your schema Keywords is nullable record. You can even ignore keywords which value is empty during preprocessing. After this step the input file would become.

{"_eventid": "1234"}
{"_eventid": "4567", "Keywords":{"_text":"abcd"} }
like image 55
Shamshad Alam Avatar answered Mar 27 '26 10:03

Shamshad Alam


BigQuery now supports schema changes on load with

--schema_update_option=ALLOW_FIELD_ADDITION
--schema_update_option=ALLOW_FIELD_RELAXATION

options. See How to insert/append unstructured data to bigquery table for more details and examples with JSON loading.

like image 43
Mosha Pasumansky Avatar answered Mar 27 '26 10:03

Mosha Pasumansky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!