Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flexible schema with Google Bigquery

I have around 1000 files that have seven columns. Some of these files have a few rows that have an eighth column (if there is data).

What is the best way to load this into BigQuery? Do I have to find and edit all these files to either - add an empty eighth column in all files - remove the eighth column from all files? I don't care about the value in this column.

Is there a way to specify eight columns in the schema and add a null value for the eighth column when there is no data available.

I am using BigQuery APIs to load data if that might help.

like image 291
Febian Shah Avatar asked Oct 19 '25 03:10

Febian Shah


2 Answers

You can use the 'allowJaggedRows' argument, which will treat non-existent values at the end of a row as nulls. So your schema could have 8 columns, and all of the rows that don't have that value will be null.

This is documented here: https://developers.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowJaggedRows

I've filed a doc bug to make this easier to find.

like image 127
Jordan Tigani Avatar answered Oct 22 '25 06:10

Jordan Tigani


If your logs are in JSON, you can define a nullable field, and if it does not appear in the record, it would remain null. I am not sure how it works with CSV, but I think that you have to have all fields (even empty).

like image 41
N.N. Avatar answered Oct 22 '25 05:10

N.N.