Flexible schema with Google Bigquery

Question

I have around 1000 files that have seven columns. Some of these files have a few rows that have an eighth column (if there is data).

What is the best way to load this into BigQuery? Do I have to find and edit all these files to either - add an empty eighth column in all files - remove the eighth column from all files? I don't care about the value in this column.

Is there a way to specify eight columns in the schema and add a null value for the eighth column when there is no data available.

I am using BigQuery APIs to load data if that might help.

Jordan Tigani · Accepted Answer

You can use the 'allowJaggedRows' argument, which will treat non-existent values at the end of a row as nulls. So your schema could have 8 columns, and all of the rows that don't have that value will be null.

This is documented here: https://developers.google.com/bigquery/docs/reference/v2/jobs#configuration.load.allowJaggedRows

I've filed a doc bug to make this easier to find.

N.N. · Answer

If your logs are in JSON, you can define a nullable field, and if it does not appear in the record, it would remain null. I am not sure how it works with CSV, but I think that you have to have all fields (even empty).

Flexible schema with Google Bigquery

Tags:

google-bigquery

Febian Shah

2 Answers

Jordan Tigani

N.N.

Recent Activity

Donate For Us

Flexible schema with Google Bigquery

Tags:

google-bigquery

Febian Shah

2 Answers

Jordan Tigani

N.N.

Related questions

Recent Activity

Donate For Us