Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling bad lines in bigquery

I am using the google.bigquery.python api in order to automatically load a dataset from a GCS bucket to Bigquery on a daily basis. I specified a schema for this dataset and it worked untill I had one day this error:

ERROR - {u'state': u'DONE', u'errors': [{u'reason': u'invalid', u'message': u'Invalid argument: [email protected]', u'location': u'File: 0 / Line:1283011 / Field:44'},
{u'reason': u'invalid', u'message': u'Invalid argument: [email protected]', u'location': u'File: 0 / Line:1338016 / Field:44'}, {u'reason': u'invalid', u'message': u'Too many errors encountered. Limit is: 0.'}], u'errorResult': {u'reason': u'invalid', u'message': u'Too many errors encountered. Limit is: 0.'}}

My question is: how can I tell bigquery to delete automatically lines which don't fit the schema?

like image 421
sweeeeeet Avatar asked Oct 15 '25 11:10

sweeeeeet


2 Answers

If you are using the bq command line client, the option is --max-bad-records. This tells BQ how many bad records to ignore before failing a load job. The default is zero. I don't recall whether setting this positive lets bad rows into the data or only ignores and suppresses the error. You might want to manually test it.

From: https://cloud.google.com/bigquery/bq-command-line-tool

--max_bad_records Type: integer Description: The maximum number of bad rows to skip before the load job is aborted and no updates are performed. If this value is larger than 0, the job will succeed as long as the number of bad records do not exceed this value. This is useful if you would like to load files that may have bad records. The default value for this parameter is 0 (all rows are required to be valid).

Valid Values: Any integer

Usage: bq load --max_bad_records=3 <destination_table> <data_source_uri> [<table_schema>]

In the python API this field is called maxBadRecords. You can search for maxBadRecords in the python BigQuery API repo on Github and you'll see it is in the jobs API configuration for a load job.

like image 60
Paul Avatar answered Oct 17 '25 00:10

Paul


Sorry can't comment on Paul's answer as I have less than 50 reputations.

Re: whether setting this positive lets bad rows into the data or only ignores and suppresses the error

The bad rows will be ignored and will not go into the loaded data.

like image 37
Hua Zhang Avatar answered Oct 17 '25 01:10

Hua Zhang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!