This wasn't clear from the documentation, but it looks like BigQueryIO.write performs a streaming write, which in turn limits the row size to <20KB?
Is it possible to configure a non-streaming BigQuery write that enables support for the larger (1MB) row size? My DataFlow job is a batch job, not a streaming one, and BigQuery streaming is not necessary, and undesired in this case, since it restricts me from importing my data.
If not, what's the recommended workflow for importing large rows into BigQuery? I guess I can run the DataFlow ETL and write my data into text files using TextIO, but then I'd have to add a manual step outside of this pipeline to trigger a BQ import?
Batch Datflow jobs don't stream data to BigQuery. The data is written to GCS and then we execute BigQuery import jobs to import the GCS files. So the streaming limits shouldn't apply.
Note the import job is executed by the service not by the workers which is why you don't see code for this in BigQueryIO.write.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With