I have a partitioned table with less than 600MB data in the targeted partitions.
When I run an EXPORT query with a subquery that returns ~1000 rows, EXPORT shards the data into 27(!) files.
This is not a big query result. I assume it's happening because the optimizer sees the 600MB, but I'm not sure.
Has anyone else encountered this? I am GZIPing the results so concatenating would involve unzipping, appending and zipping again...
I have run many various export scenarios in BigQuery. I think there is only one use case when BigQuery export would export multiple files where Table size is partitioned and smaller than 1 GB. It's using wildcards during exporting.
BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the URI except as part of the bucket name. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern.
I have tested this on using Public database with partitioned and normal table:
bigquery-public-data.covid19_geotab_mobility_impact.us_border_wait_timesbigquery-public-data.covid19_google_mobility.mobility_reportScenario 1:
Normal table: bigquery-public-data.covid19_google_mobility.mobility_report
Table size - 771.81 MB
Number of rows - 7,254,927
When I exported this normally (using GZIP) I've got only one file (regularexportnowildcard). When I used wildcard it created 2 files with the same size (regularexport000000000000 and regularexport000000000001).

Scenario 2:
Partitioned table: bigquery-public-data.covid19_geotab_mobility_impact.us_border_wait_times
Table size - 540.25 MB
Number of rows - 870,064
When I have exported a Partitioned table normally, without a wildcard I've got only one file (nowildcardexport).
However, when I tried to export Partitioned Table with Wildcards, I got 586 files as a result.

As stated in the GCP Documentation Introduction to partitioned tables
A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you can control costs by reducing the number of bytes read by a query.
It looks like, if you are using wildcards with partitioned table, it creates each file for each partition in the table.
UPDATE
When you are using EXPORT DATA statement you have to use wildcard otherwise you will get an error:
Invalid uri specification. Option 'uri' value must be a wild card URI.
As workaround of this, you could try to use UI or bq command. More details can be found in this Issue Tracker
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With