I have a table with duplicate records. I want to remove them. I've created a column called "hash_code" which is just a sha1 hash of all the columns. Duplicate rows will have the same hash code. Everything is fine except when I tried to create a new table with a query containing GROUP BY. My table has RECORD data type, but the new table created flattens it even when I had specified it to not flatten. Seems like GROUP BY and the "-noflatten_results" flag doesn't place nice.
Here's an example command line I ran:
bq query --allow_large_results --destination_table mydataset.my_events --noflatten_results --replace
"select hash_code, min(event) as event, min(properties.adgroup_name) as properties.adgroup_name,
min(properties.adid) as properties.adid, min(properties.app_id) as properties.app_id,
min(properties.campaign_name) as properties.campaign_name from mydataset.my_orig_events group each
by hash_code "
In the above example, properties is a RECORD data type with nested fields. The resulting table doesn't have properties as RECORD data type. Instead it translated properties.adgroup_name to properties_adgroup_name, etc.
Any way to force BigQuery to treat the result set as RECORD and not flatten in GROUP BY?
Thanks!
There are a few known cases where query results can be flattened despite requesting unflattened results.
GROUP BY clauseORDER BY clauseSELECT record.record.field AS flat_field). Note that this only flattens the specific field with the alias applied, and only flattens the field if it and its parent records are non-repeated.The BigQuery query engine always flattens query results in these cases. As far as I know, there is no workaround for this behavior, other than removing these clauses or aliases from the query.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With