I wanted to create a external table in bigquery which loads data from google cloud storage. During creation of table from Web UI the option of Partitioning Type gets disabled. Is there any way i can create partitioned external table ? My data is already partitioned by date format on GCS Ex: /somepath/data/dt=2018-03-22
Just like the standard tables, you can create and use partitioned tables in BigQuery and your charges will be based on the data stored in the partitions and the queries you run against them.
You can partition BigQuery tables by: Time-unit column: Tables are partitioned based on a TIMESTAMP , DATE , or DATETIME column in the table. Ingestion time: Tables are partitioned based on the timestamp when BigQuery ingests the data. Integer range: Tables are partitioned based on an integer column.
Federated tables in GCS automatically act as partitioned tables when you use the file name as a variable (_FILE_NAME).
For example, this view transforms the file name into a native date:
#standardSQL
CREATE VIEW `fh-bigquery.views.wikipedia_views_test_ddl`
AS SELECT
  PARSE_TIMESTAMP('%Y%m%d-%H%M%S', REGEXP_EXTRACT(_FILE_NAME, '[0-9]+-[0-9]+')) datehour
  , _FILE_NAME filename
  , line
FROM `fh-bigquery.views.wikipedia_views_gcs`
Later I can write queries like:
#standardSQL
SELECT *
FROM `fh-bigquery.views.wikipedia_views_test_ddl`
WHERE EXTRACT(YEAR FROM datehour)=2015
AND EXTRACT(MONTH FROM datehour)=10
AND EXTRACT(DAY FROM datehour)=21
AND EXTRACT(HOUR FROM datehour)=7
... and these queries will only open the files with names that match this pattern.
I wrote a whole story about this at https://medium.com/google-cloud/bigquery-lazy-data-loading-ddl-dml-partitions-and-half-a-trillion-wikipedia-pageviews-cd3eacd657b6.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With