Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to schedule an export from a BigQuery table to Cloud Storage?

I have successfully scheduled my query in BigQuery, and the result is saved as a table in my dataset. I see a lot of information about scheduling data transfer in to BigQuery or Cloud Storage, but I haven't found anything regarding scheduling an export from a BigQuery table to Cloud Storage yet.

Is it possible to schedule an export of a BigQuery table to Cloud Storage so that I can further schedule having it SFTP-ed to me via Google BigQuery Data Transfer Services?

like image 857
brad broyles Avatar asked Nov 17 '25 08:11

brad broyles


2 Answers

Not sure if this was in GA when this question was asked, but at least now there is an option to run an export to Cloud Storage via a regular SQL query. See the SQL tab in Exporting table data.

Example:

EXPORT DATA
  OPTIONS (
    uri = 'gs://bucket/folder/*.csv',
    format = 'CSV',
    overwrite = true,
    header = true,
    field_delimiter = ';')
AS (
  SELECT field1, field2
  FROM mydataset.table1
  ORDER BY field1
);

This could as well be trivially setup via a Scheduled Query if you need a periodic export. And, of course, you need to make sure the user or service account running this has permissions to read the source datasets and tables and to write to the destination bucket.

Hopefully this is useful for other peeps visiting this question if not for OP :)

like image 117
Gerard Avatar answered Nov 18 '25 23:11

Gerard


There isn't a managed service for scheduling BigQuery table exports, but one viable approach is to use Cloud Functions in conjunction with Cloud Scheduler.

The Cloud Function would contain the necessary code to export to Cloud Storage from the BigQuery table. There are multiple programming languages to choose from for that, such as Python, Node.JS, and Go.

Cloud Scheduler would send an HTTP call periodically in a cron format to the Cloud Function which would in turn, get triggered and run the export programmatically.

As an example and more specifically, you can follow these steps:

  1. Create a Cloud Function using Python with an HTTP trigger. To interact with BigQuery from within the code you need to use the BigQuery client library. Import it with from google.cloud import bigquery. Then, you can use the following code in main.py to create an export job from BigQuery to Cloud Storage:

        # Imports the BigQuery client library
        from google.cloud import bigquery
    
        def hello_world(request):
            # Replace these values according to your project
            project_name = "YOUR_PROJECT_ID" 
            bucket_name = "YOUR_BUCKET" 
            dataset_name = "YOUR_DATASET" 
            table_name = "YOUR_TABLE" 
            destination_uri = "gs://{}/{}".format(bucket_name, "bq_export.csv.gz")
    
            bq_client = bigquery.Client(project=project_name)
    
            dataset = bq_client.dataset(dataset_name, project=project_name)
            table_to_export = dataset.table(table_name)
    
            job_config = bigquery.job.ExtractJobConfig()
            job_config.compression = bigquery.Compression.GZIP
    
            extract_job = bq_client.extract_table(
                table_to_export,
                destination_uri,
                # Location must match that of the source table.
                location="US",
                job_config=job_config,
            )  
            return "Job with ID {} started exporting data from {}.{} to {}".format(extract_job.job_id, dataset_name, table_name, destination_uri)
    

    Specify the client library dependency in the requirements.txt file by adding this line:

    google-cloud-bigquery
    
  2. Create a Cloud Scheduler job. Set the Frequency you wish for the job to be executed with. For instance, setting it to 0 1 * * 0 would run the job once a week at 1 AM every Sunday morning. The crontab tool is pretty useful when it comes to experimenting with cron scheduling.

    Choose HTTP as the Target, set the URL as the Cloud Function's URL (it can be found by selecting the Cloud Function and navigating to the Trigger tab), and as HTTP method choose GET.

    Once created, and by pressing the RUN NOW button, you can test how the export behaves. However, before doing so, make sure the default App Engine service account has at least the Cloud IAM roles/storage.objectCreator role, or otherwise the operation might fail with a permission error. The default App Engine service account has a form of [email protected].

    If you wish to execute exports on different tables, datasets and buckets for each execution, but essentially employing the same Cloud Function, you can use the HTTP POST method instead, and configure a Body containing said parameters as data, which would be passed on to the Cloud Function - although, that would imply doing some small changes in its code.

Lastly, when the job is created, you can use the Cloud Function's returned job ID and the bq CLI to view the status of the export job with bq show -j <job_id>.

like image 21
Maxim Avatar answered Nov 18 '25 23:11

Maxim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!