Here is a code snippet I used as a reference from GCP documentation:
job_config = bigquery.QueryJobConfig()
# Set the destination table
table_ref = client.dataset(dataset_id).table('your_table_id')
job_config.destination = table_ref
sql = """
    SELECT corpus
    FROM `bigquery-public-data.samples.shakespeare`
    GROUP BY corpus;
"""
# Start the query, passing in the extra configuration.
query_job = client.query(
    sql,
    # Location must match that of the dataset(s) referenced in the query
    # and of the destination table.
    location='US',
    job_config=job_config)  # API request - starts the query
query_job.result()  # Waits for the query to finish
print('Query results loaded to table {}'.format(table_ref.path))
This works fine but if the table already exists, this spits out an error. I know how to delete the table first but I am wondering if there is a way to make this overwrite the table in atomic fashion so that the table always exists.
You can control how results are persisted through a combination of setting the create_disposition and write_disposition. The python library exposes these options in QueryJobConfig, and links to more details from the REST API docs.
For queries, the default behavior for write dispostion is WRITE_EMPTY, which causes a failure if the table already exists.  Switching that to WRITE_TRUNCATE should give you the atomic replacement of data that you're looking for.
TL;DR: just add this to your job config:
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With