Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to upsert data while using Snowpipe?

Context

I like using Snowpipe a lot but I have no way to apply my upsert logic when using it.

This is how my upsert logic looks like:

create temp table temp_table (like target); 

copy into temp_table from @snowflake_stage;

begin transaction;

delete from target using temp_table 
where target.pk = temp_table.pk;

insert into target 
select * from temp_table;

end transaction;

drop table temp_table;

However with Snowpipe I am only allowed to define a single copy command, hence I am not able to execute a sequence of commands.

What I have tried to do

I thought about using Tasks & Streams but Tasks do not seem to support transactions (more than a single query per task). I also thought about using MERGE but I have to explicitly define the columns I want to INSERT.

For example, I can't do things like (insert without defining what to insert):

merge into src using temp_table on src.pk = temp_table.pk
when not matched then insert;

Are there any other ways I could upsert my data while still using Snowpipe?

like image 286
Reynold Rogers Avatar asked May 02 '26 21:05

Reynold Rogers


1 Answers

Refer to the usage notes of the following

https://docs.snowflake.com/en/sql-reference/sql/create-pipe.html

The use case for Snowpipes is low latency, small file size, frequent loading of data into Snowflake. Its source is a supported file in external or internal stage and its target is a Snowflake table. The idea is that the data will need minimal transformation during load which is supported by the COPY statement. In other words, you cannot run a MERGE INTO at ingestion time, that is not its use case. Once the data has landed as a Snowflake table then you're free to run Tasks on top of that data at a frequency of your choosing. Google "Snowflake Data Pipelines" and it will highlight the Stream concept (Snowflake's CDC) that ingests new data on the landed content. The awesome thing about Streams & Tasks is that you can have as many as you want on top of a single table, share or (recently add in private preview) views. And also note the modes of Streams, UPSERTS v APPEND_ONLY. Then process the data how you want.

A few things to think about though,

  • Could you consider External tables instead? Depending on the source the performance might be even better as Snowflake supports parquet, csv, json, avro and orc
  • If external tables are used then you can put a Stream on top of that, or even a Materialised view
  • Nothing against Snowpipe, our docs have very well explained steps and guidelines on how to set this up and we even have tutorials to deploy snowpipe with automatic notification service like AWS SQS+SNS or calling our REST API for Snowpipe.
  • Also note MERGE into is generally used for UPDATEs and INSERTs (and locks the target table), and Snowflake's micropartitions are immutable. What that means is that although you see an update what is happening underneath is the existing micropartition is committed to the Time-Travel period and the new Micropartition is in the "active" micropartition state (no TT required). Depending on the table and length of time-travel you are using for that table you could end up with more Time-Travel and Fail Safe micro partitions than "Active" micropartitions. Something to consider as you design this, https://docs.snowflake.com/en/user-guide/tables-storage-considerations.html#managing-costs-for-large-high-churn-tables
like image 177
patrick_at_snowflake Avatar answered May 06 '26 08:05

patrick_at_snowflake