How do I upsert data into Ducklake?
If I have a simple table definition:
CREATE TABLE ducklakeexample.demo (
"Date" TIMESTAMP WITH TIME ZONE,
"Id" UUID,
"Title" TEXT,
"Quantity" INTEGER
);
Add a row into it:
INSERT INTO ducklakeexample.demo
("Date","Id","Title", "Quantity")
VALUES
('2025-07-01 13:44:58.11+00','f3c21234-8e2b-4e1d-b9d2-a11122334455','Some Name',150),
Then want to add a new row and update the Quantity of the existing one:
INSERT INTO ducklakeexample.demo
("Date","Id","Title", "Quantity")
VALUES
-- New dummy row
('2025-07-02 09:00:00+00', 'abcd1234-5678-90ab-cdef-112233445566', 'Another Title', 75),
-- Qty change for existing row
('2025-07-01 13:44:58.11+00','f3c21234-8e2b-4e1d-b9d2-a11122334455','Some Name',0);
This adds a new row, for 'Some Name', as Duck Lake does not support Primary Keys.
So I cannot use:
INSERT OR REPLACE INTODO UPDATEsuch as:
INSERT INTO tbl VALUES (1, 42);
INSERT INTO tbl VALUES (1, 52), (1, 62) ON CONFLICT DO UPDATE SET j = EXCLUDED.j;
If I know ahead of time that the ID exists, I can do a rudimentary update, which works.
UPDATE ducklakeexample.demo
SET "Quantity" = 10
WHERE "Id" = 'f3c21234-8e2b-4e1d-b9d2-a11122334455'::UUID;
So is the expectation that the application has to handle this check ahead of time, then produce two separate queries, one to bulk insert, and other to bulk update?
Otherwise I have an accountancy/ledger style, with duplicate entries and I have to check for the max row or something to get the latest update. Which makes no sense, as this is what Time Travel is for?
DuckDB currently (version 1.3.1) does not have support for MERGE statements to perform upserts in ducklake. This is planned to be added in a future update according to their roadmap. What you can do right now is to perform a DELETE on all rows in the table that has the same ID as the to-be-inserted records and then INSERT the updated records to the table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With