(postgresql) I was trying to COPY csv data into a table but I was getting duplicate key violation errors, and there's no way to tell COPY to ignore those, so following internet wisdom I tried adding this rule:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO NOTHING;
to circumvent the problem, but I still get those errors - any ideas why ?
As a solution for your present problem, either delete the conflicting manually inserted rows or reset the sequence to a value higher than any existing id in the table.
The “duplicate key violates unique constraint” error notifies the caller that a retry is needed. This seems like an intuitive approach, but relying on this optimistic insert can quickly have a negative performance impact on your database.
You must have INSERT privilege on a table in order to insert into it. If ON CONFLICT DO UPDATE is present, UPDATE privilege on the table is also required. If a column list is specified, you only need INSERT privilege on the listed columns.
Rules by default add things to the current action:
Roughly speaking, a rule causes additional commands to be executed when a given command on a given table is executed.
But an INSTEAD rule allows you to replace the action:
Alternatively, an INSTEAD rule can replace a given command by another, or cause a command not to be executed at all.
So, I think you want to specify INSTEAD:
CREATE OR REPLACE RULE ignore_duplicate_inserts AS
   ON INSERT TO mytable
   WHERE (EXISTS ( SELECT mytable.id
           FROM mytable
          WHERE mytable.id = new.id)) DO INSTEAD NOTHING;
Without the INSTEAD, your rule is essentially saying "do the INSERT and then do nothing" when you want to say "instead of the INSERT, do nothing" and, AFAIK, the DO INSTEAD NOTHING will do that.
I'm not an expert on PostgreSQL rules but I think adding the "INSTEAD" should work.
UPDATE: Thanks to araqnid we know that:
COPY FROM will invoke any triggers and check constraints on the destination table. However, it will not invoke rules
So a rule isn't going to work in this situation. However, triggers are fired during COPY FROM so you could write a BEFORE INSERT trigger that would return NULL when it detected duplicate rows:
It can return NULL to skip the operation for the current row. This instructs the executor to not perform the row-level operation that invoked the trigger (the insertion or modification of a particular table row).
That said, I think you'd be better off with araqnid's "load it all into a temporary table, clean it up, and copy it to the final destination" would be a more sensible solution for a bulk loading operation like you have.
COPY FROM will not invoke rules (http://www.postgresql.org/docs/9.0/interactive/sql-copy.html#AEN58860)
My approach would be to load the CSV data into a temp table, then use an INSERT...SELECT statement to copy the data into the target table where it doesn't already exist. (If there are duplicates in the CSV data itself, remove those from the temp table first). Something like:
BEGIN;
CREATE TEMP TABLE stage_data(key_column, data_columns...) ON COMMIT DROP;
\copy stage_data from data.csv with csv header
-- prevent any other updates while we are merging input (omit this if you don't need it)
LOCK target_data IN SHARE ROW EXCLUSIVE MODE;
-- insert into target table
INSERT INTO target_data(key_column, data_columns...)
   SELECT key_column, data_columns...
   FROM stage_data
   WHERE NOT EXISTS (SELECT 1 FROM target_data
                     WHERE target_data.key_column = stage_data.key_column)
END;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With