How to remove duplicates in combination of three columns in python with postgresql

Question

I would like to remove the duplicate data only if three columns (name, price and new price) matching with the same data. But in an other python script.

So the data can insert in to the database, but with an other python script, I want to delete this duplicate data by a cron job.

So in this case:

cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")

cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")

are duplicates. Example script with inserted data:

import psycopg2
import sys

con = None

try:
    con = psycopg2.connect(database='testdb', user='janbodnar')    
    cur = con.cursor()

    cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name VARCHAR(20), price INT, new price INT)")
    cur.execute("INSERT INTO cars VALUES(1,'Audi',52642, 98484)")
    cur.execute("INSERT INTO cars VALUES(2,'Mercedes',57127, 874897)")
    cur.execute("INSERT INTO cars VALUES(3,'Skoda',9000, 439788)")
    cur.execute("INSERT INTO cars VALUES(4,'Volvo',29000, 743878)")
    cur.execute("INSERT INTO cars VALUES(5,'Bentley',350000, 434684)")
    cur.execute("INSERT INTO cars VALUES(6,'Citroen',21000, 43874)")
    cur.execute("INSERT INTO cars VALUES(7,'Hummer',41400, 49747)")
    cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")
    cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")
    cur.execute("INSERT INTO cars VALUES(10,'Volkswagen',21600, 36456)")

    con.commit()

except psycopg2.DatabaseError, e:
    if con:
        con.rollback()

    print 'Error %s' % e    
    sys.exit(1

finally:    
    if con:
        con.close()

Erwin Brandstetter · Accepted Answer

You can do this in one statement without additional round-trips to the server.

DELETE FROM cars
USING (
    SELECT id, row_number() OVER (PARTITION BY name, price, new_price
                                  ORDER BY id) AS rn
    FROM   cars
    ) x
WHERE cars.id = x.id
AND   x.rn > 1;

Requires PostgreSQL 8.4 or later for the window function row_number().
Out of a set of dupes the smallest id survives.
Note that I changed "new price" to new_price.

Or use the EXISTS semi-join, that @wildplasser posted as comment to the same effect.

Or, to by special request of CTE-devotee @wildplasser, with a CTE instead of the subquery ... :)

WITH x AS (
    SELECT id, row_number() OVER (PARTITION BY name, price, new_price
                                  ORDER BY id) AS rn
    FROM   cars
    )
DELETE FROM cars
USING  x
WHERE  cars.id = x.id
AND    x.rn > 1;

Data modifying CTE requires Postgres 9.1 or later.
This form will perform about the same as the one with the subquery.

How to remove duplicates in combination of three columns in python with postgresql

Tags:

python

database

postgresql

patrick

1 Answers

Erwin Brandstetter

Recent Activity

Donate For Us

How to remove duplicates in combination of three columns in python with postgresql

Tags:

python

database

postgresql

patrick

1 Answers

Erwin Brandstetter

Related questions

Recent Activity

Donate For Us