I would like to remove the duplicate data only if three columns (name, price and new price) matching with the same data. But in an other python script.
So the data can insert in to the database, but with an other python script, I want to delete this duplicate data by a cron job.
So in this case:
cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")
are duplicates. Example script with inserted data:
import psycopg2
import sys
con = None
try:
con = psycopg2.connect(database='testdb', user='janbodnar')
cur = con.cursor()
cur.execute("CREATE TABLE cars(id INT PRIMARY KEY, name VARCHAR(20), price INT, new price INT)")
cur.execute("INSERT INTO cars VALUES(1,'Audi',52642, 98484)")
cur.execute("INSERT INTO cars VALUES(2,'Mercedes',57127, 874897)")
cur.execute("INSERT INTO cars VALUES(3,'Skoda',9000, 439788)")
cur.execute("INSERT INTO cars VALUES(4,'Volvo',29000, 743878)")
cur.execute("INSERT INTO cars VALUES(5,'Bentley',350000, 434684)")
cur.execute("INSERT INTO cars VALUES(6,'Citroen',21000, 43874)")
cur.execute("INSERT INTO cars VALUES(7,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(8,'Hummer',41400, 49747)")
cur.execute("INSERT INTO cars VALUES(9,'Volkswagen',21600, 36456)")
cur.execute("INSERT INTO cars VALUES(10,'Volkswagen',21600, 36456)")
con.commit()
except psycopg2.DatabaseError, e:
if con:
con.rollback()
print 'Error %s' % e
sys.exit(1
finally:
if con:
con.close()
You can do this in one statement without additional round-trips to the server.
DELETE FROM cars
USING (
SELECT id, row_number() OVER (PARTITION BY name, price, new_price
ORDER BY id) AS rn
FROM cars
) x
WHERE cars.id = x.id
AND x.rn > 1;
Requires PostgreSQL 8.4 or later for the window function row_number().
Out of a set of dupes the smallest id survives.
Note that I changed "new price" to new_price.
Or use the EXISTS semi-join, that @wildplasser posted as comment to the same effect.
Or, to by special request of CTE-devotee @wildplasser, with a CTE instead of the subquery ... :)
WITH x AS (
SELECT id, row_number() OVER (PARTITION BY name, price, new_price
ORDER BY id) AS rn
FROM cars
)
DELETE FROM cars
USING x
WHERE cars.id = x.id
AND x.rn > 1;
Data modifying CTE requires Postgres 9.1 or later.
This form will perform about the same as the one with the subquery.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With