I've been getting a handful of errors on Postgresql, that seem to be tied to this race condition.
I have a process/daemon written in Twisted Python. The easiest way to describe it is as a Web Crawler - it pulls a page, parses the links, and logs what it's seen. Because of HTTP blocking, Twisted runs multiple "concurrent" processes deferred to threads.
Here's the race condition...
When I encounter a url shortener , this logic happens:
result= """SELECT * FROM shortened_link WHERE ( url_shortened = %(url)s ) LIMIT 1;"""
if result:
pass
else:
result= """INSERT INTO shortened_link ( url_shortened ..."
A surprising number or psycopg2.IntegrityError's are raised, because the unique index on url_shortened gets violated.
The select/insert does actually run that close together. From what I can tell, it looks like 2 shortened links get queued next to one another.
Process A: Select, returns Null
Process B: Select, returns Null
Process A: Insert , success
Process B: Insert , integrity error
Can anyone suggest any tips/tricks to handle this ? I'd like to avoid explicit locking, because I know that'll open up a whole other set of problems.
Do it all in a single command:
result= """
INSERT INTO shortened_link ( url_shortened ...
SELECT %(url)s
where not exists (
select 1
from shortened_link
WHERE url_shortened = %(url)s
);"""
It will only insert if that link does not exist.
There's really not a solution that avoids the need to be able to handle the possibility of a unique constraint violation error. If your framework can't do it then I'd wrap the SQL in a PL/pgSQL function or procedure that can.
Given that you can handle the error you might as well not test for the existence of the unique value and just attempt the insert, letting any error be handled by the EXCEPTION clause.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With