Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python postgres can I fetchall() 1 million rows?

I am using psycopg2 module in python to read from postgres database, I need to some operation on all rows in a column, that has more than 1 million rows.

I would like to know would cur.fetchall() fail or cause my server to go down? (since my RAM might not be that big to hold all that data)

q="SELECT names from myTable;"
cur.execute(q)
rows=cur.fetchall()
for row in rows:
    doSomething(row)

what is the smarter way to do this?

like image 472
Medya Gh Avatar asked Aug 31 '25 18:08

Medya Gh


2 Answers

The solution Burhan pointed out reduces the memory usage for large datasets by only fetching single rows:

row = cursor.fetchone()

However, I noticed a significant slowdown in fetching rows one-by-one. I access an external database over an internet connection, that might be a reason for it.

Having a server side cursor and fetching bunches of rows proved to be the most performant solution. You can change the sql statements (as in alecxe answers) but there is also pure python approach using the feature provided by psycopg2:

cursor = conn.cursor('name_of_the_new_server_side_cursor')
cursor.execute(""" SELECT * FROM table LIMIT 1000000 """)

while True:
    rows = cursor.fetchmany(5000)
    if not rows:
        break

    for row in rows:
        # do something with row
        pass

you find more about server side cursors in the psycopg2 wiki

like image 123
linqu Avatar answered Sep 02 '25 08:09

linqu


Consider using server side cursor:

When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client.

If the dataset is too large to be practically handled on the client side, it is possible to create a server side cursor. Using this kind of cursor it is possible to transfer to the client only a controlled amount of data, so that a large dataset can be examined without keeping it entirely in memory.

Here's an example:

cursor.execute("DECLARE super_cursor BINARY CURSOR FOR SELECT names FROM myTable")
while True:
    cursor.execute("FETCH 1000 FROM super_cursor")
    rows = cursor.fetchall()

    if not rows:
        break

    for row in rows:
        doSomething(row)
like image 29
alecxe Avatar answered Sep 02 '25 08:09

alecxe