Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Polars: How to get the row count of a LazyFrame?

The CSV file I have is 70 Gb in size. I want to load the DF and count the number of rows, in lazy mode. What's the best way to do so?

As far as I can tell, there is no function like shape in lazy mode according to the documentation. I found this answer which provide a solution not based on Polars, but I wonder if it is possible to do this in Polars as well.

like image 827
roei shlezinger Avatar asked Nov 16 '25 03:11

roei shlezinger


1 Answers

For polars 0.20.5+

To get the row count using polars.

First load it into a lazyframe...

lzdf=pl.scan_csv("mybigfile.csv")

Then count the rows and return the result

lzdf.select(pl.len()).collect()

If you just want a python scalar rather than a table as a result then just subset it

lzdf.select(pl.len()).collect().item()

For older versions

To get the row count using polars.

First load it into a lazyframe...

lzdf=pl.scan_csv("mybigfile.csv")

Then count the rows and return the result

lzdf.select(pl.count()).collect()

If you just want a python scalar rather than a table as a result then just subset it

lzdf.select(pl.count()).collect().item()
like image 120
Dean MacGregor Avatar answered Nov 18 '25 17:11

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!