Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I filter a parquet table?

Tags:

python

parquet

I am just starting to look at parquet files, since some of my data is available in that format. And I haven't really played with it before, so here's my question.

I open my parquet file like this:

import pyarrow.parquet as pq

table1 = pq.read_table('mydatafile.parquet')

And this file consists of 10 columns. Is it now possible, directly from this, to filter out all rows where e.g. column3 has the value 1?

I mean, I could just do:

df = table1.to_pandas()
df = df[df["column3"] != 1] 

But can this be done natively, without converting to a Pandas data frame first?

like image 962
Denver Dang Avatar asked Nov 01 '25 20:11

Denver Dang


1 Answers

You can use this syntax from the documentation

import pyarrow.parquet as pq

table1 = pq.read_table('mydatafile.parquet', filters = [('column3',  '!=' , 1)])

Source:

Using predicates to filter rows from pyarrow.parquet.ParquetDataset

like image 182
Angelo Canepa Avatar answered Nov 03 '25 09:11

Angelo Canepa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!