Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows in Pyspark

Tags:

pyspark

How can I drop the row values in Pyspark based on the value of row number/row index value?

I am new to Pyspark (and coding) -- I have tried coding something but it is not working.

like image 223
Shravan K Avatar asked Nov 01 '25 13:11

Shravan K


1 Answers

You can't drop specific cols, but you can just filter the ones you want, by using filter or its alias, where.

Imagine you want "to drop" the rows where the age of a person is lower than 3. You can just keep the opposite rows, like this:

df.filter(df.age >= 3)
like image 120
Manrique Avatar answered Nov 04 '25 20:11

Manrique