Drop rows in Pyspark

Question

How can I drop the row values in Pyspark based on the value of row number/row index value?

I am new to Pyspark (and coding) -- I have tried coding something but it is not working.

Manrique · Accepted Answer

You can't drop specific cols, but you can just filter the ones you want, by using filter or its alias, where.

Imagine you want "to drop" the rows where the age of a person is lower than 3. You can just keep the opposite rows, like this:

df.filter(df.age >= 3)

Donate For Us