Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use Regex to filter Columns (by name) of a PySpark dataframe

Tags:

pyspark

I have a Spark dataframe with 3k-4k columns and I'd like to drop columns where the name meets certain variable criteria ex. Where ColumnName Like 'foo'.

like image 824
DespicableMe Avatar asked Oct 27 '25 21:10

DespicableMe


1 Answers

To get a column names you use df.columns and drop() supports dropping many columns in one call. The below code uses these two and does what you need:

condition = lambda col: 'foo' in col
new_df = df.drop(*filter(condition, df.columns))
like image 81
Mariusz Avatar answered Oct 29 '25 19:10

Mariusz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!