I was working on a problem set where we have a lot of columns in a Pandas dataframe and many of these columns have trailing spaces. My question is, is there a better way to remove these spaces rather than creating a dynamic string (where we pass in column name as variable and append a strip()
to it) and then executing it for every column.
Without an example it is not fully clear what you want to accomplish, but maybe the following will help:
import pandas as pd
df = pd.DataFrame({'A ': [1, 2], 'B ': [4, 5], 'C': [8,9]})
The column headers do have trailing white spaces:
df.columns
Index([u'A ', u'B ', u'C'], dtype='object')
Now you can use map
and strip
to get rid of them:
df.columns = df.columns.map(lambda x: x.strip())
or alternatively
df.columns = df.columns.map(str.strip)
or simply (which should be the preferred option)
df.columns = df.columns.str.strip()
If you now call
df.columns
it yields
Index([u'A', u'B', u'C'], dtype='object')
If it is about the values and not the headers, you can also use applymap
:
df = pd.DataFrame({'A': ['1', '2 '], 'B': ['4 ', '5 '], 'C': ['8 ','9']})
A B C
0 1 4 8
1 2 5 9
Then the following gets rid of the trailing white spaces:
df.applymap(lambda x: x.strip())
or alternatively (which is the better option):
df.applymap(str.strip)
A B C
0 1 4 8
1 2 5 9
Note: This assumes, that you have only strings in your columns. You can also check this link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With