Replacing a value in all cells of a DataFrame in Python

Question

I have an example df:

df = pd.DataFrame({'A': ['100,100', '200,200'],
                   'B': ['200,100,100', '100']})

        A          B
0   100,100    200,100,100
1   200,200    100

and I want to replace the commas ',' with nothing (basically, remove them). You can probably guess a real-world application, as many data is written with thousand separators, feel free to introduce me to a better method.

Now I read the documentation for pd.replace() here and I tried several versions of code - it raises no error, but does not modify my data frame.

df = df.replace(',','')
df = df.replace({',': ''})
df = df.replace([','],'')
df = df.replace([','],[''])

I can get it working when specifying the column names and using the ".str.replace()" method for Series, but imagine having 20 columns. I also can get this working specifying columns in the df.replace() method but there must be a more convenient way for such an easy task. I could write a custom function, but pandas is such an amazing library it must be something I am missing.

This works:

df['A'] = df['A'].str.replace(',','')

Thank you!

Ch3steR · Accepted Answer

df.replace has a parameter regex set it to True for partial matches.

By default regex param is False. When False it replaces only exact or fullmatches.

From Pandas docs:

str: string exactly matching to_replace will be replaced with the value.

df.replace(',', '', regex=True)

        A          B
0  100100  200100100
1  200200        100

In pd.Series.str.replace by default it's regex param is True.

From docs:

Equivalent to str.replace() or re.sub(), depending on the regex value.

Determines if assumes the passed-in pattern is a regular expression:

If True, assumes the passed-in pattern is a regular expression.
If False, treats the pattern as a literal string

Cameron Riddell · Answer

Though your immediate question has probably been answered, I wanted to mention that if you are reading this data in from a csv file, you can pass the thousands argument with a comma "," to indicate that this should be treated as an integer and remove the comma:

import io
import pandas as pd

csv_file = io.StringIO("""
A,B,C
"1,000","2,000","3,000"
1,2,3
"50,000",50,5
""")

df = pd.read_csv(csv_file, thousands=",")

print(df)
       A     B     C
0   1000  2000  3000
1      1     2     3
2  50000    50     5

print(df.dtypes)
A    int64
B    int64
C    int64
dtype: object

Replacing a value in all cells of a DataFrame in Python

Tags:

python

pandas

dataframe

JachymDvorak

2 Answers

Ch3steR

Cameron Riddell

Recent Activity

Donate For Us

Replacing a value in all cells of a DataFrame in Python

Tags:

python

pandas

dataframe

JachymDvorak

2 Answers

Ch3steR

Cameron Riddell

Related questions

Recent Activity

Donate For Us