How to find special characters from Python Data frame

Question

I need to find special characters from entire dataframe.

In below data frame some columns contains special characters, how to find the which columns contains special characters?

enter image description here

Want to display text for each columns if it contains special characters.

rafaelc · Accepted Answer

You can setup an alphabet of valid characters, for example

import string
alphabet = string.ascii_letters+string.punctuation

Which is

'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\]^_`{|}~'

And just use

df.col.str.strip(alphabet).astype(bool).any()

For example,

df = pd.DataFrame({'col1':['abc', 'hello?'], 'col2': ['ÃÉG', 'Ç']})


    col1    col2
0   abc     ÃÉG
1   hello?  Ç

Then, with the above alphabet,

df.col1.str.strip(alphabet).astype(bool).any()
False
df.col2.str.strip(alphabet).astype(bool).any()
True

The statement special characters can be very tricky, because it depends on your interpretation. For example, you might or might not consider # to be a special character. Also, some languages (such as Portuguese) may have chars like ã and é but others (such as English) will not.

Plinus · Answer

To remove unwanted characters from dataframe columns, use regex:

def strip_character(dataCol):
    r = re.compile(r'[^a-zA-Z !@#$%&*_+-=|\:";<>,./()[\]{}\']')
    return r.sub('', dataCol)

df[resultCol] = df[dataCol].apply(strip_character)

How to find special characters from Python Data frame

Tags:

python-3.x

pandas

dataframe

special-characters

Learnings

2 Answers

rafaelc

Plinus

Recent Activity

Donate For Us

How to find special characters from Python Data frame

Tags:

python-3.x

pandas

dataframe

special-characters

Learnings

2 Answers

rafaelc

Plinus

Related questions

Recent Activity

Donate For Us