I have a pandas dataframe of shape ~ [200K, 40]. The dataframe has a categorical column (one of many) with over 1000 unique values. I can visualizee the value counts of each such unique column by using:
df['column_name'].value_counts()
How do i now club values with:
You can extract the values you want to mask from the index of value_counts and them map them to "miscellaneous" using replace:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, (2000, 2)), columns=['A', 'B'])
frequencies = df['A'].value_counts()
condition = frequencies<200 # you can define it however you want
mask_obs = frequencies[condition].index
mask_dict = dict.fromkeys(mask_obs, 'miscellaneous')
df['A'] = df['A'].replace(mask_dict) # or you could make a copy not to modify original data
Now, using value_counts will group all the values below your threshold as miscellaneous:
df['A'].value_counts()
df['A'].value_counts()
Out[18]:
miscellaneous 947
3 226
1 221
0 204
7 201
2 201
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With