How to remove rows from a categorical variable whose value counts do not satisfy a condition?

Question

I am new to ML and Data Science (recently graduated from Master's in Business Analytics) and learning as much as I can by myself now while looking for positions in Data Science / Business Analytics.

I am working on a practice dataset with a goal of predicting which customers are likely to miss their scheduled appointment. One of the columns in my dataset is "Neighbourhood", which contains names of over 30 different neighborhoods. My dataset has 10,000 observations, and some neighborhood names only appear less than 50 times. I think that neighborhoods that appear less than 50 times in the dataset are too rare to be analyzed properly by machine learning models. Therefore, I want to remove the names of the neighborhoods from the "Neighborhood" column which appear in that column less than 50 times.

I have been trying to write a code for this for several hours, but struggle to get it right. So far, I have gotten to the version below:

my_df = my_df.drop(my_df["Neighbourhood"].value_counts() < 50, axis = 0)

I have also tried other versions of code to get rid of the rows in that categorical column, but I keep getting a similar error:

KeyError: '[False False ...  True  True] not found in axis'

I appreciate your help in advance, and thank you for sharing your knowledge and insights with me!

katardin · Accepted Answer

Try the code below - it uses the .loc operator to select rows on the basis of a certain condition (i.e. in neighborhoods with high counts)

counts = my_df['Neighborhood'].value_counts()
new_df = my_df.loc[my_df['Neighborhood'].isin(counts.index[counts > 50])]

How to remove rows from a categorical variable whose value counts do not satisfy a condition?

Tags:

python

pandas

dataframe

categorical-data

data-cleaning

Arsik36

1 Answers

katardin

Recent Activity

Donate For Us

How to remove rows from a categorical variable whose value counts do not satisfy a condition?

Tags:

python

pandas

dataframe

categorical-data

data-cleaning

Arsik36

1 Answers

katardin

Related questions

Recent Activity

Donate For Us