I have the following column with many missing values '?' in store_data dataframe
>>>store_data['trestbps']
0 140
1 130
2 132
3 142
4 110
5 120
6 150
7 180
8 120
9 160
10 126
11 140
12 110
13 ?
I replaced all missing values with -999
store_data.replace('?', -999, inplace = True)
>>>store_data['trestbps']
0 140
1 130
2 132
3 142
4 110
5 120
6 150
7 180
8 120
9 160
10 126
11 140
12 110
13 -999
Now I want to bin the values, I used this code but the output appears all as Nan:
trestbps = store_data['trestbps']
trestbps_bins = [-999,120,140,200]
store_data['trestbps'] = pd.cut(trestbps,trestbps_bins)
>>>store_data['trestbps']
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
The categories work fine when there is no missing values. I want my output to be categorized from (0-12) and only 13 is replaced by -999. How can I achieve this?
IIUC, you may do:
bins=[0,120,140,200] #set bins
df.trestbps=pd.cut(df.trestbps,bins) #do the cut
df.trestbps=df.trestbps.values.add_categories(999) #add category as 999
df.trestbps.fillna(999) #fillna with 999
0 (120, 140]
1 (120, 140]
2 (120, 140]
3 (140, 200]
4 (0, 120]
5 (0, 120]
6 (140, 200]
7 (140, 200]
8 (0, 120]
9 (140, 200]
10 (120, 140]
11 (120, 140]
12 (0, 120]
13 999
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With