Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Label encoding without NAN Value

I would like to encode categorical variables without encoding the missing values. For the moment, I could not find the right solution, here is my code:


# To define my df :
df = pd.DataFrame({'A': ['X', np.NaN, 'Z'], 'B': ['DB', 'AB', 'CA'], 'C': ['KH', 1, np.NaN]})
df :

A   B   C
0   X   DB  KH
1   NaN AB  1
2   Z   CA  NaN
# To encoding juste A variable :
Le = preprocessing.LabelEncoder()
target = Le.fit_transform(df['A'].astype(str))

# but this method also encodes NAN values

# then I tried another handle but it does not work:

Le = preprocessing.LabelEncoder()

# define the values of A not null and try again labelencoding:

Anotnull = df.loc[df['A'] != np.nan]
target = Le.fit_transform(Anotnull.astype(str))

The objective is to make labelencoding without touching the NaN values

like image 747
Ib D Avatar asked Oct 16 '25 04:10

Ib D


1 Answers

So this is not technically label encoding "without touching the nans" but it will leave you with a label encoded data frame with the nans in their original place.

import pandas as pd
from sklearn.preprocessing import LabelEncoder


df_raw = pd.DataFrame({"feature1": ["a", "b", "c", np.nan, "e"],
                       "feature2": ["h", "i", np.nan, "k", "l"]})

# 1st possibility
df_temp = df_raw.astype("str").apply(LabelEncoder().fit_transform)
df_final = df_temp.where(~df_raw.isna(), df_raw)

# 2nd possibility
df_temp = df_raw.astype("category").apply(lambda x: x.cat.codes)
df_final = df_temp.where(~df_raw.isna(), df_raw)
like image 190
Scriddie Avatar answered Oct 17 '25 18:10

Scriddie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!