How can I encode the column values of string types in the data table by integer values. For example I have two feature variables: color (possible string values R, G and B) and skills ( with possible string values C++ , Java, SQL and Python). Given Data-table has two columns-
Color' -> R G B B G R B G G R G ;
Skills' -> Java , C++, SQL, Java, Python, Python, SQL, C++, Java, SQL, Java.
I want to know which sklearn function/method will transform above two columns as with R=0, G=1 and B=2 and with C++ =0, Java=1, SQL=2 and Python=3 :
Color: 0, 1, 2, 2, 1, 0, 2, 1, 1, 0, 1
Skills: 1, 0, 2, 1, 3, 3, 2, 0, 1, 2, 1
Kindly, let me know how to do this ??
Use Sckit-learn LabelEncoder() method
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame({
'colors': ["R" ,"G", "B" ,"B" ,"G" ,"R" ,"B" ,"G" ,"G" ,"R" ,"G" ],
'skills': ["Java" , "C++", "SQL", "Java", "Python", "Python", "SQL","C++", "Java", "SQL", "Java"]
})
def encode_df(dataframe):
le = LabelEncoder()
for column in dataframe.columns:
dataframe[column] = le.fit_transform(dataframe[column])
return dataframe
#encode the dataframe
encode_df(df)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With