I am getting the results I want but want to understand if this would be considered the best, or even a correct way of mapping data codes to descriptors.
I have a dataset where many of the values are stored as numeric codes which represent some attribute - e.g.
Fruit_Type:
1 = Apple,
2 = Orange,
3 = Banana,
4 = Grape
In SAS, I would have used a Proc Format to map the numeric to the descriptor. In SQL I would typically use a case statement which would let me either keep the original field name or assign it a new name.
I am fairly new to Python and am curious what would be considered the best approach to this. What I have been using - which seems to work fine is to create the mapping as a dictionary and then create a new column using the .apply function. This works but is it the right way to do this?
import pandas as pd
# Create sample dataframe
data = {'Fruit_Type':[1, 2, 2, 3, 1, 2, 4],
'other_data':['blah', 'blah','blah', 'blah','blah', 'blah',
'blah']}
df = pd.DataFrame(data)
#create dictionary
Fruit_Type_dictionary = {1: 'Apple',
2: 'Orange',
3: 'Banana',
4: 'Grape'}
df['rpt_Fruit_Type']= df['Fruit_Type'].apply(lambda x: Fruit_Type_dictionary.get(x))
print(df)
which yields:
Fruit_Type other_data rpt_Fruit_Type
0 1 blah Apple
1 2 blah Orange
2 2 blah Orange
3 3 blah Banana
4 1 blah Apple
5 2 blah Orange
6 4 blah Grape
which pretty much gives me my desired results.
I would use Series map method to improve readability:
df['rpt_Fruit_Type']= df['Fruit_Type'].map(Fruit_Type_dictionary)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With