I have a data frame with a number of columns but with three I am interested in. These are name, year and goals_scored. None of these columns are unique in that for example I have lines like the following:
Name Year Goals_scored
John Smith 2014 3
John Smith 2014 2
John Smith 2014 0
John Smith 2015 1
John Smith 2015 1
John Smith 2015 2
John Smith 2015 1
John Smith 2015 0
John Smith 2016 1
John Smith 2016 0
What I am trying to do is create a new data frame where I have 4 columns. One for name, then one for each of the years 2014, 2015 and 2016. The last three columns being a sum of the goals_scored for the year in question. So using the data above it would look like:
Name 2014 2015 2016
John Smith 5 5 1
To make it worse they only want it to include those names that have something for all three years.
Can anyone point me in the right direction?
Need groupby, aggregate sum and reshape by unstack:
df = df.groupby(['Name','Year'])['Goals_scored'].sum().unstack()
print (df)
Year 2014 2015 2016
Name
John Smith 5 5 1
Alternative pivot_table:
df = df.pivot_table(index='Name',columns='Year', values='Goals_scored', aggfunc='sum')
print (df)
Year 2014 2015 2016
Name
John Smith 5 5 1
Last for column from index:
df = df.reset_index().rename_axis(None, 1)
print (df)
Name 2014 2015 2016
0 John Smith 5 5 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With