I'm using Python 2.7 with Pandas 0.19.2
I've looked at some similar questions/answers to this problem but still haven't been able to solve.
I have the following dataframe:
Date name objects
0 2005-11-17 Pete 6
1 2014-02-04 Rob 3
2 2012-02-13 Rob 2
3 2004-12-16 Julia 4
4 2012-02-13 Mike 9
I want to reshape it as follows:
Pete Rob Julia Mike
2005-11-17 6 Nan Nan Nan
2014-02-04 Nan 3 Nan Nan
2012-02-13 Nan 2 Nan 9
2004-12-16 Nan Nan 4 Nan
I have been trying to run df.pivot(index='Date', columns='name', value='objects') but I get an 'index has duplicate value' error.
I thought that pivot or pivot_table work in these situations since there is essentially no 'conflict' between indexes and columns (e.g. multiple values for 'Rob' and '2014-02-04' in the dataframe).
I am not looking to aggregate - just reshape. Also in the example the dates are not in order - but that doesn't matter.
I feel this should be an easy fix but I can't see it. Can someone share some insight?
Thanks in advance.
It seems in your real data are duplicates, see sample:
print (df)
Date name objects
0 2005-11-17 Pete 6
1 2014-02-04 Rob 3
2 2012-02-13 Rob 2
3 2004-12-16 Julia 4
4 2012-02-13 Mike 9 <-duplicates for 2012-02-13 and Mike
5 2012-02-13 Mike 18 <-duplicates for 2012-02-13 and Mike
Solution are pivot_table with some aggregate function, default is np.mean but can be changed to sum, 'meadian', first, last.
df = df.pivot_table(index='Date', columns='name', values='objects', aggfunc=np.mean)
print (df)
name Julia Mike Pete Rob
Date
2004-12-16 4.0 NaN NaN NaN
2005-11-17 NaN NaN 6.0 NaN
2012-02-13 NaN 13.5 NaN 2.0 <-13.5 is mean
2014-02-04 NaN NaN NaN 3.0
Another solution with groupby, aggregate function and unstack:
df = df.groupby(['Date','name'])['objects'].mean().unstack()
print (df)
name Julia Mike Pete Rob
Date
2004-12-16 4.0 NaN NaN NaN
2005-11-17 NaN NaN 6.0 NaN
2012-02-13 NaN 13.5 NaN 2.0
2014-02-04 NaN NaN NaN 3.0
For checking duplicated is possible use duplicated with boolean indexing:
df = df[df.duplicated(subset=['Date','name'], keep=False)]
print (df)
Date name objects
4 2012-02-13 Mike 9
5 2012-02-13 Mike 18
You can use df.pivot():
In [205]: x.pivot(index='Date', columns='name', values='objects')
Out[205]:
name Julia Mike Pete Rob
Date
2004-12-16 4.0 NaN NaN NaN
2005-11-17 NaN NaN 6.0 NaN
2012-02-13 NaN 9.0 NaN 2.0
2014-02-04 NaN NaN NaN 3.0
alternative solution:
In [207]: x.groupby(['Date','name'])['objects'].first().unstack('name')
Out[207]:
name Julia Mike Pete Rob
Date
2004-12-16 4.0 NaN NaN NaN
2005-11-17 NaN NaN 6.0 NaN
2012-02-13 NaN 9.0 NaN 2.0
2014-02-04 NaN NaN NaN 3.0
Actually pivot_table() - does something very similar to mentioned solution
PS tested using Pandas version 0.19.2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With