I have a pandas data frame like this:
ts | thing_0 | qty_0 | thing_1 | qty_1 | thing_2 | qty_2
--------------------------------------------------------
1 | dog | 5 | cat | 3 | mouse | 1
2 | house | 6 | dog | 4 | cat | 2
...
I want to transform this in a way that the things become columns and the qtys become the cell values. Like this:
ts | dog | cat | mouse | house
------------------------------
1 | 5 | 3 | 1 | 0
2 | 4 | 2 | 0 | 6
...
Currently, I'm doing this transformation manually by iterating over the df.values array manually, but this is very slow. Is there a faster way to implement this with pandas means?
I have seen df.pivot, but couldn't find a way to describe the relationship between thing_0 and qty_0.
You could use lreshape to coalesce the thing and qty columns:
In [10]: pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',], 'qty':['qty_0','qty_1','qty_2']})
Out[10]:
ts thing qty
0 1 dog 5
1 2 house 6
2 1 cat 3
3 2 dog 4
4 1 mouse 1
5 2 cat 2
Then pivot to create the desired DataFrame:
import pandas as pd
df = pd.DataFrame({'qty_0': [5, 6], 'qty_1': [3, 4], 'qty_2': [1, 2], 'thing_0': ['dog', 'house'], 'thing_1': ['cat', 'dog'], 'thing_2': ['mouse', 'cat'], 'ts': [1, 2]})
reshaped = pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',],
'qty':['qty_0','qty_1','qty_2']})
result = reshaped.pivot(index='ts', columns='thing', values='qty')
print(result)
yields
thing cat dog house mouse
ts
1 3.0 5.0 0.0 1.0
2 2.0 4.0 6.0 0.0
I think jezrael's solution is better since it takes advantage of the regularity of the column names you wish to coalesce. df.columns.str.split('_', expand=True) is more general and less repetitive than
{'thing':['thing_0','thing_1','thing_2',],
'qty':['qty_0','qty_1','qty_2']}
lreshape might be helpful in situations where the column names you wish to coalesce
are irregular.
You can create MultiIndex from columns by str.split and then reshape by stack and unstack:
df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=1, drop=True)
.set_index('thing', append=True)['qty'].unstack(fill_value=0)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
Another solution with pivot instead unstack:
df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index()
df = df.pivot(index='ts', columns='thing', values='qty').fillna(0).astype(int)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
And third solution - dynamically create dict for lreshape and then use unstack:
Notice - columns have to be sorted, if not add df = df.sort_index(axis=1)
t = [x for x in df.columns if x.startswith('thing')]
q = [x for x in df.columns if x.startswith('qty')]
df = pd.lreshape(df, {'thing':t, 'qty':q})
.set_index(['ts','thing'])['qty'].unstack(fill_value=0)
print (df)
thing cat dog house mouse
ts
1 3 5 0 1
2 2 4 6 0
EDIT:
lreshape is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).
Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With