Pandas: Transform pairs of columns to cells

Question

I have a pandas data frame like this:

ts | thing_0 | qty_0 | thing_1 | qty_1 | thing_2 | qty_2
--------------------------------------------------------
1  | dog     |     5 | cat     |     3 | mouse   |     1
2  | house   |     6 | dog     |     4 | cat     |     2
...

I want to transform this in a way that the things become columns and the qtys become the cell values. Like this:

ts | dog | cat | mouse | house
------------------------------
1  |   5 |   3 |     1 |     0
2  |   4 |   2 |     0 |     6
...

Currently, I'm doing this transformation manually by iterating over the df.values array manually, but this is very slow. Is there a faster way to implement this with pandas means?

I have seen df.pivot, but couldn't find a way to describe the relationship between thing_0 and qty_0.

unutbu · Accepted Answer

You could use lreshape to coalesce the thing and qty columns:

In [10]: pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',], 'qty':['qty_0','qty_1','qty_2']})
Out[10]: 
   ts  thing  qty
0   1    dog    5
1   2  house    6
2   1    cat    3
3   2    dog    4
4   1  mouse    1
5   2    cat    2

Then pivot to create the desired DataFrame:

import pandas as pd

df = pd.DataFrame({'qty_0': [5, 6], 'qty_1': [3, 4], 'qty_2': [1, 2], 'thing_0': ['dog', 'house'], 'thing_1': ['cat', 'dog'], 'thing_2': ['mouse', 'cat'], 'ts': [1, 2]})

reshaped = pd.lreshape(df, {'thing':['thing_0','thing_1','thing_2',], 
                            'qty':['qty_0','qty_1','qty_2']})

result = reshaped.pivot(index='ts', columns='thing', values='qty')
print(result)

yields

thing  cat  dog  house  mouse
ts                           
1      3.0  5.0    0.0    1.0
2      2.0  4.0    6.0    0.0

I think jezrael's solution is better since it takes advantage of the regularity of the column names you wish to coalesce. df.columns.str.split('_', expand=True) is more general and less repetitive than

{'thing':['thing_0','thing_1','thing_2',], 
 'qty':['qty_0','qty_1','qty_2']}

lreshape might be helpful in situations where the column names you wish to coalesce are irregular.

jezrael · Answer

You can create MultiIndex from columns by str.split and then reshape by stack and unstack:

df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=1, drop=True)
       .set_index('thing', append=True)['qty'].unstack(fill_value=0)
print (df)
thing  cat  dog  house  mouse
ts                           
1        3    5      0      1
2        2    4      6      0

Another solution with pivot instead unstack:

df = df.set_index('ts')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index()
df = df.pivot(index='ts', columns='thing', values='qty').fillna(0).astype(int)
print (df)
thing  cat  dog  house  mouse
ts                           
1        3    5      0      1
2        2    4      6      0

And third solution - dynamically create dict for lreshape and then use unstack:

Notice - columns have to be sorted, if not add df = df.sort_index(axis=1)

t = [x for x in df.columns if x.startswith('thing')]
q = [x for x in df.columns if x.startswith('qty')]
df = pd.lreshape(df, {'thing':t, 'qty':q})
       .set_index(['ts','thing'])['qty'].unstack(fill_value=0)

print (df)
thing  cat  dog  house  mouse
ts                           
1        3    5      0      1
2        2    4      6      0

EDIT:

lreshape is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).

Possible solution is merging all 3 functions to one - maybe melt, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.

Pandas: Transform pairs of columns to cells

Tags:

python

pandas

knub

2 Answers

unutbu

jezrael

Recent Activity

Donate For Us

Pandas: Transform pairs of columns to cells

Tags:

python

pandas

knub

2 Answers

unutbu

jezrael

Related questions

Recent Activity

Donate For Us