Stacking in pandas dataframe based on column name

Question

I have following pandas Dataframe:

 ID   Year  Jan_salary  Jan_days  Feb_salary Feb_days Mar_salary Mar_days
  1   2016     4500        22         4200      18        4700       24
  2   2016     3800        23         3600      19        4400       23
  3   2016     5500        21         5200      17        5300       23

I want to convert this dataframe to following dataframe:

ID     Year     month   salary   days
 1      2016      01     4500     22
 1      2016      02     4200     18
 1      2016      03     4700     24
 2      2016      01     3800     23
 2      2016      02     3600     19
 2      2016      03     4400     23
 3      2016      01     5500     21
 3      2016      02     5200     17
 3      2016      03     5300     23

I tried use pandas.DataFrame.stack but couldn't get the expected outcome. I am using Python 2.7 Please guide me to reshape this Pandas dataframe. Thanks.

piRSquared · Accepted Answer

df = df.set_index(['ID', 'Year'])
df.columns = df.columns.str.split('_', expand=True).rename('month', level=0)
df = df.stack(0).reset_index()
md = dict(Jan='01', Feb='02', Mar='03')
df.month = df.month.map(md)


df[['ID', 'Year', 'month', 'salary', 'days']]

enter image description here

mechanical_meat · Answer

I love pd.melt so that's what I used in this long-winded approach:

ldf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_salary','Feb_salary','Mar_salary'],
              var_name='month',value_name='salary')
rdf = pd.melt(df,id_vars=['ID','Year'],
              value_vars=['Jan_days','Feb_days','Mar_days'],
              value_name='days')
rdf.drop(['ID','Year','variable'],inplace=True,axis=1)
cdf = pd.concat([ldf,rdf],axis=1)
cdf['month'] = cdf['month'].str.replace('_salary','')
import calendar
def mapper(month_abbr):
    # from http://stackoverflow.com/a/3418092/42346
    d = {v: str(k).zfill(2) for k,v in enumerate(calendar.month_abbr)}
    return d[month_abbr]
cdf['month'] = cdf['month'].apply(mapper)

Result:

>>> cdf
   ID  Year month  salary  days
0   1  2016    01    4500    22
1   2  2016    01    3800    23
2   3  2016    01    5500    21
3   1  2016    02    4200    18
4   2  2016    02    3600    19
5   3  2016    02    5200    17
6   1  2016    03    4700    24
7   2  2016    03    4400    23
8   3  2016    03    5300    23

Stacking in pandas dataframe based on column name

Tags:

pandas

dataframe

python-2.7

reshape

liferacer

2 Answers

piRSquared

mechanical_meat

Recent Activity

Donate For Us

Stacking in pandas dataframe based on column name

Tags:

pandas

dataframe

python-2.7

reshape

liferacer

2 Answers

piRSquared

mechanical_meat

Related questions

Recent Activity

Donate For Us