Python 3 match values based on column name similarity

Question

I have a dataframe of the following form:

Year 1 Grade	Year 2 Grade	Year 3 Grade	Year 4 Grade	Year 1 Students	Year 2 Students	Year 3 Students	Year 4 Students
60	70	80	100	20	32	18	25

I would like to somehow transpose this table to the following format:

Year	Grade	Students
1	60	20
2	70	32
3	80	18
4	100	25

I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.

mozway · Accepted Answer

You need a manual reshaping using a split of the Index into a MultiIndex:

out = (df
 .set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
 .iloc[0]                                             # select row as Series
 .unstack()                                           # unstack Grade/Students
 .droplevel(0)                                        # remove literal "Year"
 .rename_axis('Year')                                 # set index name
 .reset_index()                                       # index to column
)

output:

  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

Or using pivot_longer from janitor:

# pip install pyjanitor

import janitor

out = (df.pivot_longer(
           names_to = ('ignore', 'Year', '.value'), 
           names_sep = ' ')
         .drop(columns='ignore')
       )

out 
  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

The .value determines which parts of the sub labels in the columns are retained; the labels are split apart by names_sep, which can be a string or a regex. Another option is by using a regex, with names_pattern to split and reshape the columns:

df.pivot_longer(names_to = ('Year', '.value'), 
                names_pattern = r'.+(\d)\s(.+)')
  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

Jred · Answer

Here's one way to do it. Feel free to ask questions about how it works.

import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
        "Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})

grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)

Python 3 match values based on column name similarity

Tags:

python

python-3.x

pandas

dataframe

nadavb

2 Answers

mozway

Jred

Recent Activity

Donate For Us

Python 3 match values based on column name similarity

Tags:

python

python-3.x

pandas

dataframe

nadavb

2 Answers

mozway

Jred

Related questions

Recent Activity

Donate For Us