Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 match values based on column name similarity

I have a dataframe of the following form:

Year 1 Grade Year 2 Grade Year 3 Grade Year 4 Grade Year 1 Students Year 2 Students Year 3 Students Year 4 Students
60 70 80 100 20 32 18 25

I would like to somehow transpose this table to the following format:

Year Grade Students
1 60 20
2 70 32
3 80 18
4 100 25

I created a list of years and initiated a new dataframe with the "year" column. I was thinking of matching the year integer to the column name containing it in the original DF, match and assign the correct value, but got stuck there.

like image 460
nadavb Avatar asked Oct 15 '25 18:10

nadavb


2 Answers

You need a manual reshaping using a split of the Index into a MultiIndex:

out = (df
 .set_axis(df.columns.str.split(expand=True), axis=1) # make MultiIndex
 .iloc[0]                                             # select row as Series
 .unstack()                                           # unstack Grade/Students
 .droplevel(0)                                        # remove literal "Year"
 .rename_axis('Year')                                 # set index name
 .reset_index()                                       # index to column
)

output:

  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

Or using pivot_longer from janitor:

# pip install pyjanitor

import janitor

out = (df.pivot_longer(
           names_to = ('ignore', 'Year', '.value'), 
           names_sep = ' ')
         .drop(columns='ignore')
       )

out 
  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25

The .value determines which parts of the sub labels in the columns are retained; the labels are split apart by names_sep, which can be a string or a regex. Another option is by using a regex, with names_pattern to split and reshape the columns:

df.pivot_longer(names_to = ('Year', '.value'), 
                names_pattern = r'.+(\d)\s(.+)')
  Year  Grade  Students
0    1     60        20
1    2     70        32
2    3     80        18
3    4    100        25
like image 140
mozway Avatar answered Oct 17 '25 06:10

mozway


Here's one way to do it. Feel free to ask questions about how it works.

import pandas as pd
cols = ["Year 1 Grade", "Year 2 Grade", "Year 3 Grade" , "Year 4 Grade",
        "Year 1 Students", "Year 2 Students", "Year 3 Students", "Year 4 Students"]
vals = [60,70,80,100,20,32,18,25]
vals = [[v] for v in vals]
df = pd.DataFrame({k:v for k,v in zip(cols,vals)})

grades = df.filter(like="Grade").T.reset_index(drop=True).rename(columns={0:"Grades"})
students = df.filter(like="Student").T.reset_index(drop=True).rename(columns={0:"Students"})
pd.concat([grades,students], axis=1)
like image 42
Jred Avatar answered Oct 17 '25 07:10

Jred



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!