Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tricky conversion of field names to values while performing row by row de-aggregation (using Pandas)

I have a dataset where I would like to convert specific field names to values while performing a de aggregation the values into their own unique rows as well as perform a long pivot.

Data

Start       Date        End         Area    Final       Type    Middle Stat Low Stat    High Stat  Middle Stat1 Low Stat1    High Stat1
8/1/2013    9/1/2013    10/1/2013   NY      3/1/2023    CC      226         20          10         0             0            0
8/1/2013    9/1/2013    10/1/2013   CA      3/1/2023    AA      130         50          0          0             0            0






data = {
    "Start": ['8/1/2013', '8/1/2013'],
    "Date": ['9/1/2013', '9/1/2013'],
    "End": ['10/1/2013', '10/1/2013'],
    "Area": ['NY', 'CA'],
    "Final": ['3/1/2023', '3/1/2023'],
    "Type": ['CC', 'AA'],
    "Middle Stat": [226, 130],
    "Low Stat": [20, 50],
    "High Stat": [10, 0],
    "Middle Stat1": [0, 0],
    "Low Stat1": [0, 0],
    "High Stat1": [0, 0]
}

                        

Desired

Start       Date        End         Area    Final       Type    Stat    Range   Stat1
8/1/2013    9/1/2013    10/1/2013   NY      3/1/2023    CC      20      Low     0
8/1/2013    9/1/2013    10/1/2013   CA      3/1/2023    AA      50      Low     0
8/1/2013    9/1/2013    10/1/2013   NY      3/1/2023    CC      226     Middle  0
8/1/2013    9/1/2013    10/1/2013   CA      3/1/2023    AA      130     Middle  0
8/1/2013    9/1/2013    10/1/2013   NY      3/1/2023    CC      10      High    0
8/1/2013    9/1/2013    10/1/2013   CA      3/1/2023    AA      0       High    0

Doing

I believe I have to inject some sort of wide to long method, (SO member assisted) however unsure how to incorporate this whilst having the same suffix in the targeted (columns of interest) column names.

pd.wide_to_long(df, 
                stubnames=['Low','Middle','High'],
                i=['Start','Date','End','Area','Final'],
                j='',
                sep=' ',
                suffix='(stat)'
).unstack(level=-1, fill_value=0).stack(level=0).reset_index()

Any suggestion is appreciated.

#Original Dataset

import pandas as pd

# create DataFrame
data = {'Start': ['9/1/2013', '10/1/2013', '11/1/2013', '12/1/2013'],
        'Date': ['10/1/2016', '11/1/2016', '12/1/2016', '1/1/2017'],
        'End': ['11/1/2016', '12/1/2016', '1/1/2017', '2/1/2017'],
        'Area': ['NY', 'NY', 'NY', 'NY'],
        'Final': ['3/1/2023', '3/1/2023', '3/1/2023', '3/1/2023'],
        'Type': ['CC', 'CC', 'CC', 'CC'],
        'Low Stat': ['', '', '', ''],
        'Low Stat1': ['', '', '', ''],
        'Middle Stat': ['0', '0', '0', '0'],
        'Middle Stat1': ['0', '0', '0', '0'],
        'Re': ['','','',''],
        'Set': ['0', '0', '0', '0'],
        'Set2': ['0', '0', '0', '0'],
        'Set3': ['0', '0', '0', '0'],
        'High Stat': ['', '', '', ''],
        'High Stat1': ['', '', '', '']}

df = pd.DataFrame(data)
like image 277
Lynn Avatar asked Oct 28 '25 22:10

Lynn


2 Answers

df.melt(id_vars=df.columns[:6], value_name='Values')
      Start      Date        End Area     Final Type     variable  Values
0  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC  Middle Stat    226
1  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA  Middle Stat    130
2  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC    Low Stat      20
3  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA    Low Stat      50
4  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC   High Stat      10
5  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA   High Stat       0
like image 127
Laurent B. Avatar answered Oct 30 '25 14:10

Laurent B.


One option is with pivot_longer from pyjanitor - in this case we use the special placeholder .value to identify the parts of the column that we want to remain as headers, while the rest get collated into a new column :

# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    index = slice('Start', 'Type'), 
    names_to = ("Range", ".value"), 
    names_sep = " ")
)
      Start      Date        End Area     Final Type   Range  Stat  Stat1
0  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC  Middle   226      0
1  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA  Middle   130      0
2  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC     Low    20      0
3  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA     Low    50      0
4  8/1/2013  9/1/2013  10/1/2013   NY  3/1/2023   CC    High    10      0
5  8/1/2013  9/1/2013  10/1/2013   CA  3/1/2023   AA    High     0      0
like image 42
sammywemmy Avatar answered Oct 30 '25 15:10

sammywemmy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!