Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Unpivot Excel Data Where Category and Children Labels Are In Same Column

Basically, I think the easiest way to explain this is, I am trying expand a multi-indexed table, but the indexes are both in the same column.

My data is structured like this:

Row Labels Sum
Collection1 22
Data 1 10
Data 2 12
Collection2 33
Data 1 33
Collection3 45
Data 1 14
Data 2 31
Total 100

What I would like out is a Dataframe like this:

Row Labels Data1 Data2 Sum
Collection1 10 12 22
Collection2 33 0 33
Collection3 14 31 45
Total 57 43 100

Is there any built in pandas methods, or a straight forward approach to handling this type of translation?

I have tried manually breaking down the table and recreating it by collecting the row labels that are repeated, and making columns from them, with the data from rows with that label, but, the tricky spot is where child data is missing; like in the example above Collection2 Data2 doesn't exist. With this approach I could calculate for each row if Data1 equals Collection1 and if it does, add a 0 to Data2 at that index. But, it seems super ugly, and figured there is probably a much more elegant approach.

like image 639
user3552225 Avatar asked Oct 31 '25 23:10

user3552225


2 Answers

Using a pivot_table:

# identify groups
m = df['Row Labels'].str.match(r'Collection\d+|Total')

# reshape
out = (df
   .assign(index=df['Row Labels'].where(m).ffill(),
           col=df['Row Labels'].mask(m, 'Sum')
          )
   .pivot_table(index='index', columns='col', values='Sum', fill_value=0)
   .rename_axis(columns=None)
)

# recompute Total
out.loc['Total'] = out.drop('Total').sum()

out = out.reset_index()

Output:

         index  Data 1  Data 2  Sum
0  Collection1      10      12   22
1  Collection2      33       0   33
2  Collection3      14      31   45
3        Total      57      43  100
like image 123
mozway Avatar answered Nov 03 '25 14:11

mozway


I'm not sure if exists some straightforward Pandas solution, but you can try this example:

# remove the Total row - will recreate it as last step
df = df[df["Row Labels"] != "Total"]

# find the indices for pivoting
mask = df["Row Labels"].str.startswith("Collection")
df["idx"] = mask.cumsum()

# do the actual transformation here: pivot + merge
df = (
    pd.merge(
        df[mask],
        df[~mask].pivot(index="idx", columns="Row Labels", values="Sum"),
        left_on="idx",
        right_index=True,
    )
    .drop(columns=["idx"])
    .fillna(0)
)

# add Total row back
df = pd.concat(
    [
        df,
        pd.DataFrame(
            {"Row Labels": ["Total"], **{c: [df[c].sum()] for c in df.loc[:, "Sum":]}}
        ),
    ]
)

print(df)

Prints:

    Row Labels  Sum  Data 1  Data 2
0  Collection1   22    10.0    12.0
3  Collection2   33    33.0     0.0
5  Collection3   45    14.0    31.0
0        Total  100    57.0    43.0
like image 30
Andrej Kesely Avatar answered Nov 03 '25 12:11

Andrej Kesely



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!