Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good way to turn a pandas column multiindex into a column category?

Tags:

python

pandas

Probably easier to understand with an example so here we go:

random = np.random.uniform(size=(3))

what_i_have = pd.DataFrame({ 
    ('a', 'a'): random,
    ('b', 'b1'): np.linspace(3, 5, 3),
    ('b', 'b2'): np.linspace(6, 8, 3),
    ('b', 'b3'): np.linspace(9, 11, 3)
})

what_i_want = pd.DataFrame({
    ('a', 'a'): np.concatenate((random, random, random)),
    ('b', 'b_category'): ['b1']*3 + ['b2']*3 + ['b3']*3,
    ('b', 'b_value'): np.linspace(3, 11, 9)
})

print(what_i_have)
print('----------------------------------')
print(what_i_want)

Output:

          a    b           
          a   b1   b2    b3
0  0.587075  3.0  6.0   9.0
1  0.798710  4.0  7.0  10.0
2  0.206860  5.0  8.0  11.0
----------------------------------
          a          b        
          a b_category b_value
0  0.587075         b1     3.0
1  0.798710         b1     4.0
2  0.206860         b1     5.0
3  0.587075         b2     6.0
4  0.798710         b2     7.0
5  0.206860         b2     8.0
6  0.587075         b3     9.0
7  0.798710         b3    10.0
8  0.206860         b3    11.0

My issue is that my data doesn't just have b1 b2 b3, it also has b4, b5, b6... All the way to about b90. The obvious solution would be to make a loop creating 90 dataframes, one for each category, then concatenating them into one dataframe, but I imagine there must be a better way of doing it.

edit:

what_i_have.unstack() doesn't really solve the issue, as can be seen below. It could be an intermediate step but there's still some work to do with this result before reaching what I want and I don't see much of an advantage in doing this over the loop solution I've previously mentioned:

a  a   0     0.587075
       1     0.798710
       2     0.206860
b  b1  0     3.000000
       1     4.000000
       2     5.000000
   b2  0     6.000000
       1     7.000000
       2     8.000000
   b3  0     9.000000
       1    10.000000
       2    11.000000
like image 212
Janilson Avatar asked Feb 02 '26 21:02

Janilson


1 Answers

Keeping the MultiIndex, might be a better way out there though:

df = df.melt(id_vars=[df.columns[0]], var_name=['b','b_category'], value_name='b_value')
a = df[[('a','a')]]
b = df[['b', 'b_category', 'b_value']].pivot(columns='b').swaplevel(0,1, axis=1)
df = pd.concat([a, b], axis=1)
df.columns = pd.MultiIndex.from_tuples(df.columns)
print(df)

Output:

          a          b
          a b_category b_value
0  0.737076         b1     3.0
1  0.718409         b1     4.0
2  0.269516         b1     5.0
3  0.737076         b2     6.0
4  0.718409         b2     7.0
5  0.269516         b2     8.0
6  0.737076         b3     9.0
7  0.718409         b3    10.0
8  0.269516         b3    11.0
like image 83
BeRT2me Avatar answered Feb 04 '26 09:02

BeRT2me



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!