Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Re-arrange 1D pandas DataFrame to 2d by splitting index names

Tags:

python

pandas

I have a 1D DataFrame that is indexed with keys of the form i_n, where i and n are strings (for the sake of this example, i is an integer number and n is a character). This would be a simple example:

       values
0_a  0.583772
1_a  0.782358
2_a  0.766844
3_a  0.072565
4_a  0.576667
0_b  0.503876
1_b  0.352815
2_b  0.512834
3_b  0.070908
4_b  0.074875
0_c  0.361226
1_c  0.526089
2_c  0.299183
3_c  0.895878
4_c  0.874512

Now I would like to re-arrange this DataFrame to be 2D such that the number (the part of the index name before the underscore) serves as column name and the character (the part of the index after the underscore) serves as index:

          0         1         2          3          4
a  0.583772  0.782358  0.766844  0.0725654   0.576667
b  0.503876  0.352815  0.512834  0.0709081  0.0748752
c  0.361226  0.526089  0.299183   0.895878   0.874512

I have a solution for the problem (the function convert_2d below), but I was wondering, whether there would be a more idiomatic way to achieve this. Here the code that was used to generate the original DataFrame and to convert it to the desired form:

import pandas as pd
import numpy as np

def convert_2d(df):
    df2 = pd.DataFrame(columns=['a','b','c'], index=list(range(5))).T

    names = set(idx.split('_')[1] for idx in df.index)
    numbers = set(idx.split('_')[0] for idx in df.index)

    for i in numbers:
        for n in names:
            df2[i][n] = df['values']['{}_{}'.format(i,n)]

    return df2



##generating 1d example data:
data = np.random.rand(15)
indices = ['{}_{}'.format(i,n) for n in ['a','b','c'] for i in range(5)]
df = pd.DataFrame(
    data, columns=['values']
).rename(index={i:idx for i,idx in enumerate(indices)})

print(df)

##converting to 2d
print(convert_2d(df))

Some notes about the index keys: it can be assumed (like in my function) that there are no 'missing keys' (i.e. a 2d array can always be achieved) and the only thing that can be taken for granted about the keys is the (single) underscore (i.e. the numbers and letters were only chosen for explanatory reasons, in reality there would be just two arbitrary strings connected by the underscore).

like image 492
Thomas Kühn Avatar asked Sep 03 '25 02:09

Thomas Kühn


1 Answers

IIUC Create the Multiple index thenunstack

df.index=pd.MultiIndex.from_tuples(df.index.str.split('_').map(tuple))
df['values'].unstack(level=0)
Out[65]: 

          0         1         2         3         4
a  0.583772  0.782358  0.766844  0.072565  0.576667
b  0.503876  0.352815  0.512834  0.070908  0.074875
c  0.361226  0.526089  0.299183  0.895878  0.874512
like image 103
BENY Avatar answered Sep 04 '25 16:09

BENY