I have the following dataframe:
x = pd.DataFrame([
{'F1': 'a', 'F1_D': 'aa', 'F2': 'a', 'F2_D': 'aa', 'F3': 'a', 'F3_D': 'aa'},
{'F1': 'b', 'F1_D': 'bb', 'F2': 'b', 'F2_D': 'bb', 'F3': 'b', 'F3_D': 'bb'},
{'F1': 'c', 'F1_D': 'cc', 'F2': 'c', 'F2_D': 'cc', 'F3': 'c', 'F3_D': 'cc'},
{'F1': 'd', 'F1_D': 'dd', 'F2': 'd', 'F2_D': 'dd', 'F3': 'd', 'F3_D': 'dd'},
])
>>>x
F1 F1_D F2 F2_D F3 F3_D
0 a aa a aa a aa
1 b bb b bb b bb
2 c cc c cc c cc
3 d dd d dd d dd
I want to transform this dataframe to long form but have two value variables, like below:
col1 col2 col3
F1 a aa
F2 a aa
F3 a aa
F1 b bb
F2 b bb
. . .
. . .
. . .
F3 d dd
First add _col2 for columns names if not ending by _D in rename:
f = lambda x: f'{x}' if x.endswith('_D') else f'{x}_col2'
x = x.rename(columns=f)
print (x)
F1_col2 F1_D F2_col2 F2_D F3_col2 F3_D
0 a aa a aa a aa
1 b bb b bb b bb
2 c cc c cc c cc
3 d dd d dd d dd
And then reshape by split with _ and DataFrame.stack, last use DataFrame.rename_axis and DataFrame.reset_index for some data cleaning:
x.columns = x.columns.str.split('_', expand=True)
df = (x.stack(0)
.rename_axis([None, 'col1'])
.reset_index(level=1)
.reset_index(drop=True)
.rename(columns={'D':'col3'}))
print (df)
col1 col3 col2
0 F1 aa a
1 F2 aa a
2 F3 aa a
3 F1 bb b
4 F2 bb b
5 F3 bb b
6 F1 cc c
7 F2 cc c
8 F3 cc c
9 F1 dd d
10 F2 dd d
11 F3 dd d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With