I have a data frame with two MultiIndexes (in what I understand to be "wide format") and I want to reshape it so that the second index becomes a new column (what I understand to be "long format"). Is there any easy way to do this?
This is the data frame now:
A = pd.DataFrame([[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]],
columns=['h', 'k'], index=[1, 2, 3, 4, 5])
B = pd.DataFrame([[3, 4], [3, 4], [3, 4], [3, 4], [3, 4]],
columns=['h', 'k'], index=[1, 2, 3, 4, 5])
C = pd.DataFrame([[5, 6], [5, 6], [5, 6], [5, 6], [5, 6]],
columns=['h', 'k'], index=[1, 2, 3, 4, 5])
A.columns = pd.MultiIndex.from_product([['A'], A.columns])
B.columns = pd.MultiIndex.from_product([['B'], B.columns])
C.columns = pd.MultiIndex.from_product([['C'], C.columns])
ABC = pd.concat([A, B, C], axis = 1)
A B C
h k h k h k
1 1 2 3 4 5 6
2 1 2 3 4 5 6
3 1 2 3 4 5 6
4 1 2 3 4 5 6
5 1 2 3 4 5 6
This is what I want to achieve:
A B C new_col
1 1 3 5 h
2 1 3 5 h
3 1 3 5 h
4 1 3 5 h
5 1 3 5 h
1 2 4 6 k
2 2 4 6 k
3 2 4 6 k
4 2 4 6 k
5 2 4 6 k
(I hope to be able to do this directly on the data frame and not have to transform it into another object, e.g. numpy array, to avoid decreased performance).
Try:
print(
ABC.stack(level=1)
.reset_index(1)
.rename(columns={"level_1": "new_col"})
.sort_values("new_col")
)
Prints:
new_col A B C
1 h 1 3 5
2 h 1 3 5
3 h 1 3 5
4 h 1 3 5
5 h 1 3 5
1 k 2 4 6
2 k 2 4 6
3 k 2 4 6
4 k 2 4 6
5 k 2 4 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With