Im working with a database that contains the following structure
| ID | Test | Result |
|---|---|---|
| 12a | Test1 | Normal |
| 12a | Test3 | 678.2 |
| 3s5 | Test2 | <1 |
| 3s5 | Test1 | Normal |
| 8r5 | Test4 | Rectangular(3+) |
As you can see, the different tests have different result formats, and not every ID have all the tests.
I would like to transform this into something as follows:
| ID | Test1 | Test2 | Test3 | Test4 |
|---|---|---|---|---|
| 12a | Normal | NA | 678.2 | NA |
| 3s5 | Normal | <1 | NA | NA |
| 8r5 | NA | NA | NA | Rectangular(3+) |
I've tried with pandas.pivot but encountered the following error
df.pivot(index="ID",columns="Test",values="Result")
ValueError: Index contains duplicate entries, cannot reshape
Changing the Index to ID does not work, neither resetting index.
Any help will be greatly appreciated!
You can try using .pivot_table() instead of .pivot(), as follows:
df.pivot_table(index="ID", columns="Test", values="Result", aggfunc='first')
Result:
Test Test1 Test2 Test3 Test4
ID
12a Normal NaN 678.2 NaN
3s5 Normal <1 NaN NaN
8r5 NaN NaN NaN Rectangular(3+)
Here is a way of doing it :
df = {'ID': ['12a', '12a', '3s5', '3s5', '8r5'],
'Test': ['Test1', 'Test3', 'Test2', 'Test1', 'Test4'],
'Result': ['Normal', '678.2', '<1', 'Normal', 'Rectangular(3+)']}
df=df.groupby(['ID', 'Test'])['Result'].sum().unstack(fill_value="NA")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With