I have a large dataset that is difficult to investigate without analysis tools. It's general form is this, but with 16 "ItemType0" columns and 16 "ItemType1", "ItemType2", etc columns.
It represents the properties (many of them) of up to 16 different items recorded at a single timestep, then properties of that timestep.
| Time | ItemType0[0].property | ItemType0[1].property | Property |
|---|---|---|---|
| 1 | 1 | 0 | 2 |
| 2 | 0 | 1 | 2 |
| 3 | 3 | 3 | 2 |
I'd like to receive:
| Time | ItemType0.property | Property |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 0 | 2 |
| 3 | 3 | 2 |
| 1 | 0 | 2 |
| 2 | 1 | 2 |
| 3 | 3 | 2 |
import pandas as pd
wide_df = pd.DataFrame({
"Time": [1,2,3],
"ItemType0[0].property": [1,0,3],
"ItemType0[1].property": [0,1,3],
"Property": [2,2,2]})
What I've tried:
Melt:
ids = [col for col in wide_df.columns if "[" not in col]
inter_df = pd.melt(wide_df, id_vars=ids, var_name="Source")
MemoryError: Unable to allocate 28.3 GiB for an array with shape (15,506831712) and data type uint32
I wouldn't even know where to begin with pd.wide_to_long as everything doesn't start with the same.
IIUC, you can try to group the properties by ItemTypeX and then explode:
df.columns = df.columns.str.replace(r"\[\d+\]", "", regex=True)
df = df.set_index(["Time", "Property"])
df = df.T.groupby(df.columns).agg(list).T
print(df.reset_index().explode(df.columns.to_list()))
Prints:
Time Property ItemType0.property
0 1 2 1
0 1 2 0
1 2 2 0
1 2 2 1
2 3 2 3
2 3 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With