I have an initial data frame that looks like this:
Serial No. Data One Data Two
0 01 0.258625 0.667996
1 01 0.192356 0.723055
2 01 0.738066 0.266488
3 01 0.374525 0.059664
4 01 0.193977 0.104213
5 01 0.213749 0.366608
⁞
49 05 0.948550 0.823888
I want to rearrange it into a multi-index data frame with columns of data type and serial number:
Data Data One Data Two
Serial No. 01 02 03 04 05 01 02 03 04 05
To generate and fill the multi-index Data Frame I've written a small block of code:
serial_numbers = ["01", "02", "03", "04", "05"]
headings = ["Serial No.", "Data One", "Data Two"]
arrays = [[], []]
for ii, heading in enumerate(headings[1:]):
arrays[0] += [heading]*len(serial_numbers)
arrays[1] += serial_numbers
index = pd.MultiIndex.from_tuples(list(zip(*arrays)), names=["Data", "Serial No."])
group = random_data.groupby(headings[0])
data = pd.DataFrame(np.zeros((len(group.get_group(serial_numbers[0])), len(index))), columns=index)
for heading in headings[1:]:
for serial_number in serial_numbers:
data.loc[:, (heading, serial_number)] = group.get_group(serial_number).loc[:, heading]
The above code creates a Data Frame where the first serial number of each data type is filled with values and the rest are NaNs:
Data Data One Data Two
Serial No. 01 02 03 04 05 01 02 03 04 05
0 0.258625 NaN NaN NaN NaN 0.667996 NaN NaN NaN NaN
1 0.192356 NaN NaN NaN NaN 0.723055 NaN NaN NaN NaN
2 0.738066 NaN NaN NaN NaN 0.266488 NaN NaN NaN NaN
3 0.374525 NaN NaN NaN NaN 0.059664 NaN NaN NaN NaN
4 0.193977 NaN NaN NaN NaN 0.104213 NaN NaN NaN NaN
5 0.213749 NaN NaN NaN NaN 0.366608 NaN NaN NaN NaN
6 0.829126 NaN NaN NaN NaN 0.972882 NaN NaN NaN NaN
7 0.494763 NaN NaN NaN NaN 0.482118 NaN NaN NaN NaN
8 0.024283 NaN NaN NaN NaN 0.538428 NaN NaN NaN NaN
9 0.700613 NaN NaN NaN NaN 0.067831 NaN NaN NaN NaN
Why does this work for the first serial number of each data type but not others?
In the real world the input data comes from a CSV and pandas.read_csv but for the purpose of this question I've used numpy to generate random data and some other code to structure it.
serial_no_col = []
for serial_number in serial_numbers:
for ii in range(10):
serial_no_col.append(serial_number)
random_data = pd.DataFrame(np.random.rand(50, 3), columns=headings)
random_data.loc[:, "Serial No."] = serial_no_col
Use cumcount for new indices created by set_index and then unstack:
print (df)
Serial No. Data One Data Two
0 01 0.258625 0.667996
1 01 0.192356 0.723055
2 02 0.738066 0.266488
3 02 0.374525 0.059664
4 03 0.193977 0.104213
5 03 0.213749 0.366608
49 05 0.948550 0.823888
df = df.set_index([df.groupby('Serial No.').cumcount(), 'Serial No.']).unstack()
print (df)
Data One Data Two \
Serial No. 01 02 03 05 01 02
0 0.258625 0.738066 0.193977 0.94855 0.667996 0.266488
1 0.192356 0.374525 0.213749 NaN 0.723055 0.059664
Serial No. 03 05
0 0.104213 0.823888
1 0.366608 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With