data = [[1, 'tom', 10, 53, 2, 3, 9, 6 ], [2, 'nick', 1, 53, 2, 23, 4, 7], [3, 'juli', 9, 23, 2, 31, 9, 3]]
df = pd.DataFrame(data, columns = ['ID', 'Name', 'Apple.Food.0', 'Apple.Food.1', 'Apple.Food.2', 'Pear.Food.0', 'Pear.Food.1', 'Pear.Food.2'])
df
| ID | Name | Apple.Food.0 | Apple.Food.1 | Apple.Food.2 | Pear.Food.0 | Pear.Food.1 | Pear.Food.2 |
|---|---|---|---|---|---|---|---|
| 1 | tom | 10 | 53 | 2 | 3 | 9 | 6 |
| 2 | nick | 1 | 53 | 2 | 23 | 4 | 7 |
| 3 | juli | 9 | 23 | 2 | 31 | 9 | 3 |
I want to unpack last 6 columns to like: i.e. Apple.Food.0 - unpack prefix 'Apple' to the rows, unpack suffix '0' to the rows, keep 'Food' as the column name and keep the value there; then regenerate the ID column as the PK.
I tried to use .str.split() to split the column, however, the value will comes with all the spited columns. Any built-in pandas function could achieve this easily?
| ID | Name | Fruit | Time | Food |
|---|---|---|---|---|
| 1 | tom | Apple | 0 | 10 |
| 2 | tom | Apple | 1 | 53 |
| 3 | tom | Apple | 2 | 2 |
| 4 | tom | Pear | 0 | 3 |
| 5 | tom | Pear | 1 | 9 |
| 6 | tom | Pear | 2 | 6 |
| 7 | nick | Apple | 0 | 1 |
| 8 | nick | Apple | 1 | 53 |
| 9 | nick | Apple | 2 | 2 |
| 10 | nick | Pear | 0 | 23 |
| 11 | nick | Pear | 1 | 4 |
| 12 | nick | Pear | 2 | 7 |
| 13 | juli | Apple | 0 | 9 |
| 14 | juli | Apple | 1 | 23 |
| 15 | juli | Apple | 2 | 2 |
| 16 | juli | Pear | 0 | 31 |
| 17 | juli | Pear | 1 | 9 |
| 18 | juli | Pear | 2 | 3 |
Use DataFrame.set_index for MultiIndex with columns without . first, then str.split with columns, reshape by DataFrame.stack, check index names by DataFrame.rename_axis and last add DataFrame.reset_index:
df1 = df.set_index(['ID','Name'])
df1.columns = df1.columns.str.split('.', expand=True)
df1 = df1.stack([0,2]).rename_axis(['ID','Name','Fruit','Time']).reset_index()
print (df1)
ID Name Fruit Time Food
0 1 tom Apple 0 10
1 1 tom Apple 1 53
2 1 tom Apple 2 2
3 1 tom Pear 0 3
4 1 tom Pear 1 9
5 1 tom Pear 2 6
6 2 nick Apple 0 1
7 2 nick Apple 1 53
8 2 nick Apple 2 2
9 2 nick Pear 0 23
10 2 nick Pear 1 4
11 2 nick Pear 2 7
12 3 juli Apple 0 9
13 3 juli Apple 1 23
14 3 juli Apple 2 2
15 3 juli Pear 0 31
16 3 juli Pear 1 9
17 3 juli Pear 2 3
Performance:
#3k rows
df = pd.concat([df] * 1000, ignore_index=True)
def f1():
df1 = df.set_index(['ID','Name'])
df1.columns = df1.columns.str.split('.', expand=True)
df1 = df1.stack([0,2]).rename_axis(['ID','Name','Fruit','Time']).reset_index()
def f2():
x = df.melt(['ID', 'Name'], value_name='Food')
x[['Fruit', 'Time']] = x.variable.str.split('.', expand=True)[[0,2]]
x = x.sort_values(['ID', 'Fruit']).reset_index(drop=True).drop('variable', 1)
In [41]: %timeit f1()
27.4 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [42]: %timeit f2()
53.1 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With