Assume I have a sample figure like this:
l = [
['Visitors', '1 February 2020', 'Saturday', 'Shop A', 'In', '100', '20', '30','150', 'Out', '90', '10', '15', '115'],
['Visitors', '1 February 2020', 'Saturday', 'Shop B', 'In', '20', '10', '40', '70', 'Out', '10', '9', '0', '19'],
['Visitors', '1 February 2020', 'Saturday', 'Shop C', 'In', '42', '18', '20', '80', 'Out', '40', '10', '20', '70'],
['Visitors', '1 February 2020', 'Saturday', 'Shop D', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop E', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Shop F', 'In', '20', '19', '11', '50', 'Out', '10', '9', '5', '24'],
['Visitors', '1 February 2020', 'Saturday', 'Shop G', 'In', '25', '8', '33', '66', 'Out', '20', '6', '30', '56'],
['Visitors', '1 February 2020', 'Saturday', 'Shop H', 'In', '180', '88', '6', '274', 'Out', '170', '80', '5', '255'],
['Visitors', '1 February 2020', 'Saturday', 'Shop I', 'In', '0', '0', '0', '0', 'Out', '0', '0', '0', '0'],
['Visitors', '1 February 2020', 'Saturday', 'Total', 'In', '387', '163', '140', '690', 'Out', '340', '124', '75', '539'],
]
The figures show about how many men/women/children visit a shop a day and record their in and out record. The figure above can be interpreted as below:
[Ppl_type, Date, Weekday, Shop, In, Men, Women, Children, Total, Out, Men, Women, Children, Total]
enter image description here <-- the expected result
This is the result I would like to see. Generate the above figure into excel with the header below:
header= ['Ppl_type', 'Date', 'Weekday', 'Shop', 'In/Out', 'Visitor_Type', 'Number']
As a result, each shop will have six rows (i.e. three rows for "In" and three rows for "Out") summarizing the figures above.
I would like to know how can I do it by python and generate the result to excel. I have tried worksheet.write but seems only work for the first four columns. Thank you so much.
For a fully programmatic solution, you can use:
header= ['Ppl_type', 'Date', 'Weekday', 'Shop',
'In', 'Men', 'Women', 'Children', 'Total',
'Out', 'Men', 'Women', 'Children', 'Total']
df = pd.DataFrame(l, columns=header)
m1 = df.columns.isin(['In', 'Out'])
grp = df.columns.to_series().where(m1).ffill()
m2 = grp.notna()
m = m2 & ~m1
out = (
df.loc[:, m2==m]
.set_index(list(grp[~m2].index))
.astype(int)
.set_axis(pd.MultiIndex.from_arrays([df.columns[m], grp[m]],
names=('Visitor_Type', 'In/Out')), axis=1)
.stack(['In/Out', 'Visitor_Type']).reset_index(name='Number')
# uncomment the line below to remove the Total
#.loc[lambda d: d['Visitor_Type'].ne('Total')
)
Output:
Ppl_type Date Weekday Shop In/Out Visitor_Type Number
0 Visitors 1 February 2020 Saturday Shop A In Children 30
1 Visitors 1 February 2020 Saturday Shop A In Men 100
2 Visitors 1 February 2020 Saturday Shop A In Total 150
3 Visitors 1 February 2020 Saturday Shop A In Women 20
4 Visitors 1 February 2020 Saturday Shop A Out Children 15
5 Visitors 1 February 2020 Saturday Shop A Out Men 90
6 Visitors 1 February 2020 Saturday Shop A Out Total 115
7 Visitors 1 February 2020 Saturday Shop A Out Women 10
8 Visitors 1 February 2020 Saturday Shop B In Children 40
9 Visitors 1 February 2020 Saturday Shop B In Men 20
10 Visitors 1 February 2020 Saturday Shop B In Total 70
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With