I have the below code:
import pandas as pd
df=pd.read_csv("https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1", encoding="latin-1")
pvt_received=df.pivot_table(index=['site'], values = ['received','sent'], aggfunc = { 'received' : 'count' ,'sent': 'count'}, fill_value=0, margins=True)
pvt_received['to_send']=pvt_received['received']-pvt_received['sent']
column_order = ['received', 'sent','to_send']
pvt_received_ordered = pvt_received.reindex_axis(column_order, axis=1)
pvt_received_ordered.to_csv("test_pivot.csv")
table_to_send = pd.read_csv('test_pivot.csv', encoding='latin-1')
table_to_send.rename(columns={'site':'Site','received':'Date Received','sent':'Date Sent','to_send':'Date To Send'}, inplace=True)
table_to_send.set_index('Site', inplace=True)
table_to_send
Which generate this table:
Date Received Date Sent Date To Send
Site
2 32.0 27.0 5.0
3 20.0 17.0 3.0
4 33.0 31.0 2.0
5 40.0 31.0 9.0
All 106.0 106.0 0.0
But this parameter margins=True is not giving correct result of total of each columns. For instance, Date Received should be 125 instead of 106, Date Sent should be 106 (it is correct) and Date To Send should be 19 instead of 0.0 (zero). Question: What am I supposed to change to get correct numbers? Also, there is lack on All that should do a sum of each row. Thanks a lot in advance.
It seems from your code that you create the Date To Send after the pivot table is constructed so it's just giving you the result of: 106.0 - 106.0. Also, they way margin values are calculated with the default dropna=True after grouping means that any row with a NaN or NaT will be dropped. Setting dropna=False should fix this problem.
I refactored your code to convert the received and sent columns to date_time format before creating the pivot table and the to_send column.
df2 = pd.read_csv(
"https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1"
,encoding="latin-1")
df2['received'] = pd.to_datetime(df2['received'])
df2['sent'] = pd.to_datetime(df2['sent'])
Then create the pivot table, which was originally intended.
pvt_received = df2.pivot_table(index=['site'], values=['received','sent'],\
aggfunc='count', margins=True, dropna=False)
pvt_received['to_send'] = pvt_received['received'] - pvt_received['sent']
pvt_received.rename(columns={'site':'Site'
,'received':'Date Received'
,'sent':'Date Sent'
,'to_send':'Date To Send'}
,inplace=True)
pvt_received
Date Received Date Sent Date To Send
Site
2 32 27 5
3 20 17 3
4 33 31 2
5 40 31 9
All 125 106 25
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With