Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pivot Table margins=True not summing well

I have the below code:

import pandas as pd
df=pd.read_csv("https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1", encoding="latin-1")
pvt_received=df.pivot_table(index=['site'], values = ['received','sent'], aggfunc = {  'received' : 'count' ,'sent': 'count'}, fill_value=0, margins=True) 
pvt_received['to_send']=pvt_received['received']-pvt_received['sent']
column_order = ['received', 'sent','to_send']
pvt_received_ordered = pvt_received.reindex_axis(column_order, axis=1)
pvt_received_ordered.to_csv("test_pivot.csv")
table_to_send = pd.read_csv('test_pivot.csv', encoding='latin-1')
table_to_send.rename(columns={'site':'Site','received':'Date Received','sent':'Date Sent','to_send':'Date To Send'}, inplace=True)
table_to_send.set_index('Site', inplace=True)
table_to_send

Which generate this table:

      Date Received       Date Sent       Date To Send
Site            
2         32.0             27.0           5.0
3         20.0             17.0           3.0
4         33.0             31.0           2.0
5         40.0             31.0           9.0
All       106.0            106.0          0.0

But this parameter margins=True is not giving correct result of total of each columns. For instance, Date Received should be 125 instead of 106, Date Sent should be 106 (it is correct) and Date To Send should be 19 instead of 0.0 (zero). Question: What am I supposed to change to get correct numbers? Also, there is lack on All that should do a sum of each row. Thanks a lot in advance.

like image 237
MGB.py Avatar asked Feb 17 '26 18:02

MGB.py


1 Answers

It seems from your code that you create the Date To Send after the pivot table is constructed so it's just giving you the result of: 106.0 - 106.0. Also, they way margin values are calculated with the default dropna=True after grouping means that any row with a NaN or NaT will be dropped. Setting dropna=False should fix this problem.

I refactored your code to convert the received and sent columns to date_time format before creating the pivot table and the to_send column.

df2 = pd.read_csv(
         "https://www.dropbox.com/s/90y07129zn351z9/test_data.csv?dl=1"
         ,encoding="latin-1")
df2['received'] = pd.to_datetime(df2['received'])
df2['sent'] = pd.to_datetime(df2['sent'])

Then create the pivot table, which was originally intended.

pvt_received = df2.pivot_table(index=['site'], values=['received','sent'],\
    aggfunc='count', margins=True, dropna=False)

pvt_received['to_send'] = pvt_received['received'] - pvt_received['sent']
pvt_received.rename(columns={'site':'Site'
                             ,'received':'Date Received'
                             ,'sent':'Date Sent'
                             ,'to_send':'Date To Send'}
                             ,inplace=True)
pvt_received

        Date Received   Date Sent   Date To Send
Site            
2       32              27          5
3       20              17          3
4       33              31          2
5       40              31          9
All     125             106         25
like image 70
KT12 Avatar answered Feb 20 '26 07:02

KT12



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!