I have a large df of information that I want to add a 'Total' column to. However, when I use the sum() method the resulting column is populated only with 0 values.
Here is a slice of my df:
COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4
0 Alabama 4.099099 4.090001 2.042345 NaN
1 Alaska 1.396396 1.390001 1.000000 1.000000
2 Arizona 4.189189 NaN 2.003257 1.537777
3 Arkansas 2.927928 2.920001 2.208723 NaN
4 California 3.378378 3.780001 1.754930 2.012395
To add the Total column, I did the following:
df['Total'] = df.sum(axis=1)
which created the following df:
COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4 Total
0 Alabama 4.099099 4.090001 2.042345 NaN 0.0
1 Alaska 1.396396 1.390001 1.000000 1.000000 0.0
2 Arizona 4.189189 NaN 2.003257 1.537777 0.0
3 Arkansas 2.927928 2.920001 2.208723 NaN 0.0
4 California 3.378378 3.780001 1.754930 2.012395 0.0
I then tried a different approach, to add each column to the Total column one at a time:
for col in df:
df['Total'] = df['Total'] + df[col]
However, this results in the Total column being populated with only NaN values.
My guess is the existing NaN values in df is causing this behavior in the Total column. This seems to be a simple task so if there's something I'm overlooking let me know. Any suggestions/ solution would be greatly appreciated.
I suggest filter out first column with strings and convert all another columns to floats:
df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)
print (df.dtypes)
COL NAME0 object
COL NAME1 object
COL NAME2 object
COL NAME3 object
COL NAME4 object
dtype: object
df['Total'] = df.sum(axis=1)
print (df)
COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4 Total
0 Alabama 4.099099 4.090001 2.042345 NaN 0.0
1 Alaska 1.396396 1.390001 1.000000 1.000000 0.0
2 Arizona 4.189189 NaN 2.003257 1.537777 0.0
3 Arkansas 2.927928 2.920001 2.208723 NaN 0.0
4 California 3.378378 3.780001 1.754930 2.012395 0.0
df['Total'] = df.iloc[:, 1:].astype(float).sum(axis=1)
print (df)
COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4 Total
0 Alabama 4.099099 4.090001 2.042345 NaN 10.231445
1 Alaska 1.396396 1.390001 1.000000 1.000000 4.786397
2 Arizona 4.189189 NaN 2.003257 1.537777 7.730223
3 Arkansas 2.927928 2.920001 2.208723 NaN 8.056652
4 California 3.378378 3.780001 1.754930 2.012395 10.925704
If need processing columns later:
df = df.astype(dict.fromkeys(df.columns[1:], 'float'))
print (df.dtypes)
COL NAME0 object
COL NAME1 float64
COL NAME2 float64
COL NAME3 float64
COL NAME4 float64
dtype: object
df['Total'] = df.sum(axis=1)
print (df)
COL NAME0 COL NAME1 COL NAME2 COL NAME3 COL NAME4 Total
0 Alabama 4.099099 4.090001 2.042345 NaN 10.231445
1 Alaska 1.396396 1.390001 1.000000 1.000000 4.786397
2 Arizona 4.189189 NaN 2.003257 1.537777 7.730223
3 Arkansas 2.927928 2.920001 2.208723 NaN 8.056652
4 California 3.378378 3.780001 1.754930 2.012395 10.925704
you could also select the columns by dtypes :
df['Total'] = df.select_dtypes(exclude=['object']).sum(axis=1)
print(df)
COL_NAME0 COL_NAME1 COL_NAME2 COL_NAME3 COL_NAME4 Total
0 Alabama 4.099099 4.090001 2.042345 NaN 10.231445
1 Alaska 1.396396 1.390001 1.000000 1.000000 4.786397
2 Arizona 4.189189 NaN 2.003257 1.537777 7.730223
3 Arkansas 2.927928 2.920001 2.208723 NaN 8.056652
4 California 3.378378 3.780001 1.754930 2.012395 10.925704
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With