Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When does pandas need to use .values to do manipulations?

Tags:

python

pandas

I have a pandas dataframe where I need to do some simple calculations on particular data points. I was having a problem where the result was producing a NaN result.

In this simple version of what I was doing, the first attempt works fine, but the second produces a NaN

import pandas as pd
import numpy as np

df_data = {'Location' : ['Denver', 'Boulder', 'San Diego', 'Reno', 'Portland',
    'Eugene', 'San Francisco'], 'State' : ['co', 'co', 'ca', 'nv',
    'or', 'or', 'ca'], 'Rando_num': [18.134, 5, 34, 11, 72, 42, 9],
    'Other_num': [11, 26, 55, 134, 88, 4, 22]}
df = pd.DataFrame(data = df_data)
df['Sum'] = np.nan

print(df.loc[df['Location'] == 'Denver', 'Rando_num'])
print(df.loc[df['Location'] == 'Denver', 'Other_num'])

#This works
df.loc[df['Location'] == 'Denver', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Denver', 'Other_num'])

print(df)

#This don't
df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Reno', 'Rando_num'])

print(df)

Using df.loc to find the specific data points works fine where location is Denver but not when it is two different locations. I don't get why that is. If I add .values it fixes the problem:

df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'].values +
        df.loc[df['Location'] == 'Reno', 'Rando_num'].values)

Does the community know of cases where a function like this would need the .values element to work? Or put another way, what is fundamentally different once the .values is added?

If it helps, all elements are floats and the df.loc is always a single value.

like image 231
Tom Avatar asked Oct 29 '25 19:10

Tom


1 Answers

1st case

df.loc[df['Location'] == 'Denver', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Denver', 'Other_num'])

Notice that the selection is same across and the indices remain the same. When you add values with same range of indices or size it works.

2nd Case

df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Reno', 'Rando_num'])

Here, the selections are different as given below and when you add NaN to a number, NaN is the result. Addition works at same index.

>>> df.loc[df['Location'] == 'Denver', 'Rando_num']
0    18.134
Name: Rando_num, dtype: float64

>>> df.loc[df['Location'] == 'Reno', 'Rando_num']
3    11.0
Name: Rando_num, dtype: float64

Additionally, to understand better

Left Index    Right Index    Sum
0->18.134     0->NaN         NaN
1->NaN        1->NaN         NaN
2->NaN        2->NaN         NaN
3->NaN        3->11.0        NaN
4->NaN        4->NaN         NaN
5->NaN        5->NaN         NaN

3rd Case

With .values

>>> a = df.loc[df['Location'] == 'Denver', 'Rando_num'].values
array([18.134])
>>> b = df.loc[df['Location'] == 'Reno', 'Rando_num'].values
array([11.])
>>> a + b
array([29.134])
like image 197
Vishnudev Avatar answered Oct 31 '25 08:10

Vishnudev