Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what's the difference between set_value and = in pandas

Tags:

python

pandas

In writing to a dataframe in pandas, we see we have a couple of ways to do it, as provided by this answer and this answer.

We have the method of

  • df[r][c].set_value(r,c,some_value) and the method of
  • df.iloc[r][c] = some_value.

What is the difference? Which is faster? Is either a copy?

like image 213
user2723494 Avatar asked Oct 19 '25 12:10

user2723494


1 Answers

The difference is that set_value is returning an object, while the assignment operator assigns the value into the existing DataFrame object.

after calling set_value you will potentially have two DataFrame objects (this does not necessarily mean you'll have two copies of the data, as DataFrame objects can "reference" one another) while the assignment operator will change data in the single DataFrame object.

It appears to be faster to use the set_value, as it is probably optimized for that use-case, while the assignment approach will generate intermediate slices of the data:

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df=pd.DataFrame(np.random.rand(100,100))

In [4]: %timeit df[10][10]=7
The slowest run took 6.43 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 89.5 µs per loop

In [5]: %timeit df.set_value(10,10,11)
The slowest run took 10.89 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 3.94 µs per loop

the result of set_value may be a copy, but the documentation is not really clear (to me) on this:

Returns:

frame : DataFrame

If label pair is contained, will be reference to calling DataFrame, otherwise a new object

like image 141
NirIzr Avatar answered Oct 21 '25 03:10

NirIzr



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!