I have a dataframe and I wanted to apply a certain function on a set of columns. Something like:
data[["A","B","C","D","E"]].apply(some_func, axis=1)
In the some_func
function, the first step is extracting out all the column values into separate variables.
def some_func(x):
a,b,c,d,e = x # or x.tolist()
#Some more processing
To reproduce, the result, use
x = pd.Series([1,2,3,4,5], index=["A","B","C","D","E"])
Now, my question is, why does
%%timeit
a,b,c,d,e = x.tolist()
Output:
538 ns ± 2.82 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
perform better than
%%timeit
a,b,c,d,e = x
Output:
1.61 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Let's define two functions and inspect them with dis
:
from dis import dis
from pandas import Series
x = Series([1,2,3,4,5], index=["A","B","C","D","E"])
def a():
a, b, c, d, e = x.tolist()
def b():
a, b, c, d, e = x
dis(a)
dis(b)
Executing the above will yield:
# dis(a)
7 0 LOAD_GLOBAL 0 (x)
2 LOAD_METHOD 1 (tolist)
4 CALL_METHOD 0
6 UNPACK_SEQUENCE 5
8 STORE_FAST 0 (a)
10 STORE_FAST 1 (b)
12 STORE_FAST 2 (c)
14 STORE_FAST 3 (d)
16 STORE_FAST 4 (e)
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
# dis(b)
10 0 LOAD_GLOBAL 0 (x)
2 UNPACK_SEQUENCE 5
4 STORE_FAST 0 (a)
6 STORE_FAST 1 (b)
8 STORE_FAST 2 (c)
10 STORE_FAST 3 (d)
12 STORE_FAST 4 (e)
14 LOAD_CONST 0 (None)
16 RETURN_VALUE
From the above, it seems that, if anything, function (a) has more instructions. So why is it faster?
As explained in this answer, looking at the contents of UNPACK_SEQUENCE, one can see that there are some special-cases, such as when the number of left-hand side variables is equal to the length of the right-hand side object.
So, x.tolist()
under the hood uses numpy
method to create a list from the array data, which allows making use of the optimization for this special case (you can check the deterioration in performance by changing the number of arguments on the left-hand side, e.g. a, *b = range(3)
, will work, but will be slower than a, b, c = range(3)
).
When the right-hand side object is not a Python tuple or a list, then Python iterates over the contents of the object, which appears to be less efficient.
For practical reasons, if you really want best performance (with the current versions of the modules), you can swap x.tolist()
with x._values.tolist()
, which should give about 10-15% boost in performance (you're just removing one layer of pandas to numpy call, and doing it directly here). The caveat is that these types of optimizations are sensitive to what's happening in lower-level code, so there is no guarantee that performance gains will be there in future Python/library combinations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With