How to determine if numpy.vectorize() is needed?

Question

Is there a way to determine at runtime if a function requires numpy.vectorize() to behave as expected?

For background, I ask this because I'm using Numpy in a program to calculate phase diagrams from thermodynamic functions available in the literature (CALPHAD-based). For a given temperature, one evaluates the free energy functions and determines the common tangent curves touching concave-up (second derivative > 0) to define composition ranges of phase coexistence. For this, it was nice to directly define the second derivative function. All was going well with real free energy functions (not hard to get derivatives of) until I tried to test with a simple parabolic free enrgy, which has a constant second derivative. This blew up my algorithm since I had not expected the numpy broadcasting to look inside the function and decide it did not need to broadcast.

The difficulty comes down to this behavior:

import numpy as np
def f(x):
   return(x*x)
def g(x):
   return(3.0)
def h(x):
   return(0*x+3.0)
def i(x):
   return(x-x+3.0)

x = np.linspace(1.0, 5.0, 5)

Running in IPython 3.3.2 results in these outputs:

f(x) -> array([ 1., 4., 9., 16., 25.]) -- what I expected

g(x) -> 3.0 (note only 1 element, and a float, not ndarray) -- not naively expected

h(x) -> array([ 3., 3., 3., 3., 3.]) -- OK, fooled the broadcasting by having x do something

i(x) -> array([ 3., 3., 3., 3., 3.]) -- Same as h(x) but avoiding the multiply but with roundoff issues

Now I could use

gv = np.vectorize(g)

and get

gv(x) -> array([ 3., 3., 3., 3., 3.]) -- expected behavior

If my program is to (eventually) accept arbitrary user-entered free energy functions this will cause problems unless all users understand numpy internal broadcasting magics. Or, I could reflexively np.vectorize everything to prevent this. The problem is the cost if the function will "just work" in numpy.

That is, using %timeit in IPython,

h(x) -> 100000 loops, best of 3: 3.45 µs per loop

If I vectorize h(x) needlessly (i.e. hv = np.vectorize(h)), I get

hv(x) -> 10000 loops, best of 3: 43.2 µs per loop

So, needlessly vectorizing is a huge penalty (40 microseconds for 5 function evals).

I guess I could to an initial test on the return of a function evaluating on a small ndarray to see if the return type is array or float, and then define a new function if it is float, like:

def gv(x):
   return(g(x)+0.0*x)

That just seems like a horrible kludge.

So - is there a better way to 'fool' numpy into efficiently broadcasting in this case?

Daniel · Accepted Answer

To solve the problem shown. If you want a new array:

def g(x):
    return np.ones_like(x)*3

or if you want to set all elements in an array to 3 in place:

def g(x):
    x[:] = 3

Note there is no return statement here as you are simply updating array x so that all elements are 3.

The issue with def g(x): return(3) as shown is there is no reference to numpy inside the function. You state for any given input return 3. Stating x=3 will run into similar issues as you are updating the pointer x to point to 3 instead of the numpy array. While the statement x[:]=3 accesses an internal function known as a view from the numpy.ndarray class instead of the usual use of the = statement that simply updates a pointer.

How to determine if numpy.vectorize() is needed?

Tags:

python

numpy

Jon Custer

1 Answers

Daniel

Recent Activity

Donate For Us

How to determine if numpy.vectorize() is needed?

Tags:

python

numpy

Jon Custer

1 Answers

Daniel

Related questions

Recent Activity

Donate For Us