Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy: 2D array from a list of arrays and scalars

I need to create a 2D numpy array from a list of 1D arrays and scalars so that the scalars are replicated to match the length of the 1D arrays.

Example of desired behaviour

>>> x = np.ones(5)
>>> something([x, 0, x])
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

I know that the vectorial elements of the list are always going to have the same length (shape) so I can do it "by hand" by doing something like this:

def something(lst):
    for e in lst:
        if isinstance(e, np.ndarray):
            l = len(e)
            break
    tmp = []
    for e in lst:
        if isinstance(e, np.ndarray):
            tmp.append(e)
            l = len(e)
        else:
            tmp.append(np.empty(l))
            tmp[-1][:] = e
    return np.array(tmp)

What I am asking for is whether there is some ready-made solution hidden somewhere in numpy or, if there is none, whether there is a better (e.g. more general, more reliable, faster) solution than the one above.

like image 934
zegkljan Avatar asked Oct 16 '25 17:10

zegkljan


1 Answers

In [179]: np.column_stack(np.broadcast(x, 0, x))
Out[179]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

or

In [187]: np.row_stack(np.broadcast_arrays(x, 0, x))
Out[187]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

Using np.broadcast is faster than np.broadcast_arrays:

In [195]: %timeit np.column_stack(np.broadcast(*[x, 0, x]*10))
10000 loops, best of 3: 46.4 µs per loop

In [196]: %timeit np.row_stack(np.broadcast_arrays(*[x, 0, x]*10))
1000 loops, best of 3: 380 µs per loop

but slower than your something function:

In [201]: %timeit something([x, 0, x]*10)
10000 loops, best of 3: 37.3 µs per loop

Note that np.broadcast can be passed at most 32 arrays:

In [199]: np.column_stack(np.broadcast(*[x, 0, x]*100))
ValueError: Need at least two and fewer than (32) array objects.

whereas np.broadcast_arrays is unlimited:

In [198]: np.row_stack(np.broadcast_arrays(*[x, 0, x]*100))
Out[198]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       ..., 
       [ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

Using np.broadcast or np.broadcast_arrays is a bit more general than something. It will work on arrays of different (but broadcastable) shapes, for instance:

In [209]: np.column_stack(np.broadcast(*[np.atleast_2d(x), 0, x]))
Out[209]: 
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.]])

whereas something([np.atleast_2d(x), 0, x]) returns:

In [211]: something([np.atleast_2d(x), 0, x])
Out[211]: 
array([array([[ 1.,  1.,  1.,  1.,  1.]]), array([ 0.]),
       array([ 1.,  1.,  1.,  1.,  1.])], dtype=object)
like image 99
unutbu Avatar answered Oct 19 '25 09:10

unutbu