Creating two concatenated arrays from a generator

Question

Consider the following example in Python 2.7. We have an arbitrary function f() that returns two 1-dimensional numpy arrays. Note that in general f() may returns arrays of different size and that the size may depend on the input.

Now we would like to call map on f() and concatenate the results into two separate new arrays.

import numpy as np

def f(x):
    return np.arange(x),np.ones(x,dtype=int)   

inputs = np.arange(1,10)
result = map(f,inputs)
x = np.concatenate([i[0] for i in result]) 
y = np.concatenate([i[1] for i in result])

This gives the intended result. However, since result may take up much memory, it may be preferable to use a generator by calling imap instead of map.

from itertools import imap
result = imap(f,inputs)
x = np.concatenate([i[0] for i in result]) 
y = np.concatenate([i[1] for i in result])

However, this gives an error because the generator is empty at the point where we calculate y.

Is there a way to use the generator only once and still create these two concatenated arrays? I'm looking for a solution without a for loop, since it is rather inefficient to repeatedly concatenate/append arrays.

Thanks in advance.

Sergei Lebedev · Accepted Answer

Is there a way to use the generator only once and still create these two concatenated arrays?

Yes, a generator can be cloned with tee:

import itertools
a, b = itertools.tee(result)

x = np.concatenate([i[0] for i in a]) 
y = np.concatenate([i[1] for i in b])

However, using tee does not help with the memory usage in your case. The above solution would require 5 N memory to run:

N for caching the generator inside tee,
2 N for the list comprehensions inside np.concatenate calls,
2 N for the concatenated arrays.

Clearly, we could do better by dropping the tee:

x_acc = []
y_acc = []
for x_i, y_i in result:
    x_acc.append(x_i)
    y_acc.append(y_i)

x = np.concatenate(x_acc)
y = np.concatenate(y_acc)

This shaved off one more N, leaving 4 N. Going further means dropping the intermediate lists and preallocating x and y. Note, that you needn't know the exact sizes of the arrays, only the upper bounds:

x = np.empty(capacity)
y = np.empty(capacity)
right = 0
for x_i, y_i in result:
    left = right
    right += len(x_i)  # == len(y_i)  
    x[left:right] = x_i
    y[left:right] = y_i

x = x[:right].copy()
y = y[:right].copy()

In fact, you don't even need an upper bound. Just ensure that x and y are big enough to accommodate the new item:

for x_i, y_i in result:
    # ...
    if right >= len(x):
        # It would be slightly trickier for >1D, but the idea
        # remains the same: alter the 0-the dimension to fit 
        # the new item.
        new_capacity = max(right, len(x)) * 1.5
        x = x.resize(new_capacity)
        y = y.resize(new_capacity)

Creating two concatenated arrays from a generator

Tags:

python

arrays

concatenation

map-function

numpy

Forzaa

1 Answers

Sergei Lebedev

Recent Activity

Donate For Us

Creating two concatenated arrays from a generator

Tags:

python

arrays

concatenation

map-function

numpy

Forzaa

1 Answers

Sergei Lebedev

Related questions

Recent Activity

Donate For Us