Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating two concatenated arrays from a generator

Consider the following example in Python 2.7. We have an arbitrary function f() that returns two 1-dimensional numpy arrays. Note that in general f() may returns arrays of different size and that the size may depend on the input.

Now we would like to call map on f() and concatenate the results into two separate new arrays.

import numpy as np

def f(x):
    return np.arange(x),np.ones(x,dtype=int)   

inputs = np.arange(1,10)
result = map(f,inputs)
x = np.concatenate([i[0] for i in result]) 
y = np.concatenate([i[1] for i in result]) 

This gives the intended result. However, since result may take up much memory, it may be preferable to use a generator by calling imap instead of map.

from itertools import imap
result = imap(f,inputs)
x = np.concatenate([i[0] for i in result]) 
y = np.concatenate([i[1] for i in result]) 

However, this gives an error because the generator is empty at the point where we calculate y.

Is there a way to use the generator only once and still create these two concatenated arrays? I'm looking for a solution without a for loop, since it is rather inefficient to repeatedly concatenate/append arrays.

Thanks in advance.

like image 953
Forzaa Avatar asked Nov 29 '25 16:11

Forzaa


1 Answers

Is there a way to use the generator only once and still create these two concatenated arrays?

Yes, a generator can be cloned with tee:

import itertools
a, b = itertools.tee(result)

x = np.concatenate([i[0] for i in a]) 
y = np.concatenate([i[1] for i in b]) 

However, using tee does not help with the memory usage in your case. The above solution would require 5 N memory to run:

  • N for caching the generator inside tee,
  • 2 N for the list comprehensions inside np.concatenate calls,
  • 2 N for the concatenated arrays.

Clearly, we could do better by dropping the tee:

x_acc = []
y_acc = []
for x_i, y_i in result:
    x_acc.append(x_i)
    y_acc.append(y_i)

x = np.concatenate(x_acc)
y = np.concatenate(y_acc)

This shaved off one more N, leaving 4 N. Going further means dropping the intermediate lists and preallocating x and y. Note, that you needn't know the exact sizes of the arrays, only the upper bounds:

x = np.empty(capacity)
y = np.empty(capacity)
right = 0
for x_i, y_i in result:
    left = right
    right += len(x_i)  # == len(y_i)  
    x[left:right] = x_i
    y[left:right] = y_i

x = x[:right].copy()
y = y[:right].copy()

In fact, you don't even need an upper bound. Just ensure that x and y are big enough to accommodate the new item:

for x_i, y_i in result:
    # ...
    if right >= len(x):
        # It would be slightly trickier for >1D, but the idea
        # remains the same: alter the 0-the dimension to fit 
        # the new item.
        new_capacity = max(right, len(x)) * 1.5
        x = x.resize(new_capacity)
        y = y.resize(new_capacity)
like image 51
Sergei Lebedev Avatar answered Dec 02 '25 05:12

Sergei Lebedev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!