Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating iterators from a generator returns the same object

Let's say I have a large list of data that I want to perform some operation on, and I would like to have multiple iterators performing this operation independently.

data = [1,2,3,4,5]
generator = ((e, 2*e) for e in data)
it1 = iter(generator)
it2 = iter(generator)

I would expect these iterators to be different code objects, but it1 is it2 returns True... More confusingly, this is true for the following generators as well:

# copied data
gen = ((e, 2*e) for e in copy.deepcopy(data))
# temp object
gen = ((e, 2*e) for e in [1,2,3,4,5])

This means in practice that when I call next(it1), it2 is incremented as well, which is not the behavior I want.

What is going on here, and is there any way to do what I'm trying to do? I am using python 2.7 on Ubuntu 14.04.

Edit:

I just tried out the following as well:

gen = (e for e in [1,2,3,4,5])
it = iter(gen)
next(it)
next(it)
for e in gen:
    print e

Which prints 3 4 5... Apparently generators are just a more constrained concept that I had imagined.

like image 926
Jacob Thalman Avatar asked Oct 24 '25 05:10

Jacob Thalman


1 Answers

Generators are iterators. All well-behaved iterators have an __iter__ method that should simply

return self

From the docs

The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:

iterator.__iter__() Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements. This method corresponds to the tp_iter slot of the type structure for Python objects in the Python/C API.

iterator.__next__() Return the next item from the container. If there are no further items, raise the StopIteration exception. This method corresponds to the tp_iternext slot of the type structure for Python objects in the Python/C API.

So, consider another example of an iterator:

>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> it2 = iter(it)
>>> next(it)
1
>>> next(it2)
2
>>> it is it2
True

So, again, a list is iterable because it has an __iter__ method that returns an iterator. This iterator also has an __iter__ method, which should always return itself, but it also has a __next__ method.

So, consider:

>>> x = [1, 2, 3, 4, 5]
>>> it = iter(x)
>>> hasattr(x, '__iter__')
True
>>> hasattr(x, '__next__')
False
>>> hasattr(it, '__iter__')
True
>>> hasattr(it, '__next__')
True
>>> next(it)
1
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator

And for a generator:

>>> g = (x**2 for x in range(10))
>>> g
<generator object <genexpr> at 0x104104390>
>>> hasattr(g, '__iter__')
True
>>> hasattr(g, '__next__')
True
>>> next(g)
0

Now, you are using generator expressions. But you can just use a generator function. The most straightforward way to accomplish what you are doing is just to use:

def paired(data):
    for e in data:
        yield (e, 2*e)

Then use:

it1 = paired(data)
it2 = paired(data)

Which in this case, it1 and it2 will be two separate iterator objects.

like image 78
juanpa.arrivillaga Avatar answered Oct 26 '25 18:10

juanpa.arrivillaga



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!