Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For loop versus while and next performance

In some cases of looping through a generator, it seems more natural to use while and next (with a try/except StopIteration) than the simpler for loop. Yet this comes a significant performance cost.

What is happening here, and what is the right way to approach the choice?

See example code and timing below:

%%timeit
for x in gen():
    pass
# 180 µs ± 8.78 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
_gen = gen()
try:
    while True:
        x = next(_gen)
except StopIteration:
    pass
# 606 µs ± 19.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


# Alternative use of next: But I don't see any good reason to use it.
%%timeit
_gen = gen()
while True:
    try:
        x = next(_gen)
    except StopIteration:
        break
# 676 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 666
thorwhalen Avatar asked Oct 21 '25 01:10

thorwhalen


1 Answers

Most of the time you should use the for loop. It does a few things for you, that may be tedious to do yourself:

  • Works for iterables and iterators.
  • Handles StopIteration for you (in CPython the StopIteration is handled in C instead of Python, making it significantly faster)

That means you have more general code and faster code with for. So it should always be the preferred option.

However in some cases you cannot use a for loop, then a while loop is a good choice. To make it more general you should also use iter on the argument, so you can also process iterables which are not iterators:

_gen = iter(gen())
...

The next question you need to ask yourself is: Do you need to handle StopIteration for each next call or is it irrelevant where the StopIteration happens?

Entering/Leaving the try doesn't have much overhead (that just applies to the try - if it has to go into the except, else or finally there's significant more overhead) but it's still overhead. That's why your second example is faster than the third. So if it doesn't matter where the StopIteration comes from then wrapping the while True in a try will be the faster option:

try:
    while True:
        next(_gen)
except StopIteration:
    pass

There are a few options to make the while approach faster. One would be to avoid the global name lookup for next that happens once for each iteration.

By using a local variable this lookup cost happens only once and the local name lookup inside the loop is a bit faster:

def f(gen):
    _gen = iter(gen())
    _next = next
    try:
        while True:
            x = _next(_gen)
    except StopIteration:
        return

That would be my favorite approach if I had to use the while loop approach.

You could even go a step further and avoid the __next__ lookup that happens every time you call next. However that's something that will (in some circumstances) deviate from the pure next behavior and should only be done if you know what you're doing and only if you really need the very small performance boost this gives you. In general you should NOT use that:

def f(gen):
    _gen = iter(gen())
    _next = _gen.__next__
    try:
        while True:
            x = _next()
    except StopIteration:
        return

However I don't recommend that approach. And one shouldn't really call double-underscore functions directly. I just mention it for completeness.


I also did a benchmark to display the performance of these approaches:

enter image description here

from simple_benchmark import BenchmarkBuilder

b = BenchmarkBuilder()

@b.add_function()
def for_loop(gen):
    for i in gen:
        pass

@b.add_function()
def while_outer_try(gen):
    _gen = iter(gen)
    try:
        while True:
            x = next(_gen)
    except StopIteration:
        pass

@b.add_function()       
def while_inner_try(gen):
    _gen = iter(gen)
    while True:
        try:
            x = next(_gen)
        except StopIteration:
            break

@b.add_function()
def while_outer_try_cache_next(gen):
    _gen = iter(gen)
    _next = next
    try:
        while True:
            x = _next(_gen)
    except StopIteration:
        return

@b.add_function() 
def while_outer_try_cache_next_method(gen):
    _gen = iter(gen)
    _next = _gen.__next__
    try:
        while True:
            x = _next()
    except StopIteration:
        return

@b.add_arguments('length')
def argument_provider():
    for exp in range(2, 20):
        size = 2**exp
        yield size, range(size)

r = b.run()
r.plot()

Summary:

  • Use the for loop approach whenever possible and feasible.
  • When you use a while approach make sure you use iter on the iterable. If you want to squeeze out some better performance: put the try and except outside the while (if possible) and cache the next lookup (don't cache the __next__ lookup, except you really know what you will bring onto yourself and you need to squeeze out even more performance).
  • A while approach will always be slower than for (at least in CPython) and require significantly more code. To repeat myself: Only use it if really necessary.
like image 185
MSeifert Avatar answered Oct 23 '25 15:10

MSeifert