Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the pythonic way to represent an Iterable that can be iterated over multiple times

I would like to get your advice on the most pythonic way to express the following function in python with type hints:

I'd like to expose a function as part of a library that accepts an input argument and returns an output. The contract for the input argument should be that:

  • my function can iterate over it
  • it's ok if my function maintains a reference to the input (e.g. by returning an object that keeps that reference)
  • it's ok to iterate over the input more than once

An example could be a function that accepts a sequence of URLs and then issues requests to these URLs, potentially with some retry logic so I'd have to iterate the original sequence more than once. But my question is more generic than just this sample.

At first glance a suitable signature would be:

from typing import Iterable

def do_sth(input: Iterable[str]) -> SomeResult:
  ...

However this violates the third requirement, because in python there is no guarantee that you can iterate over an Iterable more than once, e.g. because iterators and generators are themselves iterables.

Another attempt might be:

from typing import Sequence

def do_sth(input: Sequence[str]) -> SomeResult:
  ...

But then the Sequence contract is more than my function requires, because it includes indexed access and also the knowledge of the length.

A solution that came to my mind is to use the Iterable signature and then make a copy of the input internally. But this seems to introduce a potential memory problem if the source sequence is large.

Is there a solution to this, i.e. does python know about the concept of an Iterable that would return a new iterator each time?

like image 731
Carsten Avatar asked Nov 30 '25 19:11

Carsten


2 Answers

There are two natural ways of representing this that I can think of.

The first would be to use Iterable[str], and mention in the documentation, that Iterator and Generator objects should not be used since you may have multiple calls to __iter__. The whole point of Iterable is that you can get an iterator on it, and arguably it was a mistake to make Iterator support Iterable in the first place. It's not perfect, but is simple, which is usually more "pythonic" than a more technically correct annotation that is very complicated.

You can add some runtime checking that will alert the user that there is a problem if they pass the wrong thing:

iter1 = iter(input)
for item in iter1:
    do_something(item)
iter2 = iter(input)
if iter2 is iter1:
    raise ValueError("Must pass an iterable that can be iterated multiple times. Got {input}.")

Or check if you got Iterator, and handle it with a memory penalty:

if isinstance(input, Iterator):
    input = list(input)  # or itertools.tee or whatever
    warn("This may eat up a lot of memory")

The other option is to use io.TextIOBase. This can be iterated over multiple times by seeking to the beginning. This depends on your use case, and may not be a good fit. If conceptually the input is some kind of chunked view on a sequence of characters, then io streams are a good fit, even if the iterators don't technically return lines of text. If it's conceptually a sequence of strings which are aren't contiguous, then streams aren't a good fit.

like image 143
RecursivelyIronic Avatar answered Dec 02 '25 08:12

RecursivelyIronic


You could use a function which accepts no input and returns an iterable. In terms of typing hints, you would use a Callable.

From the documentation, if you are unfamiliar with Callable:

Frameworks expecting callback functions of specific signatures might be type hinted using Callable[[Arg1Type, Arg2Type], ReturnType].

Solution:

from typing import Callable, Iterable

def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
    # ...
    pass

def main():
    do_sth(lambda : (str(i) for i in range(10)))

my function can iterate over it

def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
    for item in get_input():
        pass

it's ok if my function maintains a reference to the input (e.g. by returning an object that keeps that reference)

Don't see why not.

def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
    return dict(reference=get_input)

it's ok to iterate over the input more than once

def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
    for i in range(10**82):
        for item in get_input():
            pass
like image 21
oglehb Avatar answered Dec 02 '25 10:12

oglehb