I would like to get your advice on the most pythonic way to express the following function in python with type hints:
I'd like to expose a function as part of a library that accepts an input argument and returns an output. The contract for the input argument should be that:
An example could be a function that accepts a sequence of URLs and then issues requests to these URLs, potentially with some retry logic so I'd have to iterate the original sequence more than once. But my question is more generic than just this sample.
At first glance a suitable signature would be:
from typing import Iterable
def do_sth(input: Iterable[str]) -> SomeResult:
...
However this violates the third requirement, because in python there is no guarantee that you can iterate over an Iterable more than once, e.g. because iterators and generators are themselves iterables.
Another attempt might be:
from typing import Sequence
def do_sth(input: Sequence[str]) -> SomeResult:
...
But then the Sequence contract is more than my function requires, because it includes indexed access and also the knowledge of the length.
A solution that came to my mind is to use the Iterable signature and then make a copy of the input internally. But this seems to introduce a potential memory problem if the source sequence is large.
Is there a solution to this, i.e. does python know about the concept of an Iterable that would return a new iterator each time?
There are two natural ways of representing this that I can think of.
The first would be to use Iterable[str], and mention in the documentation, that Iterator and Generator objects should not be used since you may have multiple calls to __iter__. The whole point of Iterable is that you can get an iterator on it, and arguably it was a mistake to make Iterator support Iterable in the first place. It's not perfect, but is simple, which is usually more "pythonic" than a more technically correct annotation that is very complicated.
You can add some runtime checking that will alert the user that there is a problem if they pass the wrong thing:
iter1 = iter(input)
for item in iter1:
do_something(item)
iter2 = iter(input)
if iter2 is iter1:
raise ValueError("Must pass an iterable that can be iterated multiple times. Got {input}.")
Or check if you got Iterator, and handle it with a memory penalty:
if isinstance(input, Iterator):
input = list(input) # or itertools.tee or whatever
warn("This may eat up a lot of memory")
The other option is to use io.TextIOBase. This can be iterated over multiple times by seeking to the beginning. This depends on your use case, and may not be a good fit. If conceptually the input is some kind of chunked view on a sequence of characters, then io streams are a good fit, even if the iterators don't technically return lines of text. If it's conceptually a sequence of strings which are aren't contiguous, then streams aren't a good fit.
You could use a function which accepts no input and returns an iterable. In terms of typing hints, you would use a Callable.
From the documentation, if you are unfamiliar with Callable:
Frameworks expecting callback functions of specific signatures might be type hinted using
Callable[[Arg1Type, Arg2Type], ReturnType].
Solution:
from typing import Callable, Iterable
def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
# ...
pass
def main():
do_sth(lambda : (str(i) for i in range(10)))
my function can iterate over it
def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
for item in get_input():
pass
it's ok if my function maintains a reference to the input (e.g. by returning an object that keeps that reference)
Don't see why not.
def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
return dict(reference=get_input)
it's ok to iterate over the input more than once
def do_sth(get_input: Callable[[], Iterable[str]]) -> SomeResult:
for i in range(10**82):
for item in get_input():
pass
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With