Iterator Protocol¶
Python Iterator Protocol¶
Here we out cover the iterator protocol in depth, both the newer version via
__iter__
and __next__
as well as the older protocol piggy backing
off ``__getitem__` for sequence types.
In python if you have found yourself wondering, how does the for each loop work? This is where the iterator protocol comes into play and making your own iterable user defined types is surprisingly straight forward. Personally the terminology is more complex than the actual logic involved. Let’s try and break it down:
collections.abc.Iterator
extendscollections.abc.Iterable
All
iterators
areiterable
Not all
iterables
areiterators
Iterator Protocol: Iterable ABC¶
collections.abc.Iterable
is an abstract base class built into python
that offers up the abstract __iter__
method that should be implemented.
Rule of thumb is that anything that is iterable
when asked for an
iterator
will return one.
@abstractmethod def __iter__(): while False: yield None
That’s it, iterables are really that simple, if something is iterable, it can
return an iterator
. It is an iterator` that python actually uses to
perform iteration. The built in ``iter()
function calls an objects
dunder __iter__
.
Iterator Protocol: Iterator ABC¶
Once you have grasped that iterables are _responsible_ for returning iterators
,
things start to make a lot more sense. Next up is the collections.abc.Iterator
abstract base class provided by python and there are three main core things of note
Iterator extends Iterable.
Iterator implements a simple
__iter__
to make Iterable happy, that returns itself.Iterator exposes a new
__next__
abstract method.
As we can see from inspecting the mro of collections.abc.Iterator
:
from collections.abc import Iterator Iterator.__mro__ # collections.abc.Iterator, collections.abc.Iterable, object)
In order to satisfy the interface from collections.abc.Iterable
, it
implements a very basic __iter__
, which returns self:
def __iter__(): return self
However, iterators themselves offer up a little extra, the interface exposes
a new abstract method, known as __next__
and it is this implementation
that when iterated over (for loops, map, list comps etc) that is used to
exhaust the iterator, one element at a time:
def __next__(self): raise StopIteration
Something of note here, is unlike other languages, occasionally python uses
exceptions to handle code logic / flow, when an Iterator is exhausted it
should raise a StopIteration
Exception, this is how python knows internally that
there are no more values.
Iterator Protocol: __getitem__¶
There is another caveat, an object does not have to define the modern iterable/iterator
interfaces to qualify as being iterable, using dunder __getitem__
if an object
can take an integer starting from 0, python will happily iterate over that object as
well, this is known as the older iterator protocol, however due to the symantics,
when iterating using this approach, an IndexError
should be raised instead of the
traditional StopIteration
. Let’s demonstrated an example:
class ReversedEvenNumbers: def __init__(self, max): self.nums = [n for n in range(1, max+1)[::-1] if n % 2 == 0] def __getitem__(self, index): return self.nums[index] for n in ReversedEvenNumbers(15): print(n) # 14, 12, 10, 8, 6, 4, 2
As you can see, we have created something we can iterate over, without actually implementing
any of the iterator (modern) protocol. Accessing an index out of range by default raises an
IndexError
so python gracefully handles that in this scenario.
Iterator Protocol: Modern Example¶
We have learned a little bit about the older iterator protocol with an example, however
let’s implement something a little more modern. Now we will use the abstract base
classes and create our own custom iterator and explain some of the magic behind
pythons virtual subclassing via abc.register
and the __subclasshook__
.
In this example, we will be creating a word iterator from a user provided sentence. Continue reading
after this topic to understand why our Sentence
class does not have to explicitly inherit from
collections.abc.Iterator
(a little sprinkle of python magic!):
# A typical first approach (albeit naive) class Sentence: def __init__(self, sentence: str) -> None: self.word_list = sentence.split() self.index = 0 def __iter__(self): return self def __next__(self): if self.index >= len(self.word_list): raise StopIteration value = self.word_list[self.index] self.index += 1 return value
When starting out, you might think something like this is, pretty good. However
there are a couple of caveats you should be aware of, each time iter(iterable)
is called, it should return a fresh iterator
. What happens in this scenario
with the above implementation:
We need to create a fresh iterator, each time python calls __iter__
on our object. Let’s patch
that up first:
Better! Each time we ask for an iterator from our custom RevisedSentence
class, we
can access all the values, but can it be improved any more? We’ll, python supports
a ton of built iterators / iterables, we can much easier piggy back off those in
this kind of scenario:
class SuperSentence: def __init__(self, sentence: str): self.word_list = sentence.split() def __iter__(self): return iter(self.word_list)
Iterator Protocol: Virtual & Subclasshook¶
…