Iterators and Generators

The yield Statement

We’ve made extensive use of the relationship between the for statement and various kinds of iterable containers without looking too closely at how this really works.

In this chapter, we’ll look at the semantics of iterators in Iterator Semantics; this includes their close relationshp to an iterable container, and the for statement. We can then look at the semantics of generator functions in Generator Function Semantics, and talk about the syntax for defining a generator function in Defining a Generator Function.

We’ll look at other built-in functions we use with iterators in Generator Functions.

We’ll review statements related to the use of iterators in Generator Statements.

We’ll provide more places where iterators are used in Iterators Everywhere, as well as an in-depth example in Generator Function Example.

When we see how to define our own classes of objects, we’ll look at creating our own iterators in Creating or Extending Data Types.

Iterator Semantics

The easiest way to define an iterator (and the closely-related concept of generator function) is to look at the for statement. The for statement makes use of a large number of iterator features. This statement is the core use case for iterators, and we’ll use it to understand the interface an iterator must provide.

Let’s look at the following snippet of code.

for i in ( 1, 2, 3, 4, 5 ):
    print i

Under the hood, the for statement engages in the following sequence of interactions with an iterable object (the tuple (1,2,3,4,5)).

  1. The for statement requests an iterator from the object. The for statement does this by evaluating the iter() function on the object in the in clause.

    The working definition of iterable is that the object responds to the iter() function by returning an iterator. All of the common containers (str, list, tuple, dict, set) will respond to the iter() function by returning an iterator over the items in the container. A dict iterator will yield the keys in the mapping.

  2. The for statement uses the next() function to evaluate the the iterator’s next() method and assigns the resulting object to the target variable. In this case, the variable i is assigned to each object.

  3. The for statement evaluates the suite of statements. In this case, the suite is just a print statement.

  4. The for statement continues steps 2 and 3 until an exception is raised.

    If the exception is a StopIteration, this is handled to indicate that the loop has finished normally.

The for statement is one side of the interface; the other side is the iterator object itself. From the above scenario, we can see that an iterator must define a __next__() method that the for statement can use. This method does one of two things.

  • Returns the next item from a sequence (or other container) or
  • Raises the StopIteration exception.

To do this, an iterator must also maintain some kind of internal state to know which item in the sequence will be delivered next.

When we describe a container as iterable, we mean that it responds to the built-in iter() function by returning an iterator object that can be used by the for statement. All of the sequence containers return iterators; set, dict and files also return iterators. In the case of a dict, the iterator returns the dict keys in no particular order.

Iterators in Python. As noted above, all of the containers we’ve seen so far have the iterable interface. This means that the container will return an iterator object that will visit all the elements (or keys) in the container.

It turns out that there are many other uses of iterators in Python. Many of the functions we’ve looked at work with iterators.

We’ll return to this in Iterators Everywhere.

Defining Our Own Iterators. There are two ways to define our own iterators. We can create an object that has the iterator interface, or we can define a generator function. Under the hood, a generator function will have the iterator interface, but we’re saved from having to create a class with all the right method function names.

We’ll look at Generator Functions in Generator Function Semantics.

We’ll look at defining an Iterator class in Data + Processing = Objects.

Generator Function Semantics

A generator function is a function that can be used by the for statement as if it were an iterator. A generator looks like a conventional function, with one important difference: a generator includes the yield statement.

The essential relationship between a generator function and the for statement is the following.

  1. The for statement calls the generator. The generator begins execution and executes statements in the suite up to the first yield statement. The yield creates the initial value for the for statement to assign.

  2. The for statement applies the built-in next() function to the generator function’s hidden next() method. The value that was returned by the yield statement is assigned to the target variable.

  3. The for statement evaluates it’s suite of statements.

  4. The for statement applies the built-in next() function to the generator function’s hidden next() method. The generator resumes execution after the yield statement. When the generator function gets to another yield statement, this value creates a value for the for statement.

  5. The for statement continues steps 3 and 4 until the generator executes a return statement (or runs past the end of it’s suite). Either situation will raise the StopIteration exception.

    When a StopIteration is raised, it is handled by the for statement as a normal termination of the loop.

What we Provide. Generator function definition is similar to function definition (see Functions). We provide three pieces of information: the name of the generator, a list of zero or more parameters, and a suite of statements that yields the output values. The suite of statements must include at least one yield statement.

We evaluate a generator in a for statement by following the function’s name with () enclosing any argument values. The Python interpreter evaluates the argument values, then applies the generator. This will start execution of the the generator’s suite.

Once the generator has started, the generator and the for statement pass control back and forth. The generator will yield objects and the for statement consumes those objects.

This back-and-forth between the for statement and the generator means that the generator’s local variables are all preserved. In other words, a generator function has a peer relationship with the for statement; it’s local variables are kept when it yields a value. The for suite and the generator suite could be called coroutines.

Example: Using a Generator to Consolidate Information. Lexical scanning and parsing are both tasks that compilers do to discover the higher-level constructs that are present in streams of lower-level elements. A lexical scanner discovers punctuation, literal values, variables, keywords, comments, and the like in a file of characters. A parser discovers expressions and statements in a sequence of lexical elements.

Lexical scanning and parsing algorithms consolidate a number of characters into tokens or a number of tokens into a statement. A characteristic of these algorithms is that some state change is required to consolidate the inputs prior to creating each output. A generator provides these characteristics by preserving the generator’s state each time an output is yielded.

In both lexical scanning and parsing, the generator function will be looping through a sequence of input values, discovering a high-level element, and then yielding that element. The yield statement returns the sequence of results from a generator function, and also saves all the local variables, and even the location of the yield statement so that the generator’s next request will resume processing right after the yield .

Defining a Generator Function

The presence of the yield statement in a function means that the function is actually a generator object, and will have the an iterator-like interface built automatically. In effect it becomes a stateful object with a next() method defined – so it will work with the next() function and for statement – and it will raise the StopIteration exception when it returns.

The syntax for a function definition is in Function Definition: The def and return Statements ; a generator is similar.

def name ( parameter 〈, ... 〉 ):
    suite

The suite of statements must include at least one yield statement.

The yield statement specifies the values emitted by the generator. Note that the expression is required.

yield expression

If a return statement is used in the function, it ends the generator by raising the StopIteration exception to alert the for statement. For obvious reasons, the return statement cannot return a value.

Here’s a complete, but silly example of a generator.

def primes():
    yield 2
    yield 3
    yield 5
    yield 7
    yield 11
    return

In this case, we simply yield a fixed sequence of values. After yielding five values, it exits. Here’s how this generator is used by a for statement.

>>> for p in primes():
...     print p
2
3
5
7
11

Generator Functions

The iter() function can be used to acquire an iterator object associated with a container like a sequence, set, file or dict. We can then manipulate this iterator explicitly to handle some common situations.

iter(iterable) → iterator
Returns the iterator for an object. This iterator interacts with built-in types in obvious ways. For sequences, this will return each element in order. For sets, it will return each element in no particular order. For dictionaries, it will return the keys in no particular order. For files, it will return each line in order.

Gettimg an explicit iterator – outside a for statement – is handy for dealing with data structures (like files) which have a head-body structure. In this case, there are one or more elements (the head) which are processed one way and the remaining elements (the body) which are processed another way.

We’ll return to this in detail in Files. For now, here’s a small example.

>>> someSeq = range(2,20,2)
>>> seq_iter = iter(someSeq)
>>> next(seq_iter)
2
>>> for value in seq_iter:
...     print value,
...
4 6 8 10 12 14 16 18
  1. We create a sequence, someSeq. Any iterable object would work here; any sequence, set, dict or file.
  2. We create the iterator for this sequence, and assign it to seq_iter. This object has a next() method which is used by the next() function and the for statement.
  3. We call next(seq_iter) explicitly to get past one heading item in the sequence.
  4. We then provide the iterator to the for statement. The for statement repeatedly evaluates the next() function on the iterator object and executes its suite of statements.

Generator Statements

What the for statement really does

In Iterative Processing: The for Statement, we defined a for statement using the following summary:

for variable in iterable :
    suite

We glossed over the iterable, showing how to create simple sequence objects using the range() function or explicit list literals.

At this point, we can use a number of data structures that are “iterable”: they respond the the iter() function by creating an iterator.

Also, we can define generator functions, which are also iterable.

The Secret World of for. Once we’ve looked at generator functions and iterators, we can see what the for statement really does. The purpose of the for statement is to visit each value yielded by an iterator, assigning each value to the variable.

Note that there are two parts to this.

First, the for variable in object evalates iter(object) to get an iterator. Objects will return an an iterator all primed and ready to yield. A generator function will – effectively – return itself, all primed and ready to yield.

Second, the iterator object (or generator function) must yield a sequence of values.

Looking forward, we’ll see many additional applications of the way the for statement works. As we look at designing our own objects in Data + Processing = Objects, we’ll want to assure that our objects work well with the for statement, also.

Iterators Everywhere

Iterators are ubiquitous. We have – up to this point – been breezy and casual about all the places iterators are used.

We’ve looked at many functions (max(), min(), any(), all(), sum(), sorted(), reversed(), and enumerate()) which apply to all the various container classes. Actually, these functions all apply to iterators and our containers return the iterators these functions expect.

As a simple example, we can define our own version of enumerate().

def enumerate( iterable, start=0 ):
    for item in iterable:
        yield start, item
        start += 1

That’s all that’s required to write a function which works with an iterable, and is itself an iterator. This style of programming is called functional, and is beyond the scope of this book.

Additionally, we’ve looked at the range() function without looking too closely at it’s sibling xrange().

Important

Python 3 and Range

In Python 3, the range() function – which created a list object – will be replaced with an iterator.

To create a list object, you’ll do this

someList = list( range( 6 ) )

In effect, the xrange() iterator will be renamed range(). The legacy range() function will go away.

It turns out that we’ve been making heavy use of iterators. Functions like sorted(), reversed(), any(), all(), and sum() all work with iterators not simply list objects.

We’ll look at how to create our own iterator objects in Collection Special Method Names for Iterators and Iterable.

Generator Function Example

Let’s look at an example which summarizes some details, yielding the summaries. Assume we have the list of tuples named spins. We want to know how many red spins separate a pair of black spins, on average. We need a function which will yield the count of gaps as it examines the spins. We can then call this function repeatedly to get the gap information.

generator.py

spins = [('red', '18'), ('black', '13'), ('red', '7'),
    ('red', '5'), ('black', '13'), ('red', '25'),
    ('red', '9'), ('black', '26'), ('black', '15'),
    ('black', '20'), ('black', '31'), ('red', '3')]

def countReds( aList ):
    count= 0
    for color,number in aList:
        if color == 'black':
            yield count
            count= 0
        else:
            count += 1
    yield count

gaps= [ gap for gap in countReds(spins) ]
print gaps
  1. The spins variable defines our sample data. This might be an actual record of spins.

  2. We define our gapCount() function. This function initializes count to show the number of non-black’s before a black. It then steps through the individual spins, in the order presented. For non-black’s, the count is incremented.

  3. For black spins, however, we yield the length of the gap between the last black. When we yield a result, the generator produces a result value, and also saves all the other processing information about this function so that it can be continued from this point.

    When the function is continued, it resumes right after the yield statement: the count will be reset, and the for loop will advance to examine the next number in the sequence.

  4. When the sequence is exhausted, we also yield the final count. The first and last gap counts may have to be discarded for certain kinds of statistical analysis.

  5. This gaps statement shows how we use the generator. In this case, we create a list comprehension from the results; the for clause will step through the values yielded by the generator until the it exits normally. This sequence of values is collected into a list that we can the use for statistical analysis.

Generator Exercises

  1. The Sieve of Eratosthones (Again). Look at The Sieve of Eratosthones and The Sieve of Eratosthones. We created a list or a set of candidate prime numbers.

    This exercise has three parts: initialization, generating the list (or set) or prime numbers, then reporting. In the list version, we had to filter the sequence of boolean values to determine the primes. In the set version, the set contained the primes.

    Within the Generate step, there is a point where we know that the value of p is prime. At this point, we can yield p. If we yield each value as we discover it, we eliminate the entire “report” step from the function.

  2. The Generator Version of range(). The range() function creates a sequence. For very large sequences, this consumes a lot of memory. You can write a version of range which does not create the entire sequence, but instead yields the individual values. Using a generator will have the same effect as iterating through a sequence, but won’t consume as much memory.

    Define a generator, genrange(), which generates the same sequence of values as range(), without creating a list object.

    Check the documentation for the built-in function xrange().

  3. Prime Factors. There are two kinds of positive numbers: prime numbers and composite numbers. A composite number is the product of a sequence of prime numbers. You can write a simple function to factor numbers and yield each prime factor of the number.

    Your factor() function can accept a number, n, for factoring. The function will test values, f, between 2 and the square root of n to see if the expression n % f == 0 is true. If this is true. then the factor, f, divides n with no remainder; f is a factor.

    Don’t use a simple-looking for -loop; the prime factor of 128 is 2, repeated 7 times. You’ll need to use a while loop.

Table Of Contents

Previous topic

Exceptions

Next topic

Files

This Page