Common List Design Patterns

Digging a Little Deeper

This chapter presents some common processing patterns for lists. In The One-Two Punch: Lists of Tuples we describe the relatively common Python data structure built from lists of tuples. We’ll cover a powerful list construction method called a list comprehension in List Construction Shortcuts. In Sorting a List: Expanding on the Rules we cover some advanced sequence sorting. In Tables and Matrices – More Multi-Dimensional Loop-the-Loops we cover simple multidimensional sequences.

Even more complex data structures are available. Numerous modules handle the sophisticated representation schemes described in the Internet standards (called Requests for Comments, RFC’s). Later on, we’ll take a quick look at these in Essential Modules : The Python Library.

The One-Two Punch: Lists of Tuples

Lists of tuples are surprisingly common. In other languages, like Java, we are forced to either use the too-complex built-in arrays or create an even more complex class definition to simply keep a few values together. Our canonical examples involve simple coordinate pairs for 2-dimensional or 3-dimensional geometries. Additional examples might includes the 3 codes for red, green and blue that define a color. Or, for printing, the four color tuple of the values for cyan, magenta, yellow and black.

As an example of using red, green, blue tuples, we may have a list of individual colors that looks like the following. Here, we’ve defined three colors – black, a dark grey, a purple – and assigned this list of colors to the variable colorScheme.

colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]

A interesting form of the for statement uses multiple assignment to work with a list of tuples. Consider the following example which assigns r, g and b from each element of the 3-tuple in the list. We can then do calculations on the three values independently.

colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]
for r,g,b in colorScheme:
    print "color (%d,%d,%d)" % ( r, g, b )
    print "opposite (%d,%d,%d)" % ( 255-r, 255-g, 255-b )

This is equivalent to the following. In this example, we have the for statement assign each item in the list to the variable color, and then we use a separate multiple assignment to decompose the for tuple in r, g and b.

colorScheme = [ (0,0,0), (0x20,0x30,0x20), (0x80,0x40,0x80) ]
for color in colorScheme:
    r, g, b = color
    print "color (%d,%d,%d)" % ( r, g, b )
    print "opposite (%d,%d,%d)" % ( 255-r, 255-g, 255-b )

The items() function of a dictionary transforms a dictionary to a sequence of tuples. We’ll cover dictionaries in Mappings : The dict. This is a teaser for some of what we’ll see there.

d = { 'red':18, 'black':18, 'green':2 }
for c,f in d.items():
    print "%s occurs %f" % (c, f/38.0)

The zip() built-in function interleaves two or more lists to create a list of tuples from the two lists. This is not terribly useful, but we’ll use it to build dictionaries.

List Construction Shortcuts

Python provides a mechanism to construct a list called a list comprehension or list maker. A list comprehension uses a generator expression (similar to the for and if statements) to create a new list. The generator expression allows us to write a rule rather than write each individual value in a list.

Comprehensions implement the basic map and filter iteration patterns. See Patterns of Iteration for more information on these iteration design patterns. A comprehension doesn’t implement the reduction pattern very well.

Map Processing. A list comprehension looks like list literal. It does this by enclosing a generator expression in [ ]‘s. Here’s the simplest form, used to do a mapping.

[ expression  for-clause  ]

The expression is any expression.

The for-clause mirrors the for statement. The big difference is that a it doesn’t have a complete suite of statements; it is just the target variable and the iterable sequence of values.

for variable  in iterable

The overall generator expression executes the for loop; for each iteration, it evaluates the expression and yields value. The list comprehension uses that sequence of values to create the resulting list.

Here are some examples.

>>> import random

>>> [ 3*x+2 for x in range(12) ]
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35]
>>> [ (x,x) for x in (2,3,4,5) ]
[(2, 2), (3, 3), (4, 4), (5, 5)]
>>> [ random.random() for x in range(5) ]
[0.4527184178006578, 0.84888059794845783, 0.21016399448987311, 0.80816095098407259, 0.87693626640363287]

A list comprehension behaves like the following loop:

r= []
for  target in  sequence :
    r.append(  expr )

The basic process, then, is to iterate through the sequence in the for-clause, evaluating the expression, expression. The values that result are assembled into the list.

If the expression depends on the for-clause target variable, the expression is a map from the for-clause variable to the resulting list. If the expression doesn’t depend on the for-clause target value, each time we evaluate the expression we’ll get the same value.

Here’s an example where the expression depends on the for-clause. This is a mappings from the range(10) to the final list.

>>> a= [ v*2+1 for v in range(10) ]
>>> a
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

This creates the first 10 odd numbers. It starts with the sequence created by range(10). The for-clause assigns each value in this sequence to the target variable, v. The expression, v*2+1, is evaluated for each distinct value of v. The expression values are assembled into the resulting list.

Typically, the expression depends on the variable set in the for-clause. Here’s an example, however, where the expression doesn’t depend on the for-clause.

b= [ 0 for i in range(10) ]

This creates a list of 10 zeros. Because the expression doesn’t depend on the for-clause, this could also be done as

b= 10*[0]

Filter Processing. A comprehension can also have an if-clause. This acts as a filter to determine which elements belong to the list and which elements do not belong.

The more complete syntax for a list comprehension is as follows:

[  expr  for-clausefor-clause |  if-clause 〉 ... ]

The expr is any expression. The for-clause mirrors the for statement:

for  variable  in  sequence

The if-clause mirrors the if statement.

if filter

This syntax summary shows that the first for-clause is required. This can be followed by either for-clause‘s or if-clause‘s. The | means that we can use either a for-clause or an if-clause .

This syntax summary shows a ... which means that you can repeat as many for-clause‘s and if-clause‘s as you need. We’ll stick to the most common form, which is a single if-clause to create a filter.

Note that there’s no , or other punctuation; the various for-clauses and if-clauses are simply separated by spaces.

Here is an example that creates the list of hardways rolls, which excludes two 2’s and two 12’s. The for loop creates a sequence of six numbers (from 1 to 6), assigning each value to x . The if filter only keeps values where x+x is not 2 or 12. All other values are used to create a tuple of (x,x).

hardways = [ (x,x) for x in range(1,7) if x+x not in (2, 12) ]

These more complex list comprehensions behave like the following loop:

r= []
for  target in  sequence :
    if  filter :
        r.append(  expr )

The basic process, then, is to iterate through the sequence in the for-clause, evaluating the if-clause. When the if-clause filter is True, evaluate the expression, expr. The values that result are assembled into the list.

>>> v = [ (x,2*x+1) for x in range(10) if x%3==0 ]
>>> v
[(0, 1), (3, 7), (6, 13), (9, 19)]

This works as follows:

  1. The function range(10) creates a sequence of 10 values.
  2. The for-clause iterates through the sequence, assigning each value to the local variable x.
  3. The if-clause evaluates the filter function, x%3==0. If it is false, the value is skipped. If it is true, the expression, at (x,2*x+1), is evaluated.
  4. This expression creates a 2-tuple of the value of x and the value of 2* x +1.
  5. The expression results (a sequence of tuples) are assembled into a list, and assigned to v.

Tip

Debugging List Comprehensions

List comprehensions have rather complex syntax. There are a number of ways to get SyntaxError messages. The expression, the for-clause and the overall structure of the expression, including balancing the [] are all sources of problems. Debugging a list comprehension is most easily done by building up the list comprehension one clause at a time.

A simple comprehension has two clauses: the expression clause and the for-clause. You can try each part out in IDLE individually.

  • If the for-clause is wrong, the original sequence will not be correct.
  • If the expression clause is not correct, the resulting sequence will be incorrect.

The for-clause of a list comprehension can be seen by entering just the for-clause as a separate statement. The expression clause can be evaluated for specific values to be sure that it works correctly.

For example, [ 2*i+1 for i in range(5) ] can be debugged in two parts. First, assure that range(5) produces the source sequence you expected. Second, assure that 2*i+1 works for values of i from 0 to 4.

Sorting a List: Expanding on the Rules

Let’s look at a common processing problem. Our source is a table of raw data in a spreadsheet. We want to do some processing that is a pain in the neck to do in the spreadsheet. We can transform this spreadsheet into a list of tuples for processing by Python. We can then write Python programs to manipulate this data, doing mappings, filterings and reductions as well as sorting and presentation in an easy-to-read report or summary.

For example, we have a spreadsheet with raw census data that looks like the following:

Code County State Jobs
001 Albany NY 162692
002 Allegany NY 11986
...      
121 Wyoming NY 8722
123 Yates NY 5094

We can easily transform this raw data into a sequence of tuples that look like the following. There’s a sidebar on how to do this, if you’re not a spreadsheet wizard.

jobData= [
(001,'Albany','NY',162692),
(003,'Allegany','NY',11986),
...
(121,'Wyoming','NY',8722),
(123,'Yates','NY',5094),
]

Simple Sorting. Sorting this list can be done trivially with the list sort() method.

jobData.sort()

Note that this updates the jobData list in place. The sort() method specifically does not return a result. A common mistake is to say something like: a= b.sort(). This always sets the variable a to None.

This kind of sort will simply compare each tuple with each other tuple. This makes it very easy to use, if your tuple’s elements are in the right order. If you want to compare the elements of your tuple in a different order, however, you’ll need to do something extra.

Sorting By Another Column. Let’s say we wanted to sort by state name, the third element in the tuple. We want don’t want the naive comparison among tuples. We want a smarter comparison that looks at the elements we choose, not the first element in the tuple. We do this by giving a key function to the sort() method.

The key function returns an object or a simple sequence of the key values selected from each element to be sorted. In this case, we want the key function to return the third elements of our county jobs tuples.

def by_state( row ):
    return row[2]
jobData.sort( key=by_state )

Note that we pass the function object to the sort() method. A common mistake is to say jobData.sort( by_state() ). If we include the ()‘s, we evaluate the function by_state() once, which is a mistake.

We don’t want to evaluate the function; we want to provide the function to sort(), so that sort() can evaluate the function as many times as needed to sort the list.

Note that if we say by_state(), we evaluate sort3() without any argument values, which is also a type error. If we say by_state – naming the function instead of evaluating it – then sort() will properly call the function with the expected single argument.

Sorting By Multiple Fields. Another common process is to sort information by several key fields. Continuing this example, lets sort the list by state name and then number of jobs. This is sometimes called a multiple-key sort. We want our data in order by state. Within each state, we want to use the number of jobs to sort the data.

We do this by creating a tuple of the fields we want to use for sorting.

def by_state_jobs( row ):
    return ( a[2], a[3] )
jobData.sort( key=by_state_jobs )

The sort() method must compare elements of the sequence against each other. If the sort() method is given a key function, this function is called to create the sort comparison key for each element.

In our case, we’ve provided a function (by_state_jobs()) that extracts a tuple as the key. The tuple contains the state and the number of jobs from each row.

Tip

Debugging List Sorting

There are three kinds of problems that can prevent a customized sort operation from working correctly.

  • Our key function doesn’t have the right form. It must be a function that extracts the key from an item of the sequence being sorted.

    def key( item ):
        return something based on the item
    
  • The data in your list isn’t regular enough to be sorted. For example, if we have dates that are represented as strings like '1/10/56', '11/19/85', '3/8/87', these strings are irregular and won’t sort very nicely. As humans, we know that they should be sorted into year-month-date order, but the strings that Python sees begin with '1/', '11' and '3/', with an alphabetic order that may not be what you expected.

    To get this data into a usable form, we have to normalize it. Normalizing is a computer science term for getting data into a regular, consistent, usable form. In our example of sorting dates, we’ll need to use the time or datetime modules to parse these strings into proper Python objects that can be compared.

Ascending vs. Descending. The default sort is ascending order. We can sort into descending order by adding the reverse keyword parameter to the sort.

jobData.sort( key=by_state_jobs, reverse=True )

By default, reverse is False, giving us ascending order. When we set it to true, the list is sorted in reverse order; that is, descending.

The Lambda Shorthand. In reading other programs, you may see something like the following:

jobData.sort( key=lambda row: row[2] )

This lambda is a small, anonymous function definition. These are used sometimes because it saves having to create a function which is only used once in a single sort() operation. Mentally, you can rewrite this to the following:

# some unique name for the replacement
def aLambda( row ):
    # return followed by the original lambda expression
    return row[2]

# the original statement replaced with the new function
jobData.sort( key=aLambda  )

Tables and Matrices – More Multi-Dimensional Loop-the-Loops

Some situations demand multi-dimensional sequences. In a business we might have a budget with cost centers and months as two dimensions of a large table. We can put the months across the top of the table, and fill in the cost centers down the rows of the table.

One need for multi-dimensional sequences is the mathematical matrix operations. These are not obvious to the non-mathematical audience, so we’ll cover this later in the section and in some of the exercises.

Let’s look at a simple two-dimensional example that doesn’t involve a matrix. Instead it involves a tabular summary. When rolling two dice, there are 36 possible outcomes. We can tabulate these in a two-dimensional table with one die in the rows and one die in the columns:

  1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

In Python, a multi-dimensional table can be done as a sequence of sequences. This table is a sequence of rows.

Each individual row, in turn is a sequence of individual cells. This allows us to use mathematical-like notation. Where the mathematician might say A_{i,j}, in Python we say A[i][j]. We want the row i from table A, and column j from that row.

Building a Table. We can build a table using a nested list comprehension. The following example creates a table as a sequence of sequences and then fills in each cell of the table.

table= [ [ 0 for i in range(6) ] for j in range(6) ]
print table
for d1 in range(6):
    for d2 in range(6):
        table[d1][d2]= d1+d2+2
print table
  1. Use a list comprehension to create a six by six table of zeros. Actually, the table is six rows. Each row has six columns.

    The comprehension can be read from inner to outer, like an ordinary expression. The inner list, [ 0 for i in range(6) ], creates a simple list of six zeros. The outer list, [ [...] for j in range(6) ] creates six copies of these inner lists.

  2. Print the grid of zeroes.

  3. Fill this list of lists with each possible combination of two dice. This is not the most efficient way to do this, but we want to illustrate several techniques with a simple example. We’ll look at each half in detail.

  4. Iterate over all combinations of two dice, filling in each cell of the table. This is done as two nested loops, one loop for each of the two dice. The outer for loop enumerates all values of one die, d1. The inner for loop enumerates all values of a second die, d2.

    Updating each cell involves selecting the row with table[d1]; this is a list of 6 values. The specific cell in this list is selected by ...[d2]. We set this cell to the number rolled on the dice, d1+d2+2. This program produced the following output.

[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]]
[[2, 3, 4, 5, 6, 7], [3, 4, 5, 6, 7, 8], [4, 5, 6, 7, 8, 9],
[5, 6, 7, 8, 9, 10], [6, 7, 8, 9, 10, 11], [7, 8, 9, 10, 11, 12]]

Better-Looking Output. The printed list of lists is a little hard to read. The following loop would display the table in a more readable form.

>>> for row in table:
...     print row
...
[2, 3, 4, 5, 6, 7]
[3, 4, 5, 6, 7, 8]
[4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9, 10]
[6, 7, 8, 9, 10, 11]
[7, 8, 9, 10, 11, 12]

As an exercise, we’ll leave it to the reader to add some features to this to print column and row headings along with the contents. As a hint, the "%2d" % value string operation might be useful to get fixed-size numeric conversions.

Summarizing A Table. Let’s summarize this two-dimensional table into a frequency table. The values of two dice range from 2 to 12. If we use a list with 13 elements, these elements will be identified with indexes from 0 to 12, allowing us to accumulate counts in this list.

fq= 13*[0]
print fq
for row in table:
    for c in row:
        fq[c] += 1
print fq[2:]
  1. We initialize the frequency table, fq, to be a list of 13 zeros.

  2. The outer for loop sets the variable row to each element of the original table variable. This decomposes the table into individual rows, each of which is a 6-element list.

  3. The inner for loop sets the variable c to each column’s value within the row. This decomposes the row into the individual values.

    We count the actual occurrences of each value, c by using the value as an index into the frequency table, fq. The increment the frequency value by 1.

This program produced the following output.

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1]

Using Indexes. There is an alternative to this approach. Rather than strip out each row sequence, we could use explicit indexes and look up each individual value with an integer index into the sequence.

for i in range(6):
    for j in range(6):
        c= table[i][j]
        fq[ c ] += 1

The outer loop sets the variable i to the values from 0 to 5. The inner loop sets the variable i to the values from 0 to 5.

We use the index value of i to select a row from the table, and the index value of i to select a column from that row. This is the value, c. We then accumulate the frequency occurrences in the frequency table, fq.

The first version has the advantage of directly manipulating the Python objects, it is somewhat simpler. The second version, however, is more like common mathematical notation, and more like other programming languages. It is more complex because of a level of indirection. Instead of manipulating the Python sequence, we access the objects indirectly via their index in a sequence.

Matrix Addition. We use this latter technique for managing the mathematically defined matrix operations. Matrix operations are done more clearly with this style of explicit index operations. We’ll show matrix addition as an example, here, and leave matrix multiplication as an exercise in a later section.

m1 = [ [1, 2, 3, 0], [4, 5, 6, 0], [7, 8, 9, 0] ]
m2 = [ [2, 4, 6, 0], [1, 3, 5, 0], [0, -1, -2, 0] ]
m3= [ 4*[0] for i in range(3) ]

for i in range(3):
    for j in range(4):
        m3[i][j]= m1[i][j]+m2[i][j]

In this example we created two input matrices, m1 and m2, each three by four. We initialized a third matrix, m3, to three rows of four zeros, using a comprehension. Then we iterated through all rows (using the i variable), and all columns (using the j variable) and computed the sum of m1 and m2.

List Processing Exercises

List Comprehension Exercises

  1. All Dice Combinations.

    Write a list comprehension that uses nested for-clauses to create a single list with all 36 different dice combinations from (1,1) to (6,6).

  2. Temperature Table.

    Write a list comprehension that creates a list of tuples. Each tuple has two values, a temperature in Fahrenheit and a temperature in Celsius.

    Create one list for Fahrenheit values from 0 to 100 in steps of 5 and the matching Celsius values.

    Create another list for Celsius values from -10 to 50 in steps of 2 and the matching Fahrenheit values.

Sorting Exercises

  1. Unique Values In A Sequence.

    In Accumulating Unique Values, we looked at accumulating the unique values in a sequence. Sorting the sequence leads to a purely superficial simplification. Sorting is a relatively expensive operation, but for short sequences, the cost is not much higher than the version already presented.

    Given an input sequence, seq, we can easily sort this sequence. This will put all equal-valued elements together. The comparison for unique values is now done between adjacent values, instead of a lookup in the resulting sequence.

    Unique Values of a Sequence, seq.

    1. Initialize. Set result \gets list().

      Sort the input sequence, seq.

    2. Loop. For each value, v in seq.

      Already in result ? If v is the last element in result: ignore it.

      If v is not the last element in result:

      Append v to the sequence result.

    3. Result. Return list result, which has unique values from seq.

  2. Portfolio Reporting.

    In Blocks of Stock, we presented a stock portfolio as a sequence of tuples. Plus, we wrote two simple functions to evaluate purchase price and total gain or loss for this portfolio.

    Develop a function (or a lambda form) to sort this portfolio into ascending order by current value (current price * number of shares). This function (or lambda) will require comparing the products of two fields instead of simply comparing two fields.

Multidimensional Exercises

  1. Matrix Formatting.

    Given a 6 \times 6 matrix of dice rolls, produce a nicely formatted result. Each cell should be printed with a format like "| %2s" so that vertical lines separate the columns. Each row should end with an '|'. The top and bottom should have rows of “—-“‘s printed to make a complete table.

  2. Three Dimensions.

    If the rolls of two dice can be expressed in a two-dimensional table, then the rolls of three dice can be expressed in a three-dimensional table. Develop a three dimensional table, 6 x 6 x 6, that has all 216 different rolls of three dice.

    Write a loop that extracts the different values and summarizes them in a frequency table. The range of values will be from 3 to 18.

Sequence FAQ’s

How can there even be an immutable data structure? That sounds like a contradiction.

Let’s be sure to separate the data object’s immutability from setting the value of a variable. A variable’s value can be a series of different immutable objects. In many respects, changing the value of a variable is what defines the state of our program, and switching that value from one object to another object is what our program is supposed to do. Other times, the object is large, or complex, and it is somewhat more efficient to alter the object rather than create a new one.

Look at the following example. Here, the variable b changes from "some long" to "some long string".

a= "some"
b= a + " long"
b= b + " string"

None of the string objects ("some", " long" or " string") change. There are two new strings that are built by this program: "some long" and "some long string". Neither of these change after they are built as the program runs.

When the program ends, two strings ("some" and "some long string") are associated with variables a and b. The remaining strings are quietly removed from memory, since they are no longer needed.

While the strings themselves are immutable, the values assigned to our variables reflect our intent to assemble a long string from smaller pieces.

Since lists do everything tuples do and are mutable, why bother with tuples?

Immutable tuples are more efficient than variable-length lists. There are fewer operations to support. Once the tuple is created, it can only be examined. When it is no longer referenced, the normal Python garbage collection will release the storage for the tuple.

Many applications rely on fixed-length tuples. A program that works with coordinate geometry in two dimensions may use two-tuples to represent ( x , y ) coordinate pairs. Another example might be a program that works with colors as three-tuples, ( r , g , b ), of red, green and blue levels. A variable-length list is not appropriate for these kinds of fixed-length tuple.

Wouldn’t it be more efficient to allow mutable strings?

Variable length strings are most commonly implemented by imposing an upper limit on a string’s length. Having this upper limit is unappealing because it leads to the possibility of a program having data larger than this upper limit. Indeed, this “buffer overflow” problem is at the root of many security vulnerabilities.

This fixed upper limit model is embodied in the C string libraries. Strings can vary in length, but require the programmer set a fixed upper bound on the length. This amount of storage is allocated, and the string can vary up to that limit. While this provides excellent performance, it does impose an arbitrary restriction. Some languages (Java for example) stop gracefully when the string limit is exceeded, others (C for example) behave badly when strings exceed their declared length.

In effect, Python has strings of arbitrary size. Python does this by creating new strings instead of attempting to modify existing strings. Python is freed from this security issues associated with variable length strings and the resulting buffer overflow problem.

I noticed map, filter and reduce functions in the Python reference manual. Shouldn’t we cover these?

These functions are actually rather difficult to describe in this context because they reflect a view of programming that is fundamentally different from the approach we’ve taken in this book. We’re covering programming from an imperative point of view. These three functions reflect the functional viewpoint. Both approaches are suitable for newbies. We had to pick one, and the coin toss came up imperative.

In the long run, these functions aren’t that useful. Why? Because the List Comprehension (see List Construction Shortcuts) does everything that the map() and filter() functions do, making them unnecessary. The reduce design is often much more clearly expressed with an explicit for or while statement than with the reduce() function.

Table Of Contents

Previous topic

Flexible Sequences : the list

Next topic

The Unexpected : The try and except statements

This Page