Flexible Sequences : The list¶

A list is a mutable sequence of items. It differs in important ways from an immutable tuple. Yet, being a sequence, it shares a large number of common features with other data structures.

We’ll look at some common patterns of list processing in From Outlines to Bank Lines – Stacks and Queues.

We’ll provide two sets of exercises, because lists are that important. The first set, List Exercises, covers the basics. The second set, More Advanced List Exercises, is a number of more challenging exercises to be sure you have a chance to explore all the things lists can do.

What Does Python Mean by “List?”¶

A list is a variable length sequence of Python objects. This is the most flexible kind of sequence; it can contain any kind of Python data object. It can grow or shrink as needed. We often use lists to collect a set of data elements like cards in a blackjack hand: we get two cards to start, and then more and more cards as we ask for a hit.

Let’s look at this definition in detail.

• Since a list is a sequence, all of the common operations and built-in functions of sequences apply. This includes +, * and [].
• Since a list is mutable, it can be changed. Items can be added to the list or removed from the list. Unlike strings and tuples, these operations do not create a new list, but change the state of the existing list.
• Since lists are an extension to the basic sequence type, lists have unique method functions.

Let’s look at a list of Roulette wheel spins. Here’s a depiction of a list of four items, the Python value is ["red", "red", "black", "red"]. Each item has a position that identifies where it is in the list.

 position 0 1 2 3 item ‘red’ ‘red’ ‘black’ ‘red’

Because a list is mutable, new items can be added to the list. These new items can be inserted in any position. We can append to the end of the list. We can put elements into the list by inserting before any of existing positions. If we insert before position zero, we will extend the list at the beginning. In addition to extending the list, we can replace any of the items in the list.

How We Write Lists¶

Lists are created by surrounding the list of objects with [] and separating the objects with commas (,). An empty list is simply []. As with tuples, an extra comma at the end of the list is graciously ignored.

Examples:

myList = [ 2, 3, 4, 9, 10, 11, 12 ]
history = [ ]

myList: A simple list of seven values; all of which happen to be numbers. An empty list; a list can be expanded by appending new elements.

List elements do not have to be the same type. A list can be a mixture of any Python data types, including lists, tuples, strings and numeric types.

Important

But Wait!

But wait! you say. The [] characters are used to pick items out of tuples and characters out of strings. How can they also be used to define a new list object?

While it is confusing to use a single punctuation mark in two ways, this isn’t the only punctuation mark that suffers from this fate. When we introduced tuples, we looked at ()s and how Python is able to distinguish functions, expressions, and a tuples: math.sqrt(42), (42) and (42,) are a function expression, a simple expression and a tuple, respectively. The ., also serves several purposes: it punctuates floating-point numbers, and it also separates the module name from an object in that module, and it separates an object from the name of an object method.

In the case of []s, the context defines precisely how to interpret these characters.

• When you have something like a[b], this is item selection from a sequence.
• When you have [b] by itself, this is a one-element list.
• When you have [9, 8, 7][2], the [9, 8, 7] is a list; The [2] selects an item from that list.

Lists permit a sophisticated kind of display called a comprehension. We’ll revisit this in some depth in List Construction Shortcuts. We have to mention it now because a comprehension has syntax similar to a literal list.

As a teaser, consider the following:

>>> [ 2*i+1 for i in range(6) ]
[1, 3, 5, 7, 9, 11]


This statement creates a list using a list comprehension. A comprehension starts with a candidate list (range(6), in this example) and derives the list values from the candidate using an expression (2*i+1 in this example). A great deal of power is available in comprehensions.

This is a kind of literal list of valiues, using the [] syntax; it can be used anywhere a literal list is appropriate.

List Factory Functions¶

In addition to literal values, the following function also creates a list object.

list(sequence) → list

Creates a list from the items in sequence. If the sequence is omitted, an empty list is created.

>>> list()
[]
>>> list( ("black","red" ) )
['black', 'red']


In the second example, a two-element tuple ("black","red" )) – a kind of sequence – is transformed into a list of individual elements.

** The range() function is used heavily, primarily to control the for statement. Technically, it generates a list, so we include it here, after we introduced it briefly in The for Statement.

range([start], stop[, step]) → list

The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0. The full form returns a list of plain integers [ start , start + step , start + 2 * step , ... ]. If step is positive, the last element is the largest start + i * step less than stop. If step is negative, the last element is the largest start + i * step greater than stop . step must not be zero (or else ValueError is raised).

Operations We Perform On Lists¶

Because a list is a sequence, the three standard sequence operations (+, *, []) can be performed with lists.

The + operator. The + operator creates a new list as the concatenation of the arguments.

>>> ["field"] + [2, 3, 4] + [9, 10, 11, 12]
['field', 2, 3, 4, 9, 10, 11, 12]


The * operator. The * operator between lists and numbers (number * list or list * number) creates a new list that is a number of repetitions of the input list.

>>> 2*["pass","don't","pass"]
['pass', "don't", 'pass', 'pass', "don't", 'pass']


The [] operator. The [] operator selects an item or a slice from the list. There are two forms for picking items or slices from a list.

This form extracts a single item.

list[index]


Items are numbered from 0 to len(list)-1. Items are also numbered in reverse from -len(list) to -1.

This extracts a slice, creating a new sequence from a sequence.

list[start:end]


Items from start to end-1 are chosen to create a new list as a slice of the original list; there will be end - start items in the resulting list. If start is omitted it is the beginning of the list (position 0), if end is omitted it is the end of the list (position -1).

For more information on how the numbering works for the [] operator, see Numbering from Zero.

In the following example, we’ve constructed a list, rolls where each of the six items in the list is a tuple object. Each of these tuple objects is a pair of dice. When we say rolls[2], we’re extracting the item at position 2, which is the third item from the list. In this example, it’s a “hard 4”, a pair of 2’s.

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> rolls[2]
(2, 2)
>>> print(rolls[:3], 'split', rolls[3:])
[(6, 2), (5, 4), (2, 2)] split [(1, 3), (6, 5), (1, 4)]
>>> rolls[-1]
(1, 4)
>>> rolls[-3:]
[(1, 3), (6, 5), (1, 4)]


The % Operator. The string format operator works between string and list. We prefer to use str.format(), however.

Built-in Functions for Lists¶

A number of built-in functions create or deal with lists. The following functions apply to all sequences, including tuples and strings.

len(iterable) → integer

Return the number of items of a iterable (sequence, set or mapping).

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> len(rolls)
6

max(sequence) → value

Returns the largest value in the iterable (sequence, set or mapping).

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> max(rolls)
(6, 5)


Recall that tuples are compared element-by-element. The tuple (6, 5) has a first element that is greater than all but one other tuple, (6, 2). If the first elements are the same, then the second element is compared.

min(sequence) → value

Returns the smallest value in the iterable (sequence, set or mapping).

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> min(rolls)
(1, 3)


Recall that tuples are compared element-by-element. The tuple (1, 3) has a first element that is less than all but one other tuple, (1, 4). If the first elements are the same, then the second element is compared.

Iteration Functions. These functions are most commonly used with a for statement to process list items.

enumerate(iterable) → iterator

Enumerate the elements of a set, sequence or mapping. This yields a sequence of tuples based on the original list. Each of the tuples has two elements: a sequence number and the item from the original list.

This is generally used with a for statement. Here’s an example:

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> for position, roll in enumerate( rolls ):
...     print(position, sum(roll))
...
0 8
1 9
2 4
3 4
4 11
5 5

sorted( iterable [,cmp] [,key] [,reverse] ) → iterator

This iterates through an iterable object like a list in ascending or descending sorted order. Unlike the sort() method function, this does not update the list, but leaves it alone.

This is often used with a for statement. It can also be used with the list() function to create a copy of a list in sorted order.

Here’s an example:

>>> rolls=[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]
>>> descending= list( sorted( rolls, reverse=True ) )
>>> descending
[(6, 5), (6, 2), (5, 4), (2, 2), (1, 4), (1, 3)]
>>> rolls
[(6, 2), (5, 4), (2, 2), (1, 3), (6, 5), (1, 4)]


Now there are two copies of the original list: rolls is in the original order; descending is in descending order.

reversed(sequence) → iterator

This iterates through a sequence in reverse order.

This is generally used with a for statement. Here’s an example:

>>> stats = [ (43,'red'), (52, 'black'), (5,'zero') ]
>>> for count, color in reversed( stats ):
...     print(count, color)
...
5 zero
52 black
43 red

zip(sequence, ...) → sequence

This creates a new sequence of tuples. Each tuple in the new sequence has values taken from the input sequences.

>>> color = [ "red", "green", "blue" ]
>>> level = [ 20, 30, 40 ]
>>> zip( color, level )
[('red', 20), ('green', 30), ('blue', 40)]


Aggregation Functions. These functions reduce a list to a single aggregate value.

sum(iterable) → number

Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.

>>> range(1,8*2,2)
[1, 3, 5, 7, 9, 11, 13, 15]
>>> sum(_)
64

all(iterable) → boolean

Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.

The all() function is often used with List Comprehension, which we’ll look at in List Construction Shortcuts.

>>> compare_1 = [ 2<=3, 5<7, 22%2 == 0 ]
>>> all( compare_1 )
True
>>> compare_2 = [ 2 > 3, 5<7, 22%2 == 0 ]
>>> all( compare_2 )
False
>>> compare_2
(False, True, True)

any(iterable) → boolean

Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.

The any() function is often used with List Comprehension, which we’ll look at in List Construction Shortcuts.

>>> roll = 7
>>> test = [ roll == 2, roll == 3, roll == 12 ]
>>> any( test )
False
>>> test.append( roll == 7 )
>>> test.append( roll == 11 )
>>> any( test )
True
>>> test
[False, False, False, True, False]


Comparing Two Lists¶

The standard comparisons (<, <=, >, >=, ==, !=, in and not in) work the same with all sequences: lists, tuples and strings. The list are compared element by element. If the corresponding elements are the same type, ordinary comparison rules are used. If the corresponding elements are different types, the type names are compared, since there is no other rational basis for comparison.

d1= random.randrange(6)+1
d2= random.randrange(6)+1
if d1+d2 in [2, 12] + [3, 4, 9, 10, 11]:
print("field bet wins on ", d1+d2)
else:
print("field bet loses on ", d1+d2)


This will create two random numbers, simulating a roll of dice. If the number is in the list of field bets, this is printed. Note that we assemble the final list of field bets from two other lists. In a larger application program, we might distinguish between the field bets based on different payout odds.

We have to note that comparing two lists which have very different contents may not be sensible. When we compare two strings, we can use this to put them into alphabetic order. In the case of comparing tuples, we generally compare tuples of the same length. For example, we might compare some three-tuples that encode red-green-blue colors. This is consistent with the ways we use tuples to represent a piece of data that has a fixed number of individual items.

In the case of lists, however, we have to be sure that we have an obvious meaning for the comparison. Python will allow us to compare any two list objects. As designers of programs, we have to be sure we are making a sensible comparison between objects that should be compared in the first place. We don’t want to have programs that do senseless things like compare a list of the 46 highest peaks in New York with the list of ingredients in Fettucini Alfredo.

Methods to Transform Lists¶

A list object has a number of member methods. These can be grouped arbitrarily into transformations, which change the list, and accessors, which returns a fact about a list.

Transformations. The following method functions make changes to the given list. With the exception of pop(), these method functions don not return a value.

Do not do this; it doesn’t do anything useful.

a = ['some','list','of']
a = a.append( 'values' )


The list.append() method function does not return a new value. It modifies the object. The return value happens to be None.

class list
list.append(object)

Update list l by appending object to end of the list.

>>> a=["red","orange","yellow"]
>>> a.append("green")
>>> a
['red', 'orange', 'yellow', 'green']

list.extend(sequence)

Extend the list by appending sequence elements. Note the difference from append(object), which treats the argument as a single list object.

>>> a=["red","orange","yellow"]
>>> a.extend(["green","blue"])
>>> a
['red', 'orange', 'yellow', 'green', 'blue']

list.insert(index, object)

Update list l by inserting sequenceobject before position index. If index is greater than len(list), the object is simply appended. If index is less than zero, the object is prepended.

>>> a=["red","yellow","green"]
>>> a.insert(1,"orange")
>>> a
['red', 'orange', 'yellow', 'green']

list.pop(index) → item

Remove and return item at index (default is the last element, with an index of -1). An exception is raised if the list is already empty. This is the opposite of append(). Further, this is both a transformation of the list as well as an accessor that returns an item from the list.

>>> a=["red","yellow","green","blue"]
>>> a.pop()
'blue'
>>> a
['red', 'yellow', 'green']

list.remove(value)

Remove first occurrence of value from list l. An exception is raised if the value is not in the list.

This example has a list of four initial values, a string, a number, the result of an expression (which will be a number), and a tuple. We’ll remove the tuple (4,3,"craps") from the list.

>>> a=["red",21,6*6,(4,3,"craps")]
>>> a.remove( (4,3,"craps") )
>>> a
['red', 21, 36]

list.reverse()

Reverse the items of the list l. This is done “in place”, it does not create a new list.

>>> a=["red","yellow","green","blue"]
>>> a.reverse()
>>> a
['blue', 'green', 'yellow', 'red']

list.sort([cmp, key, reverse])

Sort the items of the list . This is done “in place”, it does not create a new list. This does not return a value, either; it transforms the list.

If no comparison function (cmp) or key function (key) is provided, the items are simply compare and sorted into the desired order.

You can either provide a compare function, or a key function. It doesn’t make sense to provide both.

If a key function, key is given, it must extract some comparable value from the list element. We’ll return to this when we address lists of lists in Sorting a List: Expanding on the Rules.

If the reverse keyword, reverse, is True, the list is sorted in descending order. If reverse is omitted or set to False, the list is sorted in ascending order.

>>> a=["red","yellow","green","blue"]
>>> a
['red', 'yellow', 'green', 'blue']
>>> a.sort()
>>> a
['blue', 'green', 'red', 'yellow']


The list sort() transformation is very powerful. We’ll look at more sophisticated sorting options in Sorting a List: Expanding on the Rules. For now, let’s just look at the following simple examples. We’ll sort simple lists of numbers and strings just to show you how this works.

>>> a=  [ 10, 1, 3, 9, 4 ]
>>> a.sort()
>>> a
[1, 3, 4, 9, 10]
>>> b= [ "word", "topic", "subject", "part", "section", "chapter" ]
>>> b.sort()
>>> b
['chapter', 'part', 'section', 'subject', 'topic', 'word']


Accessors. The following method functions determine a fact about a list and return that as a value.

class list
list.count(value) → integer

Return number of occurrences of value in list l.

>>> a=["red","red","black","red"]
>>> a.count("red")
3
>>> a.count("green")
0
>>> a.count("black")
1

list.index(value) → integer

Return index of first occurrence of value in the list. If the item is not found, this will raise a ValueError.

If the given value is in the list, then list[ list.index(value) ] is value.

>>> a=["red","yellow","green","blue"]
>>> a.sort()
>>> a.index('red')
2
>>> a[2]
'red'
>>> a
['blue', 'green', 'red', 'yellow']

list.pop(index) → item

Remove and return item at index (default is the last element, with an index of -1). An exception is raised if the list is already empty. This is the opposite of append(). Further, this is both a transformation of the list as well as an accessor that returns an item from the list.

>>> a=["red","yellow","green","blue"]
>>> a.pop()
'blue'
>>> a
['red', 'yellow', 'green']


Statements and Lists¶

There are three kinds of statements that are associated with tuples: the various kinds of assignment statements, and the for statement deals with sequences of all kinds. Additionally the del statement can update a list by removing an element.

The Assignment Statements. The variation on the assignment statement called multiple-assignment statement works with lists as well as tuples. We looked at this in Combining Assignment Statements. Multiple variables are set by decomposing the items in the list.

>> x,y = [1,"hi"]
>>> y
'hi'
>>> x
1

This will only work of the list has a fixed and known number of elements. This kind of multiple assignment makes more sense when working with tuples, which are immutable, rather than lists, which can vary in length.

The for Statement. The for statement works directly with sequences. When we first looked at for statements, we used the range() function to create a list for us. We can also create lists other ways. We’ll see still more list construction techniques in the next chapter.

Here’s the basic syntax for providing a literal sequence of values. We provide the list object that we want the for statement to use as the sequence of values. In this example, the variable i will be set to each value in the list, the prime numbers between 2 and 19.

s= 0
for i in [2,3,5,7,11,13,17,19]:
s += i
print("total", s)


The del Statement. The del statement removes items from a list. For example

>>> i = range(10)
>>> del i[0], i[2], i[4], i[6]
>>> i
[1, 2, 4, 5, 7, 8]


This example reveals how the del statement works.

The i variable starts as the list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ].

Remove i[0] and the variable is [1, 2, 3, 4, 5, 6, 7, 8, 9].

Remove i[2] (the value 3) from this new list, and get [1, 2, 4, 5, 6, 7, 8, 9].

Remove i[4] (the value 6) from this new list and get [1, 2, 4, 5, 7, 8, 9].

Finally, remove i[6] and get [1, 2, 4, 5, 7, 8].

From Outlines to Bank Lines – Stacks and Queues¶

Stacks and Queues are two ways that we can use lists. Stacks and queues are mutable sequences: items are put into them and removed from them. They are slight specializations of our Python’s more general list. The difference between a stack and queue is the rules for putting things into the list and getting things out of the list.

Stack. A stack can be described as a last-in-first-out (LIFO) list. The last element inserted into the stack is the first element to be removed from the stack. When you stack dishes: the last dish is on the top of the stack; it will be the first dish removed from the stack.

A number of algorithms use a stack to keep track of nested processing contexts. For example, an outline is a nested structure: parts contain chapters, which contain sections, which contain sub-sections. A part is deeply nested, and has chapters stacked on “top” of it. A chapter, in turn, has sections stacked on top of it.

When we read (or write) we begin the part and set it on our mental stack. We then look at the chapter, opening it and putting it on our mental stack. We look at a section, putting it on the stack while we read the paragraphs, and then removing it from the stack when we are done with the section. The last section onto the stack is the first section off the stack when we get to a new title.

Queue. A queue can be described as a first-in-first-out (FIFO) list. Elements are stored temporarily in a queue, and processed in the order they were received. The line at the coffee shop is a queue: the first person in line is the first person to get their Cappucino.

Queues are often used as buffers to match processing speeds between fast and slow operations. For example, it takes less than a minute for my computer to generate a document with 402 pages, but my printer will take almost an hour to print the document. To balance this speed difference, the operating system creates a queue of print jobs.

Using Lists. Both the stack and queue are essentially a list. In the case of a stack, it is a list that has items added and removed at the last position only. A queue, on the other hand, has items appended at the end, but removed from the front of the list.

The append() and pop() method functions can be used to create a standard stack. The append() function places an item at the end of the list (or top of the stack), where the pop() function can remove it and return it.

>>> stack= []
>>> stack.append("part I")
>>> stack.append("chapter 1")
>>> stack.append("intro section" )
>>> stack.pop()
'intro section'
>>> stack.append("another section" )
>>> stack.pop()
'another section'
>>> stack.pop()
'chapter 1'
>>> stack.pop()
'part I'
>>> stack
[]


The append() and pop(0)() functions can be used to create a standard queue, or first-in-first-out (FIFO) list. The append() function places an item at the end of the queue. Evaluating pop(0)() removes the first item from the queue it and returns it.

>>> queue=[]
>>> queue.append("part I")
>>> queue.append("part II")
>>> queue.pop(0)
'part I'
>>> queue.append("part III")
>>> queue.pop(0)
'part II'
>>> queue.append("part IV")
>>> queue.pop(0)
'part III'
>>> queue.pop(0)
'part IV'
>>> queue
[]


List Exercises¶

1. Creating a Sequence of Outcomes.

Evaluating a betting strategy amounts to collecting a sequence of outcomes from playing a number of games. We want a betting strategy that is better than betting at random. One basis for comparison, then, is truly random results with a very simple betting strategy like “always bet on black”.

We’ll assume that a single round of play in a casino game is an elaborate coin toss. In the case of Roulette, there are a large number of outcomes, but we can focus on just betting red and black. This makes the game almost a coin toss. There are a total of 38 outcomes on an American table, composed of 18 red, 18 black and 2 green. If we play a number of rounds we’ll win some and lose some. If we stay at the table for 200 spins, our results could vary from the really unlikely 200 wins to the equally unlikely 200 losses. In the middle are mixes of wins and losses. For a truly fair coin toss, this range of values is called the Gaussian or normal distribution. The question we need to have answered, is what is the average result of sessions that last for 200 spins of the wheel?

We can create a sequence of values that represents the wheel by assembling a list that has 18 copies of 'red', 18 copies of 'black' and two copies of 'green'. We can then use the random.choice() function to pick one of these values as the result of the spin.

If the chosen result is 'black', we’ve won, and our stake increases by one bet. Otherwise, we’ve lost and our stake decreases by one bet.

To simulate a session of betting, we initialize our table stakes to 100 betting units. This means we go to a $5 Roulette table with$500 in our pocket. We can create a loop which does the following 200 times:

1. Use the random.choice() function to pick one of 38 values as the result of the spin.
2. Increase or decrease the stake depending on the color chosen.

Each session, therefore, will have a result that is a single number, the final amount we left the table with. You can check your result by simulating a few thousand sessions and accumulating a sequence of final amounts.

Compute the average of your sequence of final amounts. You should have an average result of about 89. The standard deviation should be around 14. What does this mean? We can expect to lose 11 betting units over 200 spins of the wheel.

2. Creating a Different Sequence of Outcomes.

In the previous exercise, we created a random sequence of outcomes for Roulette using a simple “always bet on black” betting strategy. What if we want to use a bet with a different payout? For example, the three column bets pay 2:1 when they win. How does this change our results?

In Roulette there are 12 column 1, 12 column 2, 12 column 3, and 2 zero results on an American table. As with the previous exercise (Creating a Sequence of Outcomes), we can construct a sequence that represents the wheel by assembling a list of 38 elements that have the proper number of "col1", "col2", "col3" and "zero" values. We can then use the random.choice() function to pick one of these values as the result of the spin.

We’ll assume a consistent bet on "col3". We’ll choose a random result from the wheel sequence; if this result is "col3", we’ve won, and our stake increases by two bets. Otherwise, we’ve lost and our stake decreases by one bet.

We can revise our previous example to use this wheel, bet and result.

Compute the average of your sequence of final amounts. You should have an average result of about 89. The standard deviation should be around 19. What does this mean? We can expect to lose 11 betting units over 200 spins of the wheel.

3. Creating a Sequence of Really Bad Outcomes.

In the previous exercises, we created a random sequence of outcomes for Roulette using some simple “always bet on black” or “always bet on column three” betting strategy. What if we want to use a bet with a really bad payout? For example, there is a bet that covers zero, double zero, one, two and three. This bet will win 5/38th of the time, but pays as if it won 6.33/38 of the time. How does this change our results?

In Roulette there are 5 '5bet' results and 33 'other' results on an American table. As with the previous exercises (Creating a Sequence of Outcomes), we can construct a sequence that represents the wheel by assembling a list of 38 elements that have the proper number of "5bet", "other" values. We can then use the random.choice() function to pick one of these values as the result of the spin.

We’ll assume a consistent bet on "5bet". We’ll choose a random result from the wheel sequence; if this result is :"5bet", we’ve won, and our stake increases by six bets. Otherwise, we’ve lost and our stake decreases by one bet.

We can revise our previous example to use this wheel, bet and result.

Compute the average of your sequence of final amounts. You should have an average result of about 83. The standard deviation should be around 34. What does this mean? We can expect to lose 17 betting units over 200 spins of the wheel.

4. Random Number Evaluation.

Before using a new random number generator, it is wise to evaluate the degree of randomness in the numbers produced. A variety of clever algorithms look for certain types of expected distributions of numbers, pairs, triples, etc. This is one of many random number tests.

If we generate thousands of random numbers between 0 and 9, we expect that we’ll have the name number of 0’s as 9’s. Specifically, we expect that 1/10th of our numbers are 0’s, 1/10th are 1’s, etc. Actually random numbers are – well – random, so they will deviate slightly from this perfection.

This difference between actual and expected can be used for a more sophisticated statistical test called a Chi-Squared test. The formula is pretty simple, but the statistics beyond this book. The idea is, however, that the Chi-Squared test can help us tell whether our data is too well organized, meets our expectation for randomness, or is too disorganized.

What we’ll do is generate random numbers, and assign them to one of ten different bins. When we’ve done this for a few thousand samples, we’ll compare the count of numbers in each bin with our expectation to see if we’ve got a respectable level of randomness.

Use random.random() to generate an array of 1000 random samples; assign this to the variable u. These numbers will be uniformly distributed between 0 and 1.

Distribution test of a sequence of random samples, U

1. Initialize. Initialize count to a list of 10 zeros.
2. Examine Samples. For each sample value, v, in the original set of 1000 random samples, U.
1. Coerce Into Range. Set . Multiply by 10 and truncate and integer to get a a new value in the range 0 to 9.
2. Count. Increment count [x] by 1.
3. Report. We expect each count to be 1/10th of our available samples. We need to display the actual count and the % of the total. We also need to calculate the difference between the actual count and the expected count, and display this.
1. Accumulating Unique Values.

We’ll use the Bounded Linear Search algorithm to locate just the unique values in a sequence of values. This could, for example, be a sequence of words extracted from a document. We might also use a procedure like this to evaluate a sequence of random numbers to be sure that all of the expected values were actually found in the sequence somewhere.

Let’s assume we have a sequence, seq with 1000 values that are supposed to be random numbers between 0 and 37, inclusive. We created these numbers with something like the following.

import random
[ random.randrange(0,37) for i in range(1000) ]


[You may have already noticed the error in the above statement.]

We can use the following procedure to do a complete evaluation.

Unique Values of a Sequence, seq

1. Initialize. Set .

2. Loop. For each value, v, in seq.

We’ll use the Bounded Linear Search to see if v occurs in uniq.

1. Initialize. Set .

Append v to the list uniq.

2. Search. while : increment i.

At this point . The question is whether or not.

3. New Value?. if : v is unique.

4. Existing Value?. if : v is a duplicate of .

Delete , the value we added.

3. Result. Return array uniq, which has unique values from seq.

You may also notice that this fancy Bounded Linear Search is suspiciously similar to the index() method function of a list. Rewrite this using uniq.index() instead of the Bounded Linear Search in step 2.

When we look the set collection, you’ll see another way to tackle this problem.

1. Binary Search.

This is not as universally useful as the Bounded Linear Search (above) because it requires the data be sorted.

Binary Search a sorted Sequence, seq, for a target value, tgt

1. Initialize. .

. This is the midpoint of the sorted sequence.

2. Divide and Conquer. While .

If : . Move h to the midpoint.

If : . Move l to the midpoint.

. Compute a midpoint of the new, smaller sequence.

3. Result. If : return m

2. Quicksort.

The super-fast sort algorithm.

As a series of loops it is rather complex. As a recursion it is quite short. This is the same basic algorithm in the C libraries.

Quicksort proceeds by partitioning the list into two regions: one has all of the high values, the other has all the low values. Each of these regions is then individually sorted into order using the quicksort algorithm. This means the each region will be subdivided and sorted.

For now, we’ll sort an array of simple numbers. Later, we can generalize this to sort generic objects.

Quicksort a List, a between elements lo and hi

1. Partition

1. Initialize. . Setup for partitioning between ls and hs.

.

2. Swap To Partition. while :

If : increment ls by 1. Move the low boundary of the partitioning.

If : swap the values .

If : decrement hs by 1. Move the high boundary of the partitioning.

If :, swap the values .

2. Quicksort Each Partition.

QuickSort( a , lo, middle )

QuickSort( a , middle+1, hi )

3. Recursive Search.

This is also a binary search: it works using a design called “divide and conquer”. Rather than search the whole list, we divide it in half and search just half the list. This version, however is defined with a recursive function instead of a loop. This can often be faster than the looping version shown above.

Recursive Search a List, seq for a target, tgt, in the region between elements lo and hi.

1. Empty Region? If : return -1 as a code for “not found”.
2. Middle Element. .
3. Found? If : return m.
4. Lower Half? If : return recursiveSearch ( seq, tgt, lo, m )
5. Upper Half? If : return recursiveSearch( seq, tgt, m+1, hi )
4. Sieve of Eratosthenes.

This is an algorithm which locates prime numbers. A prime number can only be divided evenly by 1 and itself. We locate primes by making a table of all numbers, and then crossing out the numbers which are multiples of other numbers. What is left must be prime.

Sieve of Eratosthenes

1. Initialize. Create a list, prime of 5000 booleans, all True, initially.

.

2. Iterate. While .

1. Find Next Prime. While :

Increment p by 1.

2. Remove Multiples. At this point, p is prime.

Set .

while .

.

Increment k by p.

3. Next p. Increment p by 1.

3. Report. At this point, for all p if prime [ p ] is true, p is prime.

while :

if : print p

The reporting step is a “filter” operation. We’re creating a list from a source range and a filter rule. This is ideal for a list comprehension. We’ll look at these in List Construction Shortcuts.

Formally, we can say that the primes are the set of values defined by . This formalism looks a little bit like a list comprehension.

5. Polynomial Arithmetic.

We can represent numbers as polynomials. We can represent polynomials as arrays of their coefficients. This is covered in detail in [Knuth73], section 2.2.4 algorithms A and M.

Example: has the following coefficients: ( 4, 0, 3, 1 ).

The polynomial is represented as ( 2, -3, -4 ).

The sum of these is ; ( 4, 2, 0, -3 ).

The product these is ; ( 8, -12, -10, -7, -15, -4 ).

You can apply this to large decimal numbers. In this case, x is 10, and the coefficients must all be between 0 and x-1. For example, .

1. Result Size. .

Set p1 to a tuple of zeros + p.

Else: Set p1 to p.

Set q1 t a tuple of zeroes + q.

Else, Set q1 to q.

4. Add. Add matching elements from p1 and q1 to create result, r.

5. Result. Return r as the sum of p and q.

Multiply Polynomials, x, y

1. Result Size. .

Initialize the result list, r, to all zeros, with a size of .

2. For all elements of x. while :

For all elements of y. while :

Set .

3. Result. Return a tuple made from r as the product of x and y.

6. Dutch National Flag.

A challenging problem, one of the hardest in this set. This is from Edsger Dijkstra’s book, A Discipline of Programming [Dijkstra76].

Imagine a board with a row of holes filled with red, white, and blue pegs. Develop an algorithm which will swap pegs to make three bands of red, white, and blue (like the Dutch flag). You must also satisfy this additional constraint: each peg must be examined exactly once.

Without the additional constraint, this is a relatively simple sorting problem. The additional constraint requires that instead of a simple sort which passes over the data several times, we need a more clever sort.

Hint: You will need four partitions in the array. Initially, every peg is in the “Unknown” partition. The other three partitions (“Red”, “White” and “Blue”) are empty. As the algorithm proceeds, pegs are swapped into the Red, White or Blue partition from the Unknown partition. When you are done, the unknown partition is reduced to zero elements, and the other three partitions have known numbers of elements.