In this chapter we’ll cover the common features of the various kinds of collections which keep items in sequence. This will set the stage for the following chapters:
In this section we’ll define what we mean by sequence in Sequence means “In Order”. We’ll talk about designing programs that use sequences in Working With a Sequence. We’ll compare the four kinds of sequences in Subspecies of Sequences. We’ll look at the common features of sequences in Features of a Sequence.
A sequence is a collection of individual items. A sequence keeps the items in a specific order, which means we can identify each item by its numerical position within the collection. Some sequences (like the tuple) have a fixed number of elements, with static positions in the sequence. Other sequences (like the list) have a variable number of elements, and possibly dynamic positions in the sequence.
Python has other collections which are not ordered. We’ll get to those in More Data Collections.
Here’s a depiction of a sequence of four items. Each item has a position that identifies the item in the sequence.
| position | 0 | 1 | 2 | 3 |
| item | 3.14159 | 'two words' | 2048 | (1+2j) |
Sequences are used internally by Python. A number of statements and functions we have covered have sequence-related features. We’ll revisit a number of functions and statements to add the power of sequences to them. In particular, the for statement is something we glossed over in The for Statement.
The idea that a for statement processes elements in a particular order, and a sequence stores items in order is an important connection. As we learn more about these data structures, we’ll see that the processing and the data are almost inseparable.
It turns out that the range() function that we introduced generates a sequence object. You can see this object when you do the following:
>>> range(6)
[0, 1, 2, 3, 4, 5]
>>> range(1,7)
[1, 2, 3, 4, 5, 6]
>>> range(2,36,3)
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35]
We’ll look at the range() function and how it generates list objects in detail in Flexible Sequences : the list.
The typical outline for programs what work with sequences is the following. This is pretty abstract; we’ll follow this outline with a more concrete example.
Let’s say that we have a betting strategy for Roulette that we would like to simulate and collect statistics on the strategy’s performance. The verb collect is a hint that we will have a collection of samples, and a sequence is an appropriate type of collection.
Let’s work backwards from our goal and see how we’ll use collections to do this simulation. Once we have all of the necessary steps that lead to our goal, we can just reverse the order of the steps and write our program.
Print Results. We are done when we have printed the results from our simulation and analysis. In this case, the results are some simple descriptive statistics: the mean (“average”) and the number of samples.
To print the values, we must have computed them.
Compute Mean. The mean is the sum of the samples divided by the count of the samples. The sum is a reduction from the collection of outcomes, as is the count.
To compute the sum and the count, we must have a collection of individual results from playing Roulette.
Create Sample Collection. To create the samples, we have to simulate our betting strategy enough times to have meaningful statistics. We’ll use an iteration to create a collection of 100 individual outcomes of playing our strategy. Each outcome is the result of one session of playing Roulette.
In order to collect 100 outcomes, we’ll need to create each individual outcome. Each outcome is based on placing and resolving bets.
Resolve Bets. We apply the rules of Roulette to determine if the bet was a winner (and how much it won) or if the bet was a loser.
Before we can resolve a bet, have to spin the wheel. And before we spin the whell, we have to place a bet.
Spin Wheel. We generate a random result. We increase the number of spins we’ve played.
In order for the spin to have any meaning, of course, we’ll need to have some bets placed.
Place Bets. We use our betting strategy to determine what bet we will make and how much we will bet. For example, in the Martingale system, we bet on just one color. We double our bet when we lose and reset our bet to one unit when we win. Note that there are table limits, also, that will limit the largest bet we can place.
When we reverse these steps, we have a very typical program that creates a sequence of samples and analyzes that sequence of samples.
Other typical forms for programs may include reading a sequence of data elements from files, something we’ll turn to in later chapters. Some programs may be part of a web application, and process sequences that come from user input on a web form.
There are four subspecies of sequence:
When we create a tuple, str or Unicode, we’ve created an immutable, or static object. We can examine the object, looking at specific characters or values. We can’t change the object. This means that we can’t put additional data on the end of a str. What we can do, however, is create a new str that is the concatenation of the two original strings.
When we create a list, on the other hand, we’ve created a mutable object. A list can have additional objects appended to it. Objects can be removed from a list, also. The order of the objects can be changed.
One other note on str. While str objects are sequences of characters, there is no separate character data type. A character is treated as a str of length one. This relieves programmers from the C or Java burden of remembering which quotes to use for single characters as distinct from multi-character strings. It also eliminates any problems when dealing with Unicode multi-byte characters separate from US-ASCII single-byte characters.
We call these subspecies because, to an extent, they are interchangeable. It may seem like a sequence of individual characters has little in common with a sequence of complex numbers. However, these two sequence objects do have some common kinds of features. In the next section, we’ll look at all of the features that are common among these sequence subspecies.
A great deal of Python’s internals are sequence-based. Here are just a few examples:
All the varieties of sequences (strings, tuples and lists) have some common characteristics. We’ll look at a bunch of Python language aspects of these pieces of data, including:
Inside a Sequence. Our programs talk about sequences in two senses. Sometimes we talk about the sequence as a whole. Other times we talk about individual elements or subsequences. Naming an element or a subsequence is done with a new operator that we haven’t seen before. We’ll introduce it now, and return to it when we talk about each different kind of sequence.
The [] operator is called a subscription. It puts a subscript after the sequence to identify which specific item or items from the sequence will be used. There are two forms for the [] operator:
The single item format is
sequence [ index ]
This identifies one item based on the position number.
The slice format is
sequence [ start : end ]
This identifies a subsequence of items with positions from start to end -1. This creates a new sequence which is a slice of the original sequence; there will be end - start items in the resulting sequence.
Items are identified by their position numbers. The position numbers start with zero at the beginning of the sequence.
Important
Numbering From Zero
Newbies are often tripped up because items in a sequence are numbered from zero. This leads to a small disconnect between or cardinal numbers and ordinal names.
The ordinal names are words like “first”, “second” and “third”. The cardinal numbers used for these positions are 0, 1 and 2. We have two choices to try and reconcile these two identifiers:
In this book, we’ll use conventional ordinal names starting with “first”, and emphasize that this is position 0 in the sequence.
Positions are also numbered from the end of the sequence as well as the beginning. Position -1 is the last item of the sequence, -2 is the next-to-last item.
Important
Numbering In Reverse
Experienced programmers are often tripped up because Python identifies items in a sequence from the right using negative numbers, as well as from the left using positive numbers. This means that each item in a sequence actually has two numeric indexes.
Here’s a depiction of a sequence of four items. Each item has a position that identifies the item in the sequence. We’ll also show the reverse position numbers.
| forward position | 0 | 1 | 2 | 3 |
| reverse position | -4 | -3 | -2 | -1 |
| item | 3.14159 | 'two words' | 2048 | (1+2j) |
Why do we have two different ways of identifying each position in the sequence? If you want, you can think of it as a handy short-hand. The last item in any sequence, S can be identified by the formula S[ len(S)-1 ]. For example, if we have a sequence with 4 elements, the last item is in position 3. Rather than write S[ len(S)-1 ], Python lets us simplify this to S[-1].
Factory Functions. There are also built-in factory (or “conversion”) functions for the sequence objects. These are ways to create sequences from other kinds of data.
Accesssor Functions. There are several built-in accessor functions which return information about a sequence.
These functions apply to all varieties of lists, strings and tuples.
Enumerate the elements of a sequence, set or mapping. This yields a sequence of tuples based on the original iterable. Each of the tuples has two elements: a sequence number and the item from the original iterable.
This kind of iterator is generally used with a for statement.
This iterates through a iterable (sequence, set or mapping) in ascending or descending sorted order. Unlike a list’s sort() method function, this does not update the list, but leaves it alone.
This kind of iterator is generally used with a for statement.
This iterates through an iterable (sequence, set or mapping) in reverse order.
This kind of iterator generally used with a for statement. Here’s an example:
>>> the_tuple = ( 9, 7, 3, 12 )
>>> for v in reversed( the_tuple ):
... print v
...
12
3
7
9
This creates a new sequence of tuples. Each tuple in the new sequence has values taken from the input sequences.
>>> color = ( "red", "green", "blue" )
>>> level = ( 20, 30, 40 )
>>> zip( color, level )
[('red', 20), ('green', 30), ('blue', 40)]
The following functions don’t apply quite so widely. For example, applying any() or all() to a string is silly and always returns True. Similarly, applying sum() to a sequence that isn’t all numbers is silly and returns an TypeError.
Tuples and Lists.
What is the value in having both immutable sequences (tuples) and mutable sequences (lists)? What are the circumstances under which you would want to change a string? What are the problems associated with strings that grow in length? How can storage for variable length strings be managed?
Unicode Strings.
What is the value in making a distinction between Unicode strings and ASCII strings? Does it improve performance to restrict a string to single-byte characters? Should all strings simply be Unicode strings to make programs simpler? How should file reading and writing be handled?
Statements and Data Structures.
In order to introduce the for statement in The for Statement, we had to dance around the sequence issue. Would it make more sense to introduce the various types of collections first, and then describe statements that process the collections later?
Something has to be covered first, and is therefore more fundamental. Is the processing statement more fundamental to programming, or is the data structure?
Try to avoid extraneous spaces in lists and tuples. Python programs should be relatively compact. Prose writing typically keeps ()’s close to their contents, and puts spaces after commas, never before them. This should hold true for Python, also. The preferred formatting for lists and tuples, then, is (1,2,3) or (1, 2, 3). Spaces are not put after the enclosing [ ] or ( ). Spaces are not put before ,.