# Collecting Items in Sequence¶

In this chapter we’ll cover the common features of the various kinds of collections which keep items in sequence. This will set the stage for the following chapters:

In this section we’ll define what we mean by sequence in Sequence means “In Order”. We’ll talk about designing programs that use sequences in Working With a Sequence. We’ll compare the four kinds of sequences in Subspecies of Sequences. We’ll look at the common features of sequences in Features of a Sequence.

## Sequence means “In Order”¶

A sequence is a collection of individual items. A sequence keeps the items in a specific order, which means we can identify each item by its numerical position within the collection. Some sequences (like the tuple) have a fixed number of elements, with static positions in the sequence. Other sequences (like the list) have a variable number of elements, and possibly dynamic positions in the sequence.

Python has other collections which are not ordered. We’ll get to those in More Data Collections.

Here’s a depiction of a sequence of four items. Each item has a position that identifies the item in the sequence.

 position 0 1 2 3 item 3.14159 'two words' 2048 (1+2j)

Sequences are used internally by Python. A number of statements and functions we have covered have sequence-related features. We’ll revisit a number of functions and statements to add the power of sequences to them. In particular, the for statement is something we glossed over in The for Statement.

The idea that a for statement processes elements in a particular order, and a sequence stores items in order is an important connection. As we learn more about these data structures, we’ll see that the processing and the data are almost inseparable.

It turns out that the range() function that we introduced generates a sequence object. You can see this object when you do the following:

```>>> range(6)
[0, 1, 2, 3, 4, 5]
>>> range(1,7)
[1, 2, 3, 4, 5, 6]
>>> range(2,36,3)
[2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35]
```

We’ll look at the range() function and how it generates list objects in detail in Flexible Sequences : the list.

## Working With a Sequence¶

The typical outline for programs what work with sequences is the following. This is pretty abstract; we’ll follow this outline with a more concrete example.

1. Create the sequence. This may involve reading it from a file, or creating it with some kind of generator.
2. Transform the sequence. This may involve computing new values, using a filter to select values that match a condition, or reducing the sequence to a summary.
3. Produce a final result.

Let’s say that we have a betting strategy for Roulette that we would like to simulate and collect statistics on the strategy’s performance. The verb collect is a hint that we will have a collection of samples, and a sequence is an appropriate type of collection.

Let’s work backwards from our goal and see how we’ll use collections to do this simulation. Once we have all of the necessary steps that lead to our goal, we can just reverse the order of the steps and write our program.

• Print Results. We are done when we have printed the results from our simulation and analysis. In this case, the results are some simple descriptive statistics: the mean (“average”) and the number of samples.

To print the values, we must have computed them.

• Compute Mean. The mean is the sum of the samples divided by the count of the samples. The sum is a reduction from the collection of outcomes, as is the count.

To compute the sum and the count, we must have a collection of individual results from playing Roulette.

• Create Sample Collection. To create the samples, we have to simulate our betting strategy enough times to have meaningful statistics. We’ll use an iteration to create a collection of 100 individual outcomes of playing our strategy. Each outcome is the result of one session of playing Roulette.

In order to collect 100 outcomes, we’ll need to create each individual outcome. Each outcome is based on placing and resolving bets.

• Resolve Bets. We apply the rules of Roulette to determine if the bet was a winner (and how much it won) or if the bet was a loser.

Before we can resolve a bet, have to spin the wheel. And before we spin the whell, we have to place a bet.

• Spin Wheel. We generate a random result. We increase the number of spins we’ve played.

In order for the spin to have any meaning, of course, we’ll need to have some bets placed.

• Place Bets. We use our betting strategy to determine what bet we will make and how much we will bet. For example, in the Martingale system, we bet on just one color. We double our bet when we lose and reset our bet to one unit when we win. Note that there are table limits, also, that will limit the largest bet we can place.

When we reverse these steps, we have a very typical program that creates a sequence of samples and analyzes that sequence of samples.

Other typical forms for programs may include reading a sequence of data elements from files, something we’ll turn to in later chapters. Some programs may be part of a web application, and process sequences that come from user input on a web form.

## Subspecies of Sequences¶

There are four subspecies of sequence:

• The str, which is a collection of the US-ASCII characters. The US-ASCII standard includes the 128 most commonly-used characters.
• The Unicode string, which is a collection of Unicode (or Universal Character Set) characters. The Unicode standard includes just about any character in any of the world’s alphabets.
• The tuple, which is a collection of any kind of Python object. By “any kind of Python object”, we mean any kind of object: numbers, strings, sequences, functions, anything.
• The list, which is a collection of any kind of Python object. The list collection can be altered after it’s created.

When we create a tuple, str or Unicode, we’ve created an immutable, or static object. We can examine the object, looking at specific characters or values. We can’t change the object. This means that we can’t put additional data on the end of a str. What we can do, however, is create a new str that is the concatenation of the two original strings.

When we create a list, on the other hand, we’ve created a mutable object. A list can have additional objects appended to it. Objects can be removed from a list, also. The order of the objects can be changed.

One other note on str. While str objects are sequences of characters, there is no separate character data type. A character is treated as a str of length one. This relieves programmers from the C or Java burden of remembering which quotes to use for single characters as distinct from multi-character strings. It also eliminates any problems when dealing with Unicode multi-byte characters separate from US-ASCII single-byte characters.

We call these subspecies because, to an extent, they are interchangeable. It may seem like a sequence of individual characters has little in common with a sequence of complex numbers. However, these two sequence objects do have some common kinds of features. In the next section, we’ll look at all of the features that are common among these sequence subspecies.

A great deal of Python’s internals are sequence-based. Here are just a few examples:

• The for statement, in particular, expects a sequence, and we often create a list with the range() function.
• When we split a str using the split() method, we get a list of substrings.
• When we define a function, we can have positional parameters collected into a sequence, something we’ll cover in :ref`data.map`.

## Features of a Sequence¶

All the varieties of sequences (strings, tuples and lists) have some common characteristics. We’ll look at a bunch of Python language aspects of these pieces of data, including:

• There is a syntax for writing the kind of sequence. Strings, for example, are surrounded by quotes.
• There are operations that we can apply to a sequence. Strings, for example, can be concatenated using the + operator.
• Some built-in functions are appropriate for different kinds of sequences. In particular, each kind of sequence has an appropriate factory function with obvious names like str(), unicode(), list(), and tuple().
• There are rules for how the comparison operators apply between two sequences.
• A sequence object has specific methods. Some methods are generic, and all sequences offer them. Other methods are unique to that kind of sequence.
• Some of the Python statements interact with sequences. We’ll have to revisit some statement descriptions to explain how the statements make use of sequences.
• In some cases, there are library modules that work with this kind of sequence.

Inside a Sequence. Our programs talk about sequences in two senses. Sometimes we talk about the sequence as a whole. Other times we talk about individual elements or subsequences. Naming an element or a subsequence is done with a new operator that we haven’t seen before. We’ll introduce it now, and return to it when we talk about each different kind of sequence.

The [] operator is called a subscription. It puts a subscript after the sequence to identify which specific item or items from the sequence will be used. There are two forms for the [] operator:

• The single item format is

```sequence [ index ]
```

This identifies one item based on the position number.

• The slice format is

```sequence [  start :  end ]
```

This identifies a subsequence of items with positions from start to end -1. This creates a new sequence which is a slice of the original sequence; there will be end - start items in the resulting sequence.

Items are identified by their position numbers. The position numbers start with zero at the beginning of the sequence.

Important

Numbering From Zero

Newbies are often tripped up because items in a sequence are numbered from zero. This leads to a small disconnect between or cardinal numbers and ordinal names.

The ordinal names are words like “first”, “second” and “third”. The cardinal numbers used for these positions are 0, 1 and 2. We have two choices to try and reconcile these two identifiers:

• Remember that the ordinal names are always one too big. The “third” item is in position “2”.
• Try to use the word “zeroth” (or “zeroeth”) for the item in position 0.

In this book, we’ll use conventional ordinal names starting with “first”, and emphasize that this is position 0 in the sequence.

Positions are also numbered from the end of the sequence as well as the beginning. Position -1 is the last item of the sequence, -2 is the next-to-last item.

Important

Numbering In Reverse

Experienced programmers are often tripped up because Python identifies items in a sequence from the right using negative numbers, as well as from the left using positive numbers. This means that each item in a sequence actually has two numeric indexes.

Here’s a depiction of a sequence of four items. Each item has a position that identifies the item in the sequence. We’ll also show the reverse position numbers.

 forward position 0 1 2 3 reverse position -4 -3 -2 -1 item 3.14159 'two words' 2048 (1+2j)

Why do we have two different ways of identifying each position in the sequence? If you want, you can think of it as a handy short-hand. The last item in any sequence, S can be identified by the formula S[ len(S)-1 ]. For example, if we have a sequence with 4 elements, the last item is in position 3. Rather than write S[ len(S)-1 ], Python lets us simplify this to S[-1].

Factory Functions. There are also built-in factory (or “conversion”) functions for the sequence objects. These are ways to create sequences from other kinds of data.

str(object) → string
Creates a string from the object. This provides a human-friendly string representation of really complex objects. There is another string factory function, repr, which creates a Python-friendly representation of an object. We’ll return to this in Sequences of Characters : str and Unicode.
unicode(object) → unicode
Creates a Unicode string from the object.
list(sequence) → list
Return a new list whose items are the same as those of the argument sequence. Generally, this is used to convert immutable tuples to mutable lists.
tuple(sequence) → tuple
Return a new tuple whose items are the same as those of the argument sequence. If the argument is a tuple, the return value is the same object. Generally, this is used to convert mutable lists into immutable tuples.

Accesssor Functions. There are several built-in accessor functions which return information about a sequence.

These functions apply to all varieties of lists, strings and tuples.

min(iterable) → item
Return the item which is least in the iterable (sequence, set or mapping).
max(iterable) → item
Return the item which is greatest in the iterable (sequence, set or mapping).
len(iterable) → number
Return the number of elements in the iterable (sequence, set or mapping).
enumerate(iterable) → iterator

Enumerate the elements of a sequence, set or mapping. This yields a sequence of tuples based on the original iterable. Each of the tuples has two elements: a sequence number and the item from the original iterable.

This kind of iterator is generally used with a for statement.

sorted(iterable[, cmp][, key][, reverse]) → iterator

This iterates through a iterable (sequence, set or mapping) in ascending or descending sorted order. Unlike a list’s sort() method function, this does not update the list, but leaves it alone.

This kind of iterator is generally used with a for statement.

reversed(iterable) → iterator

This iterates through an iterable (sequence, set or mapping) in reverse order.

This kind of iterator generally used with a for statement. Here’s an example:

```>>> the_tuple = ( 9, 7, 3, 12 )
>>> for v in reversed( the_tuple ):
...     print v
...
12
3
7
9
```
zip(sequence, ...) → sequence

This creates a new sequence of tuples. Each tuple in the new sequence has values taken from the input sequences.

```>>> color = ( "red", "green", "blue" )
>>> level = ( 20, 30, 40 )
>>> zip( color, level )
[('red', 20), ('green', 30), ('blue', 40)]
```

The following functions don’t apply quite so widely. For example, applying any() or all() to a string is silly and always returns True. Similarly, applying sum() to a sequence that isn’t all numbers is silly and returns an TypeError.

sum(iterable) → number
Sum the values in the iterable (set, sequence, mapping). All of the values must be numeric.
all(iterable) → boolean
Return True if all values in the iterable (set, sequence, mapping) are equivalent to True.
any(iterable) → boolean
Return True if any value in the iterable (set, sequence, mapping) is equivalent to True.

## Sequence Exercises¶

1. Tuples and Lists.

What is the value in having both immutable sequences (tuples) and mutable sequences (lists)? What are the circumstances under which you would want to change a string? What are the problems associated with strings that grow in length? How can storage for variable length strings be managed?

2. Unicode Strings.

What is the value in making a distinction between Unicode strings and ASCII strings? Does it improve performance to restrict a string to single-byte characters? Should all strings simply be Unicode strings to make programs simpler? How should file reading and writing be handled?

3. Statements and Data Structures.

In order to introduce the for statement in The for Statement, we had to dance around the sequence issue. Would it make more sense to introduce the various types of collections first, and then describe statements that process the collections later?

Something has to be covered first, and is therefore more fundamental. Is the processing statement more fundamental to programming, or is the data structure?

## Style Notes¶

Try to avoid extraneous spaces in lists and tuples. Python programs should be relatively compact. Prose writing typically keeps ()’s close to their contents, and puts spaces after commas, never before them. This should hold true for Python, also. The preferred formatting for lists and tuples, then, is (1,2,3) or (1, 2, 3). Spaces are not put after the enclosing [ ] or ( ). Spaces are not put before ,.