Sequences: Strings, Tuples and Lists

The Common Features of Sequences

Before digging into the details, we’ll introduce the common features of three of the data types that are containers for sequences of values.

In Sequence Semantics we will provide an overview of the semantics of sequences. We describes the common features of the sequences in Overview of Sequences.

The sequence is central to programming and central to Python. A number of statements and functions we have covered have sequence-related features that we have glossed over, danced around, and generally avoided.

We’ll revisit a number of functions and statements we covered in previous sections, and add the power of sequences to them. In particular, the for statement is something we glossed over in Iterative Processing: For All and There Exists.

In the chapters that follow we’ll look at Strings, Tuples and Lists in detail. In Mappings and Dictionaries , we’ll introduce another structured data type for manipulating mappings between keys and values.

Sequence Semantics

A sequence is a container of objects which are kept in a specific order. We can identify the individual objects in a sequence by their position or index. Positions are numbered from zero in Python; the element at index zero is the first element.

We call these containers because they are a single object which contains (or collects) any number of other objects. The “any number” clause means that they can contain zero other objects, meaning that an empty container is just as valid as a container with one or thousands of objects.

Important

Other Languages

In some programming languages, they use words like “vector” or “array” to refer to sequential containers. For example, in C or Java, the primitive array has a statically allocated number of positions. In Java, a reference outside that specific number of positions raises an exception. In C, however, a reference outside the defined positions of an array is an error that may never be detected. Really.

There are four commonly-used subspecies of sequence containers.

  • String, called str. A container of single-byte ASCII characters.
  • Unicode String, unicode. A container of multi-byte Unicode (or Universal Character Set) characters.
  • tuple. A container of anything with a fixed number of elements.
  • list. A container of anything with a dynamic number of elements.

Important

Python 3

This mix of types will change slightly.

The String and Unicode types will merge into the str type. This will represent text.

A new container, the “byte array” will be introduced, named bytes. This will represent binary data.

tuple and list won’t change.

When we create a tuple or string , we’ve created an immutable, or static object. We can examine the object, looking at specific characters or items. We can’t change the object. This means that we can’t put additional data on the end of a string. What we can do, however, is create a new string that is the concatenation of the two original string objects.

When we create a list, on the other hand, we’ve created a mutable object. A list can have additional objects appended to it or inserted in it. Objects can be removed from a list, also. A list can grow and shrink; the order of the objects in the list can be changed without creating a new list object.

One other note on string. While string are sequences of characters, there is no separate character data type. A character is simply a string of length one. This relieves programmers from the C or Java burden of remembering which quotes to use for single characters as distinct from multi-character string. It also eliminates any problems when dealing with Unicode multi-byte characters.

Overview of Sequences

All the varieties of sequences (string, tuple and list) have some common characteristics. We’ll identify the common features first, and then move on to cover these in detail for each individual type of sequence. This section is a road-map for the following three sections that cover string, tuple and list in detail.

Literal Values. Each sequence type has a literal representation. The details will be covered in separate sections, but the basics are these:

  • string uses quotes: "string".
  • tuple uses (): (1,'b',3.1).
  • list uses []: [1,'b',3.1].

Operations. Sequences have three common operations: + will concatenate sequences to make longer sequences. * is used with a number and a sequence to repeat the sequence several times. Finally, the [ ] operator is used to select elements from a sequence.

The [ ] operator can extract a single item, or a subset of items by slicing. There are two forms of [].

  • The single item format is sequence [ index ]. Items are numbered from 0.
  • The slice format is sequence [ start : end ]. Items from start to end -1 are chosen to create a new sequence; it will be a slice of the original sequence. There will be end-start items in the resulting sequence.

Positions can be numbered from the end of the string as well as the beginning. Position -1 is the last item of the sequence, -2 is the next-to-last item.

Here’s how it works: each item has a positive number position that identifies the item in the sequence. We’ll also show the negative position numbers for each item in the sequence. For this example, we’re looking at a four-element sequence like the tuple (3.14159,"two words",2048,(1+2j)) .

forward position 0 1 2 3
reverse position -4 -3 -2 -1
item 3.14159 “two words” 2048 (1+2j)

Why do we have two different ways of identifying each position in the sequence? If you want, you can think of it as a handy short-hand. The last item in any sequence, S can be identified by the formula S[ len(S)-1 ] . For example, if we have a sequence with 4 elements, the last item is in position 3. Rather than write S[ len(S)-1 ], Python lets us simplify this to S[-1] .

You can see how this works with the following example.

>>> a=(3.14159,"two words",2048,(1+2j))
>>> a[0]
3.1415899999999999
>>> a[-3]
'two words'
>>> a[2]
2048
>>> a[-1]
(1+2j)

Built-in Functions. len(), max() and min() apply to all varieties of sequences. We’ll provide the definitions here and refer to them in various class definitions.

len(sized_collection) → integer

Return the number of items of the collection. This can be any kind of sized collection. All sequences and mappings are subclasses of collections.Sized and provide a length.

Here are some examples.

>>> len("Wednesday")
9
>>> len( (1,1,2,3) )
4
max(iterable_collection) → object

Returns the largest value in the iterable collection. All sequences and mappings are subclasses of collections.Iterable; the max() function can iterate over elements and locate the largest.

>>> max( (1,2,3) )
3
>>> max('abstractly')
'y'

Note that max() can also work with a number of individual arguments instead of a single iterable collection argument value. We looked a this in Collection Functions.

min(iterable_collection) → object

Returns the smallest value in the iterable collection. All sequences and mappings are subclasses of collections.Iterable; the max() function can iterate over elements and locate the smallest.

>>> min( (10,11,2) )
2
>>> min( ('10','11','2') )
'10'

Note that strings are compared alphabetically. The min() (and max() function can’t determine that these are supposed to be evaluated as numbers.)

sum(iterable_collection[, start=0]) → number

Return the sum of the items in the iterable collection. All sequences and mappings are subclasses of collections.Iterable.

If start is provided, this is the initial value for the sum, otherwise 0 is used.

If the values being summed are not all numeric values, this will raise a TypeError exception.

>>> sum( (1,1,2,3,5,8) )
20
>>> sum( (), 3 )
3
>>> sum( (1,2,'not good') )
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
any(iterable_collection) → boolean
Return True if there exists an item in the iterable collection which is True. All sequences and mappings are subclasses of collections.Iterable.
all(iterable_collection) → boolean
Return True if all items in the iterable collection are True. All sequences and mappings are subclasses of collections.Iterable.
enumerate(iterable_collection) → iterator

Iterates through the iterable collection returning 2-tuples of ( index, item ).

>>> for position, item in enumerate( ('word',3.1415629,(2+3j) ) ):
...     print position, item
...
0 word
1 3.1415629
2 (2+3j)
sorted(sequence[, key=None][, reverse=False]) → iterator

This returns an iterator that steps through the elements of the iterable container in ascending order.

If the reverse keyword parameter is provided and set to True, the container is iterated in descending order.

The key parameter is used when the items in the container aren’t simply sorted using the default comparison operators. The key function must return the fields to be compared selected from the underlying objects in the tuple.

We’ll look at this in detail in Functional Programming with Collections.

reversed(sequence) → iterator

This returns an iterator that steps through the elements in the iterable container in reverse order.

>>> tuple( reversed( (9,1,8,2,7,3) ) )
(3, 7, 2, 8, 1, 9)

Comparisons. The standard comparisons (<, <=, >, <=, ==, !=) apply to sequences. These all work by doing item-by-item comparison within the two sequences. The item-by-item rule results in strings being sorted alphabetically, and tuples and lists sorted in a way that is similar to strings.

There are two additional comparisons: in and not in. These check to see if a single value occurs in the sequence. The in operator returns a True if the item is found, False if the item is not found. The not in operator returns True if the item is not found in the sequence.

Methods. The string and list classes have method functions that operate on the object’s value. For instance "abc".upper() executes the upper() method belonging to the string literal "abc". The result is a new string, 'ABC'. The exact dictionary of methods is unique to each class of sequences.

Statements. The tuple and list classes are central to certain Python statements, like the assignment statement and the for statement. These were details that we skipped over in The Assignment Statement and Iterative Processing: For All and There Exists.

Modules. There is a string module with several string specific functions. Most of these functions are now member functions of the string type. Additionally, this module has a number of constants to define various subsets of the ASCII character set, including digits, printable characters, whitespace characters and others.

Factory Functions. There are also built-in factory (or conversion) functions for the sequence objects. We’ve looked at some of these already, when we looked at str() and repr().

Exercises

  1. Tuples and Lists. What is the value in having both immutable sequences (tuple) and mutable sequences (list)? What are the circumstances under which you would want to change a string? What are the problems associated with a string that grows in length? How can storage for variable length string be managed?

  2. Unicode Strings. What is the value in making a distinction between Unicode strings and ASCII strings? Does it improve performance to restrict a string to single-byte characters? Should all strings simply be Unicode strings to make programs simpler? How should file reading and writing be handled?

  3. Statements and Data Structures. In order to introduce the for statement in Iterative Processing: For All and There Exists, we had to dance around the sequence issue. Would it make more sense to introduce the various sequence data structures first, and then describe statements that process the data structure later?

    Something has to be covered first, and is therefore more fundamental. Is the processing statement more fundamental to programming, or is the data structure?

Style Notes

Try to avoid extraneous spaces in list and tuple displays. Python programs should be relatively compact. Prose writing typically keeps ()’s close to their contents, and puts spaces after commas, never before them. This should hold true for Python, also.

The preferred formatting for a list or tuple, then, is [1,2,3] or (1, 2, 3). Spaces are not put after the initial [ or (. Spaces are not put before ,.

Table Of Contents

Previous topic

Additional Notes On Functions

Next topic

Strings

This Page