Mappings and Dictionaries

Many algorithms need to map a key to a data value. This kind of mapping is supported by the Python dictionary, dict. We’ll look at dictionaries from a number of viewpoints: semantics, literal values, operations, comparison operators, statements, built-in functions and methods.

We are then in a position to look at two applications of the dictionary. We’ll look at how Python uses dictionaries along with sequences to handle arbitrary connections of parameters to functions in Advanced Parameter Handling For Functions. This is a very sophisticated set of tools that let us define functions that are very flexible and easy to use.

Dictionary Semantics

A dictionary, called a dict, maps a key to a value. The key can be any type of Python object that computes a consistent hash value. The value referenced by the key can be any type of Python object.

There is a subtle terminology issue here. Python has provisions for creating a variety of different types of mappings. Only one type of mapping comes built-in; that type is the dict. The terms are almost interchangeable. However, you may develop or download other types of mappings, so we’ll be careful to focus on the dict class.

Working with a dict is similar to working with a sequence. Items are inserted into the dict, found in the dict and removed from the dict. A dict object has member methods that return a sequence of keys, or values, or ( key , value ) tuples suitable for use in a for statement.

Unlike a sequence, a dict does not preserve order. Instead of order, a dict uses a hashing algorithm to identify each item’s place in the dict with a rapid calculation of the key’s hash value. The built-in function, hash() is used to do this calculation. Items in the dict are inserted in an position related to their key’s apparently random hash values.

Some Alternate Terminology. A dict can be thought of as a container of ( key : value ) pairs. This can be a helpful way to imagine the information in a mapping. Each pair in the list is the mapping from a key to an associated value.

A dict can be called an associative array. Ordinary arrays, typified by sequences, use a numeric index, but a dict‘s index is made up of the key objects. Each key is associated with (or “mapped to”) the appropriate value.

Some Consequences. Each key object has a hash value, which is used to place the value in the dict. Consequently, the keys must have consistent hash values; they must be immutable objects. You can’t use list, dict or set objects as keys. You can use tuple, string and frozenset objects, since they are immutable. Additionally, when we get to class definitions (in Classes), we can make arrangements for our objects to return an immutable hash value.

A common programming need is a heterogeneous container of data. Database records are an example. A record in a database might have a boat’s name (as a string), the length overall (as a number) and an inventory of sails (a list of strings). Often a record like this will have each element (known as a field) identified by name.

A C or C++ struct accomplishes this. This kind of named collection of data elements may be better handled with a class (someting we’ll cover in Classes) or a named tuple (see collections). However, a mapping is also useful for managing this kind of heterogeneous data with named fields.

Note that many languages make record definitions a statically-defined container of named fields. A Python dict is dynamic, allowing us to add field names at run-time.

A common alternative to hashing is using some kind of ordered structure to maintain the keys. This might be a tree or list, which would lead to other kinds of mappings. For example, there is an ordered dictionary, in the Python collections module.

Dictionary Literal Values

A dict literal is created by surrounding a key-value list with {}; a key is separated from its value with :, and the key : value pairs are separated with commas (,). An empty dict is simply {}. As with list and tuple, an extra , inside the {} is tolerated.

Examples:

diceRoll = { (1,1): "snake eyes", (6,6): "box cars" }
myBoat = { "NAME":"KaDiMa", "LOA":18, "SAILS":["main","jib","spinnaker"] }
theBets = { }
diceRoll:This is a dict with two elements. One element has a key of a tuple (1,1) and a value of a string, "snake eyes". The other element has a key of a tuple (6,6) and a value of a string "box cars".
myBoat:This variable is a dict with three elements. One element has a key of the string "NAME" and a value of the string "KaDiMa". Another element has a key of the string "LOA" and a value of the integer 18. The third element has a key of the string "SAILS" and the value of a list ["main", "jib", "spinnaker"].
theBets:An empty dict.

The values and keys in a dict do not have to be the same type. Keys must be a type that can produce a hash value. Since list s and dict objects are mutable, they are not permitted as keys. All other non-mutable types (especially string, frozenset and tuple) are legal keys.

Dictionary Operations

A dict only permits a single operation: []. This is used to add, change or retrieve items from the dict. The slicing operations that apply to sequences don’t apply to a dict.

Examples of dict operations.

>>> d= {}
>>> d[2] = [ (1,1) ]
>>> d[3] = [ (1,2), (2,1) ]
>>> d
{2: [(1, 1)], 3: [(1, 2), (2, 1)]}
>>> d[2]
[(1, 1)]
>>> d["2 or 3"] = d[2] + d[3]
>>> d
{'2 or 3': [(1, 1), (1, 2), (2, 1)], 2: [(1, 1)], 3: [(1, 2), (2, 1)]}
  1. This example starts by creating an empty dict, d.
  2. Into d[2] we insert a list with a single tuple.
  3. Into d[3] we insert a list with two tuples.
  4. When the entire dict is printed it shows the two key:value pairs, one with a key of 2 and another with a key of 3.
  5. The entry with a key of the string "2 or 3" has a value which is computed from the values of d[2] + d[3]. Since these two entries are lists, the lists can be combined with the + operator. The resulting expression is stored into the dict.
  6. When we print d, we see that there are three key:value pairs: one with a key of 3, one with a key of 2 and one with a key of "2 or 3" .

This ability to use any object as a key is a powerful feature, and can eliminate some needlessly complex programming that might be done in other languages.

Here are some other examples of picking elements out of a dict.

>>> myBoat = { "NAME":"KaDiMa", "LOA":18,
...      "SAILS":["main","jib","spinnaker"] }
>>> myBoat["NAME"]
'KaDiMa'
>>> myBoat["SAILS"].remove("spinnaker")
>>> myBoat
{'LOA': 18, 'NAME': 'KaDiMa', 'SAILS': ['main', 'jib']}

String Formatting with Dictionaries. The string formatting operator, %, can be applied between str and dict as well as str and sequence. When this operator was introduced in Strings, the format specifications were applied to a tuple or other sequence. When used with a dict, each format specification is given an additional option that specifies which dict element to use. The general format for each conversion specification is:

%( element ) [ flags ][
width [ . precision ]] code

The flags, width, precision and code elements are defined in Strings. The element field must be enclosed in ()’s; this is the key to be selected from the dict.

For example:

print "%(NAME)s, %(LOA)d" % myBoat

This will find myBoat[NAME] and use %s formatting; it will find myBoat[LOA] and use %d number formatting.

Dictionary Comparison Operations

Some of the standard comparisons ( < , <= , > , >=, == , != ) don’t have a lot of meaning between two dictionaries. Since there may be no common keys, nor even a common data type for keys, dictionaries are simply compared by length. The dict with fewer elements is considered less than a dict with more elements.

The membership comparisons (in, not in) apply to the keys of the dictionary.

>>> colors = { "blue": (0x30,0x30,0xff), "green": (0x30,0xff,0x97),
... "red": (0xff,0x30,0x97), "yellow": (0xff,0xff,0x30) }
>>> "blue" in colors
True
>>> (0x30,0x30,0xff) in colors
False
>>> "orange" not in colors
True

Dictionary Statements

There are a number of statements that have specific features related to dict objects.

The for Statement. The for statement iterates through the keys of the dictionary.

>>> colors = { "blue": (0x30,0x30,0xff), "green": (0x30,0xff,0x97),
... "red": (0xff,0x30,0x97), "yellow": (0xff,0xff,0x30) }
>>> for c in colors:
...     print c, colors[c]

It’s common to use some slightly different techniques for iterating through the elements of a dict.

  • The key:value pairs. We can use the items() method to iterate through the sequence of 2-tuples that contain each key and the associated value.

    for key, value in someDictionary.items():
        # process key and value
  • The values. We can use the values() method to iterate through the sequence of values in the dictionary.

    for value in someDictionary.values():
        # process the value

    Note that we can’t easily determine the associated key. A dictionary only goes one way: from key to value.

  • The keys. We can use the keys() method to iterate through the sequence of keys. This is what happens when we simply use the dictionary object in the for statement.

Here’s an example of using the key:value pairs.

>>> myBoat = { "NAME":"KaDiMa", "LOA":18,
...     "SAILS":["main","jib","spinnaker"] }
>>> for key, value in myBoat.items():
...     print key, "=", value
...
LOA = 18
NAME = KaDiMa
SAILS = ['main', 'jib', 'spinnaker']

The del Statement. The del statement removes items from a dict . For example

>>> i = { "two":2, "three":3, "quatro":4 }
>>> del i["quatro"]
>>> i
{'two': 2, 'three': 3}

In this example, we use the key to remove the item from the dict.

The member function, pop(), does this also.

>>> i = { "two":2, "three":3, "quatro":4 }
>>> i.pop("quatro")
4
>>> i
{'two': 2, 'three': 3}

Dictionary Built-in Functions

Here are the built-in functions that deal with dictionaries.

dict([values][, key=value...]) → dictionary

Creates a new dictionary. If a positional parameter, values is provided, each element must be a 2-tuple. The values pairs are used to populate the dictionary; the first element of each pair is the key and the second element is the value.

Note that the zip() function produces a list of 2-tuples from two parallel lists.

If any keyword parameters are provided, each keyword becomes a key in the dictionary and the keyword argument becomes the value for that key.

>>> dict( [('first',0), ('second',1),('third',2)] )
{'second': 1, 'third': 2, 'first': 0}
>>> dict( zip(['fourth','fifth','sixth'],[3,4,5]) )
{'sixth': 5, 'fifth': 4, 'fourth': 3}
>>> dict( seventh=7, eighth=8, ninth=9 )
{'seventh': 7, 'eighth': 8, 'ninth': 9}

Functions which apply to dicts, but are defined elsewhere.

  • len(). For dicts, this function returns the number of items.

    >>> len( {1:'first',2:'second',3:'third'} )
    3
    >>> len( {} )
    0
    
  • max(). For dicts, this function returns the maximum key.

    >>> max( {1:'first',2:'second',3:'third'} )
    3
    
  • min(). For dicts, this function returns the minimum key.

  • sum(). For dicts, this function sums the keys.

    >>> sum( {1:'first',2:'second',3:'third'} )
    6
    
  • any(). Equivalent to any( dictionary.keys() ). Return True if any key in the dictionary are True, or equivalent to True. This is almost always true except for empty dictionaries or a peculiar dictionary with keys of 0, False, None, etc.

  • all(). Equivalent to all( dictionary.keys() ). Return True if all keys in the dictionary are True, or equivalent to True.

    >>> all( {1:'first',2:'second',3:'third'} )
    True
    >>> all( {1:'first',2:'second',None:'error'} )
    False
    
  • enumerate(). Iterate through the dictionary returning 2-tuples of ( index, key ). This iterates through the key values. Since dictionaries have no explicit ordering to their keys, this enumeration is in an arbitrary order.

  • sorted(). Iterate through the dictionary keys in sorted order. The keys are actually a list, and this returns a list of the sorted keys.

    >>> sorted( { "two":2, "three":3, "quatro":4 } )
    ['quatro', 'three', 'two']
    

Dictionary Methods

A dict object has a number of member methods. Many of these maintain the values in a dict . Others retrieve parts of the dict as a sequence, for use in a for statement.

The following mutator functions update a dict object. Most of these do not return a value. The dict.pop() and dict.setdefault() methods both update the dictionary and return values.

dict.clear()
Remove all items from the dict.
dict.pop(key[, default]) → value
Remove the given key from the dict, returning the associated value. If the key does not exist, return the default value provided. If the key does not exist and no default value exists, raise a KeyError exception.
dict.setdefault(key[, default]) → value
If the key is in the dictionary, return the associated value. If the key is not in the dictionary, set the given default as the value and return this value. If default is not given, it defaults to None.
dict.update(new[, key=value...])

Merge values from the new new into the original dict, adding or replacing as needed.

It is equivalent to the following Python statement. for k in new.keys(): d[k]= new[k]

If any keyword parameters are provided, each keyword becomes a key in the dictionary and the keyword argument becomes the value for that key.

>>> x= dict( seventh=7, eighth=8, ninth=9 )
>>> x
{'seventh': 7, 'eighth': 8, 'ninth': 9}
>>> x.update( first=1 )
>>> x
{'seventh': 7, 'eighth': 8, 'ninth': 9, 'first': 1}

The following transformer function transforms a dictionary into another object.

dict.copy() → dictionary
Copy the dict to make a new dict. This is a shallow copy. All objects in the new dict are references to the same objects as the original dict.

The following accessor methods provide information about a dict.

dict.get(key[, default]) → value
Get the item with the given key, similar to dict[key]. If the key is not present and default is given, supply default instead. If the key is not present and no default is given, raise the KeyError exception.
dict.items() → sequence
Return all of the items in the dict as a sequence of (key,value) 2-tuples. Note that these are returned in no particular order.
dict.keys() → sequence
Return all of the keys in the dict as a sequence of keys. Note that these are returned in no particular order.
dict.values() → sequence
Return all the values from the dict as a sequence. Note that these are returned in no particular order.

Using Dictionaries as Function Parameter Defaults

It’s very, very important to note that default values must be immutable objects. Recall that numbers, strings, None, and tuple objects are immutable.

We note that dictionaries as well as sets and lists are mutable, and cannot be used as default values for function parameters.

Consider the following example of what not to do.

>>> def default2( someDict={} ):
...     someDict['default']= 2
...     return someDict
...
>>> looks_good= {}
>>> default2(looks_good)
{'default': 2}
>>> default2(looks_good)
{'default': 2}
>>> looks_good
{'default': 2}
>>>
>>>
>>> not_good= default2()
>>> not_good
{'default': 2}
>>> worse= default2()
>>> worse
{'default': 2}
>>> not_good
{'default': 2}
>>>
>>> not_good['surprise']= 'what?'
>>> not_good
{'default': 2, 'surprise': 'what?'}
>>> worse
{'default': 2, 'surprise': 'what?'}
  1. We defined a function which has a default value that’s a mutable object. This is simple a bad programming practice in Python.

  2. We used this function with a dictionary object, looks_good. The function updated the dictionary object as expected.

  3. We used the function’s default value to create not_good. The function inserted a value into an empty dictionary and returned this new dictionary object.

    It turns out that the function updated the mutable default value, also.

  4. When we use the function’s default value again, with worse, the function uses the updated default value and updates it again.

    Both not_good and worse are references to the same mutable object that is being updated.

To avoid this, do not use mutable values as defaults. Do this instead.

def default2( someDict=None ):
    if someDict is None:
        someDict= {}
    someDict['default']= 2
    return someDict

This creates a fresh new mutable object as needed.

Dictionary Exercises

  1. Word Frequencies. Update the exercise in Accumulating Unique Values to count each occurance of the values in aSequence. Change the result from a simple sequence to a dict. The dict key is the value from aSequence. The dict value is the count of the number of occurances.

    If this is done correctly, the input sequence can be words, numbers or any other immutable Python object, suitable for a dict key.

    For example, the program could accept a line of input, discarding punctuation and breaking them into words in space boundaries. The basic string operations should make it possible to create a simple sequence of words.

    Iterate through this sequence, placing the words into a dict. The first time a word is seen, the frequency is 1. Each time the word is seen again, increment the frequency. Produce a frequency table.

    To alphabetize the frequency table, extract just the keys. A sequence can be sorted (see section 6.2). This sorted sequence of keys can be used to extract the counts from the dict.

  2. Stock Reports. A block of publicly traded stock has a variety of attributes, we’ll look at a few of them. A stock has a ticker symbol and a company name. Create a simple dict with ticker symbols and company names.

    For example:

    stockDict = { 'GM': 'General Motors',
     'CAT':'Caterpillar', 'EK':"Eastman Kodak" }
    

    Create a simple list of blocks of stock. These could be tuple s with ticker symbols, prices, dates and number of shares. For example:

    purchases = [ ( 'GE', 100, '10-sep-2001', 48 ),
     ( 'CAT', 100, '1-apr-1999', 24 ),
     ( 'GE', 200, '1-jul-1999', 56 ) ]
    

    Create a purchase history report that computes the full purchase price (shares times dollars) for each block of stock and uses the stockDict to look up the full company name. This is the basic relational database join algorithm between two tables.

    Create a second purchase summary that which accumulates total investment by ticker symbol. In the above sample data, there are two blocks of GE. These can easily be combined by creating a dict where the key is the ticker and the value is the list of blocks purchased. The program makes one pass through the data to create the dict. A pass through the dict can then create a report showing each ticker symbol and all blocks of stock.

  3. Date Decoder. A date of the form 8-MAR-85 includes the name of the month, which must be translated to a number. Create a dict suitable for decoding month names to numbers. Create a function which uses string operations to split the date into 3 items using the “-” character. Translate the month, correct the year to include all of the digits.

    The function will accept a date in the “dd-MMM-yy” format and respond with a tuple of ( y , m, d ).

  4. Dice Odds. There are 36 possible combinations of two dice. A simple pair of loops over range(6)+1 will enumerate all combinations. The sum of the two dice is more interesting than the actual combination. Create a dict of all combinations, using the sum of the two dice as the key.

    Each value in the dict should be a list of tuple s; each tuple has the value of two dice. The general outline is something like the following:

    Enumerate Dice Combinations

    Initialize. combos \gets dict()

    For all d1. Iterate with 1 \leq d_1 < 7.

    For all d2. Iterate with 1 \leq d_2 < 7.

    Create a Tuple. t \gets ( d_1, d_2 ).

    In the Dictionary. Is d_1+d_2 a key in combos?

    Append. Append the tuple, t to the value for item d_1+d_2 in combos.

    Not In the Dictionary. If d_1+d_2 is not a key in combos, then

    Insert. Add a new element d_1+d_2 to the combos; the value is a 1-element list of the tuple, t.

    Report. Display the resulting dictionary.

Advanced Parameter Handling For Functions

In More Function Definition Features we hinted that Python functions can handle a variable number of argument values in addition to supporting optional argument values.

When we define a function, we can have optional parameters. We define a fixed number of parameters, but some (or all) can be omitted because we provided default values for them. This allows us to provide too few positional argument values.

If we provide too many positional argument values to a function, however, Python raises an exception. It turns out that we can also handle this.

Consider the following example. We defined a function of three positional parameters, and then evaluated it with more than three argument values.

>>> def avg(a,b,c): return (a+b+c)/3.0
...
>>> avg(10,11,12)
11.0
>>> avg(10,11,12,13)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: avg() takes exactly 3 arguments (4 given)

First, we’ll look at handling an unlimited number of positional values. Then we’ll look at handling an unlimited number of keyword values.

Unlimited Number of Positional Argument Values

Python lets us define a function that handles an unknown and unlimited number of argument values. Examples of built-in functions with a unlimited number of argument values are max() and min().

Rather than have Python raise an exception for extra argument values, we can request the additional positional argument values be collected into a tuple. To do this, we provide a final parameter definition of the form * extras. The * indicates that this parameter variable is the place to capture the extra argument values. The variable, here called extras, will receive a sequence with all of the extra positional argument values.

You can only provide one such variable (if you provided two, how could Python decide which of these two got the extra argument values?) You must provide this variable after the ordinary positional parameters in the function definition.

The following function accepts an unlimited number of positional arguments; it collects these in a single tuple parameter, args.

def myMax( *args ):
    max= args[0]
    for a in args[1:]:
        if a > max: max= a
    return max

Here’s another example. In this case we have a fixed parameter in the first position and all the extra parameters collected into a tuple called vals.

def printf( format, *vals ):
    print format % vals

This should look familiar to C programmers. Now we can write the following, which may help ease the transition from C to Python.

printf( "%s = %d", "some string", 2 )
printf( "%s, %s, %d %d", "thing1", "thing2", 3, 22 )

Unlimited Number of Keyword Argument Values

In addition to collecting extra positional argument values into a single parameter, Python can also collect extra keyword argument values into a dict.

If you want a container of keyword arguments, you provide a parameter of the form ** extras. Your variable, here called extras, will receive a dict with all of the keyword parameters.

The following function accepts any number of keyword arguments; they are collected into a single parameter.

def rtd( **args ):
    if "rate" in args and "time" in args:
        args['distance'] = args['rate']*args['time']
    elif "rate" in args and "distance" in args:
        args['time']= args['distance']/args['rate']
    elif "time" in args and "distance" in args:
        args['rate']= args['distance']/args['time']
    else:
        raise Exception( "%r does not compute" % ( args, ) )
    return args

Here’s two examples of using this rtd() function.

>>> rtd( rate=60.0, time= .75 )
{'distance': 45.0, 'rate': 60.0, 'time': 0.75}
>>> rtd( distance=173, time=2+50/60.0 )
{'distance': 173, 'rate': 61.058823529411761, 'time': 2.8333333333333335}

The keyword arguments are collected into a dict, named args. We check for combinations of “rate”, “time” and “distance” in the args dictionary. For each combination, we can solve for the remaining value and update the dict by insert the additional key and value into the dict.

Evaluation with a Container Instead of Individual Argument Values

When evaluating a function, we can provide a sequence instead of providing individual positional parameters.

We do this with a special version of the * operator when evaluating a function. Here’s an example of forcing a 3- tuple to be assigned to three positional parameters.

>>> def avg3( a, b, c ):
...     return (a+b+c)/3.0
...
>>> data= ( 4, 3, 2 )
>>> avg3( *data )
3.0

In this example, we told Python to assign each element of our 3-tuple named data, to a separate parameter variables of the function avg3().

As with the * operator, we can use ** to make a dict become a series of keyword parameters to a function.

>>> d={ 'a':5, 'b':6, 'c':9 }
>>> avg3( **d )
6.666666666666667

In this example, we told Python to assign each element of the dict, d , to specific keyword parameters of our function.

We can mix and match this with ordinary parameter assignment, also. Here’s an example.

>>> avg3( 2, b=3, **{'c':4} )
3.0

Here we’ve called our function with three argument values. The parameter a will get its value from a simple positional parameter. The parameter b will get its value from a keyword argument. The parameter c will get its value from having the dict {'c':4} turned into keyword parameter assignment.

We’ll make more use of this in Inheritance .