Creating or Extending Data Types

When we use an operator, like + or *, what happens depends on the types of the objects involved. When we say c*2, the value depends on the type of c . If c is numeric, then 2 may have to be converted to the same type of number as c, and the answer will be a number. If, however, c is a sequence, the result is a new sequence.

>>> c=8.0
>>> c*2
16.0
>>> c="8.0"
>>> c*2
'8.08.0'
>>> c=(8,0)
>>> c*2
(8, 0, 8, 0)

The selection of appropriate behavior is accomplished by the relatively simple mechanism of special method names within Python. Each class of objects, either built-in or created by a programmer, can provide the required special method names to create the intimate relationship between the class, the built-in functions and the mathematical operators.

We’ll look at the general principles in Semantics of Special Methods.

If you provide special methods, you can make your class behave like a built-in class. Your class can participate seamlessly with built-in Python functions like str(), len(), repr(). These are basic special methods, covered in Basic Special Methods. We’ll look at some special attributes in Special Attribute Names.

Your class can also participate with the usual mathematical operators like + and *. We’ll look at this in Numeric Type Special Methods.

Additionally, your class could also use the collection operators in a manner similar to a dict, set, tuple or list. We’ll look at this in Collection Special Method Names, Collection Special Method Names for Iterators and Iterable, Collection Special Method Names for Sequences, Collection Special Method Names for Sets and Collection Special Method Names for Mappings.

We’ll look at a few examples in Mapping Example and Iterator Examples.

We’ll look at special names for handling attributes are handled in Attributes, Properties and Descriptors.

Additionally, we can extend built-in classes. We do this by extending some of the special methods to do additional or different things. We’ll examine this in Extending Built-In Classes.

Semantics of Special Methods

Python has a number of language features that interact with the built-in data types. For example, objects of all built-in types can be converted to strings. You can use the built-in str() function to perform these conversions. The str() function invokes the __str__() special method of the given object. In effect, str(a) is evaluated as a.__str__().

When you create your own class, you must supply the specially-named method, __str__(), that the built-in str() function can use to successfully convert your classes values to strings. The default implementation of __str__() is pretty lame; you’ll always want to override it.

In Special Method Names we introduced a few special method names. We looked at __init__(), which is evaluated implicitly when an object is created. We looked at __str__(), which is used by the str() function and __repr__() that is used by the repr() function.

A huge number of Python features work through these special method names. When you provide appropriate special methods for your class, it behaves more like a built-in class.

You may be suspicious that the special method name __str__() matches the built-in function str(). There is no simple, obvious rule. Many of the built-in functions invoke specially-named methods of the class that are similar. The operators and other special symbols, however, can’t have a simple rule for pairing operators with special method names. You’ll have to actually read the documentation for built-in functions (Library Reference, section 2.1) and special method names (Language Reference, section 3.3) to understand all of the relationships.

Categories of Special Method Names. The special methods fall into several broad categories. The categories are defined by the kind of behavior your class should exhibit.

Basic Object Behaviors:
 

A number of special method names make your object behave like oher built-in objects. These special methods make your class respond to str(), repr() and various comparison operators. This also includes methods that allow your object to respond to the hash() function, which allows instances of your class to be a key to a mapping.

Numeric Behaviors:
 

These special methods allow your class to respond to the artithmetic operators: +, -, *, /, %, **, <<, >>, and, or and not. When you implement these special methods, your class will behave like the built-in numeric types.

Container Behaviors:
 

If your new class is a container (or a collection), there are a number of methods required so that your class can behave like the built-in collection types (sequence, set, mapping). Section 3.4.5 of the Python Language Reference calls them “containers”. However, we’ll call them collections below.

Iterator Behavior:
 

An iterator has a unique protcol. The for statement requires an __iter__() method to product an iterator for an iterable object. The iterator must provide the next() method to actually do the iteration. While this isn’t tied to a collection, it’s commonly used with collections, so we’ll show this with the collection special names below.

Attribute Handling Behavior:
 

Some special methods customize how your class responds to the . operator for manpulating attributes. For example, when you evaluate object.attr. This is commonly used when attribute manipulation is more complex than simply locating an attribute that was defined by __init__().

Function Behavior:
 

You can make your object behave like a function. When you define the method __call__(), your object is callable , and can be used as if it was a function.

Statement Interaction:
 

There are a few special methods required by statements.

The for statement requires an __iter__() method to product an iterator for an iterable object. The iterator object must implement a next() method.

The with statement requires __enter__() and __exit__() methods.

Basic Special Methods

In addition to __init__() and __str__() there are a number of methods which are appropriate for classes of all kinds.

__init__(self, args...)
Called when a new instance of the class is created. Note that this overrides any superclass __init__() method; to do superclass initialization first, you must evaluate the superclass __init__() like this: super( class, self ).__init__( args... ). The super() function identifies the superclass of your class, class.
__del__(self)
Called when the this object is no longer referenced anywhere in the running program; the object is about to be removed by garbage collection. This is rarely used. Note that this is called as part of Python garbage collection; it is not called by the del statement.
__repr__(self) → string
Called by the repr() built-in function. Typically, the string returned by this will look like a valid Python expression to reconstruct the object.
__str__(self) → string
Called by the str() built-in function. This is called implicitly by the print statement (and print() function) to convert an object to a convenient, “pretty” string representation.
__eq__(self, other) → boolean
Called by ==.
__ne__(self, other) → boolean
Called by !=.
__lt__(self, other) → boolean
Called by <.
__le__(self, other) → boolean
Called by <=.
__gt__(self, other) → boolean
Called by >.
__ge__(self, other) → boolean
Called by >=.
__hash__(self) → int
Called during dictionary operations, and by the built-in function hash() to transform an object to a unique 32-bit integer hash value. Objects which compare equal should also have the same hash value. If a class does not define a __eq__() method it should not define a __hash__() operation either. Classes with mutable objects can define __eq__() but should not define __hash__(), or objects would move around in the dictionary.
__nonzero__(self) → boolean
Called during truth value testing; must return 0 or 1. If this method is not defined, and __len__() is defined, then __len__() is called based on the assumption that this is a collection. If neither function is defined, all values are considered True.

Special Attribute Names

As part of creating a class definition, Python adds a number of special attributes. These are informational in nature, and cannot not be easily be changed except by redefining the class or function, or reimporting the module.

__class__:This object’s class name. The class has a __name__ attribute which is the name of the class.
__module__:The module in which the class was defined.
__dict__:The dictionary which contains the object’s attributes and methods.
__bases__:The base classes for this class. These are also called superclasses.
__doc__:The documentation string. This is part of the response produced by the help() function.

Here’s an example of how the class docstring is used to produce the help() results for a class.

import random
print random.Random.__doc__
help(random.Random)

Numeric Type Special Methods

When creating a new numeric data type, you must provide definitions for the essential mathematical and logical operators. When we write an expression using the usual +, -, *, /, //, %, or **, Python transforms this to method function calls.

Consider the following:

v1= MyClass(10,20)
v2= MyClass(20,40)
x = v1 + v2

In this case, Python will evaluate line 3 as if you had written:

x = v1.__add__( v2 )

Every arithmetic operator is transformed into a method function call. By defining the numeric special methods, your class willwork with the built-in arithmetic operators. There are, however, some subtleties to this.

Forward, Reverse and In-Place Method Functions. There are as many as three variant methods required to implement each operation. For example, * is implemented by __mul__(), __rmul__() and __imul__(). There are forward and reverse special methods so that you can assure that your operator is properly commutative. There is an in-place special method so that you can implement augmented assignment efficiently (see Augmented Assignment).

You don’t need to implement all three versions. If you implement just the forward version, and your program does nothing too odd or unusual, everything will work out well. The reverse name is used for special situations that involve objects of multiple classes.

Python makes two attempts to locate an appropriate method function for an operator. First, it tries a class based on the left-hand operand using the “forward” name. If no suitable special method is found, it tries the the right-hand operand, using the “reverse” name.

Additionally, we can return the special value NotImplemented to indicate that a specific version of a method function is not implemented. This will cause Python to skip this method function and move on to an alternative.

Consider the following:

v1= MyClass(10,20)
x = v1 * 14
y = 28 * v1

Both lines 2 and 3 require conversions between the built-in integer type and MyClass. For line 2, the forward name is used. The expression v1*14 is evaluated as if it was

x = v1.__mul__( 14 )

For line 3, the reverse name is used. The expression 28*v1 is evaluated as if it was

y = v1.__rmul__( 28 )

Note

Python 3 and Coercion

Historically, as Python has evolved, so have the ins and outs of argument coercion from data type to data type. We’ve omitted the real details of the coercion rules.

In Python 3.0, the older notion of type coercion and the coerce() function will be dropped altogether, so we’ll focus on the enduring features that will be preserved.

Section 3.4.8 of the Python Language Reference covers this in more detail; along with the caveat that the Python 2 rules have gotten too complex.

The Operator Algorithm. The algorithm for determing what happens with x op y is approximately as follows.

Note that a special method function can return the value NotImplemented. This indicates that the operation can’t work directly only the values, and another operation should be chosen. The rules provide for a number of alternative operations, this allows a class to be designed in a way that will cooperate successfully with potential future subclasses.

  1. The expression string % anything is a special case and is handled first. This assures us that the value of anything is left untouched by any other rules. Generally, it is a tuple or a dictionary, and should be left as such.
  2. If this is an augmented assignment statement (known as an in-place operator, e.g., variable $= anything) for some operator, $. If the left operand implements __iXopX__(), then that __iXopX__() special method is invoked without any coercion. These in-place operators permit you to do an efficient udpate the left operand object instead of creating a new object.
  3. As a special case, the two operators are superclass XopX subclass, then the right operand (the subclass) __rXopX__() method is tried first. If this is not implemented or returns NotImplemented then the the left operand (the superclass) __XopX__() method is used. This is done so that a subclass can completely override binary operators, even for built-in types.
  4. For x op y, x.__op__(y) is tried first. If this is not implemented or returns NotImplemented , y.__rop__(x) is tried second.

The following functions are the “forward” operations, used to implement the associated expressions.

method function original expression
   
__add__(other)() self + other
__sub__(other)() self - other
__mul__(other)() self * other
__div__(other)() self / other classical Python 2 division
__truediv__(other)() self / other when from __future__ import division or Python 3
__floordiv__(other)() self // other
__mod__(other)() self % other
__divmod__(other)() divmod( self, other )
__pow__(other [, modulo] )() self ** other
__lshift__(other)() self << other
__rshift__(other)() self >> other
__and__(other)() self & other
__or__(other)() self | other
__xor__(other)() self ^ other

The method functions in this group are used to resolve operators by attempting them using a reversed sense.

method function original expression
   
__radd__(other)() other + self
__rsub__(other)() other - self
__rmul__(other)() other * self
__rdiv__(other)() other / self classical Python 2 division
__rtruediv__(other)() other / self when from __future__ import division or Python 3
__rfloordiv__(other)() other // self
__rmod__(other)() other % self
__rdivmod__(other)() divmod( other, self )
__rpow__(other [,modulo])() other ** self
__rlshift__(other)() other << self
__rrshift__(other)() other >> self
__rand__(other)() other & self
__ror__(other)() other | self
__rxor__(other)() other ^ self

The method functions in this group are used to resolve operators by attempting them using an incremental sense.

method function original expression
   
__iadd__(other)() self += other
__isub__(other)() self -= other
__imul__(other)() self *= other
__idiv__(other)() self /= other classical Python 2 division
__itruediv__(other)() self /= other when from __future__ import division or Python 3
__ifloordiv__(other)() self //= other
__imod__(other)() self %= other
__ipow__(other [,modulo])() self **= other
__ilshift__(other)() self <<= other
__irshift__(other)() self >>= other
__iand__(other)() self &= other
__ior__(other)() self |= other
__ixor__(other)() self ^= other

The method functions in the following group implement the basic unary operators.

method function original expression
   
__neg__() - self
__pos__() + self
__abs__() abs( self )
__invert__() ~ self
__complex__() complex( self )
__int__() int( self )
__long__() long( self )
__float__() float( self )
__oct__() oct( self )
__hex__() hex( self )
__index__() sequence[self], usually as part of a slicing operation which required integers

Rational Number Example. Consider a small example of a number-like class. The Rational Numbers exercise in Classes describes the basic structure of a class to handle rational math, where every number is represented as a fraction. We’ll add some of the special methods required to make this a proper numeric type. We’ll finish this in the exercises.

class Rational( object ):
    def __init__( self, num, denom= 1L ):
        self.n= long(num)
        self.d= long(denom)
    def __add__( self, other ):
        return Rational( self.n*other.d + other.n*self.d,
        self.d*other.d )
    def __str__( self ):
        return "%d/%d" % ( self.n, self.d )

This class has enough methods defined to allow us to add fractions as follows:

>>> x = Rational( 3, 4 )
>>> y = Rational( 1, 3 )
>>> x + y
7/12

In order to complete this class, we would need to provide most of the rest of the basic special method names (there is almost never a need to provide a definition for __del__()). We would also complete the numeric special method names.

Additionally, we would have to provide correct algorithms that reduced fractions, plus an additional conversion to respond with a mixed number instead of an improper fraction. We’ll revisit this in the exercises.

Conversions From Other Types. For your class to be used successfully, your new numeric type should work in conjunction with existing Python types. You will need to use the isinstance() function to examine the arguments and make appropriate conversions.

Consider the following expressions:

x = Rational( 22, 7 )
y = x+3
z = x+0.5

Our original __add__() method assumed that the other object is a Rational object. But in this case, we’ve provided int and float values for other. Generally, numeric classes must be implemented with tests for various other data types and appropriate conversions.

We have to use the isinstance() function to perform checks like the following: isinstance( other, int ). This allows us to detect the various Python built-in types.

If the result of isinstance( other, types ) is True in any of the following cases, some type of simple conversion should be done, if possible.

  • complex. isinstance( other, complex ). You may want to raise an exception here, since it’s hard to see how to make rational fractions and complex numbers conformable. If this is a common situation in your application, you might need to write an even more sophisticated class that implements complex numbers as a kind of rational fraction. Another choice is to write a version of the abs() function of the complex number, which creates a proper rational fraction for the complex magnitude of the given value.

  • float. isinstance( other, float ). One choice is to truncate the value of other to long , using the built-in long() function and treat it as a whole number, the other choice is to determine a fraction that approximates the floating point value.

  • int or long. isinstance( other, (int,long) ). Any of these means that the other value is clearly the numerator of a fraction, with a denominator of 1.

  • string. isinstance( other, basestring ). We might try to convert the other value to a long using the built-in long() function. If the conversion fails, we could try a float. The exception that’s thrown from any of the attempted conversions will make the error obvious.

    The basestring type, by the way, is the superclass for ASCII strings ( str ) and Unicode strings ( unicode ).

  • Rational. isinstance( other, Rational ). This indicates that the other value is an instance of our Rational class; we can do the processing as expected, knowing that the object has all the methods and attributes we need.

Here is a version of __sub__() with an example of type checking. If the other argument is an instance of the class Rational, we can perform the subtract operation. Otherwise, we attempt to convert the other argument to an instance of Rational and attempt the subtraction between two Rationals.

def __sub__( self, other ):
    if isinstance(other,Rational):
        return Rational( self.n*other.d - other.n*self.d, self.d*other.d )
    else:
        return self - Rational(long(other))

An alternative to the last line of code is the following.

return Rational( self.n-long(other)*self.d, self.d )

While this second version performs somewhat quicker, it expresses the basic rational addition algorithm twice, once in the if: suite and again in the else: suite. A principle of object oriented programming is to maximize reuse and minimize restating an algorithm. My preference is to state the algorithm exactly once and reuse it as much as possible.

Reverse Operators. In many cases, Python will reverse the two operands, and use a function like __rsub__() or __rdiv__(). For example:

def __rsub__( self, other ):
    if isinstance(other,Rational):
       return Rational( other.n*self.d - self.n*other.d,
       self.d*other.d )
    else:
        return Rational(long(other)) - self

You can explore this behavior with short test programs like the following:

x = Rational( 3,4 )
print x-5
print 5-x

Collection Special Method Names

The various collection special method names can be organized several different ways. Above, in Semantics of Special Methods we claimed that a bunch of special method names were related to “container” and “iterator” behavior. These categories from the language reference don’t tell the whole story.

Python gives us additional tools to create classes that behave like the built-in collection classes. We can use the abstract base classes in the collections module to jump-start our definition of new types of collections.

Each abstract base class (ABC) in the collections module provides a common feature (or set of features) with the method functions that are required to implement that feature. In some cases, the features build on each other, and a number of method functions are required.

Since each of the ABC classes is abstract, they’re missing the implementation of one or more methods. To use these classes, you’ll have to provide the necessary methods.

One very important consequence of using the collections base classes is that it creates standardized names for the various features. This simplifies the assertions that might be required when checking the argument values to a function or method function.

For more information, see section 9.3.1 of the Python Library Reference, as well as PEP 3119.

Some Foundational Definitions. We’ll look at some foundational abstract classes first. Each of these defines a small group of fundamental features. We’ll use this in the next section to build more sophisticated classes.

We’ll look at the following:

  • Container. What makes a container? The in test for membership. Extend this class to make objects of your class a container.
  • Hashable. This makes something usable as a key for mappings or sets. Extend this class to make objects of your class a hashable key.
  • Sized. This makes something report how many elements it has. Extend this class to make objects of your class respond to the len() function with a size.
  • Callable. Extend this class to make objects of your class behave like a function.
class Container

To be a Container object, the class must provide the __contains__() method.

__contains__(self, value) → boolean
Return true if the value is in the container.

This example is a little silly, but it shows a tuple-like container that secretly adds an additional item.

class BonusContainer( collections.Container ):
    def __init__( self, \*members ):
        self.members= members + ( 'SecretKey', )
    def __contains__( self, value ):
        return value in self.members
class Hashable

To be a Hashable object, the class must provide the __hash__() method. This is a requirement for any object we might want to use as a key to a dictionary.

__hash__(self) → int
Return a hash for this Hashable object. The easiest wasy to compute a hash is to sume the hashes of the various elements.

Here’s an example of a class that creates a hash, suitable for use in a dictionary.

class StockBlock( collections.Hashable ):
    def __init__( self, name, price, shares ):
        self.name= name
        self.price= price
        self.shares= shares
        self._hash= hash(self.name)+hash(self.price)+hash(self.shares)
    def __hash__( self ):
        return self._hash
class Sized

To be a Sized object, the class must provide the __len__() method.

__len__(self) → int
Return the size of this collection. This is generally understood to be the nunber of items in the collection.
class Callable

To be a Callable object, the class must provie the __call__() method.

Functions are callable objects, but we can also define a class that creates callable objects, similar to a function definition.

__call__(self, parameters...) → return value

This method is called when the object is used as if it were a function. We might create and use callable object with something like the following.

callable_object = MyCallable()
callable_object( argument, values )

Here’s an example of a callable class definition. We can use this to create callable objects that are – essentially – functions.

class TimesX( collections.Callable ):
    def __init__( self, factor ):
        self.factor= factor
    def __call__( self, argument ):
        return argument * self.factor

We can use this class to create callable objects as follows:

>>> times_2= TimesX(2)
>>> times_2( 5 )
10
>>> import math
>>> times_pi= TimesX(math.pi)
>>> times_pi( 3*3 )
28.274333882308138
  1. We created a callable object, times_2, as an instance of TimesX with a factor of 2.
  2. We applied our times_2 function to a value of 5.
  3. We created a callable object, times_pi, as an instance of TimesX with a factor of math.pi.
  4. We applied our times_i function to a value of 3*3.

Collection Special Method Names for Iterators and Iterable

The following collection class definitions introduce special method names to make your class respond to the iterator protocols used by the for statement.

We’ll look at defining iterable contains and iterators in depth in Collection Special Method Names for Iterators and Iterable.

class Iterable

To be an Iterable object, the class must provide the __iter__() method.

__iter__(self) → Iterator
Returns an Iterator for this Iterable object.

Generally, this will look like the following.

import collections
class  MyIterable( collections.Iterable ):

    # all the other methods of the MyIterable collection

    def __iter__( self ):
        return MyIteratorClass( self )

This means that we have a class that extends collections.Iterator which will control the iteration through the given MyIterable collection.

These two classes are sometimes called “friends”, since the the Iterator often has deeper access to the Iterable.

class Iterator

To be an Iterator object, the class must provide the __next__() method. Additionally, an Iterator is itself Iterable, so it must provide a __iter__() method.

Generally, the __iter__() method just does return self.

__next__(self) → object
Advance the iterator and either return the next object from the iterable container or raise an StopIteration exception.

Important

Python 3

The the Iterator method name of __next__() is focused on Python 3.

For Python 2 compatability, you might want to also defined next().

def next( self ):
    return self.__next__()
Iterator.__iter__(self) → self
Additionally an iterator will also provide a definition of the __iter__() special method name. This will simply return the iterator itself (return self). This prevents small problems with redundant calls to the iter() built-in function.

Note there are still more features of iterators. The Python Enhancement Proposal (PEP 342) describes some considerably more advanced features of iterators and the yield statement.

Collection Special Method Names for Sequences

The following collection class definitions introduce special method names to make your class behave like a Sequence (tuple) or a Mutable Sequence (list).

class Sequence

A Sequence object (essentially a tuple) is based on three more fundamental definitions: Sized, Iterable and Container. As such, the class must define a number of methods including __contains__(), __len__(), __iter__().

A Sequence must define a __getitem__() method.

Additionally, methods like __reversed__(), index() and count() are also sensible for this class. The collections.Sequence provides defaults for these methods.

Since we’re talking about an object that’s like a tuple or frozenset, there aren’t any methods for updating or mutating the contents.

Note also, that Hashable isn’t part of a Sequence. You might want to add Hashable if your Sequence can support a fixed hash value, suitable for use as dictionary key.

This class provides a default implementation of __iter__() that is based on using a range of values and calling __getitem__().

Sequence.__getitem__(self, index) → value

Return the value at the given index in the Sequence.

The __getitem__() method function should be prepared for the key to be either a simple integer or a slice object. When called with an integer, it returns an element of the sequence or raises IndexError.

A slice is a simple object with three attributes: start, stop and step. When called with a slice object, it returns another Sequence.

The following examples show common slice situations.

  • The expression someSequence[1:5] is transformed to someSequence.__getitem__( slice(1,5) ). The slice object has the following attribute values: key.start = 1, key.stop = 5, key.step = None.
  • The expression someSequence[2:8:2] is transformed to someSequence.__getitem__( slice(2,8,2) ). The slice object has the following attribute values: key.start = 2, key.stop = 8, key.step = 2.
  • The expression someSequence[1:3,5:8] is transformed into someSequence.__getitem__( ( slice(1,3), slice(5,8) ) ). The key argument will be a tuple of slice objects.

Here’s an example of a simple class that behaves like a tuple, with some restrictions.

import collections
class Point( collections.Sequence ):
    def __init__( self, x, y ):
        self.x= x
        self.y= y
    def __len__( self ):
        return 2
    def __contains__( self, key ):
        return self.x==key or self.y==key
    def __getitem__( self, position ):
        if position in ( 0, -2 ):
            return self.x
        elif position in ( 1, -1 ):
            return self.y
        else:
            raise IndexError
class MutableSequence

A MutableSequence exteds a Sequence. It requires all of the Sequence methods to be defined. Plus it has some additional methods. To be mutable, it must have a way to update and remove items. The methods __setitem__(), __delitem__(), and insert() must be defined to update a MutableSequence.

There are of course, numerous additional methods that are provided by default. Any of the optional methods from Sequence, plus append(), reverse(), extend(), pop(), remove(), and __iadd__(). You can override these definitions if you want to improve their performance.

MutableSequence.__getitem__(self, index) → value
Return the value at the given index in the Sequence.
MutableSequence.__setitem__(self, index, value)
Replace the value at the given index in the Sequence
MutableSequence.__delitem__(self, index, value)
Remove the value at the given index in the Sequence.

The __getitem__(), __setitem__() and __delitem__() method function should be prepared for the index to be either a simple integer, a slice object, or a tuple of slice objects.

MutableSequence.insert(self, index, value)
Insert a new value before the given index in the Sequence.

Collection Special Method Names for Sets

The following collection class definitions introduce special method names to make your class behave like a Set (frozenset) or a MutableSet (set).

class Set

A Set, like a basic Sequence, is based on three more fundamental definitions: Sized, Iterable, Container. The basic collections.Set is an immutable set; it’s the basis for the built-in frozenset. A mutable set will build on this definition.

As such, the class must define a number of methods including __contains__(), __len__(), __iter__().

Since, generally, a set simply checks for membership, we don’t need too much more.

The comparison operations (__le__(), __lt__(), __eq__(), __ne__(), __gt__(), __ge__(), __and__(), __or__(), __sub__(), __xor__(), and isdisjoint()) have default definitions. You can override these for performance reasons.

class MutableSet

A MutableSet extends a Set. It requires all of the Set methods to be defined. Plus it has some additional methods. To be mutable, it must have the methods add() and discard() to update the MutableSet. The collections.MutableSet is is the basis for the built-in set.

The collections.MutableSet provides the following method functions clear(), pop(), remove(). These are based on our supplied add() and discard()

Also, the following operators are provided so that a MutableSet can be updated with another set: __ior__(), __iand__(), __ixor__(), and __isub__().

MutableSet.add(self, item)
Updates the set to add the item, if it was not already a member.
MutableSet.discard(self, item)
Updates the set to remove the item, if it was a member. Does nothing if the member was not already in the set.

Collection Special Method Names for Mappings

The following collection class definitions introduce special method names to make your class behave like a Mapping or a MutableMapping (dict).

class Mapping

A Mapping, like a basic Sequence, is based on three more fundamental definitions: Sized, Iterable, Container. The basic collections.Mapping is the definition of an immutable mapping. A mutable mapping (like a dict) will build on this definition.

As such, the class must define a number of methods including __contains__(), __len__(), __iter__().

A Mapping must define a __getitem__() method.

Additionally, default methods will be provided for __contains__(), keys(), items(), values(), get(), __eq__(), and __ne__(). The equality test compares the list created by items() to assure that each item tuple has the same key and value.

__getitem__(self, key) → object
Returns the value corresponding to key, or raises KeyError.
class MutableMapping

A MutableMapping exteds a Mapping. It requires all of the Set methods to be defined. Plus it has some additional methods. To be mutable, the methods __setitem__() and __setitem__() must be defined.

Also, methods are provided for pop(), popitem(), clear(), update(), and setdefault()

__getitem__(self, key) → object

Returns the value corresponding to key, or raises KeyError.

A collections.defaultdict does not raise an exception. For keys that don’t exist, this version of a MutableMapping creates a default value and then returns that.

__setitem__(self, key, value)
Puts an item into the mapping; the item has the given key and value. If the key did not exist it is added. If the key did exist, the previous value is replaced.
__delitem__(self, key)
Removes an item from the mapping, or raises a KeyError if the item did not exist.

Beyond these two base classes, there are some additional classes that help you to define a “view” that’s based on a mapping.

class KeysView

An object of this class is built from an existing Mapping, and behaves like a Set that contains the keys from the exsting Mapping.

The methods from Sized (__len__()) and Set (__contains__() and __iter__()) are defined.

class ValuesView

An object of this class is built from an existing Mapping, and behaves like a Set that contains the values from the exsting Mapping.

Unlike a proper Set, however, there may appear to be multiple copies of a given value.

The methods from Sized (__len__()) and Set (__contains__() and __iter__()) are defined.

class ItemsView

An object of this class is built from an existing Mapping, and behaves like a Set that contains a sequence of ( key, value ) tuples from the exsting Mapping.

The methods from Sized (__len__()) and Set (__contains__() and __iter__()) are defined.

Note that __contains__() checks for the presence of a ( key, value ) 2-tuple.

Mapping Example

An immutable mapping is a kind of translation from key to value. The mapping is fixed when the object is created and cannot be updated.

Here’s an example of a small class to define an immutable mapping that we’re calling a Translation.

Note that our immutable mapping happens to have a plain old dict under the hood.

class Translation( collections.Mapping ):
    def __init__( self, ** kw ):
        self._map= kw
    def __len__( self ):
        return len(self._map)
    def __contains__( self, key ):
        return key in self._map
    def __iter__( self ):
        return iter(self._map)
    def __getitem__( self, key ):
        return self._map[key]

Here’s a transcript of using our Translation class to create an object that translates some names to numeric values.

>>> c = Translation( red=0, green=1, blue=2 )
>>> c['red']
0
>>> c['white']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 11, in __getitem__
KeyError: 'white'
>>> c['black']= 3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Translation' object does not support item assignment
>>> for nm in c:
...     print nm
...
blue
green
red

Iterator Examples

The built-in sequence types (list, tuple, string) all produce iterator objects for use by the for statement. The set, frozenset, dict and file objects also produce an iterator.

In addition to defining ordinary generator methods by using the yield statement, your classes can also produce iterator objects. This can make a program slightly simpler to read by assuring that loops are simple, obvious for statements.

Easy Iterators. When writing a collection-like class, you can simply write method functions that include the yield statement.

class Deck( object ):
    def __init__( self ):
        # Create the self.cards container.
    def deal( self ):
        self.shuffle()
        for c in self.cards:
            yield c

The deal() method is an iterator. We can use this iterator as follows.

d= Deck()
for card in d.deal():
    print c

This is the same technique covered in Iterators and Generators, except used with method functions instead of stand-alone functions.

Unique Iterator Classes. Generally, an iterator is an object we’ve designed to help us use a more complex container. Consequently, the container will usually contain a factory method which creates iterators associated with the container. A container will implement the special method __iter__() to emit an iterator properly configured to handle the container.

When we evaluate iter( someList ), the object, someList, must return an iterator object ready to be used with a for statement. The way this works is the iter() function evaluates the __iter__() method function of the object, someList. The objects __iter__() method function creates the object to be used as an iterator. We’ll do something similar in our own classes.

In the following example classes, we’ll create a class which wraps a list and provides and a specialized iterator that yields only non-zero values of the collection.

import collections
class DataSamples( collections.Iterable, collections.Sized ):
    def __init__( self, aList=None ):
        self.values= [] if aList is None or aList
    def __iter__( self ):
        return NonZeroIter( self )
    def __len__( self ):
        return len( self.values )
    def __getitem__( self, index ):
        return self.values[index]
  1. When we initialize a DataSamples instance, we save any provided sequence of values. This class behaves like a collection. We haven’t provided all of the methods, however, in order to keep the example short. Clearly, to be list-like, we’ll need to provide an append() method.
  2. When we evaluate the iter() function for a DataSamples object, the DataSamples object will create a new, initialized NonZeroIter. Note that we provide the DataSamples object to the new NonZeroIter, this allows the iterator to process the collection properly.
class NonZeroIter( collections.Iterator ):
    def __init__( self, aDataSamples ):
        self.ds= aDataSamples
        self.pos= -1
    def __next__( self ):
        while self.pos+1 != len(self.ds) and self.ds[self.pos+1] == 0:
            self.pos += 1
        if self.pos+1 == len( self.ds ):
            raise StopIteration
        self.pos += 1
        return self.ds[self.pos]
    def next( self ):
        return self.__next__()
    def __iter__( self ):
        return self
  1. When initialized, the NonZeroIter saves the collection that it works with. It also sets it’s current state; in this instance, we have pos set to -1, just prior to the element we’ll return.
  2. The next() function of the iterator locates the next non-zero value. If there is no next value or no next non-zero value, it raises StopIteration to notify the for statement. Otherwise, it returns the next non-zero value. It updates its state to reflect the value just returned.
  3. The __iter__() function of the iterator typically returns self.

We can make use of this iterator as follows.

ds = DataSamples( [0,1,2,0,3,0] )
for value in ds:
    print value

The for statement calls iter(ds) implicitly, which calls ds.__iter__(), which creates the NonZeroIter instance. The for statement then calls the next() method of this iterator object to get the non-zero values from the DataSamples object. When the iterator finally raises the StopIteration exception, the for statement finishes normally.

Extending Built-In Classes

We can extend all of Python’s built-in classes. This allows us to add or modify features of the data types that come with Python. This may save us from having to build a program from scratch.

Here’s a quick example of a class which extends the built-in list class by adding a new method.

>>> class SumList( list ):
...     def sum( self ):
...             return sum( self )
...
>>> x= SumList( [1, 3, 4] )
>>> x
[1, 3, 4]
>>> x.sum()
8
>>> x.append( 5 )
>>> x.sum()
13

Clearly, we can extend a class like list with additiona; methods for various statistical calculations. Each function can be added as easily as the SumList.sum() function in the above example.

Here’s an exmaple of extending a dictionary with simple statistical functions. We based on dictionary on collections.defaultdict because it makes it very simple to create a frequency table. See Default Dictionaries for more information on default dictionaries.

>>> from collections import defaultdict
>>> class StatDict( defaultdict ):
...     def __init__( self, valueList=None ):
...             super( StatDict, self ).__init__( int, valueList )
...     def sum( self ):
...             return sum(  k*self[k] for k in self  )
...
>>> x = StatDict( [ (2,1), (3,1), (4,2) ] )
>>> x.sum()
13
>>> x[5] += 1
>>> x.sum()
18
>>> x[5] += 1
>>> x.sum()
23

Each time we increment the frequency of a numeric value, the defaultdict will add the integer count automatically.

Special Method Name Exercises

Geometric Points

A 2-dimensional point is a coordinate pair, an x and y value. If we limit the range to the range 0 to 2**16, we can do a few extra operations quickly.

Develop the basic routines for __init__(), __repr__(), __str__(). The __hash__() function can simply combine x and y via x<<16+y.

Develop a test routine that creates a sequence of points.

Also, be sure to develop a test that uses points as keys in a dictionary.

Rational Numbers

Finish the Rational number class by adding all of the required special methods. The Rational Numbers exercise in Classes describes the basic structure of a class to handle rational math, where every number is represented as a fraction.

Currency and the Cash Drawer

Currency comes in denominations. For instance, US currency comes in $100, $50, $20, $10, $5, $1, $.50, $.25, $.10, $.05, and $.01 denominations. Parker Brothers Monopoly™ game has currency in 500, 100, 50, 20, 10, 5 and 1. Prior to 1971, English currency had £ 50, £ 20, £ 10, £ 5, £ 1, shillings (1/12 of a pound) and pence (1/20 of a shilling). An amount of money can be represented as an appropriate tuple of integers, each of which represents the specific numbers of each denomination. For instance, one representation for $12.98 in US currency is (0, 0, 0, 1, 0, 2, 0, 3, 2, 0, 3).

Each subclass of Currency has a specific mix of denominations. We might define subclasses for US currency, Monopoly currency or old English currency. These classes would differ in the list of currencies.

An object of class currency would be created with a specific mix of denominatioons. The superclass should include operations to add and subtract Currency objects. An __iadd__( currency )() method, for example would add the denominations in currency to this object’s various denominations. An __isub__( currency )() method, for example would subtract the denominations in currency to this object’s various denominations; in the event of attempting to subtract more than is available, the object would raise an exception.

Be sure to define the various conversions to float, int and long so that the total value of the collection of bills and coins can be reported easily.

An interesting problem is to translate a decimal amount into appropriate currency. Note that numbers like 0.10 don’t have a precise floating-point representation; floating point numbers are based on powers of 2, and 0.10 can only be approximated by a finite-precision binary fraction. For US currency, it’s best to work in pennies, representing $1.00 as 100.

Develop a method which will translate a given target amount, t, into an appropriate mixture of currency denominations. In this case, we can iterate through the denominations from largest to smallest, determining the largest quantity, q of a denomination, d, such that q \times d \leq t. This version doesn’t depend on the current value of the Currency object.

More Adanced Solution. A more advanced version is to create a Currency object with a given value; this would represent the money in a cash drawer, for example. A method of this object would make an amount of money from only the available currency in the cash drawer, or raise an exception if it could not be done.

In this case, we iterate through the denominations, d, from largest to smallest, determining the largest quantity, q, such that q \times d \leq t, consistent with available money in the cash drawer. If we don’t have enough of a given denomination, it means that we will be using more of the smaller denominations.

One basic test case is to create a currency object with a large amount of money available for making change.

In the following example, we create a cash drawer with $804.55. We accept a payment of $10 as 1 $5, 4 $1, 3 $.25, 2 $.10 and 1 $.05. Then we accept a payment of $20, for a bill of $15.24, meaning we need to pay out $4.76 in change.

drawer = USCurrency( (5,2,6,5,5,5,5,5,5,5,5) )
drawer += USCurrency((0,0,0,0,1,4,0,3,2,1,0))
drawer += USCurrency((0,0,1,0,0,0,0,0,0,0,0))
drawer.payMoney( 4.76 )

Interestingly, if you have $186.91 (one of each bill and coin) you can find it almost impossible to make change. Confronted with impossible situations, this class should raise an UnableToMakeChange exception.

Sequences with Statistical Methods

Create a sequence class, StatSeq that can hold a sequence of data values. This class must be a subclass of collections.MutableSequence and define all of the usual sequence operators, including __add__(), __radd__(), __iadd__(), but not __mul__().

The __init__() function should accept a sequence to initialize the collection. The various __add__() functions should append the values from a StatSeq instance as well as from any subclass of collections.Sequence (which includes list and tuple.)

Most importantly, this class shold define the usual statistical functions like mean and standard deviation, described in the exercises after Tuples, Sequence Processing Functions: map(), filter() and reduce() and Sample Class with Statistical Methods in Classes.

Since this class can be used everywhere a sequence is used, interface should match that of built-in sequences, but extra features are now readily available. For a test, something like the following should be used:

import random
samples = StatSeq( [ random.randrange(6) for i in range(100) ] )
print samples.mean()
s2= StatSeq()
for i in range(100):
    ss.append( random.randrange(6) )
    # Also allow s2 += [ random.randrange(6) ]
print s2.mean()

There are two approaches to this, both of which have pros and cons.

  • Define a subclass of list with a few additional methods. This will be defined as class StatSeq( list ):.
  • Define a new class (a subclass of object) that contains an internal list, and provides all of the sequence special methods. Some (like append) will be delegated to the internal list object. Others (like mean) will be performed by the StatSeq class itself. This will be defined as class StatSeq( object ):.

Note that the value of mean() does not have to be computed when it is requested. It is possible to simply track the changing sum of the sequence and length of the sequence during changes to the values of the sequence.

The sum and length are both set to zero by __init__(). The sum and length are incremented by every __add__(), append(), insert(). They are decremented by pop(), remove(), and and __delitem__(). Finally, there’s a two-part change for __setitem__(): the old value is deducted and the new value is added.

This way the calculation of mean is simply a division operation.

Keeping track of sums and counts can also optimize mode and standard deviation. A similar optimization for median is particularly interesting, as it requires that the sample data is retained by this class in sorted order. This means that each insert must preserve the sorted data set so that the median value can be retrieved without first sorting the entire sequence of data. You can use the bisect module to do this.

There are a number of algorithms for maintaining the data set in sorted order. You can refer to Knuth’s The Art of Computer Programming [Knuth73], and Cormen, Leiserson, Rivest Introduction to Algorithms [Cormen90] which cover this topic completely.

Chessboard Locations

A chessboard can be thought of as a mapping from location names to pieces. There are two common indexing schemes from chessboards: algebraic and descriptive. In algebraic notation the locations have a rank-file address of a number and a letter. In descriptive notation the file is given by the starting piece’s file, rank and player’s color.

See Chess Game Notation for an extension to this exercise.

The algebraic description of the chess board has files from a-h going from white’s left to right. It has ranks from 1-8 going from white’s side (1) to black’s side (8). Board’s are almost always shown with position a1 in the lower left-hand corner and h8 in the upper right, white starts at the bottom of the picture and black starts at the top.

In addition to the simplified algebraic notation, there is also a descriptive notation, which reflects each player’s unique point of view. The descriptive board has a queen’s side (white’s left, files a-d) and a king’s side (white’s right, files e-h). Each rank is numbered starting from the player. White has ranks 1-8 going from white to black. Black, at the same time as ranks 1-8 going back toward white. Each of the 64 spaces on the board has two names, one from white’s point of view and one from black’s.

Translation from descriptive to algebraic is straight-forward. Given the player’s color and a descriptive location, it can be translated to an algebraic location. The files translate through a relatively simple lookup to transform QR to a, QKt to b, QB to c, Q to d, K to e, KB to f, KKt to g, KR to h. The ranks translate through a simple calculation: white’s ranks are already in algebraic notation; for black’s rank of r, 9-r is the algebraic location.

Create a class to represent a chess board. You’ll need to support the special function names to make this a kind of mapping. The __getitem__() function will locate the contents of a space on the board. The __setitem__() function will place a piece at a space on the board. If the key to either function is algebraic (2 characters, lower case file from a-h and digit rank from 1-8), locate the position on the board. If the key is not algebraic, it should be translated to algebraic.

The codes for pieces include the piece name and color. Piece names are traditionally “p” or nothing for pawns, “R” for rook, “N” for knight, “B” for bishop, “Q” for queen and “K” for king. Pawns would be simply the color code “w” or “b”. Other pieces would have two-character names: “Rb” for a black rook, “Qw” for the white queen.

The __init__() method should set the board in the standard starting position:

Chess Starting Position
piece algebraic descriptive piece code
white rooks a1, h1 wQR1, wKR1 Rw
white knights b1, g1 wQKt1, wKKt1 Nw
white bishop c1, f1 wQB1, wKB Bw
white queen d1 wQ1 Qw
white king e1 wK1 Kw
white pawns a2-h2 wQR2-wKR2 w
black rooks a8, h8 bQR1, bKR1 Rb
black knights b8, g8 bQKt1, bKKt1 Nb
black bishops c8, f8 bQB1, bKB1 Bb
black queen d8 bQ1 Qb
black king e8 bK1 Kb
black pawns a7-a7 bQR2-bKR2 b

Here’s a sample five-turn game. It includes a full description of each move, and includes the abbreviated chess game notation.

  1. white pawn from e2 to e4; K2 to K5

    black pawn from e7 to e5; K2 to K5

  2. white knight from g1 to f3; KKt1 to KB3

    black pawn from d7 to d6; Q2 to Q3

  3. white pawn from d2 to d4; Q2 to Q4

    black bishop from c8 to g4; QB1 to KKt5

  4. white pawn at d4 takes pawn at e5; Q4 to K5

    black bishop at g4 takes knight at f3; KKt5 to KB6

  5. white Q at d1 takes bishop at f3; Q1 to KB3

    black pawn at d6 takes e5; Q3 to K4

The main program should be able to place and remove pieces with something like the following:

chess= Board()
# move pawn from white King 2 to King 5
chess['wK5']= chess['wK2']; chess['wK2']= ''
# move pawn from black King 2 to King 5
chess['bK5']= chess['bK2']; chess['bK2']= ''
# algebraic notation to print the board
for rank in [ '8', '7', '6', '5', '4', '3', '2', '1']:
    for file in [ 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']:
        print "%5s" % board[file+rank],
    print

The algebraic output can be changed to the following, which some people find simpler.

for rank in ('8','7','6','5','4','3','2','1'):
    print "".join(
    [ "%5s" % board[file+rank]
        for file in ('a','b','c','d','e','f','g','h') ] )

You should also write a move() function to simplify creating the test game. A move typically consists of the piece name, the from position, the to position, plus optional notes regarding check and pawn promotions.

Relative Positions on a Chess Board

When decoding a log of a chess game in Short Algebraic Notation (SAN), it is often necessary to search for a piece that made a given move. We’ll look at this problem in detail in Chess Game Notation.

There are actually a number of search algorithms, each constrained by the rules for moving a particular piece. For example, the knight makes a short “L”-shaped move and there are only 8 positions on the board from which a knight can start to end up at a given spot. The queen, on the other hand, moves horizontally, vertically or diagonally any distance, and there are as many as 24 starting positions for the queen to end up at a given spot.

This search is simplified by having iterators that know a few rules of chess and an give us a sequence of appropriate rank and file values. We’d like to be able to say something like the following.

piece, move, toPos = ( "Q", "x", "f3" )
for fromPos in aBoard.queenIter( toPos ):
    if aBoard[fromPos] == 'Q':
        print "Queen from", fromPos, "takes", aBoard[toPos], "at", toPos

We’ll review a few chess definitions for this problem. You can also see Chessboard Locations in Collection Special Method Names for some additional background.

The algebraic description of the chess board has files from a-h going from white’s left to right. It has ranks from 1-8 going from white’s side (1) to black’s side (8). Board’s are almost always shown with position a1 in the lower left-hand corner and h8 in the upper right, white starts at the bottom of the picture and black starts at the top.

We need the following collection of special-purpose iterators.

  • The kingIter() method has to enumerate the eight positions that surround the king.

  • The queenIter() method has to enumerate all the positions in the same rank, the same file, and on the diagonals. Each of these must be examined from the queen’s position moving toward the edge of the board. This search from the queen outward allows us to locate blocking pieces that would prevent the queen from making a particular move.

  • The bishopIter() method has to enumerate all the positions on the diagonals. Each of these must be examined from the bishop’s position moving toward the edge of the board.

  • The knightIter() method has to enumerate the eight positions that surround the knight, reflecting the knight’s peculiar move of 2 spaces on one axis and 1 space on the other axis. There are four combinations of two ranks and one file and four more combinations of two files and one rank from the ending position. As with the king, no piece can block a knight’s move, so order doesn’t matter.

  • The rookIter() method has to enumerate all the positions in the same rank and the same file. Each of these must be examined from the rook’s position moving toward the edge of the board.

  • The pawnIter() method has to enumerate a fairly complex set of positions. Most pawn moves are limited to going forward one rank in the same file.

    Since we need to know which direction is forward, we need to know the color of the pawn. For white pawns, forward means the ranks increase from 2 to 8. For black pawns, then, forward means the ranks decrease from 7 down to 1.

    Pawn captures involve going forward one rank in an adjacent file. Further complicating the analysis is the ability for a pawn’s first move to be two ranks instead of one.

We note that the queen’s iterator is really a combination of the bishop and the rook. We’ll look at the rook’s iterator, because it is can be adapted to be a bishop iterator, and then those two combined to create the queen iterator.

Given a starting position with a rank of r and a file of f, we’ll need to examine all ranks starting from r and moving toward the edge of the board. These are r-1, r+1, r-2, r+2, r-3, r+3, .... Similarly, we need to examine all of the files starting from f and moving toward the edge of the board. These are f-1, f+1, f-2, f+2, f-3, f+3, ....

Before doing an comparison, we need to filter the file and rank combinations to assure that they are legal positions. Additionally, we need to stop looking when we’ve encountered a piece of our own color or an opposing piece that isn’t the one we’re searching for. These intervnening pieces “block” the intentended move.