Empirical Tests of Random Number Generators

Steven F. Lott

© 2001, Steven F. Lott

Revised 4/26/2005

Overview

Purpose

There are two ways to establish that a sequence of numbers is random: empirical testing and theoretical testing. Empirical testing creates sequences of random numbers and subjects those numbers to a battery of tests. Theoretical testing examines the random number generator itself.

This document describes a number of basic empirical tests for random number generators as described by Knuth.

The following tests are implemented:

  1. Frequency Test - develops frequency distribution of individual samples.
  2. Serial Test - develops frequency distribution of pairs of samples.
  3. Gap Test - develops frequency distribution of the length of gaps between groups samples in a given range.
  4. Poker Test - develops frequency distribution for 5-card "hands" of samples over a small (16-value) domain.
  5. Coupon Collector's Test - develops frequency distribution for lengths of subsets that contain a complete set of values from a small (8-value) domain.
  6. Permutation Test - develops frequency distribution for the permutations of ordering of 4-sample selections.
  7. Runs Up Test - develops frequency distribution for lengths of "runs up" where each value is larger than the previous value; one variation covers the case where runs are statistically dependent.
  8. Runs Up Test with independent runs and a relatively large domain.
  9. Runs Up Test with a "small domain", that has a slightly different expected distribution.
  10. Maximum of T - develops frequency distribution for the largest value in a group of T values.
  11. Serial Correlation - computes the correlation coefficient between adjacent pairs of values.

Introduction

The Design Considerations section presents an overview of the various design considerations. These range from an overview of the design through some data representation and algorithm considerations. The rngtest Module section describes the implementation of the various tests, embodied in the rngtest.py module. This covers the overall design and implementation for the module and the details of the design and implementation of each test. The chi2stats Module section describes the additional functions that support the statistical processing in the tests, this is embodied in the chi2stats.py module. This includes the higher-level χ² functions, correlation coefficient, and the "special" functions required to support these statistical functions. The Application Script section presents an application main program that evaluates a variety of random number generators.

Background

A simulation's utility is limited by the quality of the random number generator that is used to randomize the simulated events. A necessary step in any simulation is a demonstration that the random numbers are truly random and have not introduced a bias into the results.

The author started out developing a simple casino game simulator (designed to allow exploration of betting strategies). The associated question of random number quality arose, and lead to this random number testing module, rngtest. The statistical measures used to evaluate generators lead, in turn, to development of the chi2stats module.

While these started as digressions, they provide numerous interesting algorithms and examples of relatively simple Python programming. The resulting modules are also good candidates for a literate presentation via pyWeb.

References

Fields, Joe. Online dictionary of Combinatorics, for basic definitions.

Gourdon, Xavier and Pascal Sebah. Numbers, Constants and Computation. This provides the recursive definition of Bernoulli numbers.

Knuth, Donald E. The Art of Computer Programming, Volume 2, Seminumerical Algorithms, Addison-Wesley, 1969. Section 3.3.2. covers empirical testing. Section 3.3.1. covers the Chi-squared test for fit between actual and expected frequency distributions.

The netlib.org web site for the following algorithms: 291 for the ln(Γ(x)) function, 309 for the ln(Γ(x)) function, 435 for the incomplete γ(a,x) function, 654 for the incomplete γ(a,x) function, 294 for a random number generator, 266 for a random number generator, Special Functions, Cheney-Kincaid random number generator.

Oliphant, Travis. cephesmodule.

Press, William H., et. al., Numerical Recipes in C Cambridge University Press, 1993. The C-language edition is closest to Python, since Python uses the C-language libraries. More information at Numerical Recipes. Note that the Cambridge University Press copyright restrictions allow individual use of the C-language source, but preclude posting a "derivative work" based on this source.

Ritter, Terry. Chi-Square Bias in Runs-Up/Down RNG Tests. for additional analysis of the expected distribution that applies to the runs up test.

Strangman, Gary. stats.py.

Design Considerations

Requirements Review

In this module, we are using statistical tests to evaluate the quality of random number generators. The usual sense of "testing" in a software context is to evaluate some software for correct behavior. When we talk about the behavior of a random number generator, we will call this testing. When we talk about behavior of this software, used for testing random number generators, we will call this validation. The intent is to separate the software under test from validation of the test procedure itself.

There a number of simple use cases for this module.

  1. Execute the 11 tests on a random number generator, showing the actual metric and the allowed range of results. A generator may not pass all 11 tests for a single sequence of numbers; and this is consistent with the basic statistical theory that we want our results to establish that the probability of a generator producing random results is between 0.10 and 0.90. A simple extension to this use case is to support testing a number of generators in a batch.
  2. Execute a validation method that demonstrates that the evaluation algorithms are implemented correctly.

Additionally, the following non-functional requirements strongly influence the design.

  1. Optimize adaptability by treating each test as a kind of Command class hierarchy. Each subclass has an identical interface allowing for executing the test, reporting results and performing the software validation.
  2. Package this test module so that any number or combination of generators can be tested by a separate main program that simply imports this module.
  3. Package the test module so that it also stands alone as a test of the built-in random number generator.
  4. Package the statistical functions separately so that a different statistical library could be used for more accuracy or higher performance.
  5. Write the entire suite in pure Python with as few potential copyright infrigements as possible.

The balance of this section reviews the initial design decisions that lead to the basic structure of the application. This is followed by details of data representation and generating the set of samples required for a test.

The following section provides the implementation of the rngtest module. The statistical functions that support the rngrest module are provided in the associated chi2stats module.

Both the test and statstics modules depend on some functions that are described in various sources as "special functions". These are functions beyond the simple logarithmic and trigonometric functions that are provided in the portable C math module supplied with Python. These are also packaged in the chi2stats module. These versions of the special functions are based on sources thought to be free of copyright restrictions that might prevent a derivative work.

Notation

We present the design decisions using the pyWeb source notation. This is a very simple language for preparing a source file. pyWeb then tangles the output program files, plus weaves a document file from the source.

The following pyWeb commands are used below.

@o file @{...@}
This names an output program file and provides a chunk of code that is sent to the file.
@d name @{...@}
This provides a named chunk of program source. The name can be elided with "...". A @d name that matches up to the "..." will reference this chunk.
@<name@>
This references a named chunk of program source, saying "put the chunk here" when tangling a program file.

Initial Design Decisions

There are a number of ways we can structure this collection of eleven nearly identical tests.

The Monolith

One possible design for this program is to write out all eleven tests as a single, long, Python main program. The resulting program would have this kind of outline.

@o monolith.py @{
@<code for test1 actual, expected and χ²@>
@<code for test2 actual, expected and χ²@>
etc. for 9 more tests
@}

This has the advantage of minimal language overhead: it has neither classes nor functions. It has the disadvantge of nine duplicate copies of the χ² function, plus numerous copies of factorial, the binomial coefficient and other special functions.

This "monolithic" approach is now widely deprecated in favor of an approach that reuses elements wherever possible. The most obvious candidates for reuse are those elements that are clearly mathematical functions: χ², factorial, the binomial coefficient, and others.

Many Functions

Using Python function declarations for mathematical functions can easily be extended to a design where each complete test is implemented as a function that creates the actual distributions and an associated function that creates the expected distributions.

This leads to a design like the following:

@o functions.py @{
def test1Actual( samples ):
    @<body of test1@>
def test1Expected():
    @<test1 expected values@>
def test2Actual( samples ):
    @<body of test2@>
def test2Expected():
    @<test2 expected values@>
@<main test procedure to pull all the functions together@>
@}

@d main test procedure... @{
samples= [ SomeRNGFunction() for i in range(10000) ]
a1= test1Actual(samples)
e1= test1Expected()
print "test1"
print chi2( a1, e1 )
a2= test2Actual(samples)
e2= test2Expected()
print "test2"
print chi2( a2. e2 )
etc. for 9 more tests.
@}

Packaged Functions

Python permits using a function as a first-class object; functions can be collected into sequences or tuples. (Other languages, including Java, would require wrapping a function in a class for packaging.) We can exploit this features to create a list of tuples that associates the actual and expected function. This reduces the main program to a simple iteration through the list of actual and expected value functions.

This alternative might look like the following:

@d main test procedure... @{
suiteList= ( ("test1",test1Actual,test1Expected), 
  ("test2",test2Actual,test2Expected) 
  etc. for 9 more tests
  )
samples= [ SomeRNGFunction() for i in range(10000) ]
for n,t,e in suiteList:
    actual= t( samples )
    expected= e()
    print n
    print chi2(actual,expected)
@}

Using simple functions has the disadvantage of creating a forest of 22 functions (two groups of 11), that are loosely tied by their names, or comments into test suites. Since the actual and expected functions are very tightly coupled, it seems best to bind these two functions into a single class.

Class Definitions

Each test has two functions: one to generate actual results, and one to generate expected results. We can combine these two functions into a single class; doing this means that the overall structure of each test can be factored into a method of a superclass. This overall test control can calls the actual and expected functions, and also calls the χ² function.

When examining the individual tests, we note that almost all tests have the same basic test procedure: generate samples, compute actual, compute expected, produce report and χ² metric. We can factor out this outline into a template method for all tests. The functions for performing the test and computing the expected values will be called by the overall template method.

This pattern of an overall test algorithm with the details supplied by each subclass is an example of the Template design pattern. This design pattern provides some further guidance on how to proceed.

This overall test procedure outline is a good candidate for assignment of class responsibilities. We can define an empirical test class that contain the template for running a test, a test algorithm and an expected values algorithm. Not all of these tests use the chi-squared (χ²) metric to compare actual frequencies with expected frequencies. The Runs Up Test with dependent runs and the Serial Correlation Test do not use χ² measurement. The two tests that differ can be defined as subclasses that a different template algorithm, unique to the two tests.

This gives us a next evolution of the design.

@o classes.py @{
class EmpiricalTest:
    @<constructor@>
    def run( self, someSamples ):
        # The template, which relies on testFreq() and expectedFreq()
        self.actual= self.testFreq( someSamples )
        self.expected= self.expectedFreq( someParameters )
        self.chi2= chi2stats.chi2( actual, expected )
    def testFreq( self, someSamples ): 
    	pass
    def expectedFreq( self, someParameters ):
    	pass
    @<other features@>

@<subclass definitions for each test@>

def main():
    samples= [ SomeRNGFunction() for i in range(10000) ]
    t= Test()
    t.run( samples )
@}

Subclass Design

Given the initial EmpiricalTest class design, the following design could be used for each individual test subclass. This reduces the definition of each individual test to the test algorithm and expected value algorithm. These two details are plugged into the overall template.

@d subclass definitions... @{
class ActualTest( EmpiricalTest ):
    def testFreq( self, someSamples ): 
    	Actual Test Algorithm
    def expectedFreq( self, someParameters ):
    	Actual Expected Value Algorithm
@}

For the Serial Correlation Test and the Runs Up Dependent Test, the overall run() method must be replaced. Since these tests don't use the χ² metric, they would have a subclass that looks like the following:

@d subclass definitions... @{
class SpecialTest( EmpiricalTest ):
    def run( self, someSamples ):
        Test, not based on χ² metric
@}

Another reusable design element is the frequency distribution table used for χ² measurement. This object would store actual frequencies and expected frequencies. It could then produce a columnar report comparing actual with expected and compute the χ² fit between the distributions. While this is not used for all tests, this is a good candidate for assignment of class responsibilities.

This adds to our design, seamlessly.

@d constructor @{
    def __init__( self, someParameters ):
    	self.myName= "test name"
        self.myParams= someParameters
        self.actual = []
        self.expected = []
@}

@d other features @{
    def report( self ):
        print self.myName
        for a,e in zip(self.actual,self.expected):
            print a,e
        print self.chi2
@}

Frequency Table Represenation

Given the rough outline for the EmpiricalTest class, above, we now have a data representation decision: what type should a frequency table be? On the one hand, a sequence of frequencies, indexed by the value being counted is a possibility. We can easily index the frequency sequence by the random sample value. For many of the tests, we will have a derived value like a run length that is an obvious numeric index. For poker hands or permutations, we need to construct an integer index for the sequence.

The following is an example of an index used on the value of the random sample. For example, if we have a sequence random samples, samples over the domain 0 to 16, we can create the following simple frequency table.

fq= 16 * [0]
for u in samples:
    fq[u] = fq[u] + 1

This works becuase our frequency table index matches the domain for our random samples. If our domain is poker hands, however, we may not have such a simple numeric index for each poker hand. A dictionary could be used with an explicit value as the dictionary key. This allows us to index by a poker hand name or by a permutation identifier. We would not have to transform these more abstract objects into an integer.

The following is an example of using a dictionary for a frequency table.

fq= {}
for u in samples:
    fq[u]= fq.setdefault(u,0) + 1

A third choice is to create a unique class to handles the various methods for a frequency distribution. However, almost all of these methods would simply delegate operations to an underlying dictionary or sequence.

Note that some tests require an extra bucket is added to carry counts of "all others". For example, the runs test and gap test use this for runs or gaps that are of statistically insignificant lengths. The extra bucket, however, is often treated identically to the other five, making the sequence a good candidate.

When using a dictionary, we can label the "all others" bucket with a key other than a simple integer. We can also use a dictionary's items() function, which extracts a sequence of (key,value) tuples. For example in the runs or gap test, these would be (run-length,frequency) tuples.

Since the Knuth analysis is centered on simple sequences, we will focus on this representation for frequency tables. This simplifies matching this Python program with Knuth's carefully crafted algorithms and MIX programs.

The basic responsibilities for this object is to collect the frequency information, and participate in χ² analysis. A simple, reusable χ² function would most likely operate on sequences. Either a simple sequence or a dictionary can meet these responsibilities without creating a new class.

Generating Samples

Part of each empirical test is creating a sequence of random numbers that are partitioned into a given domain. Different tests, however, work with differently-sized domains. An appropriate domain must be created for each test.

One approach to creating a unique domain for each test, is to start with a single sample of continuous random values and partition these into the domain appropriate to the test. Another approach is to settle on a domain suitable for all of these tests. This application takes a third approach of generating a fresh set of samples in the appropriate domain for each test. The point is to test the generator, not a specific set of samples.

Three different domains are used for these tests:

rngtest Module

Design Overview

This module contains numerous functions used to support the various emprical tests. The empirical tests are defined as a subclass hierarchy. A stand-alone application program is included to test the built-in random number generator.

This module can also be incorporated by a script that tests a number of generators. In that case the calling script would use the various external interface functions of this module.

The external interface functions will support several features:

Packaging

The following header provides some comments for the rngtest.py file

"rngtest.py" (3) =



"""Empirical Tests of Random Number Generators.

Based largely on Knuth, this application provides 11 tests
for a random-number generator.
"""
rngtest edit warning (4) 
rngtest imports, including the chi2stats module (5) 

EmpiricalTest class definition for each test (6) (7) (8) (9) (10) (11) (12) 

# The individual tests, defined as subclasses of Empirical Test
Frequency Test Class: equal frequencies for all values? (13) 
Serial Test Class: equal frequencies for all serial pairs? (18) 
Gap Test Class: random length gaps between values that fall into a given range? (23) 
Poker Test Class: correct distribution of all poker hands? (28) 
Coupon Test Class: random lengths of the runs that contain all values? (33) 
Permutation Test Class: all permutations present in serial groups? (38) 
#Runs Up Tests: random length runs where values are ascending?
Runs Up Dependent Class: where runs are statistically dependent (43) 
Runs Up Independent Class, Large Domain, from Knuth (49) 
Runs Up Independent Class, Small Domain, from Ritter (54) 
Max of T Test: random distribution of local maxima in groups? (59) 
Serial Correlation Test: expected correlation coefficient between pairs of values? (64) 

External Interface: run all tests on a given generator (68) (69) (70) (71) (72) (73) 

"rngtest.py" (3).

The edit warning reminds people that the source .w file should be changed, not the .py file that is created by pyWeb.

rngtest edit warning (4) =



### DO NOT EDIT THIS FILE!
### It was created by ../pyWeb/pyweb.py, __version__='$Revision$'.
### From source rngtest.w modified Sat Mar 15 12:27:50 2003.
### In working directory '/Users/slott/Documents/Projects/rngtest'.

rngtest edit warning (4). Used by rngtest.py (3).

This package depends on very little from the standard Python packages. It makes extensive use of functions in the chi2 module.

rngtest imports, including the chi2stats module (5) =



from math import sqrt
import random
import chi2stats

rngtest imports, including the chi2stats module (5). Used by rngtest.py (3).

Testing Framework Definitions

Design

Each of the tests has a number of common features. All of these common features are collected into this superclass. Each individual test inherits these common features, and introduces algorithms unique to the individual test for computing the actual frequency distributions and expected frequency distributions.

The following common features are provided for all tests.

Each subclass must provide the following methods.

Implementation

There are several attributes of this class.

EmpiricalTest class definition for each test (6) =



class EmpiricalTest:
    """A single empirical test of a random number generator.
    
    EmpiricalTest( aGenerator, samples ) -- creates the test object
    run() -- generate samples, measure, generate expected, compute result
    report() -- print results on stdout
    final() -- tuple of (testname, result, low, high)
    validate() -- 1 if test produces expected results
    """
    def __init__( self, generator, samples=100000 ):
        self.generator= generator; """the sample generator"""
        self.actual= []; """the actual frequencies"""
        self.expected= []; """the expected frequencies"""
        self.range= ( 0, 0 ); """the range of chi-squared values"""
        self.result= None;  """the final metric result"""
        self.titles= ( "", "", "" ); """titles for the metric, low and high"""
        self.testName= "";  """the name of this test"""
        self.nSamples= samples; """number of samples to generate"""
        self.domain= 16; """size of domain for the samples"""

EmpiricalTest class definition for each test (6). Used by rngtest.py (3).

The generate() method creates the samples that are evaluated. This will be a sequence with the correct number of samples of the correct domain from the supplied generator.

The basic generator function is a kind of replaceable strategy for all of these empirical random number tests. Any generator can be supplied; the test will create an evaluate samples from the generator.

This implementation uses a list comprehension to generate the set of values. For large sample sizes, this is an extravagant use of memory; a more efficient algorithm might be to generate one sample at a time, and give each sample to the empirical test.

EmpiricalTest class definition for each test (7) +=



    def generate( self ):
        return [ self.generator(self.domain) for i in range(self.nSamples) ]

EmpiricalTest class definition for each test (7). Used by rngtest.py (3).

The run() method is basic algorithm for most tests. The samples are created in the variable u. The actual frequencies are developed using the testFreq() method; then the expected frequencies are developed using the expectedFreq() method. The two are compared to create the final χ² metric. This must be overridden for one variation on the runs up test and the serial correlation test.

EmpiricalTest class definition for each test (8) +=



    def run( self ):
        """Run the test."""
        u= self.generate( )
        self.actual= self.testFreq( u, self.domain )
        self.expected= self.expectedFreq( self.nSamples, self.domain )
        self.result= chi2stats.chi2( self.actual, self.expected )

EmpiricalTest class definition for each test (8). Used by rngtest.py (3).

The abstract methods testFreq() for executing the test and expectedFreq() for computing the expected values are defined below. These must be overridden by a subclass. In the case of the Serial Correlation test, which doesn't use the χ² metric, the run() method is overridden, and these are not used.

EmpiricalTest class definition for each test (9) +=



    def testFreq( self, u, domain ):
        """Return the actual frequency table for a set of samples over a given domain."""
        return []
    def expectedFreq( self, nSamples, domain ):
        """Return the expected frequency table for a number of samples over a given domain."""
        return []

EmpiricalTest class definition for each test (9). Used by rngtest.py (3).

The report() method displays actual vs. expected values. This must be overridden for one variation of the runs up test and the serial correlation test; these two tests don't use the same kind of expected value lists that the other tests use.

EmpiricalTest class definition for each test (10) +=



    def report( self ):
        """Print the details of the test."""
        print self.testName
        print "%s\t%s\t%s" % self.titles
        for l in range(len(self.expected)):
            a,e= self.actual[l], self.expected[l]
            print "%d\t%8.1f\t%8.1f" % (l,a,e)

EmpiricalTest class definition for each test (10). Used by rngtest.py (3).

The final() function returns a tuple with the final metric value and the acceptable range. If the test has been run, simply return the metric. If the test has not been run yet, execute the test procedure and then return the metric.

EmpiricalTest class definition for each test (11) +=



    def final( self ):
        """Return a tuple of (name,metric,low,high)"""
        if self.result == None:
            self.run()
        return ( self.testName, self.result ) + self.range

EmpiricalTest class definition for each test (11). Used by rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. Each subclass must override this.

EmpiricalTest class definition for each test (12) +=



    def validate( self ):
        """Validate the test algorithm against known results"""
        return 0

EmpiricalTest class definition for each test (12). Used by rngtest.py (3).

Frequency Test

Overview

The frequency test develops the frequency distribution from a sequence of discrete integers over some domain. We compare this actual distribution against the expected distribution, using the χ² test. The expected distribution is the number of samples divided by the size of the domain.

We generate a sequence u with 100,000 samples with a domain of 16 values, 0 <= u[i] < 16. Each bucket has an expected frequency of 100,000/16 occurrences.

With a domain size of 16, we have 15 degrees of freedom, the χ² value must fall between 7.261 and 25.00.

Implementation

The FreqTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Frequency Test Class: equal frequencies for all values? (13) =



class FreqTest( EmpiricalTest ):
    """Test frequency distribution."""
    frequency test constructor (14) 
    frequency test procedure (15) 
    frequency test expected values (16) 
    frequency test validation (17) 

Frequency Test Class: equal frequencies for all values? (13). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation.

frequency test constructor (14) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Frequency Test"
    self.range= ( 7.261, 25.00 )
    self.titles= ( "value","actual","expected" )
    self.domain= 16

frequency test constructor (14). Used by Frequency Test Class: equal frequencies for all values? (13) :: rngtest.py (3).

The testFreq() function accepts a list of discrete values over a given domain and counts the frequency of those values. The result is a sequence, indexed by the domain values, containing the actual frequencies.

We must be given the size of the domain in order to correctly allocate the frequency table. While it is possible to discover the size of the domain, it increases the processing overhead dramatically with little benefit.

frequency test procedure (15) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    freqTable = d*[0]
    for i in list:
        freqTable[i] += 1
    return freqTable

frequency test procedure (15). Used by Frequency Test Class: equal frequencies for all values? (13) :: rngtest.py (3).

The expectedFreq() function computes the expected values for this kind of distribution from the total size of the sample set divided by the number of buckets. Since the value is the same in each bucket, we multiply the domain size (d) by a sequence with the expected value (samples/d). This creates a sequence that repeats the single expected value.

frequency test expected values (16) =



def expectedFreq( self, nSamples, domain ):
    """Return the expected frequency table for a number of samples over a given domain."""
    return domain*[ nSamples/float(domain) ]

frequency test expected values (16). Used by Frequency Test Class: equal frequencies for all values? (13) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate a sequence with 10 of each value from the domain. We then assure that the frequencies are precisely as expected.

frequency test validation (17) =



def validate( self ):
    """Validate the test algorithm against known results"""
    samples = [ i for i in range(self.domain) ] * 10
    act= self.testFreq( samples, self.domain )
    exp= self.expectedFreq( len(samples), self.domain )
    return act == exp

frequency test validation (17). Used by Frequency Test Class: equal frequencies for all values? (13) :: rngtest.py (3).

Serial Test

Overview

The serial test assures that serial pairs of values in the sequence are independent. Each pair of values, (q,r), comes from some domain of size d, 0 <= q < d and 0 <= r < d. The probability of seeing any given pair is 1/d². We compare the actual distribution against this expected distribution, using the χ² test.

We generate a sequence u with 100,000 samples (50,000 pairs) with a domain of 8 values, 0 <= u[i] < 8; there are 64 buckets. Each bucket has a frequency of 50,000/64 occurrences.

With a domain size, d, of 8, d² is 64, therefore, we have 63 degrees of freedom. The χ² value must fall between 45.74 and 82.53.

Implementation

The SerialTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Serial Test Class: equal frequencies for all serial pairs? (18) =



class SerialTest( EmpiricalTest ):
    """Test serial pair frequency distribution."""
    serial test constructor (19) 
    serial test procedure (20) 
    serial test expected values (21) 
    serial test validation (22) 

Serial Test Class: equal frequencies for all serial pairs? (18). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation.

serial test constructor (19) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Serial Test"
    self.range= ( 45.74, 82.53 )
    self.titles= ( "pair","actual","expected" )
    self.domain= 8

serial test constructor (19). Used by Serial Test Class: equal frequencies for all serial pairs? (18) :: rngtest.py (3).

The testFreq() function accepts a list of discrete values over a given domain with size d. The list is taken as a sequence of pairs; the function counts the frequency of each pair. The result is a sequence indexed by values from 0 to d² with the frequencies.

Since we know the size of the domain is d, the size of the frequency table is d². We have two choices for indexing into the frequency table. One choice is to represent each pair as Python tuple, which can then be the index into a dictionary. Another choice is to compute a unique index for each pair (q,r) in the frequency table. We can evaluate a polynomial that returns a unique value for each combination of q and r where 0 <= q < d and 0 <= r < d. The index, i, is computed as i = q × d + r.

serial test procedure (20) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    freqTable = (d*d)*[0]
    for i in range(len(list)/2):
        q, r = list[2*i], list[2*i+1]
        freqTable[q*d+r] += 1
    return freqTable

serial test procedure (20). Used by Serial Test Class: equal frequencies for all serial pairs? (18) :: rngtest.py (3).

The expectedFreq() function computes the expected values for this distribution as the total size of the sample set divided by the number of buckets. Since the value is the same in each bucket, we multiply d² by a sequence with the expected value (samples/d²). This creates a sequence that repeats the single expected value.

serial test expected values (21) =



def expectedFreq( self, nSamples, domain ):
    """Return the expected frequency table for a number of samples over a given domain."""
    return (domain**2) * [ nSamples/(2*float(domain)**2) ]

serial test expected values (21). Used by Serial Test Class: equal frequencies for all serial pairs? (18) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case we generate a sequence with 10 copies of each possible pair of values from the domain. When then assure that the frequencies are precisely as expected.

serial test validation (22) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(self.domain):
     for j in range(self.domain):
        samples.append(i)
        samples.append(j)
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   exp= self.expectedFreq( len(samples), self.domain )
   return act == exp

serial test validation (22). Used by Serial Test Class: equal frequencies for all serial pairs? (18) :: rngtest.py (3).

Gap Test

Overview

The gap test locates the lengths of gaps between groups of values that lie within a given range. For instance, a run of values u[j], u[j+1], ..., u[j+r] could lie within some defined range, but u[j-1] and u[j+r+1] do not lie within the defined range. This run of values is called a "gap of length r".

We will choose the gap to be the values between zero and the midpoint of a domain. This can also be called a "runs below the mean" test. The various expected value probabilities are .5, .5², .5³, ... Typically, we count only gaps up to some given maximum length. Given the sample size, each gap length frequency bucket should have more than 5 entries.

The length of the frequency table can depend on the number of gaps desired. Since the number of occurrences must be greater than 5, a cutoff is required to pool the long gaps into a single bucket. Since we are using the "below the mean" gaps, the probabilities are simple inverse powers of 2: .5, .25, .125, etc. If we ask for 1000 gaps (nGaps = 1000), a length of 7 occurs 1000/27 = 7.8 times. Longer runs are not frequent enough for this analysis. We will, therefore, use a frequency table with 8 entries for lengths of 0 to 6, with 7 meaning a length of 7 or more.

The probability of a number in the range is p, in our case this is 0.5. The probability of a number outside the range is 1-p.

The probability of a gap of length 0 (r=0) is p. A number in the range is followed by another number in the range.

The probability of a gap of length 1 (r=1) is p × (1-p). Here a number in the range is followed by a number not in the range and a number in the range.

This pattern of a run of numbers not in the range followed by a number in the range leads to the following general formula for a gap of a given length:

The probability of a gap with a run of r values is p(1-p)r.

The probability of a gap with a run of greater than r values is (1-p)r.

Run lengths, r, are 0 <= r < 7. We will use a run length of 7 to mean a gap of length 7 or more.

Since we have 7 degrees of freedom, the χ² values must be between 2.167 and 14.07.

Implementation

The GapTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Gap Test Class: random length gaps between values that fall into a given range? (23) =



class GapTest( EmpiricalTest ):
    """Test distribution of gap lengths between values that fall in a given range."""
    gap test constructor (24) 
    gap test procedure (25) 
    gap test expected values (26) 
    gap test validation (27) 

Gap Test Class: random length gaps between values that fall into a given range? (23). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation. The number of gaps that will be found is nGaps.

gap test constructor (24) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Gap Test"
    self.range= ( 2.167, 14.07 )
    self.titles= ( "gap size","actual","expected" )
    self.domain= 16
    self.nGaps= self.nSamples/10

gap test constructor (24). Used by Gap Test Class: random length gaps between values that fall into a given range? (23) :: rngtest.py (3).

Knuth gives details in Algorithm G of section C of part 3.3.2.

The testFreq() function takes a random sequence, a domain size (d) and a target number of gaps (nGaps). It locates the lengths of "below the mean" (< d/2) runs. It counts up the runs until it finds the desired number of gaps. It returns a sequence with the frequency of the gaps indexed by the length of the gaps.

The length of the frequency will be 8 entries for lengths of 0 to 6, with 7 meaning a length of 7 or more.

The desired number of gaps should, therefore, be somewhat less than 1/8 the total size of the sample. If nGaps is close to samples/8, it is possible that the list could be exhausted before the desired number of gaps are found. In this case, an exception is raised.

The expected value analysis is based on the number of gaps, not the original size of the sample.

The following algorithm uses the variable j to iterate through the set of random samples. The variable s counts the number of gaps that were found. The sequence count accumulates the frequency of the various gaps.

Within the loop, the r variable accumulates the run length of a gap. While the value from the list is within the gap (< d/2), we advance the j index through the sample set, and the r count of the run length.

When we increment the value in the count sequence, we use the min() function to select the smaller of the run length or 7. In this way, run lengths more than 7 are treated as 7.

gap test procedure (25) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    j, s, count = -1, 0, 8*[0]
    while s != self.nGaps and j != len(list):
        j, r = j+1, 0
        while j != len(list) and list[j] < d/2:
            j, r = j+1, r+1
        count[min(r,7)] += 1
        s += 1
    if j == len(list):
        raise "exhausted the sequence found %d out of %d gaps" % ( s, self.nGaps )
    return count

gap test procedure (25). Used by Gap Test Class: random length gaps between values that fall into a given range? (23) :: rngtest.py (3).

The expectedFreq() function computes the expected values for this distribution based on the probability of finding values in the given range and the probability of finding a value not in the given range.

We have set the probability of finding a number in the gap, p, to 0.5 by setting the gap to d/2.

The probability of a gap with a run of r values is p(1-p)r.

The probability of a gap with a run of greater than r values is (1-p)r.

Run lengths, r, are 0 <= r < 7. We will use a run length of 7 to mean a gap of length 7 or more.

gap test expected values (26) =



def expectedFreq(self,nSamples,domain):
    """Return the expected frequency table for a number of samples over a given domain."""
    pd= 0.5
    def probGap( pd, r, nGaps=self.nGaps ):
       return nGaps*pd*(1.0-pd)**r
    def probOth( pd, r, nGaps=self.nGaps ):
       return nGaps*(1.0-pd)**r
    return [ probGap(pd,r) for r in range(7) ] + [probOth(pd,7)]

gap test expected values (26). Used by Gap Test Class: random length gaps between values that fall into a given range? (23) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate sequences that have gaps from 0 to 10 in length, repeated 10 times. This is not the expected frequency for random data, so we compare it against the frequencies for this specific sample data.

gap test validation (27) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(10):
     for j in range(i):
        samples.append(self.domain/2-1)   # value out of the range
     samples.append(self.domain/2) # value in the gap range
   samples = samples * 10
   t, self.nGaps = self.nGaps, 100
   act= self.testFreq( samples, self.domain )
   exp= self.expectedFreq( len(samples), self.domain )
   self.nGaps= t
   return (act == [10, 10, 10, 10, 10, 10, 10, 30] 
   and exp == [50.0, 25.0, 12.5, 6.25, 3.125, 1.5625, 0.78125, 0.78125])

gap test validation (27). Used by Gap Test Class: random length gaps between values that fall into a given range? (23) :: rngtest.py (3).

Poker Test

Overview

The poker test examines 5-value groups from the sequence for the various combinations. There are 5 simple kinds of combinations: 1 value present is a 5-of-a-kind, 2 values present is a 4-of-a-kind or full house, 3 values present is two pairs or a 3-of-a-kind, 4 values present is a simple pair, and 5 distinct values is the remaining combination.

Knuth, in part D of section 3.3.2, and section 1.2.6 provides the expected values for these various combinations. There are dk different possible k-tuples of values in the domain of 0 to d-1. The number of ordered selections with r different elements is given by the product of d × (d-1) × (d-2) × ... × (d-r+1). When there is one different element, all values are alike, the probability of finding this combination is d/dk. When there are two different elements, this is either four of a kind or a full house. The probability is d × (d-1)/dk.

These basic probabilities are multiplied by the Stirling numbers to show all of the ways that k elements can be partitioned into r parts.

There are five possible outcomes (from all alike to all different). Since we have 4 degrees of freedom, the χ² values must be between 0.7107 and 9.488.

Implementation

The PokerTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Poker Test Class: correct distribution of all poker hands? (28) =



class PokerTest( EmpiricalTest ):
    """Test distribution of poker hands in sample groups."""
    poker test constructor (29) 
    poker test procedure (30) 
    poker test expected values (31) 
    poker test validation (32) 

Poker Test Class: correct distribution of all poker hands? (28). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation.

poker test constructor (29) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Poker Test"
    self.range= ( 0.7107, 9.488 )
    self.titles= ( "distinct","actual","expected" )
    self.domain= 16

poker test constructor (29). Used by Poker Test Class: correct distribution of all poker hands? (28) :: rngtest.py (3).

The testFreq() function returns a frequency table created by partitioning a sequence into groups of 5 values and determining which kind of combination this group of 5 contains. The kind of combination is the number of different values in the hand. 1 different rank is a 5-of-a-kind hand. 4 different ranks means one pair was found. The domain size, d, is used to optimize identification of the number of distinct values.

If we have a sequence vUnique that has a 0 for each value in the domain, we can then set a 1 for each value that occurs in the hand of five. If all five cards are the same, then vUnique will have a single 1 for the value that is repeated 5 times. if all five cards are different, there will be five 1's in vUnique. The sum of the values will be the number of distinct values in the hand.

There are two approaches to creating this sequence of values. One is to create an explicit list of zeroes as large as the domain using vUnique = d*[0]. Once this has been created, then we use each value in the hand as an index to replace the initial 0's with 1's.

for v in hand:
    vUnique[v]= 1

Another approach is to simply create the entire domain with 0's and 1's using the Python in operator. If the domain value, v, is in the hand, this will be 1, otherwise 0. This can be done in one motion as: vUnique = [ v in hand for v in range(d) ]

poker test procedure (30) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    freqTable= 5*[0]
    for l in range(len(list)/5):
        hand= list[l*5:l*5+5]
        vUnique= [ v in hand for v in range(d) ]
        distinct= reduce( lambda a,b:a+b, vUnique, 0 )
        freqTable[distinct-1] += 1
    return freqTable

poker test procedure (30). Used by Poker Test Class: correct distribution of all poker hands? (28) :: rngtest.py (3).

The expectedFreq() function computes the expected values using Stirling numbers. The basic probability of r different values in hands of size k (usually 5) from a domain of size d is given by the following function.

poker test expected values (31) =



def expectedFreq( self, n, d ):
    """Return the expected frequency table for a number of samples over a given domain."""
    k= 5
    exp= 5*[0]
    for r in range(1,6):
        p= 1.0
        for i in range(r):
            p *= d-i
        exp[r-1]= (n/5)*(p/float(d)**k)*chi2stats.Stirling(k,r)
    return exp

poker test expected values (31). Used by Poker Test Class: correct distribution of all poker hands? (28) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate sequences that have each of the types of poker hands, repeated 10 times. This is not the expected frequency for random data, so we compare it against the frequencies for this sample data.

poker test validation (32) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(5):
     for j in range(5):
        if i < j: samples.append(0)
        else: samples.append(j)
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   exp= map( int, self.expectedFreq( len(samples), self.domain ) )
   return (act == [10, 10, 10, 10, 10]
   and exp == [0, 0, 4, 20, 24])

poker test validation (32). Used by Poker Test Class: correct distribution of all poker hands? (28) :: rngtest.py (3).

Coupon Collector's Test

Overview

The coupon collector's test determines how long a sequence of random values are required to collect a complete set of values over the domain. Typically, this test will examine sequences of up to approximately 40 values before collecting a complete set of all 8 distinct values in the domain. Clearly, the sequence must be at least as long as the domain, with short sequences being unlikely, and very long sequences also being unlikely.

Knuth, in part E of section 3.3.2 provides the expected values for these various combinations.

We will limit our sequences to 40 values to collect all coupons from a domain of 8 values, 0 <= u[i] < 8. No sequence can be shorter than 8 elements. And we will collect all longer sequences into bucket 40. This gives us 32 different lengths for collecting a complete set of coupons. Since we have 31 degrees of freedom, the χ² values must be between 19.28 and 44.99.

Implementation

The CouponTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Coupon Test Class: random lengths of the runs that contain all values? (33) =



class CouponTest( EmpiricalTest ):
    """Test distribution of lengths of runs that contain all values from the domain."""
    coupon test constructor (34) 
    coupon test procedure (35) 
    coupon test expected values (36) 
    coupon test validation (37) 

Coupon Test Class: random lengths of the runs that contain all values? (33). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation. The number of sequences that will be found is nSeq.

coupon test constructor (34) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Coupon Collector's Test"
    self.range= ( 19.28, 44.99 )
    self.titles= ( "run","actual","expected" )
    self.domain= 8
    self.nSeq= self.nSamples/25

coupon test constructor (34). Used by Coupon Test Class: random lengths of the runs that contain all values? (33) :: rngtest.py (3).

The testFreq() function performs the coupon collector's test, returning a table of frequencies of runs that contain a complete selection of values from the domain, d. At most nSeq sequences will be examined for the set of values.

Knuth gives details in Algorithm C of section E of part 3.3.2.

We use a sequence, occurs, to check for the occurrence of each value in the domain. As a value is noted in occurs, the variable q is incremented. When q equals d (the size of the domain) all of the flags in occurs have been set.

The variable r determines the length of a run. We increment an element of the count sequence using the min() function of either r and 39, to select the value that is smaller. In this way, runs of length greater than or equal to 39 are collected into a single bucket.

coupon test procedure (35) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    j, s, count = -1, 0, 40*[0]
    while s != self.nSeq and j != len(list):
        q, r, occurs = 0, 0, d*[0]
        while j != len(list) and q != d:
            j, r = j+1, r+1
            while j != len(list) and occurs[list[j]]:
                j, r = j+1, r+1
            occurs[list[j]], q = 1, q+1
        count[min(r,39)] += 1
        s += 1
    if j == len(list):
        raise "exhausted the sequence"
    return count

coupon test procedure (35). Used by Coupon Test Class: random lengths of the runs that contain all values? (33) :: rngtest.py (3).

The expectedFreq() function computes the expected values using Stirling numbers. The basic probability of a run of length r to accumulate a complete set of values from a domain of size d is given by the following expectedCoupon() function.

coupon test expected values (36) =



def expectedFreq( self, n, d ):
    """Return the expected frequency table for a number of samples over a given domain."""
    def probRun( d, r, nSeq=self.nSeq ):
        return nSeq*(chi2stats.fact(d)/float(d)**r)*chi2stats.Stirling(r-1,d-1)
    def probOth( d, r, nSeq=self.nSeq ):
        return nSeq*(1.0 - (chi2stats.fact(d)/float(d)**(r-1))*chi2stats.Stirling(r-1,d))
    return d*[0] + [ probRun(d,r) for r in range(d,39)] + [probOth(d,39)]

coupon test expected values (36). Used by Coupon Test Class: random lengths of the runs that contain all values? (33) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate varying length sequences that contain each of the domain values, repeated 10 times. This is not the expected frequency for random data, so we compare it against the frequencies for this sample data.

coupon test validation (37) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(45):
       j= 0
       while j < self.domain-1:
          samples.append(j)
          j += 1
       while j < i-1:
          samples.append(0)
          j += 1
       samples.append(self.domain-1)
   samples = samples * 10
   t, self.nSeq = self.nSeq, 450
   act= self.testFreq( samples, self.domain )
   exp= map( int, self.expectedFreq( len(samples), self.domain ) )
   self.nSeq= t
   return ( act == 8*[0] + [90] + 30*[10] + [60]
   and exp == [0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 7, 12, 
       16, 20, 23, 25, 26, 26, 25, 24, 23, 21, 20, 18, 16, 
       15, 13, 12, 10, 9, 8, 7, 6, 5, 5, 4, 4, 3, 3, 22] )

coupon test validation (37). Used by Coupon Test Class: random lengths of the runs that contain all values? (33) :: rngtest.py (3).

Permutation Test

Overview

The permutation test examines the ordering of a set of t values selected from the random sequence. The frequency of the various alternative orderings are compared against the t! possible orderings of the values. When t = 3, there are 6 possible orderings. This allows us to check larger sequences without the overhead associated with the serial test. If we did a serial test on a sequence of t values from a domain of d, the possible combinations is dt; a number typically larger than t!. By summarizing the combinations by the order of values, we can check the independence of longer sequences of values.

The algorithm picks a subset of size t of the random sequence and determines the relative ordering the subset. The relative ordering is coded as a number, f, from 0 <= f < t!.

Knuth, in part F of section 3.3.2 provides the expected values for these various combinations. We start with 100,000 samples, in groups of 4. There are 24 possible permutations, leading to an expected frequency of 100,000/4/24.

The permTest program identifies the (4!)=24 permutations of length 4. One underlying assumption is that there is a very low probability of duplicate values, so we choose a suitably large domain of 1024 values. With 23 degrees of freedom the expected χ² values must be between 13.09 and 35.17.

Implementation

The PermutationTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Permutation Test Class: all permutations present in serial groups? (38) =



class PermutationTest( EmpiricalTest ):
    """Test distribution of permutations in sample groups."""
    permutation test constructor (39) 
    permutation test procedure (40) 
    permutation test expected values (41) 
    permutation test validation (42) 
    
next permutation function (74) 

Permutation Test Class: all permutations present in serial groups? (38). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, and the domain of random values that is used for evaluation. The size of each sequence that will have the permutation evaluated is seqSize.

permutation test constructor (39) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Permutation Test"
    self.range= ( 13.09, 35.17 )
    self.titles= ( "ordering","actual","expected" )
    self.domain= 1024
    self.seqSize= 4

permutation test constructor (39). Used by Permutation Test Class: all permutations present in serial groups? (38) :: rngtest.py (3).

The testFreq() function computes the actual frequencies for each permutation. We partition the input samples into 4-element groups, saving this in a sequence called u. We then locate the order of values in this subsequence, creating a sequence c that encodes the ordering for the values. We transform this ordering information into an integer, f that represents the unique order for the samples.

Knuth gives details in Algorithm P of section F of part 3.3.2.

permutation test procedure (40) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    t= self.seqSize
    count = chi2stats.fact(t)*[0]
    for i in range(len(list)/t):
        u = list[t*i:t*i+t]
        c = t*[0]
        r = t
        while r > 0:
            s = 0
            for j in range(r):
                if u[j] > u[s]: s = j
            c[r-1]= s
            u[r-1], u[s] = u[s], u[r-1]
            r -= 1
        f= 0
        for j in range(t-1):
            f = (f+c[j])*(j+2)
        f += c[t-1]
        count[f] += 1
    return count    

permutation test procedure (40). Used by Permutation Test Class: all permutations present in serial groups? (38) :: rngtest.py (3).

The expectedFreq() function computes the expected frequencies from the number of sample groups and the number of possible permutations. Since the value is the same in each bucket, we multiply the number of combinations (t!) by a sequence with the expected value (samples/t/t!). This creates a sequence that repeats the single expected value once for each permutation.

permutation test expected values (41) =



def expectedFreq( self, n, d ):
    """Return the expected frequency table for a number of samples over a given domain."""
    perms= chi2stats.fact(self.seqSize)
    return perms*[ n/float(self.seqSize)/perms ]

permutation test expected values (41). Used by Permutation Test Class: all permutations present in serial groups? (38) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate sequences that contain each of the possible permutations, repeated 10 times. We compare this against the expected frequencies.

permutation test validation (42) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   perm = [ 0, 1, 2, 3 ]
   while perm:
       samples += perm
       perm = nextPerm( perm )
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   exp= self.expectedFreq( len(samples), self.domain )
   return act == exp

permutation test validation (42). Used by Permutation Test Class: all permutations present in serial groups? (38) :: rngtest.py (3).

Runs Up Tests

Overview

A runs up test partitions a sequence of random values into subsequences that are strictly increasing. We count the length of each sequence of increasing values, and create a frequency table. We then compare the actual frequencies against expected frequencies to create a χ² measure of the fit between the frequencies.

We limit the analysis to sequences of length 1 to 5, reserving a sixth place in the frequency table for all longer sequences.

Due to the statistical subtleties with this test, there are three slightly different versions. The first version ("Knuth 1") is the one explored in the main text of section 3.3.2 (and exercise 12). This version starts each run up with the value that terminated the previous run up. This makes each run statistically dependent on the preceding run. The second version ("Knuth 2") is the one explored in exercise 14. This version skips the value that ends a run up, making each run statistically independent from the preceding run. The third version ("Ritter") is a slight change to the expected values to allow for selection of values from a relatively small domain instead of a relatively large (nearly infinite) domain.

Dependent Runs Up

The basic algorithm statement is given in section G of part 3.3.2 (page 60) and exercise 12 (page 67).

"Let U0, U1, ..., Un-1 be n distinct numbers. Write an algorithm which determines the lengths of all ascending runs in the sequence, as follows: When the algorithm terminates, count[r] should be the number of runs of length r, for 1 <= r <= 5, and count[6] should be the number of runs of length 6 or more."

The RunsUpDTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq(), run() and report(). The overrides of run() and report() are essential because the χ² evaluation is done differently for this test.

Runs Up Dependent Class: where runs are statistically dependent (43) =



class RunsUpDTest( EmpiricalTest ):
    """Runs of ascending values, statistically dependent on each other."""
    dependent runs up constructor (44) 
    dependent runs up procedure (45) 
    dependent runs up chi-squared test (46) 
    dependent runs up overrides for run() and report() (47) 
    dependent runs up validation (48) 

Runs Up Dependent Class: where runs are statistically dependent (43). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, and the domain of random values that is used for evaluation.

dependent runs up constructor (44) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Runs Up - Dependent"
    self.range= ( 1.145, 11.070 )
    self.domain= 1024

dependent runs up constructor (44). Used by Runs Up Dependent Class: where runs are statistically dependent (43) :: rngtest.py (3).

The testFreq() function partitions the sequence of numbers, returning frequencies as a sequence, indexed by run length.

In order to support indexes from 1 to 6, we ask for 7 elements, numbered 0 through 6. We will ignore the zero element in this sequence.

dependent runs up procedure (45) =



def testFreq( self, list, domain ):
    """Return the actual frequency table for a set of samples over a given domain."""
    last, thisSeqLen= list[0], 1
    seqLen = 7*[0]
    for i in list[1:]:
        if i > last:
            last, thisSeqLen= i, thisSeqLen+1
        else:
            seqLen[min(thisSeqLen,6)] += 1
            last, thisSeqLen= i, 1
        # sum(thisSeqLen) == index in list
    seqLen[min(thisSeqLen,6)] += 1
    return seqLen

dependent runs up procedure (45). Used by Runs Up Dependent Class: where runs are statistically dependent (43) :: rngtest.py (3).

The computation of expected values and the fit between expected and actual is given in section G of part 3.3.2 (page 60) in equations (10) and (11). This calculation gives a statistic that has a χ²-like distribution for six degrees of freedom. Referring to the table on page 39, this means that if the value is between 1.635 and 12.59, we have a reasonably random sequence.

The expectedChi2() function compares actual with expected values, computing the χ² comparison metric.

We represent the matrix, a, as a sequence of sequences with the defined set of values. We represent the vector, b, as a sequence of values. These definitions are taken directly from equation (11).

We perform two nested loops, per equation (10). Since the matrix and the vector are zero-based indexes, we make a small change from the published algorithm, using an index range of 0 <= i,j < 6. Since the sequence of actual frequencies is indexed by the range 1 <= i <= 6, we must add one to i or j when retrieving from the actual frequency sequence.

dependent runs up chi-squared test (46) =



def expectedChi2( self, actual, n ):
    """Return chi-squared metric for runs-up frequencies."""
    a= [ [ 4529.4,  9044.9, 13568.0, 18091.0, 22615.0, 27892.0],
     [ 9044.9, 18097.0, 27139.0, 36187.0, 45234.0, 55789.0],
     [13568.0, 27139.0, 40721.0, 54281.0, 67852.0, 83685.0],
     [18091.0, 36187.0, 54281.0, 72414.0, 90470.0,111580.0],
     [22615.0, 45234.0, 67852.0, 90470.0,113262.0,139476.0],
     [27892.0, 55789.0, 83685.0,111580.0,139476.0,172860.0] ]
    b= [ 1.0/6.0, 5.0/24.0, 11.0/120.0, 19.0/720.0, 29.0/5040.0, 1.0/840.0 ]
    V= 0.0
    for i in range(6):
        for j in range(6):
            V += (actual[i+1] - n*b[i])*(actual[j+1] - n*b[j])*a[i][j]
    return V/n

dependent runs up chi-squared test (46). Used by Runs Up Dependent Class: where runs are statistically dependent (43) :: rngtest.py (3).

Because this expected value function also computes the χ² metric, the superclass run() and report() functions must be overridden.

dependent runs up overrides for run() and report() (47) =



def run( self ):
    u= self.generate( )
    self.actual= self.testFreq( u, self.domain )
    self.result= self.expectedChi2( self.actual, self.nSamples )
def report( self ):
    print self.testName

dependent runs up overrides for run() and report() (47). Used by Runs Up Dependent Class: where runs are statistically dependent (43) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate variable-length sequences that contain a variety of lengths of runs up, repeated 10 times.

dependent runs up validation (48) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(10):
     for j in range(i):
        samples.append( j )
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   return act == [0, 10, 10, 10, 10, 10, 40]

dependent runs up validation (48). Used by Runs Up Dependent Class: where runs are statistically dependent (43) :: rngtest.py (3).

Independent Runs Up (Large Domain)

The basic algorithm is described in part 3.3.2, exercise 14.

First, the basic change here is to make each run up statistically independent from the run that proceeded it. This is done by discarding the value that terminates a run up.

Second, there are two versions of the expected frequency, one presuming a large domain of values, other presuming a small domain of values.

The expected values depend on the number of runs actually found. After computing the actual frequency, and getting an actual number of runs up, we use this to compute the expected distribution for that number of runs up.

Runs Up Independent Class, Large Domain, from Knuth (49) =



class RunsUpIndep1Test( EmpiricalTest ):
    """Runs of ascending values, statistically independent, with a large domain."""
    independent, large domain runs up constructor (50) 
    independent, large domain runs up procedure (51) 
    independent, large domain runs up expected values (52) 
    independent, large domain runs up validation (53) 

Runs Up Independent Class, Large Domain, from Knuth (49). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, and the domain of random values that is used for evaluation and it initializes the number of runs found, nRuns.

independent, large domain runs up constructor (50) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Runs Up - Independent"
    self.range= ( 1.145, 11.070 )
    self.domain= 1024
    self.nRuns= 0

independent, large domain runs up constructor (50). Used by Runs Up Independent Class, Large Domain, from Knuth (49) :: rngtest.py (3).

The testFreq() function partitions the sequence into independent runs up. This list can be analyzed with relatively simple statistics.

This produces a sequence of frequency values for each length from 1 to 6. We will settle on the sequence representation for this particular set of algorithms. In order to support indexes from 1 to 6, we ask for 7 elements, numbered 0 through 6. We will ignore the zero element in this sequence.

There are three conditions for each value of the sequence. If u[i-1] < u[i], then u[i] is part of a run up. If u[i-1] > u[i], then u[i] is the end of a run up, and will be discarded. If u[i-2] > u[i-1], then u[i] is the first value of a run up, with no relationship known with the prior value. This third condition creates our statistical independence, and makes the body of our loop somewhat more complex than the dependent runs up, shown in the Basic Runs Up test.

To track the three conditions, we use a variable, last, that has two different types of values. When last = -1, we are skipping a value, u[i-1]; and we will keep this value, u[i]. When last is not -1, it is the value of u[i-1], which we can compare with u[i]. When last < u[i], we are still in a run up and can increase the length of the run. When last >= u[i], we have finished a run up, we can accumulate the frequency count and reset last to -1; this will skip the current value, u[i], and the next value will begin another run.

Finally, we save the number of runs actually found. We do this by summing the values in the frequency table (using the reduce() function), and saving this in nRuns.

independent, large domain runs up procedure (51) =



def testFreq( self, list, domain ):
    """Return the actual frequency table for a set of samples over a given domain."""
    last= -1
    seqLen = 7*[0]
    for i in list:
        if last != -1:
            if i > last:
                last, thisSeqLen= i, thisSeqLen+1
            else:
                seqLen[min(thisSeqLen,6)]+= 1
                last= -1
        else:
             last, thisSeqLen= i, 1
    seqLen[min(thisSeqLen,6)]+= 1
    self.nRuns= reduce( lambda a,b: a+b, seqLen, 0 )
    return seqLen

independent, large domain runs up procedure (51). Used by Runs Up Independent Class, Large Domain, from Knuth (49) :: rngtest.py (3).

The expectedFreq() function produces the expected distribution probabilities for independent runs up. The expected values for length of runs up given by the standard formula for a relatively large domain.

We compute the expected probabilities of a sequences of length l, where l various from 1 to 6. This is the Knuth expectation, where n, the domain size is quite large, but the runs are independent. This is the 'exercise 14' version.

independent, large domain runs up expected values (52) =



def expectedFreq( self, nSamples, d ):
    """Return the expected frequency table for a number of samples over a given domain."""
    def prob(l,nRuns=self.nRuns):
        return float(nRuns)*(1.0/chi2stats.fact(l)-1.0/chi2stats.fact(l+1))
    return [0] + [prob(l) for l in range(1,6)] + [float(self.nRuns)/chi2stats.fact(6)]

independent, large domain runs up expected values (52). Used by Runs Up Independent Class, Large Domain, from Knuth (49) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate variable-length sequences that contain a variety of lengths of runs up, repeated 10 times.

We create a class-level variable, expResult because the only difference between this procedure and the independent, small domain procedure is the expected result distribution. We can then override this variable declaration, and otherwise use this class for the small-domain version of this test.

independent, large domain runs up validation (53) =



expResult= [0, 500, 333, 125, 33, 6, 1]
def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(10):
     for j in range(i):
        samples.append( j )
     samples.append( i-1 )  # skipped to make runs independent
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   self.nRuns= 1000
   exp= map( int, self.expectedFreq( samples, self.domain ) )
   return act == [0, 10, 10, 10, 10, 10, 41] and exp == self.expResult

independent, large domain runs up validation (53). Used by Runs Up Independent Class, Large Domain, from Knuth (49) :: rngtest.py (3).

Independent Runs Up (Small Domain)

Referring to Ritter's analysis, we find that the nearly infinite, nearly continuous distribution assumptions of Knuth are not true for a limited range of values of the kind we are using for these specific examples.

This version of the Runs Up test provides an expectedFreq() function that produces the expected distribution probabilities for independent runs up. The expected values for length of runs up given by the Ritter formula for a relatively small domain.

Runs Up Independent Class, Small Domain, from Ritter (54) =



class RunsUpIndep2Test( RunsUpIndep1Test ):
    """Runs of ascending values, statistically independent, with a small domain."""
    independent, small domain runs up constructor (55) 
    independent, small domain runs up procedure (56) 
    independent, small domain runs up expected results (57) 
    independent, small domain runs up additional test (58) 

Runs Up Independent Class, Small Domain, from Ritter (54). Used by rngtest.py (3).

The constructor provides the testName, and the domain of random values that is used for evaluation. It inherits all of the other starting values from the parent class, RunsUpIndep1Test.

independent, small domain runs up constructor (55) =



def __init__( self, generator, samples ):
    RunsUpIndep1Test.__init__( self, generator, samples )
    self.testName= "Runs Up - Independent (Ritter)"
    self.domain= 16

independent, small domain runs up constructor (55). Used by Runs Up Independent Class, Small Domain, from Ritter (54) :: rngtest.py (3).

The expectedFreq() function produces the expected distribution for independent runs up. The expected values for length of runs up given by the Ritter formula for a relatively small domain.

independent, small domain runs up procedure (56) =



def expectedFreq( self, nSamples, d ):
    """Return the expected frequency table for a number of samples over a given domain."""
    def prob(d,l):
        return chi2stats.bico(d,l)/float(d)**l-chi2stats.bico(d,l+1)/float(d)**(l+1)
    def probOth(d,l=6):
        return chi2stats.bico(d,l)/float(d)**float(l)
    return [0] + [self.nRuns*prob(d,l) for l in range(1,6)] + [self.nRuns*probOth(d,6)]

independent, small domain runs up procedure (56). Used by Runs Up Independent Class, Small Domain, from Ritter (54) :: rngtest.py (3).

The validate() function executes a the superclass test procedure. The only difference is the different expected results.

independent, small domain runs up expected results (57) =



expResult= [0, 531, 332, 108, 23, 3, 0]

independent, small domain runs up expected results (57). Used by Runs Up Independent Class, Small Domain, from Ritter (54) :: rngtest.py (3).

Testing the Ritter formula

The following driver exercises the Ritter expected values formula. The correct answers are given in the web site.

independent, small domain runs up additional test (58) =



def testRitter():
    """Test the Ritter expected value function."""
    self.nRuns= 1.0
    for r in range(15):
        print expectedFreq(256l,r+1l)

independent, small domain runs up additional test (58). Used by Runs Up Independent Class, Small Domain, from Ritter (54) :: rngtest.py (3).

Maximum of T

Overview

The "Maximum of T" test partitions the original sequence into subsequences of length t, and finds the maximum value in each of these subsequences. This resulting sequence of values can be tested against the expected distribution F(x) = xt. The probability a value being less than or equal to x is P(x). The probability of all t values in a subsequence being less than or equal to x is P(x)t.

The probability of find a maximum at or below the value l in the domain is (l/d)t. The difference (l/d)t - ((l-1)/d)t is the probability of finding a maximum between l and l-1.

Our domain has 16 values that can be maxima in the partitions of size T. In this case, we have 14 degrees of freedom, the χ² value must fall between 6.570 and 23.68.

Implementation

The MaxOfTTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq() and expectedFreq().

Max of T Test: random distribution of local maxima in groups? (59) =



class MaxOfTTest( EmpiricalTest ):
    """Test the distribution of local maxima of groups of size T."""
    max of t test constructor (60) 
    max of t test procedure (61) 
    max of t test expected values (62) 
    max of t test validation (63) 

Max of T Test: random distribution of local maxima in groups? (59). Used by rngtest.py (3).

The constructor provides the testName, the acceptable range of χ² values, the titles for values in the result tuple, the domain of random values that is used for evaluation, and the size, T, of each group to be tested.

max of t test constructor (60) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Max of T Test"
    self.range= ( 6.570, 23.68 )
    self.titles= ( "value","actual","expected" )
    self.domain= 16
    self.T= 8

max of t test constructor (60). Used by Max of T Test: random distribution of local maxima in groups? (59) :: rngtest.py (3).

The testFreq() function partitions the list into subsequences, creating a list of the max of each subsequence.

A new list is created from the original list by partitioning into subsets and taking maximum of each subset. The built-in max() function performs this function for us. We use a list comprehension to assemble a list of the maxima of a range of subsets. Once we have the list of maxima, we can then create the frequency table for this new distribution. Note that this test and the frequency test both share a common algorithm that should be factored into the superclass.

max of t test procedure (61) =



def testFreq( self, list, d ):
    """Return the actual frequency table for a set of samples over a given domain."""
    t= self.T
    newList= [ max( list[i*t:i*t+t] ) for i in range( len(list)/t ) ]
    freqTable= d*[0]
    for v in newList:
        freqTable[v]+=1
    return freqTable

max of t test procedure (61). Used by Max of T Test: random distribution of local maxima in groups? (59) :: rngtest.py (3).

The expectedFreq() function computes the expected frequencies from the probabilities of finding values below a given value in the domain.

max of t test expected values (62) =



def expectedFreq( self, nSamples, d ):
  t= float(self.T)
  return [ (nSamples/t)*((l/float(d))**t-((l-1)/float(d))**t) for l in range(1,d+1) ]

max of t test expected values (62). Used by Max of T Test: random distribution of local maxima in groups? (59) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate fixed-length sequences that has each of the different maximum values, repeated 10 times.

max of t test validation (63) =



def validate( self ):
   """Validate the test algorithm against known results"""
   samples = [ ]
   for i in range(self.domain-self.T+1):
      for j in range(self.T):
          samples.append( j+i )
   samples = samples * 10
   act= self.testFreq( samples, self.domain )
   exp= map( int, self.expectedFreq( len(samples), self.domain ) )
   return (act == [0]*7+[10]*9
   and exp == [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 4, 8, 13, 22, 36])

max of t test validation (63). Used by Max of T Test: random distribution of local maxima in groups? (59) :: rngtest.py (3).

Serial Correlation Test

Overview

The serial tests computes the correlation coefficient between pairs of numbers in the random sequence. The correlation coefficient varies from 1.0 (correlated) to -1.0 (inversely correlated). Around zero means the relationship is random.

The correlation coefficient, C, will be close to zero, but not exactly zero. The desired range is m - 2s <= C <= m + 2s; where m = -1/(n-1) and s = (1/(n-1)) × sqrt( (n(n-3))/(n+1) ).

Implementation

The RunsUpDTest class is a subclass of the EmpiricalTest class, with overridden functions for testFreq(), run() and report(). The overrides of run() and report() are essential because this test doesn't use χ² metric, but instead uses correlation coefficient.

Serial Correlation Test: expected correlation coefficient between pairs of values? (64) =



class SerialCorrelationTest( EmpiricalTest ):
    """Compute the correlation coefficient between subsequences."""
    serial correlation test constructor (65) 
    serial correlation test run and report overrides (66) 
    serial correlation test validation (67) 

Serial Correlation Test: expected correlation coefficient between pairs of values? (64). Used by rngtest.py (3).

The constructor provides the testName, and the domain of random values that is used for evaluation.

serial correlation test constructor (65) =



def __init__( self, generator, samples ):
    EmpiricalTest.__init__( self, generator, samples )
    self.testName= "Serial Correlation"
    self.domain= 16

serial correlation test constructor (65). Used by Serial Correlation Test: expected correlation coefficient between pairs of values? (64) :: rngtest.py (3).

The correlation coefficient is computed by the chi2stats module correlate() function, part of the statistical library supporting this application.

Because this test does not use the χ² evaluation, the superclass functions run() and report() functions must be overridden, also.

serial correlation test run and report overrides (66) =



def run( self ):
    u= self.generate( )
    c= chi2stats.correlate( u[:-1], u[1:] )
    n= float(len(u))
    m= -1/(n-1)
    s= 1/(n-1)*sqrt( (n*(n-3))/(n+1) )
    self.result= c
    self.range= ( m-2*s, m+2*s )
def report( self ):
    print self.testName

serial correlation test run and report overrides (66). Used by Serial Correlation Test: expected correlation coefficient between pairs of values? (64) :: rngtest.py (3).

The validate() function executes a test procedure using known values to prove that the empirical test function is correct. In this case, we generate several fixed-length sequences and exercise the correlation function. We correlate two identical sequences, two sequences that have every number of the domain paired with an opposite value.

serial correlation test validation (67) =



def validate( self ):
  """validate() -> number"""
  s1 = [ i for i in range(self.domain) ]*10
  c1= chi2stats.correlate( s1, s1 )

  s2 = [ self.domain-i for i in range(self.domain) ]*10
  c2= chi2stats.correlate( s1, s2 )
      
  s3 = [ i for i in range(self.domain) ]*10
  s3.sort()
  c3= chi2stats.correlate( s1, s3 )

  return c1==1 and c2==-1 and abs(c3)<0.1  

serial correlation test validation (67). Used by Serial Correlation Test: expected correlation coefficient between pairs of values? (64) :: rngtest.py (3).

External Interface

We can create a sequence that contains the class definitions for each test. Given this sequence, we can then iterate through each EmpiricalTest subclass, creating an instance of the subclass and exercising the specific subclass run(), report() and final() functions. This sequence of EmpiricalTest subclasses is called theTestList.

External Interface: run all tests on a given generator (68) =



theTestList = [ FreqTest, SerialTest, GapTest, PokerTest,
    CouponTest, PermutationTest, RunsUpDTest,
    RunsUpIndep1Test, RunsUpIndep2Test,
    MaxOfTTest, SerialCorrelationTest ]

External Interface: run all tests on a given generator (68). Used by rngtest.py (3).

The validate() method of each EmpiricalTest subclass performs the internal validation operation on that specific test class. An instance of the class is created and the class validate() method is called to confirm that the class operates correctly.

External Interface: run all tests on a given generator (69) +=



def validate( aTestClass, aGenerator, samples ):
   ti= aTestClass( aGenerator, samples )
   print ti.testName, ti.validate()

External Interface: run all tests on a given generator (69). Used by rngtest.py (3).

The detail() function creates a detailed report of results from using a specific test class with a generator. The class is passed as an argument to this function. The detail() function creates the instance of the test, using the generator being tested and the number of samples.

External Interface: run all tests on a given generator (70) +=


         
def detail( aTestClass, aGenerator, samples ):
   ti= aTestClass( aGenerator, samples )
   ti.run()
   ti.report()
   print ti.final()

External Interface: run all tests on a given generator (70). Used by rngtest.py (3).

When comparing multiple generators, it is often desirable to simply collect the summary statistics from each generator. The following interface function to this module makes it importable from another module. A client module can then run the entire suite of tests on a number of generators, gathering the statistical summary.

External Interface: run all tests on a given generator (71) +=



def summary( name="builtin", gen= random.randrange, samples=100000 ):
   s= [name]
   for t in theTestList:
      ti= t( gen, samples )
      s.append( ti.final() )
   return s

External Interface: run all tests on a given generator (71). Used by rngtest.py (3).

For purposes of comparing generators, the following function returns the list of tests and acceptable ranges of results.

External Interface: run all tests on a given generator (72) +=



def tests( ):
   r = []
   for t in theTestList:
      ti= t( lambda x:0, 0 )
      r.append( (ti.testName,) + ti.range )
   return r

External Interface: run all tests on a given generator (72). Used by rngtest.py (3).

The main() function exercises the various empirical tests. It imports the random module and uses the randrange() function as the default generator to test.

As an alternative, this function can be changed to call validate(), and validate that all of the tests are implemented correctly.

External Interface: run all tests on a given generator (73) +=



import random
def main( gen=random.randrange, samples=100000 ):
   for t in theTestList:
      detail( t, gen, samples )

if __name__ == "__main__":
   main()

External Interface: run all tests on a given generator (73). Used by rngtest.py (3).

Special Functions

There is one "special functions" required to support the Permutation Test.

Next Permutation

Synopsis

nextPerm(p) -> sequence

Return the next permutation of sequence p.

Arguments

p
Sequence (list or tuple) of unique elements.

Description

This is the standard next permutation algorithm, as explained by Djikstra in A Discipline of Programming.

next permutation function (74) =



def nextPerm( p ):
   # 1. find right end of increasing order indices
   i= len(p)-2
   while i >= 0 and p[i] >= p[i+1]: i -= 1
   # 2. are we at the end of the permutation cycle?
   if i >= 0:
      # 3. set j to index of largest element between i and the end
      j= i+1
      k= i+1
      while k != len(p):
         if p[k] > p[i] and p[k] < p[j]: j= k
         k += 1
      # 4. swap i (first out of order) and j (largest)
      p[i], p[j] = p[j], p[i]
      # 5. ripple the swap from i+1 to end
      k= 1
      while k+i<len(p)-k:
         p[k+i], p[len(p)-k]= p[len(p)-k], p[k+i]
         k += 1
      return p
   return None

next permutation function (74). Used by Permutation Test Class: all permutations present in serial groups? (38) :: rngtest.py (3).

chi2stats (χ²) Module

Design Overview

The chi2stats module provides the necessary functions for computing a variety of χ² metric values. This includes the basic χ² metric for two frequency distributions, the probability of a χ² metric for given degrees of freedom, and the χ² metric for given degrees of freedom and probability.

This is supported by functions to compute factorials, binomial coefficients, Bernoulli numbers, Stirling numbers, gamma and partial gamma values. These are sometimes referred to as transcendental functions, or special functions.

Additionally, two functions are provided to locate zero crossings in well-behaved functions, like the chi-squared probabilty function. One of these brackets a zero in a function with a range where the sign of the function switches from negative to positive. The other bisects a range that contains a zero crossing to narrow the range of the crossing.

This module is written in pure Python. It duplicates functionality in Travis Oliphant's cephesmodule, but doesn't depend on C-language binaries. It also duplicates functions in Gary Strangman's stats module, but isn't derived from Numerical Recipes in C.

The χ² test compares two frequency distributions. It computes a metric that indicates the probability that the two distributions are the same. The actual value of the metric is compared to the possible vales using the incomplete gamma function, γ(a,x).

The factorial function, which tells how many permutations are possible from a domain of values, is a special case of the complete Gamma function, Γx.

Packaging

The chi2stats module defines a library of useful functions that can be imported by other applications. These functions are organized into four separate areas. They are packaged as a single module to simplify the use of these more advanced math functions.

"chi2stats.py" (75) =


chi2 doc string (76) (77) 
chi2 imports (78) 

special functions: factorial, binomial coefficient, etc. (86) 
zero-finding functions: bracket a zero and find the zero (99) 
chi-squared functions (79) 
correlation function (83) 

Test scenarios (102) 

"chi2stats.py" (75).

The first section defines the "special functions": factorial, binomial coefficient, Stirling numbers, Bernoulli numbers, the log-gamma function and the partial gamma function.

The second section provides two functions that are used to find a zero of another function. One of these brackets a guess at the zero to return a tuple describing a range in which a zero definitely occurs. The other bisects a range, narrowing it around a zero.

The third section provides the implementation of functions that directly apply to problems working with the chi-squared metric. These include a function to compute the chi-squared metric, and two functions for evaluating the results of a chi-squared computation.

The fourth section is the correlation coefficient function.

The module includes a test suite, executed when the module is the main script, that is used to certify the results for a specific port of Python.

The following module header provides the essential __doc__ string for the .py file.

chi2 doc string (76) =


"""Chi-Squared Stats Library.

Functions for handling chi-squared statistics:

chi2(actual,expected)      compute the chi-squared metric for two 
                           distributions.
chi2P(value,degF)          probability that this observed 
                           chi-squared metric value indicates a 
                           difference between actual and 
                           expected
chi2Val(degF,prob)         the chi-squared metric value for the 
                           given degrees of freedom and 
                           probability

Function for computing correlation coefficient:

correlate(u,v)             compute the correlation coefficient
                           for two sample sequences
                           
Special functions:

fact(n) -> number          factorial: permutations of of n things
bico(n,r) -> number        binomial coefficient: combinations 
                           n things taken r at a time
Stirling(n,k) -> number    Stirling number: k-sized subsets of 
                           n elements
Bernoulli(p) -> number     Bernoulli number p
gammln(x) -> number        log of gamma function for x
gammp(a,x) -> number       incomplete gamma function P(a,x)

Root-finding functions:

bracket(f,g) -> (low,high) bracket a zero of the function near g
bisect(f,r,EPS) -> number  find a zero of a function, f, in the 
                           range, r, with tolerance of EPS
"""

chi2 doc string (76). Used by chi2stats.py (75).

This is a good place to slide in the warnings.

chi2 doc string (77) +=



### DO NOT EDIT THIS FILE!
### It was created by ../pyWeb/pyweb.py, __version__='$Revision$'.
### From source chi2stats.w modified Sat Mar 15 12:28:02 2003.
### In working directory '/Users/slott/Documents/Projects/rngtest'.

chi2 doc string (77). Used by chi2stats.py (75).

This module depends on the math module.

chi2 imports (78) =



import math

chi2 imports (78). Used by chi2stats.py (75).

Chi-Squared Functions

There are three chi-squared related functions in this module.

chi-squared functions (79) =



chi-squared value: compare actual vs. expected frequencies (80) 
chi-squared probability: what are odds of a given chi-squared value? (81) 
chi-squared value of a given probability (82) 

chi-squared functions (79). Used by chi2stats.py (75).

Chi-Squared (χ²)

Synopsis

chi2(actual,expected) -> float

Return χ² metric between actual and expected distributions.

Arguments

act
Sequence of actual frequencies
exp
Sequence of expected frequencies

Description

The χ² metric comparse an actual distribution with a theoretical distribution. If the χ² metric is too low, there is a low probability of the distributions having any random variation; a high probability that the distributions are identical. If the metric is too high, there is a low probability of the distributions being the same; a high probability that the distributions have more than simple random differences.

The chi2() function will accept a sequence of actual frequencies and a sequence of expected frequencies. It will return the χ² value for comparison of these two sequences. Both sequences must be the same length and represent frequencies in the same order.

To simplify some of the test algorithms, we skip entries where the expected value is zero. This allows us to have unused entries in a frequency table. For example, the coupon collector's test has a large number of unused entries in the frequency table.

We use the built-in zip() function to interleave the two lists, creating a sequence of tuples of ( actual, expected ). This sequence of tuples is used with the multiple-assignment for loop to assign a value from actual to the variable a, and a value from expected to the variable e. This allows a simple, elegant statement to drive the basic comparison function.

This algorithm can be viewed as a variation based on the built in reduce() function. It could be coded with something like the following:

reduce( lambda v,x:v+x[1] and (x[0]-x[1])**2/x[1], zip(act,exp), 0.0 )

It could also be coded using a list comprehension, as follows:

reduce( lambda a,b:a+b, [ e and (a-e)**2/e for a,e in zip(act,ext) ], 0.0 )

However, these do not seem to simplify or clarify the operation. In particular, the e and ... (and a[1] and...) construct that is used to skip unused entries in the expected frequency distribution is particularly opaque.

chi-squared value: compare actual vs. expected frequencies (80) =



def chi2( act, exp ):
    """Return chi-squared metric between actual and expected observations."""
    v= 0
    for a,e in zip(act,exp):
        if e > 0:
            v += (a-e)**2/e
    return v

chi-squared value: compare actual vs. expected frequencies (80). Used by chi-squared functions (79) :: chi2stats.py (75).

Chi-Squared Probability

Synopsis

chi2P( value, degF ) -> probability

Return the probability of a given χ² value for a number of degrees of freedom, degF, P(value|degF).

This probability, P(value|degF), is defined as the incomplete gamma function, γ(degF/2,value/2).

Arguments

value
The χ² metric value
degF
The degrees of freedom

Description

The chi2P() function evaluates a χ² value, for a given number of degrees of freedom, and returns the probability that the observed distribution matched the expected distribution.

chi-squared probability: what are odds of a given chi-squared value? (81) =



def chi2P( chi2V, degF ):
   """Return P(value|degF) = P(degF/2,value/2)."""
   return gammp( degF/2.0, chi2V/2.0 )

chi-squared probability: what are odds of a given chi-squared value? (81). Used by chi-squared functions (79) :: chi2stats.py (75).

Dependencies

This uses the Incomplete Gamma function, γ(x).

Chi-Squared Value, Given Probability

Synopsis

chi2Val( degF, prob ) -> float

Return the χ² value that corresponds with given degrees of freedom, degF, and probability, prob.

Arguments

degF
Degrees of freedom
prob Probability

Description

The chi2Val() function determines the χ² value at a given probability level. It does this by first bracketing the zeo of the chi2P() function, then bisecting the range looking for the given probability level. This uses a standard bisection algorithm.

chi-squared value of a given probability (82) =



def chi2Val( degF, prob ):
   """Return the chi-squared value that corresponds with given degrees of freedom and probability."""
   theFunction= lambda x,d=degF,p=prob: chi2P(x,d)-p
   return bisect( theFunction, bracket( theFunction, 1 ) )

chi-squared value of a given probability (82). Used by chi-squared functions (79) :: chi2stats.py (75).

Dependencies

This uses the two zero-finding functions: bracket a zero, and bisect a region to find a zero. It also uses the Chi-squared probability function.

Correlation Function

This module provides the standard correlation coefficient function.

Correlation Coefficient

Synopsis

correlate(u,v) -> float

Return the correlation coefficient between sample sets

Arguments

u
Sequence of values
v
Sequence of values

Description

The correlate() function computes the correlation coefficient between two lists, using the standard statistical algorithm.

This function has two internal "helper" functions.

The sum() function computes the sum of a list of values returning a floating point number. The individual values are summed using long numbers to get a precise answer, which is then transformed to a floating point number for further calculation.

The mul() function multiplies corresponding elements in two lists of values, returning a third list.

correlation function (83) =



from math import sqrt
def correlate( u, v ):
    """Return correlation coefficient between sequences u and v."""
    sum of a sequence definition (84) 
    mul of two sequences definition (85) 
    n = float(len(u))
    top = n * sum( mul(u,v) ) - sum(u)*sum(v)
    bottom = sqrt( ( n*sum( mul(u,u) ) - sum(u)**2 )*( n*sum( mul(v,v) ) - sum(v)**2 ) )
    return top/bottom

correlation function (83). Used by chi2stats.py (75).

sum of a sequence definition (84) =



def sum( v ):
    return float(reduce( lambda a,b:a+b, v, 0L ))

sum of a sequence definition (84). Used by correlation function (83) :: chi2stats.py (75).

mul of two sequences definition (85) =



def mul( u, v ):
    return [ u[i]*v[i] for i in range(len(u)) ]

mul of two sequences definition (85). Used by correlation function (83) :: chi2stats.py (75).

Special Functions

There are a number of "special functions" required to support the analysis of expected values for the various empirical tests. These functions are for computing factorial, Stirling numbers, Bernoulli numbers, the binomial coefficient, log of the gamma function and the partial gamma function.

special functions: factorial, binomial coefficient, etc. (86) =


Factorial Function, fact(n) (87) 
Binomial Coefficient, bico(n,r) (88) 
Stirling Numbers, Stirling(n,k) (89) 
Bernoulli Numbers, Bernoulli(p) (90) 
Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (91) (92) (93) (94) (95) 
Incomplete Gamma Function, gammp(a,x) (96) (97) (98) 

special functions: factorial, binomial coefficient, etc. (86). Used by chi2stats.py (75).

Factorial

Synopsis

fact(n) -> number

Return the number of permutations of n things.

Arguments

n
number to compute factorial of

Description

The factorial function computes the number of permutations of a set.

The fact() function will accept a number, n, and compute the factorial of that number. It uses the standard recursive algorithm and long values.

As a speed-up, it caches previously computed factorial values in a dictionary called cache_fact to save recomputing them. We seed this cache with a few base values for factorial.

Factorial Function, fact(n) (87) =



cache_fact = { 0:1L }
def fact( nn ):
    """Return the number of permutations of nn things."""
    n= long(nn)
    if n < 0: raise Exception("invalid argument %s to fact()"%nn)
    if cache_fact.has_key( n ): return cache_fact[n]
    return cache_fact.setdefault(n,n*fact(n-1))

Factorial Function, fact(n) (87). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Binomial Coefficient

Synopsis

bico(n,r) -> number

Return the number of combinations of n things taken r at a time.

Arguments

n
set size
r
selections

Description

The combinations function computes the number of possible combinations of selections of a given size from a larger set.

The bico() function will accept a set size and a selection size and compute the number of combinations of selection from the larger set. It uses the standard algorithm and long numbers.

Binomial Coefficient, bico(n,r) (88) =



def bico( n, r ):
    """Return the number of combinations of n things taken r at a time."""
    if n < 0 or r < 0: raise Exception("invalid arguments %s, %s to bico()"%(n,r))
    return fact(n)/(fact(r)*fact(n-r))

Binomial Coefficient, bico(n,r) (88). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Dependencies

This depends on the Factorial function.

Stirling Numbers

Synopsis

Stirling(n,k) -> long

Return the Stirling number of the second kind for a set with n elements partitioned into k disjoint non-empty sets.

Arguments

n
number of elements
k
number of disjoint, non-empty sets

Description

Knuth, section 1.2.6, provides the definition of Stirling numbers. Essentially this is the number of ways that a set can be partitioned. Formula (42) in section 1.2.6 provides a recursive definition, and formulas (44) and (46) provide some additional information required.

The Stirling() function will accept two parameters, the number of elements, n, and the number of partitions, k, and computes the Stirling number. This uses long values and the recursive algorithm described by Knuth.

As a simple speed-up, any Stirling number that is computed is cached to save recomputing it. Since a dictionary can be indexed by a tuple, we can use the parameters as a tuple to index into the cache, cache_stirl, and locate a previously computed Stirling number.

We break the function into two parts. The external part checks the cache, returning the Stirling number if it is known. Otherwise, it uses an internal function, stirlf(), that uses the recursive algorithm to compute a new Stirling number.

Stirling Numbers, Stirling(n,k) (89) =



cache_stirl = { }
def Stirling(n,k):
    """Return the Stirling number of the second kind
    for a set with n elements partitioned into k 
    disjoint non-empty sets."""

    def stirlf(n,k):
        """Actual Stirling computation, not using cache"""
        if k == n: return 1L
        if k == 1: return 1L
        return long(Stirling(n-1,k-1) + k*Stirling(n-1,k))

    if cache_stirl.has_key( (n,k) ): return cache_stirl[ (n,k) ]
    return cache_stirl.setdefault( (n,k), stirlf(n,k) )

Stirling Numbers, Stirling(n,k) (89). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Bernoulli Numbers

Synopsis

Bernoulli(p) -> float

Return the Bernoulli number p.

Arguments

p
Bernoulli number to compute
Description

The Bernoulli() function will accept a parameter, p, the Bernoulli number to compute.

As a simple speed-up, any Bernoulli number that is computed is cached to save recomputing it. The variable cache_bern, keeps all of the previously computed Bernoulli numbers.

We break the function into two parts. The external part checks the cache, returning the Bernoulli number if it is known. Otherwise, it uses an internal function, bernf(), that uses the recursive algorithm to compute a new Bernoulli number.

Bernoulli Numbers, Bernoulli(p) (90) =



cache_bern = { 0:1.0 }
def Bernoulli(p):
    """Return the Bernoulli number p."""

    def bernf(p):
        """Actual Bernoulli computation, not using cache"""
        sum= 0.0
        for k in range(p):
         sum += bico(p+1,k)*Bernoulli(k)
        return -(1.0/(p+1))*sum

    if cache_bern.has_key( p ): return cache_bern[p]
    return cache_bern.setdefault( p, bernf(p) )

Bernoulli Numbers, Bernoulli(p) (90). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Dependencies

This depends on the Binomial Coefficient function.

Log-Gamma Function 1

Synopsis

gammln1(x) -> float

Return the log of the gamma function, Γ, at x.

Arguments

x
value for which to compute log of the gamma function.

Description

This is from the ACM's collected algorithms, number 291.

The gamma function of an integer, n, has the property that log(n!)=log(Γ(n+1)).

A variation on Stirling's approximation is used for values larger than 7.0. For smaller values, the following equivalence is used to evaluate this function with an argument > 7.0.

k = 8 - int(xx)
log(gamma(xx)) = log(gamma(xx+k)) - log( prod( [ xx+i for i in range(k) ] ) )

Note that the Python version above (with the list comprehension) is slower than the explicit loops used below.

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (91) =



def gammln1( xx ):
   """Return the log of the gamma function at xx>0."""
   if xx < 7.0:
      # compute the product for the equivalence
      f= xx
      while xx < 7.0:
        xx = xx + 1.0
        f = f * xx
      return lgm1( xx+1.0 ) - math.log(f) 
   else:
      return lgm1( xx )

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (91). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

The lgm1() function evaluates a version of the Stirling approximation for the gamma function, Γ.

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (92) +=



def lgm1( x ):
   """Return Stirling approximation of log of gamma at x>7.0."""
   A=   0.918938533204673  # log(sqrt(2*pi))
   c0= -0.000595238095238
   c1=  0.000793650793651
   c2= -0.002777777777778
   c3=  0.083333333333333
   z= 1.0/x**2
   return (x-0.5)*math.log(x) - x + A + (((c0*z+c1)*z+c2)*z+c3)/x

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (92). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Log-Gamma Function 2

Synopsis

gammln2(x) -> float

Return the log of the gamma function, Γ at x.

Arguments

x
value for which to compute log of the gamma function.

Description

This is from the ACM's collected algorithms, number 309.

The gamma function of an integer, n, has the property that log(n!)=log(Γ(n+1)).

A variation on Stirling's approximation is used for values larger than 7.0. For smaller values, the following equivalence is used to evaluate this function with an argument > 7.0.

k = 8 - int(xx)
log(gamma(xx)) = log(gamma(xx+k)) - log( prod( [ xx+i for i in range(k) ] ) )

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (93) +=



def gammln2( xx ):
   """Return the log of the gamma function at xx>0."""
   if xx < 7.0:
      f= xx
      while xx < 7.0:
        xx = xx + 1.0
        f = f * xx
      return lgm2(xx+1.0) - math.log(f)
   else:
      return lgm2(xx)

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (93). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

The lgm2() function evaluates the Stirling approximation for the gamma function, Γ.

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (94) +=



def lgm2( w ):
   """Return Stirling approximation of log of gamma at x>7.0."""
   A=   0.918938533204673  # log(sqrt(2*pi))
   # c[i] = Bernoulli(2*i)/(2*i*(2*i-1))
   c = [ 0, 1./12., -1./360., 
      1./1260.,   -1./1680., 
      1./1188., -691./360360.,
      1./156., -3617./122400., 
      43867./244188., -174611./125400.,
      77683./5796., -236364091./1506960.,
      657931./300., -3392780147./93960. ]
   den= float(w)
   w2 = float(w*w)
   presum= (w-0.5)*math.log(w)-w+A
   for ci in c[1:]:
      sum = presum + ci/den
      if sum == presum: break
      den = den * w2
      presum = sum
   return sum

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (94). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Note that we have hard-coded 14 Bernoulli numbers. A somewhat nicer solution is to have a set of Bernoulli numbers initalized when the module is imported. This would eliminate mystery constants and allow fine-tuning the desired accuracy.

c = [ Bernoulli(2*i)/(2*i*(2*i-1)) for i in range 20 ]

Also, since our Bernoulli function caches previous results, we could consider replacing the c array with direct calls to the Bernoulli function.

Log-Gamma Function

Synopsis

gammln(x) -> float

Return the log of the gamma function at x.

Arguments

x
value for which to compute log of the gamma function.

Description

The gammln() function chooses either the gammln1() function or the gammln2() function as the final implementation. The results are similar, but there is some very slight difference in the performance.

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (95) +=



def gammln(x,method=gammln1):
   """Return the log of the gamma function at xx>0."""
   return method(x)

Log-Gamma Function, gammln(x), and two implementations gammln1(x) and gammln2(x) (95). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Dependencies

This depends on the two alternative implementations: Log-Gamma 1, ACM algorithm 291 and Log-Gamma 2, ACM algorithm 309.

Incomplete Gamma Function

Synopsis

gammp( a, x ) -> float

Return the partial (or incomplete) gamma function at a from 0 to x, γ(a,x).

Arguments

a
"degrees of freedom", the value for which to compute the partial gamma function
x
"chi-squared value", actually the range over which the integration occurs

Description

Depending on the relationship between a and x, there are two appropriate algorithms for computing the incomplete gamma function. When x < a+1, the series converges rapidly. When x >= a+1, the continued fraction is useful.

Incomplete Gamma Function, gammp(a,x) (96) =



def gammp( a, x ):
   """Return the incomplete gamma function, P, at a from 0 to x."""
   # Choose the series or continued fraction approximations.
   if x < 0 or a <= 0:
      raise Exception("invalid arguments %s,%s to gammp()"%a,x)
   if x == 0: return 0.0
   if x < a+1.0:
      gamser, gln = gser2( a, x )
      return gamser
   else:
      gammcf, gln = gcf2( a, x )
      return 1.0 - gammcf

Incomplete Gamma Function, gammp(a,x) (96). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

These algorithms come from CALGO 435 and 654.

This is the series, for x < a+1. The original FORTRAN is as follows:

   11 T1 = A*ALOG(X) - X
      R = EXP(T1)/GAMMA(A)

   50 APN = A + 1.0
      T = X/APN
      WK(1) = T
      DO 51 N = 2,20
         APN = APN + 1.0
         T = T*(X/APN)
         IF (T .LE. 1.E-3) GO TO 60
   51 WK(N) = T
      N = 20
C
   60 SUM = T
      TOL = 0.5*ACC
   61 APN = APN + 1.0
      T = T*(X/APN)
      SUM = SUM + T
      IF (T .GT. TOL) GO TO 61
C
      MAX = N - 1
      DO 70 M = 1,MAX
         N = N - 1
   70 SUM = SUM + WK(N)
      ANS = (R/A)*(1.0 + SUM)
      QANS = 0.5 + (0.5 - ANS)
      RETURN

The overall approach develops a series of values for the high-order bits (down to 0.001) of the answer, which are saved in the WK array. Then a series of low-order bits (from 0.0001 to the TOL value of 1.E-15) are computed into the SUM variable. The high-order bits are added to the SUM variable last. The apparent objective is to somehow preserve the low-order bits.

This approach doesn't seem defensible, since the high-order values contain low-order bits that will change the low-order bits accumulated into the SUM variable.

Incomplete Gamma Function, gammp(a,x) (97) +=



def gser2( a, x ):
   """Return the incomplete gamma function at a from 0 to x where x<a+1."""
   # series approximation for partial gamma when x < a+1.0
   gln= gammln(a)
   r= math.exp( a*math.log(x) - x - gln )
   a1 = a + 1.0
   t= x / a1
   sum= t
   while t > 1.0E-15:
      a1 = a1 + 1.0
      t = t * (x/a1)
      sum = sum + t
   ans = (r/a)*(1.0+sum)
   return ( ans, gln )

Incomplete Gamma Function, gammp(a,x) (97). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

This is the continued fraction, for x >= a+1. The original FORTRAN is as follows:

   11 T1 = A*ALOG(X) - X
      R = EXP(T1)/GAMMA(A)

  170 TOL = AMAX1(5.0*E,ACC)
      A2NM1 = 1.0
      A2N = 1.0
      B2NM1 = X
      B2N = X + (1.0 - A)
      C = 1.0
  171 A2NM1 = X*A2N + C*A2NM1
      B2NM1 = X*B2N + C*B2NM1
      AM0 = A2NM1/B2NM1
      C = C + 1.0
      CMA = C - A
      A2N = A2NM1 + CMA*A2N
      B2N = B2NM1 + CMA*B2N
      AN0 = A2N/B2N
      IF (ABS(AN0 - AM0) .GE. TOL*AN0) GO TO 171
C
      QANS = R*AN0
      ANS = 0.5 + (0.5 - QANS)
      RETURN

Incomplete Gamma Function, gammp(a,x) (98) +=



def gcf2( a, x ):
   """Return the incomplete gamma function at a from x to infinity where x>=a+1."""
   # continued fraction approximation for partial gamma when x >= a+1.0
   gln= gammln2(a)
   r= math.exp( a*math.log(x) - x - gln )
   a2nm1 = 1.0
   a2n = 1.0
   b2nm1 = x
   b2n = x + (1.0 - a)
   c = 1.0
   for i in range(100):
      a2nm1 = x*a2n + c*a2nm1
      b2nm1 = x*b2n + c*b2nm1
      am0 = a2nm1/b2nm1
      c = c + 1.0
      cma = c - a
      a2n = a2nm1 + cma*a2n
      b2n = b2nm1 + cma*b2n
      an0 = a2n/b2n
      if abs(an0-am0) < 1.0E-015*an0: break
   ans= r*an0
   return ( ans, gln )

Incomplete Gamma Function, gammp(a,x) (98). Used by special functions: factorial, binomial coefficient, etc. (86) :: chi2stats.py (75).

Dependencies

This depends on the Log-Gamma function.

Zero-Finding

The problem of locating a zero of a function can be very difficult. A general procedure is almost an impossibility. For a specific function, or a family of functions that are reasonably well-behaved, zero finding is possible.

In this package, we are locating the zero-crossing of the γ(a,x) function where the a value ("degrees of freedom") is fixed, and the range over which we integrate ("chi-squared metric") is allowed to vary. We locate the chi-squared value that provides a specific probability.

This zero-finding is a two-step process. The first steps locates a range that brackets the zero. The second step then narrows the range (by bisection) to locate the zero. Both of these are primitive, but reliable tools for finding zero-values of a function.

This implementation is built on two general-purpose functions that accept a target function and initial guesses at the range that brackets the zero. These general-purpose functions are not suitable for all zero-finding, but are suitable for a wide variety of continuous functions.

zero-finding functions: bracket a zero and find the zero (99) =



bracket a zero of a given function around a guess at the zero (100) 
bisect a range to locate a zero (101) 

zero-finding functions: bracket a zero and find the zero (99). Used by chi2stats.py (75).

Bracket A Zero

Synopsis

bracket(function,guess) -> (low,high)

Return a range which brackets a zero of function around the initial guess value for the root.

Arguments

function
Reference to a function; an example is lambda n,b=someValue:n**2 - b: this will locate n as the square root of someValue.
guess
Guess at possible value of zero

Description

bracket a zero of a given function around a guess at the zero (100) =



def bracket( function, guess ):
   """Return a range which brackets a zero of function around the initial guess."""
   # locate a point where f(c1) < 0
   tries= 144
   c1= guess
   while function(c1) > 0 and tries > 0:
      c1=c1/2.0
      tries -= 1
   if tries == 0: return None
   # locate a point where f(c2) > 0
   tries= 144
   c2= guess
   while function(c2) < 0 and tries > 0:
      c2=c2*2.0
      tries -= 1
   if tries == 0: return None
   return ( c1, c2 )

bracket a zero of a given function around a guess at the zero (100). Used by zero-finding functions: bracket a zero and find the zero (99) :: chi2stats.py (75).

Bisect to Find Zero

Synopsis

bisect(function,range,EPS=0.00001) -> number

Bisect the range (a tuple of low and high) to find a zero of function until the range is narrower then some small number, EPS.

Arguments

function
Reference to a function; an example is lambda n,b=someValue:n**2 - b: this will locate n as the square root of someValue.
range
Tuple with the low and high end of a range of values that contain a root.

Description

bisect a range to locate a zero (101) =



def bisect( func, range_, EPS=0.000001 ):
   """Bisect the range (a tuple of low and high) to find a zero of func
   until the range is narrower then some small number, EPS."""
   tries= 144
   c1, c2 = range_
   if c1 == c2: raise Exception("invalid range (%d,%d) in bisect()"%range_)
   while abs( (c2-c1)/c2 ) > EPS and tries > 0:
      mid= (c1+c2)/2
      f_atmid= func(mid)
      if f_atmid < 0:
         c1= mid
      elif f_atmid > 0:
         c2= mid
      else:
         return mid
      tries -= 1
   return c1

bisect a range to locate a zero (101). Used by zero-finding functions: bracket a zero and find the zero (99) :: chi2stats.py (75).

Tests

A number of tests are provided:

Test scenarios (102) =



test 1: log-gamma, gammln() (103) 
test 2: parial gamma, gammp() (104) 
test 3: chi-squared probability for a given value, chi2p() (105) 
test 4: chi-squared value at a given probability, chi2Val() (106) 
test 5: Bernoulli numbers, Bernoulli() (107) 

Test scenarios (102). Used by chi2stats.py (75).

The _test1() function tests the basic gammln() function to be sure that it produces log-gamma values. Since (n!)=Γ(n+1), we can easily check these values against n! values.

test 1: log-gamma, gammln() (103) =



def _test1():
   print "test gammln() function"
   for n in range(10):
      f= fact(n)
      f1= math.exp( gammln1(n+1) )
      f2= math.exp( gammln2(n+1) )
      print "%2d %10d %10.0f %10.0f" % (n, f, f1, f2)
   for n in range(1,21):
      x= n/2.0
      print "%4.1f %10f %10f" % ( x, gammln1(x), gammln2(x) )

test 1: log-gamma, gammln() (103). Used by Test scenarios (102) :: chi2stats.py (75).

The _test2() function tests the basic gammp() function from a picture in Numerical Recipes in C. This test is somewhat subjective, as reading the picture is difficult.

test 2: parial gamma, gammp() (104) =



def _test2():
   print "test gammp() function"
   for a in [ 0.5, 1.0, 3.0, 10.0 ]:
      print "a=",a
      for x in [ .25, .5, 1, 2, 4, 6, 8, 10, 12, 14 ]:
         print x,gammp(a,x)

test 2: parial gamma, gammp() (104). Used by Test scenarios (102) :: chi2stats.py (75).

The _test3() function tests the basic Chi-squared probability function. Values were extracted from the Microsoft Excel program to compare the results of this function.

test 3: chi-squared probability for a given value, chi2p() (105) =



def _test3():
   print "test chi2P() function"
   print "%.2f %.2f" % (chi2P( 0.114831620490197,3),0.01)
   print "%.2f %.2f" % (chi2P( 0.351845959260105,3),0.05)
   print "%.2f %.2f" % (chi2P( 0.584375459244772,3),0.10)
   print "%.2f %.2f" % (chi2P( 2.3659727453222,  3),0.50)
   print "%.2f %.2f" % (chi2P( 6.2513944516967,  3),0.90)
   print "%.2f %.2f" % (chi2P( 7.8147247029009,  3),0.95)
   print "%.2f %.2f" % (chi2P(11.344882119471,   3),0.99)

test 3: chi-squared probability for a given value, chi2p() (105). Used by Test scenarios (102) :: chi2stats.py (75).

The _test4() function tests the ability to produce an accurate table of chi-squared values.

test 4: chi-squared value at a given probability, chi2Val() (106) =



def _test4():
   print "test chi2Val() function"
   probSet= [ 0.01, 0.05, 0.1, 0.5, 0.9, 0.95, 0.99 ]
   print "DegF"," ".join( map(lambda x:'%-9g'%x,probSet) )
   for f in range(1,11):
      print "%4d" % f,
      for p in probSet:
         v= "%.4g" % chi2Val( f, p )
         print "%-9s" % v,
      print

test 4: chi-squared value at a given probability, chi2Val() (106). Used by Test scenarios (102) :: chi2stats.py (75).

The _test5() function prints the first few Bernoulli numbers.

test 5: Bernoulli numbers, Bernoulli() (107) =



def _test5():
   actual= [ 1.0, -1/2.0, 1/6.0, 0, -1/30.0, 0, 1/42.0, 0, -1/30.0, 0, 5/66.0,
         0, -691/2730., 0, 7/6.0, 0, -3617/510.0, 0, 43867/798.0, 0, -174611/330.0]
   print "actual Bernoulli numbers, computed Bernoulli numbers"
   for p in range(len(actual)):
      print "%2d %12.7f %12.7f" % (p,Bernoulli(p),actual[p])
   print "Bernoulli numbers used in log-gamma approximation"
   for i in range(1,21):
      print "%2d %2d %24.8f" % (i,2*i,Bernoulli(2*i)/(2*i*(2*i-1)))

test 5: Bernoulli numbers, Bernoulli() (107). Used by Test scenarios (102) :: chi2stats.py (75).

This is the main program, used when this is the only module.

"chi2stats.py" (108) +=



if __name__ == "__main__":
   _test1()
   _test3()
   _test4()
   _test5()

"chi2stats.py" (108).

Sample Output

The following section is sample output from running this application on a Dell Latitude CPiA with Windows 98 and Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)].

test gammln() function
 0          1          1          1
 1          1          1          1
 2          2          2          2
 3          6          6          6
 4         24         24         24
 5        120        120        120
 6        720        720        720
 7       5040       5040       5040
 8      40320      40320      40320
 9     362880     362880     362880
 0.5   0.572365   0.572365
 1.0  -0.000000   0.000000
 1.5  -0.120782  -0.120782
 2.0  -0.000000   0.000000
 2.5   0.284683   0.284683
 3.0   0.693147   0.693147
 3.5   1.200974   1.200974
 4.0   1.791759   1.791759
 4.5   2.453737   2.453737
 5.0   3.178054   3.178054
 5.5   3.957814   3.957814
 6.0   4.787492   4.787492
 6.5   5.662562   5.662562
 7.0   6.579251   6.579251
 7.5   7.534364   7.534364
 8.0   8.525161   8.525161
 8.5   9.549267   9.549267
 9.0  10.604603  10.604603
 9.5  11.689333  11.689333
10.0  12.801827  12.801827
test chi2P() function
0.01 0.01
0.05 0.05
0.10 0.10
0.50 0.50
0.90 0.90
0.95 0.95
0.99 0.99
test chi2Val() function
DegF 0.01      0.05      0.1       0.5       0.9       0.95      0.99     
   1 0.0001571 0.003932  0.01579   0.4549    2.706     3.841     6.635    
   2 0.0201    0.1026    0.2107    1.386     4.605     5.991     9.21     
   3 0.1148    0.3518    0.5844    2.366     6.251     7.815     11.34    
   4 0.2971    0.7107    1.064     3.357     7.779     9.488     13.28    
   5 0.5543    1.145     1.61      4.351     9.236     11.07     15.09    
   6 0.8721    1.635     2.204     5.348     10.64     12.59     16.81    
   7 1.239     2.167     2.833     6.346     12.02     14.07     18.48    
   8 1.646     2.733     3.49      7.344     13.36     15.51     20.09    
   9 2.088     3.325     4.168     8.343     14.68     16.92     21.67    
  10 2.558     3.94      4.865     9.342     15.99     18.31     23.21    
actual Bernoulli numbers, computed Bernoulli numbers
 0    1.0000000    1.0000000
 1   -0.5000000   -0.5000000
 2    0.1666667    0.1666667
 3    0.0000000    0.0000000
 4   -0.0333333   -0.0333333
 5   -0.0000000    0.0000000
 6    0.0238095    0.0238095
 7    0.0000000    0.0000000
 8   -0.0333333   -0.0333333
 9    0.0000000    0.0000000
10    0.0757576    0.0757576
11   -0.0000000    0.0000000
12   -0.2531136   -0.2531136
13    0.0000000    0.0000000
14    1.1666667    1.1666667
15   -0.0000000    0.0000000
16   -7.0921569   -7.0921569
17   -0.0000000    0.0000000
18   54.9711779   54.9711779
19   -0.0000000    0.0000000
20 -529.1242424 -529.1242424
Bernoulli numbers used in log-gamma approximation
 1  2               0.08333333
 2  4              -0.00277778
 3  6               0.00079365
 4  8              -0.00059524
 5 10               0.00084175
 6 12              -0.00191753
 7 14               0.00641026
 8 16              -0.02955065
 9 18               0.17964437
10 20              -1.39243222
11 22              13.40286404
12 24            -156.84828463
13 26            2193.10333333
14 28          -36108.77125373
15 30          691472.26885132
16 32       -15238221.53940764
17 34       382900751.39142388
18 36    -10882266035.78448296
19 38    347320283765.00860596
20 40 -12369602142269.54296875

Application Script

Implementation

The following main script exercises a variety of published random number generators.

"rngmain.py" (109) =


"""Compare several random number generators."""
rngmain edit warning (111) 
rngmain imports (110) 

Algorithm 294 (112) 
Algorithm 266 (113) 
Cheney-Kincaid (114) 
Variation on Algorithm 266, from netlib.org (115) 

Main Report (116) 

"rngmain.py" (109).

This module imports the random module, for the builtin random number generator. It imports the rngtest module with the testing framework. It also imports pprint, to pretty-print sequences.

rngmain imports (110) =



import random
import rngtest
import pprint

rngmain imports (110). Used by rngmain.py (109).

The edit warning reminds people that the source should be changed, not the .py file that is created by pyWeb.

rngmain edit warning (111) =



### DO NOT EDIT THIS FILE!
### It was created by ../pyWeb/pyweb.py, __version__='$Revision$'.
### From source rngdoc.w modified Tue Apr 26 06:58:52 2005.
### In working directory '/Users/slott/Documents/Projects/rngtest'.

rngmain edit warning (111). Used by rngmain.py (109).

The

The rng294() generator is documented in ACM Collected Algorithms (CALGO) number 294.

Algorithm 294 (112) =



m = 2**28
c = 2**14 - 3
x = float(3*5*11)/float(m)
def rng294(b):
    """ACM CALGO 294 from netlib.org"""
    global x
    x = x * c
    x = x - int(x)
    return int(x*b)

Algorithm 294 (112). Used by rngmain.py (109).

The rng266() generator is documented in ACM Collected Algorithms (CALGO) number 266.

Algorithm 266 (113) =



iy = 1
def rng266(b):
    """ACM CALGO 266 from netlib.org"""
    global iy
    iy = (iy * 125) % 2796203
    r = float(iy)/2796203.0*(1.0+1.0E-6+1.0E-12)
    return int(r*b)

Algorithm 266 (113). Used by rngmain.py (109).

The rngck() generator is documented netlib.org, and is the Cheney-Kincaid generator.

Cheney-Kincaid (114) =



l = 256
def rngck(b):
    """Cheney-Kincaid from netlib.org"""
    global l
    l = (l * 16807L) % 2147483647L
    r= float(l)*4.6566128752458E-10
    return int(r*b)

Cheney-Kincaid (114). Used by rngmain.py (109).

The rng266a() generator is documented netlib.org, and references ACM Collected Algorithms (CALGO) number 266.

Variation on Algorithm 266, from netlib.org (115) =



xa = 32767
def rng266a(b):
    """ACM CALGO 266 from netlib.org"""
    global xa
    mult = [ 25, 25, 5 ]
    for m in mult:
        xa = (xa*m) % 67108864
    r= float(xa)*2.0**(-26)
    return int(r*b)

def check266a():
    for i in range(5000):
        rng266a(16)
    if xa == 40145055:
        print "Function rng266a() implemented correctly."
    else:
        print "Problem with function rng266a()."

Variation on Algorithm 266, from netlib.org (115). Used by rngmain.py (109).

The following script exercises the above generators. It imports the random module to make the builtin generator available. It imports the rngtest module that contains the empirical testing classes and functions. For debugging, it also imports the pprint module to make some sequences easier to read on output.

A sequence called generators is created to list a name and the actual function call for each generator. This makes it easy to iterate through the generators, calling the rngtest.summary() or rngtest.detail() functions for each generator.

The rngtest.tests() function returns background information for each test. This is used to print a three-line heading on the output. The background information includes the test name, the lowest acceptable metric value and the highest acceptable metric value.

Main Report (116) =




samples=1000
generators = [ ("random.randrange", random.randrange),
    ("rng294", rng294), ("rng266", rng266),
    ("rng266a", rng266a), ("rngck", rngck) ]
    
# headings
heading= rngtest.tests()
print "low,",
print ",".join( [ "%.4g" % l for n,l,h in heading ] )
print "high,",
print ",".join( [ "%.4g" % h for n,l,h in heading ] )
print "test,",
print ",".join( [ '"%s"' % n for n,l,h in heading ] )

# test results
for n,g in generators:
    r= rngtest.summary( n, g, samples )
    #pprint.pprint( r )
    print "%s," % r[0],
    print ",".join( [ "%.4g" % m for n,m,l,h in r[1:] ] )

Main Report (116). Used by rngmain.py (109).

Sample Output

The following section is sample output from running this application on an iMac with a PowerPC G4 (2.1) and Python 2.3 (#2, Jul 30 2003, 11:45:28) [GCC 3.1 20020420 (prerelease)].

Expected

low, 7.261,45.74,2.167,0.7107,19.28,13.09,1.145,1.145,1.145,6.57,0
high, 25,82.53,14.07,9.488,44.99,35.17,11.07,11.07,11.07,23.68,0
test, "Frequency Test","Serial Test","Gap Test","Poker Test","Coupon Collector's Test","Permutation Test","Runs Up - Dependent","Runs Up - Independent","Runs Up - Independent (Ritter)","Max of T Test","Serial Correlation"
random.randrange, 22.98,70.62,4.02,1.139,23.83,28.78,5.103,5.381,4.616,31.21,-0.01877
rng294, 8.992,65.76,7.46,4.682,31.1,20.14,7.357,1.505,1.162,6.918,-0.01056
rng266, 8.544,60.38,6.88,2.413,23.11,22.06,5.242,3.398,6.121,6.492,0.03033
rng266a, 10.24,53.73,6.52,1.345,38.22,29.36,7.921,5.75,8.304,3.424,0.03727
rngck, 10.75,64.22,7.76,2.667,43.14,30.51,1.749,4.214,3.827,16.45,-0.04266

Actual

low, 7.261,45.74,2.167,0.7107,19.28,13.09,1.145,1.145,1.145,6.57,0
high, 25,82.53,14.07,9.488,44.99,35.17,11.07,11.07,11.07,23.68,0
test, "Frequency Test","Serial Test","Gap Test","Poker Test","Coupon Collector's Test","Permutation Test","Runs Up - Dependent","Runs Up - Independent","Runs Up - Independent (Ritter)","Max of T Test","Serial Correlation"
random.randrange, 10.27,64.74,5.42,4.26,59.88,22.06,4.834,2.831,1.959,9.62,-0.01728
rng294, 8.992,65.76,7.46,4.682,31.1,20.14,7.357,1.505,1.162,6.918,-0.01056
rng266, 8.544,60.38,6.88,2.413,23.11,22.06,5.242,3.398,6.121,6.492,0.03033
rng266a, 10.24,53.73,6.52,1.345,38.22,29.36,7.921,5.75,8.304,3.424,0.03727
rngck, 10.75,64.22,7.76,2.667,43.14,30.51,1.749,4.214,3.827,16.45,-0.04266

Created by ../pyWeb/pyweb.py at Tue Apr 26 06:59:12 2005.

pyweb.__version__ '$Revision$'.

From source rngdoc.w modified Tue Apr 26 06:58:52 2005.

Working directory '/Users/slott/Documents/Projects/rngtest'.