We looked at general features of the file system in Files. In this chapter we’ll look at Python techniques for handling files in a few of the innumeraable formats that are in common use. Most file formats are relatively easy to handle with Python techniques we’ve already seen. Comma-Separated Values (CSV) files, XML files and packed binary files, however, are a little more sophisticated.
This only the tip of the iceberg in the far larger problem called persistence. In addition to simple file system persistence, we also have the possibility of object persistence using an object database. In this case, the databse processing lies between our program and the file system on which the database resides. This area also includes object-relational mapping, where our program relies on a mapper; the mapper uses to database, and the database manages the file system. We can’t explore the whole persistence problem in this chapter.
In this chapter we’ll present a conceptual overview of the various approaches to reading and writing files in Overview. We’ll look at reading and writing CSV files in Comma-Separated Values: The csv Module, tab-delimited files in Tab Files: Nothing Special. We’ll look reading property files in Property Files and Configuration (or .INI ) Files: The ConfigParser Module. We’ll look at the subleties of processing legacy COBOL files in Fixed Format Files, A COBOL Legacy: The codecs Module. We’ll cover the basics of reading XML files in XML Files: The xml.etree and xml.sax Modules.
Most programs need a way to write sophisticated, easy-to-control log files what contain status and debugging information. For simple one-page programs, the print statement is fine. As soon as we have multiple modules, where we need more sophisticated debugging, we find a need for the logging module. Of course, any program that requires careful auditing will benefit from the logging module. We’ll look at creating standard logs in Log Files: The logging Module.
When we introduced the concept of file we mentioned that we could look at a file on two levels.
A file format is the processing rules required to translate between usable Python objects and sequences of bytes. People have invented innumerable distinct file formats. We’ll look at some techniques which should cover most of the bases.
We’ll look at three broad families of files: text, binary and pickled objects. Each has some advantages and processing complexities.
Often, we have data that is in Comma-Separated Value (CSV) format. This used by many spreadsheets and is a widely-used standard for data files.
In Reading a CSV File the Hard Way we parsed CSV files using simple string manipulations. The csv module does a far better job at parsing and creating CSV files than the programming we showed in those examples.
About CSV Files. CSV files are text files organized around data that has rows and columns. This format is used to exchange data between spread-sheet programs or databases. A CSV file uses a number of punctuation rules to encode the data.
In the ideal case, a CSV file will have the same number of columns in each row, and the first row will be column titles. Almost as pleasant is a file without column titles, but with a known sequence of columns. In the more complex cases, the number of columns per row varies.
The csv Module. The CSV module provides you with readers or writers; these are objects which use an existing file object, created with the file() or open() function. A CSV reader will read a file, parsing the commas and quotes, delivering you the data elements of each row in a sequence or mapping. A CSV writer will create a file, adding the necessary commas and quotes to create a valid CSV file.
The following constructors within the csv module are used to create a reader, DictReader, writer or DictWriter.
Creates a reader object which can parse the given file, returning a sequence of values for each line of the file. The csvfile can be any iterable object.
This can be used as follows.
rdr= csv.reader( open( "file.csv", "rb" ) ) for row in rdr: print row
Creates a writer object which can format a sequence of values and write them to a line of the file. The csvfile can be any object which supports a write() method.
This can be used as follows.
target= open( "file.csv", "wb" ) wtr= csv.writer( target ) wtr.writerow( ["some","list","of","values"] ) target.close()
It’s very handy to use the with statement to assure that the file is properly closed.
with open( "file.csv", "wb" ) as target: wtr= csv.writer( target ) wtr.writerow( ["some","list","of","values"] )
Reader Functions. The following functions within a reader (or DictReader) object will read and parse the CSV file.
Writer Functions. The following functions with a writer (or DictWriter) object will format and write a CSV file.
Basic CSV Reading Example.
The basic CSV reader processing treats each line of the file as data. This is typical for files which lack column titles, or files which have such a complex format that special parsing and analysis is required. In some cases, a file has a simple, regular format with a single row of column titles, which can be processed by a special reader we’ll look at below.
We’ll revise the readquotes.py program from Reading a CSV File the Hard Way. This will properly handle all of the quoting rules, eliminating a number of irritating problems with the example in the previous chapter.
import csv qFile= file( "quotes.csv", "rb" ) csvReader= csv.reader( qFile ) for q in csvReader: try: stock, price, date, time, change, opPrc, dHi, dLo, vol = q print stock, float(price), date, time, change, vol except ValueError: pass qFile.close()
Column Headers as Dictionary Keys In some cases, you have a simple, regular file with a single line of column titles. In this case, you can transform each line of the file into a dictionary. The key for each field is the column title. This can lead to programs which are more clear, and more flexible. The flexibility comes from not assuming a specific order to the columns.
We’ll revise the readportfolio.py program from Reading “Records”. This will properly handle all of the quoting rules, eliminating a number of irritating problems with the example in the previous chapter. It will make use of the column titles in the file.
import csv quotes=open( "display.csv", "rb" ) csvReader= csv.DictReader( quotes ) invest= 0 current= 0 for data in csvReader: print data invest += float(data["Purchase Price"])*float(data["# Shares"]) current += float(data["Price"])*float(data["# Shares"]) print invest, current, (current-invest)/invest
We open our portfolio file, display.csv, for reading, creating a file object named quotes.
We create a csv.DictReader object from our quotes file. This will read the first line of the file to get the column titles; each subsequent line will be parsed and transformed into a dictionary.
We initialize two counters, invest and current to zero. These will accumulate our initial investment and the current value of this portfolio.
We use a for statement to iterate through the lines in quotes file. Each line is parsed, and the column titles are used to create a dictionary, which is assigned to data.
Each stock quote, q, is a string. We use the strip() operation to remove excess whitespace characters; the string which is created then performs the split() method to separate the fields into a list. We assign this list to the variable values.
We perform some simple calculations on each dict. In this case, we convert the purchase price to a number, convert the number of shares to a number and multiply to determine how much we spent on this stock. We accumulate the sum of these products into invest.
We also convert the current price to a number and multiply this by the number of shares to get the current value of this stock. We accumulate the sum of these products into current.
When the loop has terminated, we can write out the two numbers, and compute the percent change.
Writing CSV Files The most general case for writing CSV is shown in the following example. Assume we’ve got a list of objects, named someList. Further, let’s assume that each object has three attributes: this, that and aKey.
import csv myFile= open( " :replaceable:`result` ", "wb" ) wtr= csv.writer( myFile ) for someData in :replaceable:`someList` : aRow= [ someData.this, someData.that, someData.aKey, ] wtr.writerow( aRow ) myFile.close()
In this case, we assemble the list of values that becomes a row in the CSV file.
In some cases we can provide two methods to allow our classes to participate in CSV writing. We can define a csvRow() method as well as a csvHeading() method. These methods will provide the necessary tuples of heading or data to be written to the CSV file.
For example, let’s look at the following class definition for a small database of sailboats. This class shows how the csvRow() and csvHeading() methods might look.
class Boat( object ): csvHeading= [ "name", "rig", "sails" ] def __init__( aBoat, name, rig, sails ): self.name= name self.rig= rig self.sails= sails def __str__( self ): return "%s (%s, %r)" % ( self.name, self.rig, self.sails ) def csvRow( self ): return [ self.name, self.rig, self.sails ]
Including these methods in our class definitions simplifies the loop that writes the objects to a CSV file. Instead of building each row as a list, we can do the following: wtr.writerow( someData.csvRow() ) .
Here’s an example that leverages each object’s internal dictionary (__dict__) to dump objects to a CSV file.
db= [ Boat( "KaDiMa", "sloop", ( "main", "jib" ) ), Boat( "Glinda", "sloop", ( "main", "jib", "spinnaker" ) ), Boat( "Eillean Glas", "sloop", ( "main", "genoa" ) ), ] test= file( "boats.csv", "wb" ) wtr= csv.DictWriter( test, Boat.csvHeading ) wtr.writerow( dict( zip( Boat.csvHeading, Boat.csvHeading ) ) ) for d in db: wtr.writerow( d.__dict__ ) test.close()
Tab-delimited files are text files organized around data that has rows and columns. This format is used to exchange data between spread-sheet programs or databases. A tab-delimited file uses just rwo punctuation rules to encode the data.
In the ideal cases, a CSV file will have the same number of columns in each row, and the first row will be column titles. Almost as pleasant is a file without column titles, but with a known sequence of columns. In the more complex cases, the number of columns per row varies.
When we have a single, standard punctuation mark, we can simply use two operations in the string and list classes to process files. We use the split() method of a string to parse the rows. We use the join() method of a list to assemble the rows.
We don’t actually need a separate module to handle tab-delimited files.
Reading. The most general case for reading Tab-delimited data is shown in the following example.
myFile= open( " :replaceable:`somefile` ", "rU" ) for aRow in myFile: print aRow.split('\t') myFile.close()
Each row will be a list of column values.
Writing. The writing case is the inverse of the reading case. Essentially, we use a "t".join( someList ) to create the tab-delimeted row. Here’s our sailboat example, done as tab-delimited data.
test= file( "boats.tab", "w" ) test.write( "\t".join( Boat.csvHeading ) ) test.write( "\n" ) for d in db: test.write( "\t".join( map( str, d.csvRow() ) ) ) test.write( "\n" ) test.close()
Note that some elements of our data objects aren’t string values. In this case, the value for sails is a tuple, which needs to be converted to a proper string. The expression map(str, someList ) applies the str() function to each element of the original list, creating a new list which will have all string values. See Sequence Processing Functions: map(), filter() and reduce().
A property file, also known as a configuration (or .INI) file defines property or configuration values. It is usually just a collection of settings. The essential property-file format has a simple row-oriented format with only two values in each row. A configuration (or .INI) file organizes a simple list of properties into one or more named sections.
A property file uses a few punctuation rules to encode the data.
Some property file dialects allow a value to continue on to the next line. In this case, a line that ends with \ (the cwo-character sequence \ \n) escapes the usual meaning of \n. Rather being the end of a line, \n is demoted to just another whitespace character.
A property file is an extension to the basic tab-delimited file. It has just two columns per line, and some space-stripping is done. However, it doesn’t have a consistent separator, so it is slightly more complex to parse.
The extra feature introduced in a configuration file is named sections.
Reading a Simple Property File. Here’s an example of reading the simplest kind of property file. In this case, we’ll turn the entire file into a dictionary. Python doesn’t provide a module for doing this. The processing is a sequence string manipulations to parse the file.
propFile= file( r"C:\Java\jdk1.5.0_06\jre\lib\logging.properties", "rU" ) propDict= dict() for propLine in propFile: propDef= propLine.strip() if len(propDef) == 0: continue if propDef in ( '!', '#' ): continue punctuation= [ propDef.find(c) for c in ':= ' ] + [ len(propDef) ] found= min( [ pos for pos in punctuation if pos != -1 ] ) name= propDef[:found].rstrip() value= propDef[found:].lstrip(":= ").rstrip() propDict[name]= value propFile.close() print propDict print propDict['handlers']
The input line is subject to a number of processing steps.
Reading a Config File. The ConfigParser module has a number of classes for processing configuration files. You initialize a ConfigParse object with default values. The object can the read one or more a configuration files. You can then use methods to determine what sections were present and what options were defined in a given section.
import ConfigParser cp= ConfigParser.RawConfigParser( ) cp.read( r"C:\Program Files\Mozilla Firefox\updater.ini" ) print cp.sections() print cp.options('Strings') print cp.get('Strings','info')
Eschewing Obfuscation. While a property file is rather simple, it is possible to simplify property files further. The essential property definition syntax is so close to Python’s own syntax that some applications use a simple file of Python variable settings. In this case, the settings file would look like this.
# Some Properties TITLE = "The Title String" INFO = """The information string. Which uses Python's ordinary techniques for long lines."""
This file can be introduced in your program with one statement: import settings . This statement will create module-level variables, settings.TITLE and settings.INFO.
Files that come from COBOL programs have three characteristic features:
The first problem requires figuring the starting position and size of each field. In some cases, there are no gaps (or filler) between fields; in this case the sizes of each field are all that are required. Once we have the position and size, however, we can use a string slice operation to pick those characters out of a record. The code is simply aLine[start:start+size].
We can tackle the second problem using the codecs module to decode the EBCDIC characters. The result of codecs.getdecoder('cp037') is a function that you can use as an EBCDIC decoder.
The third problem requires that our program know the data type as well as the position and offset of each field. If we know the data type, then we can do EBCDIC conversion or packed decimal conversion as appropriate. This is a much more subtle algorithm, since we have two strategies for converting the data fields. See Strategy for some reasons why we’d do it this way.
In order to mirror COBOL’s largely decimal world-view, we will need to use the decimal module for all numbers and airthmetic.
We note that the presence of packed decimal data changes the file from text to binary. We’ll begin with techniques for handling a text file with a fixed layout. However, since this often slides over to binary file processing, we’ll move on to that topic, also.
Reading an All-Text File. If we ignore the EBCDIC and packed decimal problems, we can easily process a fixed-layout file. The way to do this is to define a handy structure that defines our record layout. We can use this structure to parse each record, transforming the record from a string into a dictionary that we can use for further processing.
In this example, we also use a generator function, yieldRecords(), to break the file into individual records. We separate this functionality out so that our processing loop is a simple for statement, as it is with other kinds of files. In principle, this generator function can also check the length of recBytes before it yields it. If the block of data isn’t the expected size, the file was damaged and an exception should be raised.
layout = [ ( 'field1', 0, 12 ), ( 'field2', 12, 4 ), ( 'anotherField', 16, 20 ), ( 'lastField', 36, 8 ), ] reclen= 44 def yieldRecords( aFile, recSize ): recBytes= aFile.read(recSize) while recBytes: yield recBytes recBytes= aFile.read(recSize) cobolFile= file( 'my.cobol.file', 'rb' ) for recBytes in yieldRecords(cobolFile, reclen): record = dict() for name, start, size in layout: record[name]= recBytes[start:start+len]
Reading Mixed Data Types. If we have to tackle the complete EBCDIC and packed decimal problem, we have to use a slightly more sophisticated structure for our file layout definition. First, we need some data conversion functions, then we can use those functions as part of picking apart a record.
We may need several conversion functions, depending on the kind of data that’s present in our file. Minimally, we’ll need the following two functions.
This function is used to get character data. In COBOL, this is called display data. It will be in EBCDIC if our files originated on a mainframe.
def display( bytes ): return bytes
This function is used to get packed decimal data. In COBOL, this is called COMP-3 data. In our example, we have not dealt with the insert of the decimal point prior to the creation of a decimal.Decimal object.
import codecs display = codecs.getdecoder('cp037') def packed( bytes ): n= [ '' ] for b in bytes[:-1]: hi, lo = divmod( ord(b), 16 ) n.append( str(hi) ) n.append( str(lo) ) digit, sign = divmod( ord(bytes[-1]), 16 ) n.append( str(digit) ) if sign in (0x0b, 0x0d ): n= '-' else: n= '+' return n
Given these two functions, we can expand our handy record layout structure.
layout = [ ( 'field1', 0, 12, display ), ( 'field2', 12, 4, packed ), ( 'anotherField', 16, 20, display ), ( 'lastField', 36, 8, packed ), ] reclen= 44
This changes our record decoding to the following.
cobolFile= file( 'my.cobol.file', 'rb' ) for recBytes in yieldRecords(cobolFile, reclen): record = dict() for name, start, size, convert in layout: record[name]= convert( recBytes[start:start+len] )
This example underscores some of the key values of Python. Simple things can be kept simple. The layout structure, which describes the data, is both easy to read, and written in Python itself. The evolution of this example shows how adding a sophisticated feature can be done simply and cleanly.
At some point, our record layout will have to evolve from a simple tuple to a proper class definition. We’ll need to take this evolutionary step when we want to convert packed decimal numbers into values that we can use for further processing.
XML files are text files, intended for human consumption, that mix markup with content. The markup uses a number of relatively simple rules. Additionally, there are structural requirements that assure that an XML file has a minimal level of validity. There are additional rules (either a Document Type Defintion, DTD, or an XML Schema Definition, XSD) that provide additional structural rules.
There are several XML parsers available with Python.
xml.sax Parsing. The Standard API for XML (SAX) parser is described as an event parser. The parser recognizes different elements of an XML document and invokes methods in a handler which you provide. Your handler will be given pieces of the document, and can do appropriate processing with those pieces.
For most XML processing, your program will have the following outline: This parser will then use your ContentHandler as it parses.
Here’s a short example that shows the essentials of building a simple XML parser with the xml.sax module. This example defines a simple ContentHandler that prints the tags as well as counting the occurances of the <informaltable> tag.
import xml.sax class DumpDetails( xml.sax.ContentHandler ): def __init__( self ): self.depth= 0 self.tableCount= 0 def startElement( self, aName, someAttrs ): print self.depth*' ' + aName self.depth += 1 if aName == 'informaltable': self.tableCount += 1 def endElement( self, aName ): self.depth -= 1 def characters( self, content ): pass # ignore the actual data p= xml.sax.make_parser() myHandler= DumpDetails() p.setContentHandler( myHandler ) p.parse( "../p5-projects.xml" ) print myHandler.tableCount, "tables"
Since the parsing is event-driven, your handler must accumulate any context required to determine where the individual tags occur. In some content models (like XHTML and DocBook) there are two levels of markup: structural and semantic. The structural markup includes books, parts, chapters, sections, lists and the like. The semantic markup is sometimes called “inline” markup, and it includes tags to identify function names, class names, exception names, variable names, and the like. When processing this kind of document, you’re application must determine the which tag is which.
A ContentHandler Subclass. The heart of a SAX parser is the subclass of ContentHandler that you define in your application. There are a number of methods which you may want to override. Minimally, you’ll override the startElement() and characters() methods. There are other methods of this class described in section 20.10.1 of the Python Library Reference.
The parser calls this method with each tag that is found, in non-namespace mode. The name is the string with the tag name.
The attrs parameter is an xml.sax.Attributes object. This object is reused by the parser; your handler cannot save this object.
The xml.sax.Attributes object behaves somewhat like a mapping. It doesn’t support the  operator for getting values, but does support get(), has_key(), items(), keys(), and values() methods.
The parser calls this method with each tag that is found, in namespace mode. You set namesace mode by using the parser’s p.setFeature( xml.sax.handler.feature_namespaces, True ). The name is a tuple with the URI for the namespace and the tag name. The qname is the fully qualified text name.
The attrs is described above under ContentHandler.startElementNS().
xml.etree Parsing. The Document Object Model (DOM) parser creates a document object model from your XML document. The parser transforms the text of an XML document into a DOM object. Once your program has the DOM object, you can examine that object.
Here’s a short example that shows the essentials of building a simple XML parser with the xml.etree module. This example locates all instances of the <informaltable> tag in the XML document and prints parts of this tag’s content.
#!/usr/bin/env python from xml.etree import ElementTree dom1 = ElementTree.parse("../PythonBook-2.5/p5-projects.xml") for t in dom1.getiterator("informaltable"): print t.attrib for row in t.find('thead').getiterator('tr'): print "head row" for header_col in row.getiterator('th'): print header_col.text for row in t.find('tbody').getiterator('tr'): for body_col in row.getiterator('td'): print body_col.text
The DOM Object Model. The heart of a DOM parser is the DOM class hierarchy.
There is a widely-used XML Document Object Model definition. This standard applies to both Java programs as well as Python. The xml.dom package provides definitions which meet this standard.
The standard doesn’t address how XML is parsed to create this structure. Consequently, the xml.dom package has no official parser. You could, for example, use a SAX parser to produce a DOM structure. Your handler would create objects from the classes defined in xml.dom.
The xml.dom.minidom package is an implementation of the DOM standard, which is slightly simplified. This implementation of the standard is extended to include a parser. The essential class definitions, however, come from xml.dom.
The standard element hierarchy is rather complex. There’s an overview of the DOM model in The DOM Class Hierarchy.
The ElementTree Document Object Model. When using xml.etree your program will work with a number of xml.etree.ElementTree objects. We’ll look at a few essential classes of the DOM. There are other classes in this model, described in section 20.13 of the Python Library Reference. We’ll focus on the most commonly-used features of this class.
Generally, ElementTree processing starts with parsing an XML document. The source can either be a filename or an object that contains XML text.
The result of parsing is an object that fits the ElementTree interface, and has a number of methods for examining the structure of the document.
Locate all child elements matching match. This is a handy shortcut for self.getroot().findall(match). See Element.findall().
Returns an iterable yielding all matching elements in document order.
Creates a tree iterator with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If tag is not None or ‘*’, only elements whose tag equals tag are returned from the iterator.
The ElementTree is a collection of individual Elements. Each Element is either an Element, a Comment, or a Processing Instruction. Generall, Comments and Processing Instructions behave like Elements.
Locate all child elements matching match. The match may be a simple tag name or and XPath expression.
Returns an iterable yielding all matching elements in document order.
Match queries can have the form "tag/tag/tag" to specify a specific grant-parent-parent-child nesting of tags. Additionally, “*” can be used as a wildcard.
For example, here’s a query that looks for a specific nesting of tags.
from xml.etree import ElementTree dom1 = ElementTree.parse("../PythonBook-2.5/p5-projects.xml") for t in dom1.findall("chapter/section/informaltable"): print t
Note that full XPATH syntax is accepted, but most of it is ignored.
Most programs need a way to write sophisticated, easy-to-control log files what contain status and debugging information. Any program that requires careful auditing will benefit from using the logging module to create an easy-to-read permanent log. Also, when we have programs with multiple modules, and need more sophisticated debugging, we’ll find a need for the logging module.
There are several closely related concepts that define a log.
Your program will have a hierarchical tree of Loggers. Each Logger is used to do two things. It creates LogRecord object with your messages about errors, or debugging information. It provides these LogRecords to Handlers.
Generally, each major component will have it’s own logger. The various loggers can have separate filter levels so that debugging or warning messages can be selectively enabled or disabled.
Your program will have a small number of Handlers, which are given LogRecords. A Handler can ignore the records, write them to a file or insert them into a database.
It’s common to have a handler which creates a very detailed log in a persistent file, and a second handler that simply reports errors and exceptions to the system’s stderr file.
Each Handler can make use of a Formatter to provide a nice, readable version of each LogRecord message.
Also, you can build sophisticated Filters if you need to handle complex situations.
The default configuration gives you a single Logger , named "", which uses a StreamHandler configured to write to standard error file, stderr.
Advantages. While the logging module can appear complex, it gives us a number of distinct advatages.
Multiple Loggers. We can easily create a large number of separate loggers. This helps us to manage large, complex programs. Each component of the program can have it’s own, indepenent logger.
We can configure the collection of loggers centrally, however, supporting sophisticated auditing and debugging which is independent of each individual component.
Also, all the loggers can feed a single, common log file.
Each logger can also have a severity level filter. This allows us to selectively enable debugging or disable warnings on a logger-by-logger basis.
Hierarchy of Loggers. Each Logger instance has a name, which is a .-separated string of names. For example, 'myapp.stock', 'myapp.portfolio'.
This forms a natural hierarchy of Loggers. Each child inherits the configuration from its parent, which simplifies configuration.
If, for example, we have a program which does stock portfolio analysis, we might have a component which does stock prices and another component which does overall portfolio calculations. Each component, then, could have a separate Logger which uses component name. Both of these Loggers are children of the "" Logger ; the configuration for the top-most Logger would apply to both children.
Some components define their own Loggers. For example SQLAlchemy, has a set of Loggers with 'sqlalchemy' as the first part of their name. You can configure all of them by using that top-level name. For specific debugging, you might alter the configuration of just one Logger, for example, 'sqlalchemy.orm.sync'.
Multiple Handlers. Each Logger can feed a number of Handlers. This allows you to assure that a single important log messages can go to multiple destinations. A common setup is to have two Handlers for log messages: a FileHandler which records everything, and a StreamHandler which writes only severe error messages to stderr.
For some kinds of applications, you may also want to add the SysLogHandler (in conjunction with a Filter) to send some messages to the operating system-maintained system log as well as the application’s internal log.
Another example is using the SMTPHandler to send selected log messages via email as well as to the application’s log and stderr.
Level Numbers and Filters. Each LogRecord includes a message level number, and a destination Logger name (as well as the text of the message and arguments with values to insert into the message). There are a number of predefined level numbers which are used for filtering. Additionally, a Filter object can be created to filter by destination Logger name, or any other criteria.
The predefined levels are CRITICAL, ERROR, WARNING, INFO, and DEBUG. These are coded with numeric values from 50 to 10.
Critical messages usually indicate a complete failure of the application, they are often the last message sent before it stops running; error messages indicate problems which are not fatal, but preclude the creation of usable results; warnings are questions or notes about the results being produced. The information messages are the standard messages to describe successful processing, and debug messages provide additional details.
By default, all Loggers will show only messages which have a level number greater than or equal to WARNING, which is generally 30. When enabling debugging, we rarely want to debug an entire application. Instead, we usually enable debugging on specific modules. We do this by changing the level of a specific Logger.
You can create additional level numbers or change the level numbers. Programmers familiar with Java, for example, might want to change the levels to SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, using level numbers from 70 through 10.
Module-Level Functions. The following module-level functions will get a Logger that can be used for logging. Additionally, there are functions can also be used to create Handlers, Filters and Formatters that can be used to configure a Logger.
Configures the logging system. By default this creates a StreamHandler directed to stderr, and a default Formatter. Also, by default, all Loggers show only WARNING or higher messages. There are a number of keyword parameters that can be given to basicConfig().
Typically, you’ll use this in the following form: logging.basicConfig( level=logging.INFO ).
Logger Method Functions. The following functions are used to create a LogRecord in a Logger; a LogRecord is then processed by the Handlers associated with the Logger.
Many of these functions have essentially the same signature. They accept the text for a message as the first argument. This message can have string conversion specifications, which are filled in from the various arguments. In effect, the logger does message % ( args ) for you.
You can provide a number of argument values, or you can provide a single argument which is a dictionary. This gives us two principle methods for producing log messages.
These functions also have an optional argument, exc_info , which can have either of two values. You can provide the keyword argument exc_info= sys.exc_info(). As an alternative, you can provide exc_info=True, in which case the logging module will call sys.exc_info() for you.
Creates a LogRecord with level ERROR on this logger. The positional arguments fill in the message; a single positional argument can be a dictionary.
Exception info is added to the logging message, as if the keyword parameter exc_info=True. This method should only be called from an exception handler.
Returns True if this Logger will handle messages of this level or higher. This can be handy to prevent creating really complex debugging output that would only get ignored by the logger. This is rarely needed, and is used in the following structure:
if log.isEnabledFor(logging.DEBUG): log.debug( "some complex message" )
The following method functions are used to configure a Logger. Generally, you’ll configure Loggers using the module level basicConfig() and fileConfig() functions. However, in some specialized circumstances (like unit testing), you may want finer control without the overhead of a configuration file.
When set to True, all the parents of a given Logger must also handle the message. This assures consistency for audit purposes.
When False, the parents will not handle the message. A False value might be used for keeping debugging messages separate from other messages. By default this is a True value.
There are also some functions which would be used if you were creating your own subclass of Logger for more specialized logging purposes. These methods include log.filter(), log.handle() and log.findCaller().
Using a Logger. Generally, there are a number of ways of using a Logger. In a module that is part of a larger application, we will get an instance of a Logger, and trust that it was configured correctly by the overall application. In the top-level application we may both configure and use a Logger.
This example shows a simple module file which uses a Logger.
import logging, sys logger= logging.getLogger(__name__) def someFunc( a, b ): logger.debug( "someFunc( %d, %d )", a, b ) try: return 2*int(a) + int(b) except ValueError, e: logger.warning( "ValueError in someFunc( %r, %r )", a, b, exc_info=True ) def mainFunc( *args ): logger.info( "Starting mainFunc" ) z= someFunc( args, args ) print z logger.info( "Ending mainFunc" ) if __name__ == "__main__": logging.fileConfig( "logmodule_log.ini" ) mainFunc( sys.argv[1:] ) logging.shutdown()
We import the logging module and the sys module.
We ask the logging module to create a Logger with the given name. We use the Python assigned __name__ name. This work well for all imported library modules and packages.
We do this through a factory function to assure that the logger is configured correctly. The logging module actually keeps a pool of Loggers, and will assure that there is only one instance of each named Logger.
This function has a debugging message and a warning message. This is typical of most function definitions. Ordinarily, the debug message will not show up in the log; we can only see it if we provide a configuration which sets the log level to DEBUG for the root logger or the logmodule Logger.
This function has a pair of informational messages. This is typical of “main” functions which drive an overall application program. Applications which have several logical steps might have informational messages for each step. Since informational messages are lower level than warnings, these don’t show up by default; however, the main program that uses this module will often set the overall level to logging.INFO to enable informational messages.
Create An Office Suite Result. Back in Iteration Exercises we used the for statement to produce tabular displays of data in a number of exercises. This included “How Much Effort to Produce Software?”, “Wind Chill Table”, “Celsius to Fahrenheit Conversion Tables” and “Dive Planning Table”. Update one of these programs to produce a CSV file. If you have a desktop office suite, be sure to load the CSV file into a spreadsheet program to be sure it looks correct.
Proper File Parsing. Back in File Module Exercises we built a quick and dirty CSV parser. Fix these programs to use the CSV module properly.
Configuration Processing. In Stock Valuation, we looked at a program which processed blocks of stock. One of the specific programs was an analysis report which showed the value of the portfolio on a given date at a given price. We make this program more flexible by having it read a configuration file with the current date and stock prices.
Office Suite Extraction. Most office suite software can save files in XML format as well as their own proprietary format. The XML is complex, but you can examine it in pieces using Python programs. It helps to work with highly structured data, like an XML version of a spreadsheet. For example, your spreadsheet may use tags like <Table>, <Row> and <Cell> to organize the content of the spreadsheet.
First, write a simple program to show the top-level elements of the document. It often helps to show the text within those elements so that you can correlate the XML structure with the original document contents.
Once you can display the top-level elements, you can focus on the elements that have meaningful data. For example, if you are parsing spreadsheet XML, you can assembled the values of all of the <Cell>‘s in a <Row> into a proper row of data, perhaps using a simple Python list.
This is some supplemental information on the xml.dom and xml.minidom object models for XML documents.
The Node class is the superclass for all of the various DOM classes. It defines a number of attributes and methods which are common to all of the various subclasses. This class should be thought of as abstract: it is not used directly; it exists to provide common features to all of the subclasses.
Here are the attributes which are common to all of the various kinds of Node objects.
This is an integer code that discriminates among the subclasses of Node. There are a number of helpful symbolic constants which are class variables in xml.dom.Node. These constants define the various types of Nodes.
ELEMENT_NODE, ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, NOTATION_NODE.
Here are some attributes which are overridden in each subclass of Node. They have slightly different meanings for each node type.
Here are some methods of a Node.
This is the top-level document, the object returned by the parser. It is a subclass of Node, so it inherits all of those attributes and methods. The Document class adds some attributes and method functions to the Node definition.
This is a specific element within an XML document. An element is surrounded by XML tags. In <para id="sample">Text</para>, the tag is <para>, which provides the name for the Element. Most Elements will have children, some will have Attributes as well as children. The Element class adds some attributes and method functions to the Node definition.
This is an attribute, within an Element. In <para id="sample">Text</para>, the tag is <para>; this tag has an attribute of id with a value of sample . Generally, the nodeType, nodeName and nodeValue attributes are all that are used. The Attr class adds some attributes to the Node definition.
This is the text within an element. In <para id="sample">Text</para> , the text is Text . Note that end of line characters and indentation also count as Text nodes. Further, the parser may break up a large piece of text into a number of smaller Text nodes. The Text class adds an attribute to the Node definition.