There is considerable overlap between a library module and a main program
script. The significant difference should be embodied in a small piece of
programming we’ll examine in Script or Library? The Main Program Switch. Once we’ve
looked at that, we can talk about the remaining features of a complete
command-line program in The Standard Command-Line Interface.
Script or Library? The Main Program Switch
Back in Thinking In Modules, and the Declaration of Dependence, we identified two general species
of Python modules: “main program” scripts and library modules.
Some Python files do the main work of a program, while other files provide the
definitions of classes and functions.
The library vs. script distinction is part of
our intent when designing a module; there’s no formal way to state this in
Python. Library modules can do some processing while being imported; a main
module can provide some definitions as well as the main script.
While this distinction is informal, the
overall intent should be clear: it either either provides definitions or
knits definitions together to do useful work.
The biggest and most obvious distinction is that the main program is
the file run by the user. This can be an icon the user double-clicked, or
a command the user typed at a command prompt. In either case, a single
Python file initiates the processing. This is what makes a given Python
file the “main” application.
If you look at Python application programs, you’ll see that the name
of the application almost always matches one of the file names. For
example, the IDLE application is launched by a
file named idle.py. This file contains the main part
of the application. IDLE has numerous other files, which contain class and
function definitions.
Program Varieties.
There are several subspecies of programs. We touched on this concept
in Files are the Plumbing of a Software Architecture.
In this book, we’ve focused
exclusively on command-line interface (CLI) programs
because they are simpler to create. A richly interactive Graphic User
Interface (GUI) program is generally more complex to build. Further, the core
functionality for a GUI is often easiest to develop and
debug as a CLI program. Once you have the
CLI program working, you can wrap it up with a
GUI.
To some programmers it seems more logical to design the user
experience of a GUI first, and get the windows, menus, and buttons to work
first. “After all,”, they argue, “the user’s interaction
is the most important part of the software.” As a practical matter,
however, this doesn’t work out well. It turns out to be far better to get
the essential data and processing defined and working first. Once this
works reliably and correctly, it’s easy to add a GUI to an already working
program.
What this usually means is that we have the following structure.
- One more more modules that defines the essential work of the program.
This is a “model” of the real world defined with Python objects.
- We often write a command-line application script that imports the
model.
- We can also write a GUI application script that imports the model.
This includes the graphical “view” and the “control” logic.
This clean
separation between the modules that do the work and the modules that
provide the user experience makes our life simpler in the long run because
each side of the application can be focused on a particular part of the task.
We’ll return these “varieties” of main programs in Architectural Patterns – A Family Tree.
Evolution.
Programs are built up from modules. In some cases, a program
evolves as a series of modules. First, we start with something really
basic. Then we write a module that imports our first module, and
implements better input and output. Then we figure out how the
optparse module works and we write a module which
imports the second and adds a better CLI. Then we write a GUI in GTK,
which imports all of our previous modules. At each step, we are building
additional features around the original small core of data or
processing.
Sometimes, we create a program using someone else’s complete
program. We might expand on someone else’s program or we might be knitting
two programs together to make something new.
In all of these cases, we will have modules which can be used as main
programs, but are also absorbed into a larger and more complex
program. Python gives us a very elegant mechanism for turning a main
program into a module that can be imported into a larger program.
The __name__ variable.
The global __name__ variable is the name of the currently executing
module. It helps us determine if a module is the main module – the module being run
by Python – or a library module being imported.
When the
__name__ variable is equal to
'__main__' this is the initial (or top-level or
outermost) file is being processed. When a module is being imported, the
__name__ variable is the name of the module being
imported.
If a module is the main program, it must do the useful work. If it is
a being imported, on the other hand, it is merely providing definitions to
some other main program, and should do no work except provide class and
function definitions.
You can type the following at the command line prompt in
IDLE. If you want to experiment, create a file
with just one line: print(__name__) and import this to see
what it does.
This __name__ variable allows a module to be used
as both a main program and as a library for another program. This can be
called the “main-import switch”, as it helps a module
determine if it is the main program or it is an import into another main
program. It gives us the ultimate flexibility to expand, refine and reuse
our modules for a variety of purposes.
A main program script generally looks like the following.
#!/usr/bin/env python
"""Module docstring"""
import someModule
def main():
*the real work*
if __name__ == "__main__":
main()
Tip
Debugging the Main Program Switch
There are two sides to the main program switch. When a module is
executed from the command line, you want it to do useful things. When a
module is imported by another module, you want it to provide
definitions, but not actually do anything.
Command-Line Behavior.
If you get a NameError, you misspelled
__name__. If, on the other hand, nothing seems to
happen, then you may have misspelled "__main__".
Another common problem is providing all of the class and function
definitions, but omitting the main script entirely. The
class and def statements all
execute silently. If there’s no main script to create the objects and
call the functions, then nothing will happen.
Import Behavior.
If things happen when you import a module, it’s missing the main
program switch. When a module is evolving from main program to library
that is used by a new main program, we sometimes leave the old main
program in place.
The best way to handle the change from main program to library is
to put the old main program into a function with a name like
main(), and then put it the simple main program
switch that calls this main() function when the
module name is "__main__".
The Standard Command-Line Interface
The glitzy desktop applications from big-name companies like Apple
and Microsoft are the most visible parts of our computer system. Many
programs, however, have minimal user interaction. They are run from
a command-line prompt, perform their function, and exit gracefully.
Almost all of the core GNU/Linux utilities
( cp, rm,
mv, ln,
ls, df,
du, etc.) are programs that decode command-line
parameters, perform their processing function and return a status code.
Except for some explicitly interactive programs like editors (
ex, vi,
emacs, etc.), the core elements of GNU/Linux
are command-line programs that lack a glitzy GUI.
In a way, we do interact with programs like
ls (Windows dir).
When we run the commands from the command prompt, we provide options and operands (or “arguments”).
The options begin
with - (Windows uses /). The operands are not
decorated with punctuation; usually they are file names, but could be
permissions or user names.
For example, we might do an ls -s /usr, which provides
an option of -s and an argument of /usr. (For
Windows, an example is dir /o:s “C:\Documents and Settings”,
which has an option of /o:s and an argument of
"C:\Documents and Settings".)
When the program runs, we see two kinds of output, usually
intermixed into one stream. We see the output plus any error messages. We
can use some redirection operators like > to capture the
output and send it to a file. We can use 2> to capture the
errors and send them to a file.
This redirection is beyond the scope of this book, but is covered
in all of the books on GNU/Linux programming.
Command-Line Interface (CLI) programs.
There are two critical features that make a CLI program
well-behaved. First, the program should accept parameters (options and arguments) in a
standard manner. Second, the program should generally limit output to
the standard output and standard error files created by the operating
system. When any other files are written it must be by user request and
possibly require interactive confirmation. It’s bad behavior to silently
overwrite a file.
The standard handling of command-line parameters is given as 13 rules
for UNIX commands, as shown in the intro section of UNIX man pages. These
rules describe the program name (rules 1-2), simple
options (rules 3-5), options that take argument
values (rules 6-8) and operands (rules 9 and 10) for
the program.
- The program name should be between two and nine characters. This
is consistent with most file systems where the program name is a file
name. In the Python environment, the program file is typically the
program name plus an extension of .py. Example:
python, idle.py.
- The program name should include only lower-case letters and
digits. The objective is to keep names relatively simple and easy to
type correctly. Mixed-case names and names with punctuation marks can
introduce difficulties in typing the program name correctly.
- Option names should be one character long. This is difficult to
achieve in complex programs. Often, options have two forms: a
single-character short form and a multi-character long form. Example:
ls -a, rm -i *.pyc.
- Single-character options are preceded by -.
Multiple-character options are preceded by --. All
options have a flag that indicates that this is an option, not an
operand. Single character options, again, are easier to type, but may
be hard to remember for new users of a program.
- Options with no arguments may be grouped after a single
-. This allows a series of one-character options to be
given in a simple cluster. Example ls -ldai clusters the
-l, -d, -a and -i options.
- Options that accept an argument value use a space separator. The
option arguments are not run together with the option. Without this
rule, it might be difficult to tell a option cluster from an option
with arguments. Example: cut -ds is an argument value of
s for the -d option.
- The argument value to an option cannot be optional. If an option
requires an argument value, presence of the option means that an
argument value will follow. The option is already optional; having an
optional argument doesn’t make much sense.
- Groups of option-arguments following an option must be a single
word; either separated by commas or quoted. A space would mean another
option or the beginning of the operands. Example: -d "9,10,56":
three numbers separated by commas form the argument
value for the -d option.
- All options must precede any operands on the command line. This
basic principle assures a simple, easy to understand uniformity to
command processing.
- The string -- may be used to indicate the end of
the options. This is particularly important when any of the operands
begin with - and might be mistaken for an option.
- The order of the options relative to one another should not
matter. Generally, a program should absorb all of the options to set
up the processing.
- The relative order of the operands may be significant. This
depends on what the operands mean and what the program does. The
operands are often file names, and the order in which the files are
processed may be significant. Example: ls -l -a is the
same as ls -a -l and ls -la.
- The operand - preceded and followed by a space
character should only be used to mean standard input. This may be
passed as an operand, to indicate that the standard input file is
processed at this time. Example, cat file1 - file2 will
process file1, standard input and
file2 in that order.
Parsing Command-Line Options.
These rules are handled by the getopt module, the
optparse module and the sys.argv variable in the
sys module.
Important
But Wait! This is fine GNU/Linux, but what about Windows?
Windows programmers have several choices. The most common solution
is to use the UNIX rules. They are compatible with Windows, simple and –
most important – standardized by POSIX. This means that your program
will use the - character for options, where the
Microsoft-supplied programs will use /. How often do you
use the Microsoft-supplied programs?
Another choice is to extend the getopt or
optparse modules to handle Windows punctuation
rules. This would allow you to seamlessly fit with the Microsoft
command-line programs.
And, of course, you can always write your own option parser that
looks for arguments which begin with /.
The command line arguments used to start Python are put into the
sys.argv variable of the sys module as a sequence of strings.
For example, when we run something like
python casinosim.py -g craps
The operating system (Linux or Windows) sees the
python command and runs the Python interpreter, passing
the remaining arguments to the Python interpreter as a list of strings:
["casinosim.py", "-g", "craps"].
The first operand to the Python interpreter is always the top-level
script to run. Python sets __name__ to
"__main__" and executes the file,
casinosim.py. The other argument values are placed
into sys.argv.
Overview of optparse.
First, of course, we have to think about our main program and how we want
to use it. Once we’ve figured out the arguments and options, we can
then use optparse to transform the arguments in sys.argv
into options and arguments our program can use.
The optparse module parses the command-line options in a three-step process.
- Create an empty parser.
- Define the options that this parser will handle.
- Parse the arguments. This gives you a tuple with two objects. One object
has the options as attributes. The other object is a list of the arguments
that followed the options.
Once we have the options and arguments, we can then do the real work of our program.
Parameter Parsing.
Let’s say we polished up some of our exercises to create a
complete program with the following synopsis.
portfolio.py -v -h -d mm/dd/yy -s symbol file
This program has the following options
-
-v
Verbosity. This can be repeated to increase the detail of the logging.
-
-h
Help. Provides a summary of portfolio.py.
-
-d mm/dd/yy
A particular sale date at which to evaluate the portfolio.
-
-s symbol
A particular symbol to select from the portfolio.
-
file
The name of a file with the portfolio data in CSV format.
These options can be processed as follows:
import optparse
parser= optparse.parser()
# -h automtically added by default
parser.add_option( "-v", action="count", dest="verbosity" )
parser.add_option( "-d", action="store", dest="date" )
parser.add_option( "-s", action="store", dest="symbol" )
options, filenames = parser.parse()
# options.verbosity is the count of -v options
# options.date is a string that must be further parsed
# options.symbol is a symbol string
# filenames is a list of files to process
Often, this option processing is packaged into a function called
main().
Formal Definitions.
Here are some formal definitions for parts of optparse.
-
optparse.parser(...) → Parser
Create a parser with the default option of -h and --help that provides
help on the command.
You can also override the program name, version number,
usage text and description that optparse will deduce from the
context in which it’s run.
You can provide the argument value of add_help_option=False to suppress
creating the -h and --help options.
If you provide version="someString", this will automatically add
a --version option that displays the version number.
-
class optparse.Parser
-
parser.add_option(option_string, action, ...)
Add an option to the parser. You can provide any combination of
short or long option strings of the form "-o" or "--option".
You must provide at least one, you can provide both.
The keyword parameter, action is essential for determining
what is to be done with that option.
Most parameters have a dest which is the destination attribute
of the options object that gets created.
You’ll define the option with a collection of keyword arguments.
There are a number of common cases.
- Positive Flags. add_option( "-f", "--flag", action="store_true", dest="flag", default=False )
In this case option.flag will be created and set to True.
- Negative Flags. add_option( "-f", "--flag", action="store_false", dest="flag", default=True )
In this case option.flag will be created and set to False.
- Options with String Values. add_option( "-s", "--string", action="store", dest="option", type="string")
In this case option.string will be created and set to the value of the -s option.
- Options with Numeric Values. add_option( "-i", "--int", action="store", dest="option", type="int")
In this case option.int will be created and set to the value of the -i option.
- Options that are Objects. add_option( "-c", "--command", action="store_const", const=SomeObject, dest="command")
In this case option.command will be created and set to the value SomeObject.
- Options that are Counted. add_option( "-v", "--verbose", action="count", dest="verbosity")
In this case option.verbosity will be created and set to the number of -v options present.
-
parser.parse() → options, arguments
Parse the sys.argv options and arguments, creating
an options object with all of the options and an
arguments list with all of the argument strings.
BTW – The exec Statement
The import statement, in effect, executes the
module file. Typically, a library-oriented module is a simple sequences of definitions.
The import statement executes all of those definitions.
It also creates a module object. Different
variations on import add to this by introducing different names into the global namespace.
Further, Python also
optimizes the modules brought in by the import statement so that they are only imported once.
The exec statement is similar to import, except it does not create
a module object. Consequently, it doesn’t do any optimization to execute a module
file just once.
The exec statement executes a suite of Python statements.
exec expression
The expression can be an open file (created with the open() function),
a string value which contains Python language statements, as well as a
code object created by the compile() function.
Additionally, this form of the exec statement executes in a given namespace.
exec expression in namespace
The namespace is a dictionary what will be used for any global
variables created by the statements executed.
>>> code="""a= 3
... b= 5
... c= a*b
... """
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> exec code in results
>>> results['a']
3
>>> results['c']
15
The functions eval() and execfile() do almost similar things.
Warning
warning
These are potentially dangerous tools. These break something we call the
Fundamental Assumption: the source you are reading is
the source that is being executed. A program that uses the
exec statement or eval() function
is incorporating other source statements into the program dynamically.
This can be hard to follow, maintain or enhance.
Generally, the exec statement is something that
should be avoided. There are almost always more suitable solutions that
involve extensible class design patterns.