pyWeb 2.1

In Python, Yet Another Literate Programming Tool

Table of Contents

Introduction
To Do
Change Log
Architecture and Design Overview
Implementation
Indices

Introduction

Literate programming was pioneered by Knuth as a method for developing readable, understandable presentations of programs. These would present a program in a literate fashion for people to read and understand; this would be in parallel with presentation as source text for a compiler to process and both would be generated from a common source file.

One intent is to synchronize the program source with the documentation about that source. If the program and the documentation have a common origin, then the traditional gaps between intent (expressed in the documentation) and action (expressed in the working program) are significantly reduced.

pyWeb is a literate programming tool that combines the actions of weaving a document with tangling source files. It is independent of any particular document markup or source language. Is uses a simple set of markup tags to define chunks of code and documentation.

Background

The following is an almost verbatim quote from Briggs' nuweb documentation, and provides an apt summary of Literate Programming.

In 1984, Knuth introduced the idea of literate programming and described a pair of tools to support the practise (Donald E. Knuth, Literate Programming, The Computer Journal 27 (1984), no. 2, 97-111.) His approach was to combine Pascal code with TEX documentation to produce a new language, WEB, that offered programmers a superior approach to programming. He wrote several programs in WEB, including weave and tangle, the programs used to support literate programming. The idea was that a programmer wrote one document, the web file, that combined documentation written in TEX (Donald E. Knuth, The TEXbook, Computers and Typesetting, 1986) with code (written in Pascal).

Running tangle on the web file would produce a complete Pascal program, ready for compilation by an ordinary Pascal compiler. The primary function of tangle is to allow the programmer to present elements of the program in any desired order, regardless of the restrictions imposed by the programming language. Thus, the programmer is free to present his program in a top-down fashion, bottom-up fashion, or whatever seems best in terms of promoting understanding and maintenance.

Running weave on the web file would produce a TEX file, ready to be processed by TEX. The resulting document included a variety of automatically generated indices and cross-references that made it much easier to navigate the code. Additionally, all of the code sections were automatically prettyprinted, resulting in a quite impressive document.

Knuth also wrote the programs for TEX and METAFONT entirely in WEB, eventually publishing them in book form. These are probably the largest programs ever published in a readable form.

Other Tools

Numerous tools have been developed based on Knuth's initial work. A relatively complete survey is available at sites like Literate Programming, and the OASIS XML Cover Pages: Literate Programming with SGML and XML.

The immediate predecessors to this pyWeb tool are FunnelWeb, noweb and nuweb. The ideas lifted from these other tools created the foundation for pyWeb.

There are several Python-oriented literate programming tools. These include LEO, interscript, lpy, py2html, PyLit.

The FunnelWeb tool is independent of any programming language and only mildly dependent on TEX. It has 19 commands, many of which duplicate features of HTML or L^ATEX.

The noweb tool was written by Norman Ramsey. This tool uses a sophisticated multi-processing framework, via Unix pipes, to permit flexible manipulation of the source file to tangle and weave the programming language and documentation markup files.

The nuweb Simple Literate Programming Tool was developed by Preston Briggs (preston@tera.com). His work was supported by ARPA, through ONR grant N00014-91-J-1989. It is written in C, and very focused on producing L^ATEX documents. It can produce HTML, but this is clearly added after the fact. It cannot be easily extended, and is not object-oriented.

The LEO tool, is a structured GUI editor for creating source. It uses XML and noweb-style chunk management. It is more than a simple weave and tangle tool.

The interscript tool is very large and sophisticated, but doesn't gracefully tolerate HTML markup in the document. It can create a variety of markup languages from the interscript source, making it suitable for creating HTML as well as L^ATEX.

The lpy tool can produce very complex HTML representations of a Python program. It works by locating documentation markup embedded in Python comments and docstrings. This is called "inverted literate programming".

The py2html tool does very sophisticated syntax coloring.

The PyLit tool is perhaps the very best approach to simple Literate programming, since it leverages an existing lightweight markup language and it's output formatting.

pyWeb

pyWeb works with any programming language and any markup language. This philosophy comes from FunnelWeb, noweb, nuweb and interscript. The primary differences between pyWeb and other tools are the following.

pyWeb is object-oriented, permitting easy extension. noweb extensions are separate processes that communicate through a sophisticated protocol. nuweb is not easily extended without rewriting and recompiling the C programs.
pyWeb is built in the very portable Python programming language. This allows it to run anywhere that Python 2.6 runs, with no additional tool or compiler dependencies. This makes it a useful tool for programmers in any language.
pyWeb is much simpler than FunnelWeb, LEO or Interscript. It has a very limited selection of commands, but can still produce complex programs and HTML documents.
pyWeb does not invent its own markup language like Interscript. Because Interscript has its own markup, it can generate LaTex or HTML or other output formats from a unique input format. While powerful, it seems simpler to avoid inventing yet another sophisticated markup language. The language pyWeb uses is very simple, and the author's use their preferred markup language almost exclusively.
pyWeb supports the forward literate programming philosophy, where a source document creates programming language and markup language. The alternative, deriving the document from markup embedded in program comments ("inverted literate programming"), seems less appealing. The disadvantage of inverted literate programming is that the final document can't reflect the original author's preferred order of exposition, since that informtion generally isn't part of the source code.
pyWeb also specifically rejects some features of nuweb and FunnelWeb. These include the macro capability with parameter substitution, and multiple references to a chunk. These two capabilities can be used to grow object-like applications from non-object programming languages (e.g. C or Pascal). Since most modern languages (Python, Java, C++) are object-oriented, this macro capability is more of a problem than a help.
Since pyWeb is built in the Python interpreter, a source document can include Python expressions that are evaluated during weave operation to produce time stamps, source file descriptions or other information in the woven or tangled output.

pyWeb works with any programming language and any markup language. The initial release supports HTML and L^ATEX via simple templates.

The following is extensively quoted from Briggs' nuweb documentation, and provides an excellent background in the advantages of the very simple approach started by nuweb and adopted by pyWeb.

The need to support arbitrary programming languages has many consequences:

No prettyprinting: Both WEB and CWEB are able to prettyprint the code sections of their documents because they understand the language well enough to parse it. Since we want to use any language, we've got to abandon this feature. However, we do allow particular individual formulas or fragments of L^ATEX or HTML code to be formatted and still be part of the output files.
Limited index of identifiers: Because WEB knows about Pascal, it is able to construct an index of all the identifiers occurring in the code sections (filtering out keywords and the standard type identifiers). Unfortunately, this isn't as easy in our case. We don't know what an identifier looks like in each language and we certainly don't know all the keywords. We provide a mechanism to mark identifiers, and we use a pretty standard pattern for recognizing identifiers almost most programming languages.

Of course, we've got to have some compensation for our losses or the whole idea would be a waste. Here are the advantages I [Briggs] can see:

Simplicity: The majority of the commands in WEB are concerned with control of the automatic prettyprinting. Since we don't prettyprint, many commands are eliminated. A further set of commands is subsumed by L^ATEX and may also be eliminated. As a result, our set of commands is reduced to only about seven members (explained in the next section). This simplicity is also reflected in the size of this tool, which is quite a bit smaller than the tools used with other approaches.
No prettyprinting: Everyone disagrees about how their code should look, so automatic formatting annoys many people. One approach is to provide ways to control the formatting. Our approach is simpler -- we perform no automatic formatting and therefore allow the programmer complete control of code layout.
Control: We also offer the programmer reasonably complete control of the layout of his output files (the files generated during tangling). Of course, this is essential for languages that are sensitive to layout; but it is also important in many practical situations, e.g., debugging.
Speed: Since [pyWeb] doesn't do too much, it runs very quickly. It combines the functions of tangle and weave into a single program that performs both functions at once.
Chunk numbers: Inspired by the example of noweb, [pyWeb] refers to all program code chunks by a simple, ascending sequence number through the file. This becomes the HTML anchor name, also.
Multiple file output: The programmer may specify more than one output file in a single [pyWeb] source file. This is required when constructing programs in a combination of languages (say, Fortran and C). It's also an advantage when constructing very large programs.

Use Cases

pyWeb supports two use cases, Tangle Source Files and Weave Documentation. These are often combined into a single request of the application that will both weave and tangle.

Tangle Source Files

A user initiates this process when they have a complete .w file that contains a description of source files. These source files are described with @o commands in the .w file.

The use case is successful when the source files are produced.

Outside this use case, the user will debug those source files, possibly updating the .w file. This will lead to a need to restart this use case.

The use case is a failure when the source files cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

The sequence is simply ./pyweb.py theFile.w.

Weave Source Files

A user initiates this process when they have a .w file that contains a description of a document to produce. The document is described by the entire .w file.

The use case is successful when the documentation file is produced.

Outside this use case, the user will edit the documentation file, possibly updating the .w file. This will lead to a need to restart this use case.

The use case is a failure when the documentation file cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

The sequence is simply ./pyweb.py theFile.w.

Tangle, Regression Test and Weave

A user initiates this process when they have a .w file that contains a description of a document to produce. The document is described by the entire .w file. Further, their final document should include regression test output from the source files created by the tangle operation.

The use case is successful when the documentation file is produced, including current regression test output.

Outside this use case, the user will edit the documentation file, possibly updating the .w file. This will lead to a need to restart this use case.

The use case is a failure when the documentation file cannot be produced, due to errors in the .w file. These must be corrected based on information in log messages.

The use case is a failure when the documentation file does not include current regression test output.

The sequence is as follows:

./pyweb.py -xw -pi theFile.w
python theTest >aLog
./pyweb.py -xt theFile.w

The first step excludes weaving and permits errors on the @i command. The -pi option is necessary in the event that the log file does not yet exist. The second step runs the regression test, creating a log file. The third step weaves the final document, including the regression test output.

Writing pyWeb .w Files

The input to pyWeb is a .w file that consists of a series of Chunks. Each Chunk is either program source code to be tangled or it is documentation to be woven. The bulk of the file is typically documentation chunks that describe the program in some human-oriented markup language like HTML or L^ATEX.

The pyWeb tool parses the input, and performs the tangle and weave operations. It tangles each individual output file from the program source chunks. It weaves a final documentation file file from the entire sequence of chunks provided, mixing the author's original documentation with some markup around the embedded program source.

pyWeb defines a very simple markup system in which the code chunks are surrounded with tags. The tags are used to assemble the tangled output into the requested file(s). The tags are replaced with markup so that a resulting woven document will process correctly through a browser or LaTeX tool.

The non-code chunks are not marked up in any way. Everything that's not explicitly a code chunk is simply output without modification.

All of the pyWeb tags begin with @. This can be changed.

The Structural tags (historically called "major commands") partition the input and define the various chunks. The Inline tags are (called "minor commands") are used to control the woven and tangled output from those chunks.

Structure Tags

There are two definitional tags; these define the various chunks in an input file. The

@o file @{ text @}: The @o (output) command defines a named output file chunk. The text is tangled to the named file with no alteration. It is woven into the document in an appropriate fixed-width font.
@d name @{ text @}: The @d (define) command defines a named chunk of program source. This text is tangled or woven when it is referenced by the reference inline tag.

Each @o and @d tag is followed by a chunk which is delited by @{ and @} tags. End the end of that chunk, there is an optional "major" tag.

@|: A chunk may define user identifiers. The list of defined identifiers is placed in the chunk, separated by the @| separator.

Additionally, these tags provide for the inclusion of additional input files. This is necessary for decomposing a long document into easy-to-edit sections.

@i file: The @i (include) command includes another file. The previous chunk is ended. The file is processed completely, then a new chunk is started for the text after the @i command.

All material that is not explicitly in a @o or @d named chunk is implicitly collected into a sequence of anonymous document source chunks. These anonymous chunks form the backbone of the document that is woven. The anonymous chunks are never tangled into output program source files. They are woven into the document without any alteration.

Note that white space (line breaks ('\n'), tabs and spaces) have no effect on the input parsing. They are completely preserved on output.

The following example has three chunks. An anonymous chunk of documentation, a named output chunk, and an anonymous chunk of documentation.


<p>Some HTML documentation that describes the following piece of the
program.</p>
@o myFile.py 
@{
import math
print math.pi
@| math math.pi
@}
<p>Some more HTML documentation.</p>

Inline Tags

There are several tags that are replaced by content in the woven output.

@@: The @@ command creates a single @ in the output file. This is replaced in tangled as well as woven output.
@<name@>: The name references a named chunk. When tangling, the referenced chunk replaces the reference command. When weaving, a reference marker is used. For example, in HTML, this can be replaced with <A HREF=...> markup. Note that the indentation of the @< tag is preserved for the tangled chunk that replaces the tag.
@(Python expression@): The Python expression is evaluated and the result is tangled or woven in place. A few global variables and modules are available. These are described below.

There are three index creation tags that are replaced by content in the woven output.

@f: The @f command inserts a file cross reference. This lists the name of each file created by an @o command, and all of the various chunks that are concatenated to create this file.
@m: The @m command inserts a named chunk ("macro") cross reference. This lists the name of each chunk created by an @d command, and all of the various chunks that are concatenated to create the complete chunk.
@u: The @u command inserts a user identifier cross reference. This lists the name of each chunk created by an @d command, and all of the various chunks that are concatenated to create the complete chunk.

Document Overhead

The documents generally need some minimal overheads to work correctly.

The RST weaver requires that you have .. include <isoamsa.txt>

The LaTeX weaver requires that you have \usepackage{fancyvrb}

Additional Features

The named chunks (from both @o and @d commands) are assigned unique sequence numbers to simplify cross references. In LaTex it is possible to determine the page breaks and assign the sequence numbers based on the physical pages.

Chunk names and file names are case sensitive.

Chunk names can be abbreviated. A partial name can have a trailing ellipsis (...), this will be resolved to the full name. The most typical use for this is shown in the following example.


<p>Some HTML documentation.</p>
@o myFile.py 
@{
@<imports of the various packages used@>
print math.pi,time.time()
@}
<p>Some notes on the packages used.</p>
@d imports...
@{
import math,time
@| math time
@}
<p>Some more HTML documentation.</p>

An anonymous chunk of documentation.
A named chunk that tangles the myFile.py output. It has a reference to the imports of the various packages used chunk. Note that the full name of the chunk is essentially a line of documentation, traditionally done as a comment line in a non-literate programming environment.
An anonymous chunk of documentation.
A named chunk with an abbreviated name. The imports... matches the complete name. Set off after the @| separator is the list of identifiers defined in this chunk.
An anonymous chunk of documentation.

Note that the first time a name appears (in a reference or definition), it must be the full name. All subsequent uses can be elisions. Also not that ambiguous elision is an annoying problem when you first start creating a document.

Named chunks are concatenated from their various pieces. This allows a named chunk to be broken into several pieces, simplifying the description. This is most often used when producing fairly complex output files.


<p>An anonymous chunk with some HTML documentation.</p>
@o myFile.py 
@{
import math,time
@}
<p>Some notes on the packages used.</p>
@o myFile.py
@{
print math.pi,time.time()
@}
<p>Some more HTML documentation.</p>

An anonymous chunk of documentation.
A named chunk that tangles the myFile.py output. It has the first part of the file. In the woven document this is marked with "=".
An anonymous chunk of documentation.
A named chunk that also tangles the myFile.py output. This chunk's content is appended to the first chunk. In the woven document this is marked with "+=".
An anonymous chunk of documentation.

Newline characters are preserved on input. Because of this the output may appear to have excessive newlines. In all of the above examples, each named chunk was defined with the following.


@{
import math,time
@}

This puts a newline character before and after the import line.

One transformation is performed when tangling output. The indentation of a chunk reference is applied to the entire chunk. This makes it simpler to prepare source for languages (like Python) where indentation is important. It also gives the author control over how the final tangled output looks.

Also, note that the myFile.py uses the @| command to show that this chunk defines the identifier aFunction.


<p>An anonymous chunk with some HTML documentation.</p>
@o myFile.py 
@{
def aFunction( a, b ):
    @<body of the aFunction@>
@| aFunction @}
<p>Some notes on the packages used.</p>
@d body...
@{
"""doc string"""
return a + b
@}
<p>Some more HTML documentation.</p>

The tangled output from this will look like the following. All of the newline characters are preserved, and the reference to body of the aFunction is indented to match the prevailing indent where it was referenced. In the following example, explicit line markers of ~ are provided to make the blank lines more obvious.


~
~def aFunction( a, b ):
~        
~    """doc string"""
~    return a + b
~

There are two possible implementations for evaluation of a Python expression in the input.

Create an ExpressionCommand, and append this to the current Chunk. This will allow evaluation during weave processing and during tangle processing. This makes the entire weave (or tangle) context available to the expression, including completed cross reference information.
Evaluate the expression during input parsing, and append the resulting text as a TextCommand to the current Chunk. This provides a common result available to both weave and parse, but the only context available is the WebReader and the incomplete Web, built up to that point.

In this implementation, we adopt the latter approach, and evaluate expressions immediately. A simple global context is created with the following variables defined.

time: This is the standard time module.
os: This is the standard os module.
theLocation: A tuple with the file name, first line number and last line number for the original expression's location
theWebReader: The WebReader instance doing the parsing.
thisApplication: The name of the running pyWeb application.
__version__: The version string in the pyWeb application.

Running pyWeb to Tangle and Weave

Assuming that you have marked pyweb.py as executable, you do the following.

./pyweb.py file...

This will tangle the @o commands in each file. It will also weave the output, and create file.html.

Command Line Options

Currently, the following command line options are accepted.

-v: Verbose logging. The default is changed by updating the constructor for theLog from Logger(standard) to Logger(verbose).
-s: Silent operation. The default is changed by updating the constructor for theLog from Logger(standard) to Logger(silent).
-c x: Change the command character from @ to x. The default is changed by updating the constructor for theWebReader from WebReader(f,'@') to WebReader(f,'x').
-w weaver: Choose a particular documentation weaver, for instance 'rst', 'html', 'latex'. The default is based on the first few characters of the input file. You can do this by updating the language determination call in the application main function from l= w.language() to l= HTML().
-xw: Exclude weaving. This does tangling of source program files only.
-xt: Exclude tangling. This does weaving of the document file only.
-pcommand: Permit errors in the given list of commands. The most common version is -pi to permit errors in locating an include file. This is done in the following scenario: pass 1 uses -xw -pi to exclude weaving and permit include-file errors; the tangled program is run to create test results; pass 2 uses -xt to exclude tangling and include the test results.

Restrictions

pyWeb requires any Python that supports from __future__ import print_function. Generally version 2.6. or newer.

Currently, input is not detabbed; Python users generally are discouraged from using tab characters in their files.

Installation

You must have Python 2.6.

Download and expand pyweb.zip. You will get pyweb.css, pyweb.html, pyweb.pdf, pyweb.py and pyweb.w.
Except on Windows, chmod +x pyweb.py.
If you like, cp pyweb.py /usr/local/bin/pyweb to make a global command.
Make a bootstrap copy of pyweb.py (I copy it to pyweb-2.1.py). You can run ./pyweb.py pyweb.w to generate the latest and greatest pyweb.py file, as well as this documentation, pyweb.html.

Be sure to save a bootstrap copy of pyweb.py before changing pyweb.w. Should your changes to pyweb.w introduce a bug into pyweb.py, you will need a fall-back version of pyWeb that you can use in place of the one you just damaged.

Acknowledgements

This application is very directly based on (derived from?) work that preceded this, particularly the following:

Ross N. Williams' FunnelWeb
Norman Ramsey's noweb
Preston Briggs' nuweb, currently supported by Charles Martin and Marc W. Mengel

Also, after using John Skaller's interscript for two large development efforts, I finally understood the feature set I really needed.

Jason Fruit contributed the current LaTeX template segments being used.

To Do

Fix OutputChunk to also include the comment convention for the file being produced. While it's possible to guess from the file extension, this can be unwise. '.py' is "#", '.java' or '.cpp' is '//', etc.
Offer an HTML template with a code-quoting filter like PyFontify or Pygments to add Syntax coloring to a Python-specific HTML weaver. See Syntax Highlight for more information.
Rethink the MacroAction. Is this really necessary? Wouldn't the Application be simpler without it?
Consider getting templates from a "header" section in the .w file. This removes any weaver command-line option; it's defined within the file. Also, setting the command character can be done in the header. To support multiple projects, the header would probably be included with @i, indicating that embedding in the .w file isn't as useful as keeping it separate. See the weave.py example.
Offer a basic HTML template that uses CDATA sections instead of quoting. Does require the standard quoting for the CDATA end tag.
The createUsedBy() method can be done incrementally by accumulating a list of forward references to chunks; as each new chunk is added, any references to the chunk are removed from the forward references list, and a call is made to the Web's setUsage method. References backward to already existing chunks are easily resolved with a simple lookup.
Use a Builder pattern to plug an explicit WebBuilder instance into the WebReader class to build the parse tree. This can be overridden to, for example, do incremental building in one pass.
Note that the Web is a lot like a NamedChunk; this could be factored out. This will create a more proper Composition pattern implementation.

Change Log

Changes since version 1.4.

Removed home-brewed logger.
Replaced getopt with optparse.
Replaced LaTeX markup.
Corrected significant problems in cross-reference resolution.
Replaced all HTML and LaTeX-specific features with a much simpler template engine which applies a template to a Chunk. The Templates are separate configuration items. The big issue with templates are conditional processing and the use of loops to handle multiple references in a transitive closure. While it's nice to depend on Jinja2, it's also nice to be totally stand-alone. Sigh. Choices include the no-logic string.Template in the standard library an the Templite+ Recipe 576663.
Looked at SCons API. Renamed "Operation" to "Action"; renamed "perform" to "__call__". Consider having "__call__" which does logging, then call "execute". Weaver fits nicely with SCons Builder since we can see Weave( "someFile.w" ) as sensible. Tangling is tougher because the @o commands define the dependencies there.
Eliminated the EmitterFactory; replace this with simple injection of the proper template configuration.
Removed the @O command; it was essentially a variant template for LaTeX.
Disentangled indentation and quoting in the codeBlock. Everyone needs indentation -- it's a lower-level feature of write. Quoting, however, is unique to a woven codeBlock. Fix referenceTo to write indented without code quoting.
Offer an RST template. Note that colorizing may be easier to handle with an RST template. The weaving markup template degenerates to .. parsed-literal:: and indent. By doing this, the RST output from pyWeb can be run through DocUtils rst2html.py or perhaps Sphix to create final HTML. The hard part is the indent.
Fixed ReferenceCommand tangle and all setIndent/clrIndent operations. Only a ReferenceCommand actually cares about indentation. And that indentation is totally based on the "context" plus the text in the Command immediate in front of the ReferenceCommand.

Architecture and Design Overview

This application breaks the overall problem into the following sub-problems.

Repesentation of the Web as Chunks and Commands
Reading and parsing the input.
Weaving a document file.
Tangling the desired program source files.

Representation

The basic "parse tree" is actually quite flat. The source document can be decomposed into a simple sequence of Chunks. Each Chunk is a simple sequence of Commands.

Chunks and commands cannot be nested, leading to delightful simplification.

The overall parse "tree" is contained in the overall Web. The web includes the sequence of Chunks as well as an index for the Named chunks.

Note that a named chunk may be created through a number of @d commands. This means that Each named chunk may be a sequence of Chunks with a common name.

Each chunk is composed of a sequence of instances of Command. Because of this uniform composition, the several operations (particularly weave and tangle) can be delegated to each Chunk, and in turn, delegated to each Command that composes a Chunk.

Reading and Parsing

A solution to the reading and parsing problem depends on a convenient tool for breaking up the input stream and a representation for the chunks of input. Input decomposition is done with the Python Splitter pattern.

The Splitter pattern is widely used in text processing, and has a long legacy in a variety of languages and libraries. A Splitter decomposes a string into a sequence of strings using the split pattern. There are many variant implementations. One variant locates only a single occurence (usually the left-most); this is commonly implemented as a Find or Search string function. Another variant locates all occurrences of a specific string or character, and discards the matching string or character.

The variation on Splitter that we use in this application creates each element in the resulting sequence as either (1) an instance of the split regular expression or (2) the text between split patterns. By preserving the actual split text, we can define our splitting pattern with the regular expression '@.'. This will split on any @ followed by a single character. We can then examine the instances of the split RE to locate pyWeb commands.

We could be a tad more specific and use the following as a split pattern: '@[doOifmu|<>(){}[\]]'. This would silently ignore unknown commands, merging them in with the surrounding text. This would leave the '@@' sequences completely alone, allowing us to replace '@@' with '@' in every text chunk.

Weaving

The weaving operation depends on the target document markup language. There are several approaches to this problem. One is to use a markup language unique to pyWeb, and emit markup in the desired target language. Another is to use a standard markup language and use converters to transform the standard markup to the desired target markup. The problem with the second method is specifying the markup for actual source code elements in the document. These must be emitted in the proper markup language.

Since the application must transform input into a specific markup language, we opt using the Strategy pattern to encapsulate markup language details. Each alternative markup strategy is then a subclass of Weaver. This simplifies adding additional markup languages without inventing a markup language unique to pyWeb. The author uses their preferred markup, and their preferred toolset to convert to other output languages.

Tangling

The tangling operation produces output files. In earlier tools, some care was taken to understand the source code context for tangling, and provide a correct indentation. This required a command-line parameter to turn off indentation for languages like Fortran, where identation is not used. In pyWeb, the indent of the actual @< command is used to set the indent of the material that follows. If all @< commands are presented at the left margin, no indentation will be done. This is helpful simplification, particularly for users of Python, where indentation is significant.

The standard Emitter class handles this basic indentation. A subclass can be created, if necessary, to handle more elaborate indentation rules.

Implementation

The implementation is contained in a file that both defines the base classes and provides an overall main() function. The main() function uses these base classes to weave and tangle the output files.

The broad outline of the presentation is as follows:

Base Class Definitions. This includes the web structure, the emitters (Weavers and Tanglers) and the high-level actions.
pyWeb Module File, including Module Initialization, Application Class and main function.
Additional Scripts
Administrative Elements

Base Class Definitions

There are three major class hierarchies that compose the base of this application. These are families of related classes that express the basic relationships among entities.

Emitters - An Emitter creates an output file, either source code, LaTeX or HTML from the chunks that make up the source file. Two major subclasses are Weaver, which has a focus on markup output, and Tangler which has a focus on pure source output. HTML and LaTeX are further specializations of the Weaver class. The TanglerMake subclass of the Tangler class is a make-friendly source-code emitter.
Chunks - a Chunk is a collection of Command instances. This can be either an anonymous chunk that will be sent directly to the output, or one the classes of named chunks delimited by the major @d or @o commands.
Commands - A Command contains user input and creates output. This can be a block of text from the input file, one of the various kinds of cross reference commands (@f, @m, @u) or a reference to a chunk (via the @<name@> sequence).

Additionally, there are several supporting classes:

a Web class for the interconnected web of Chunks.
a WebReader class that parses the input, creating the Commands and Chunks.
an Error class for exceptions that are unique to this application.

Base Class Definitions (1) =


→Error class - defines the errors raised (90)
→Command class hierarchy - used to describe individual commands (75)
→Chunk class hierarchy - used to describe input chunks (51)
→Web class - describes the overall "web" of chunks (94)
→Emitter class hierarchy - used to control output files (2)
→Reference class hierarchy - references to a chunk (91)(92)(93) 
→WebReader class - parses the input file, building the Web structure (112)
→Action class hierarchy - used to describe basic actions of the application (131)

◊ Base Class Definitions (1). Used by pyweb.py (150).

Emitters

An Emitter instance is resposible for control of an output file format. This includes the necessary file naming, opening, writing and closing operations. It also includes providing the correct markup for the file type.

There are several subclasses of the Emitter superclass, specialized for various file formats.

Emitter class hierarchy - used to control output files (2) =


→Emitter superclass (3)
→Weaver subclass of Emitter to create documentation with fancy markup and escapes (13)
→LaTeX subclass of Weaver (23)
→HTML subclass of Weaver (31)(32)
→Tangler subclass of Emitter to create source files with no markup (43)
→Tangler subclass which is make-sensitive (48)

◊ Emitter class hierarchy - used to control output files (2). Used by Base Class Definitions (1); pyweb.py (150).

An Emitter instance is created to contain the various details of writing an output file. Emitters are created as follows:

A Web object will create an Emitter to weave the final document.
A Web object will create an Emitter to tangle each file.

Since each Emitter instance is responsible for the details of one file type, different subclasses of Emitter are used when tangling source code files (Tangler) and weaving files that include source code plus markup (Weaver).

Further specialization is required when weaving HTML or LaTeX. Generally, this is a matter of providing two things:

Boilerplate text to replace various pyWeb constructs
Escape rules to make source code amenable to the markup language

An additional part of the escape rules can include using a syntax coloring toolset instead of simply applying escapes.

In the case of tangling, the following algorithm is used:

Visit each each output Chunk (@o), doing the following:
1. Open the Tangler instance using the target file name.
2. Visit each Chunk directed to the file, calling the chunk's tangle() method.
  1. Call the Tangler's docBegin() method. This sets the Tangler's indents.
  2. Visit each Command, call the command's tangle() method. For the text of the chunk, the text is written to the tangler using the codeBlock() method. For references to other chunks, the referenced chunk is tangled using the referenced chunk's tangler() method.
  3. Call the Tangler's docEnd() method. This clears the Tangler's indents.

In the case of weaving, the following algorithm is used:

If no Weaver is given, examine the first Command of the first Chunk and create a weaver appropriate for the output format. A leading '<' indicates HTML, otherwise assume LaTeX.
Open the Weaver instance using the source file name. This name is transformed by the weaver to an output file name appropriate to the language.
Visit each each sequential Chunk (anonymous, @d or @o), doing the following:
1. Visit each Chunk, calling the Chunk's weave() method.
  1. Call the Weaver's docBegin(), fileBegin() or codeBegin() method, depending on the subclass of Chunk. For fileBegin() and codeBegin(), this writes the header for a code chunk in the weaver's markup language. A slightly different decoration is applied by fileBegin() and codeBegin().
  2. Visit each Command, call the Command's weave() method. For ordinary text, the text is written to the Weaver using the codeBlock() method. For references to other chunks, the referenced chunk is woven using the Weaver's referenceTo() method.
  3. Call the Weaver's docEnd(), fileEnd() or codeEnd() method. For fileEnd() or codeEnd(), this writes a trailer for a code chunk in the Weaver's markup language.

Emitter Superclass

Usage

The Emitter class is not a concrete class; it is never instantiated. It contains common features factored out of the Weaver and Tangler subclasses.

Inheriting from the Emitter class generally requires overriding one or more of the core methods: doOpen(), doClose() and doWrite(). A subclass of Tangler, might override the code writing methods: codeLine(), codeBlock() or codeFinish().

Design

The Emitter class is an abstract superclass for all emitters. It defines the basic framework used to create and write to an output file. This class follows the Template design pattern. This design pattern directs us to factor the basic open(), close() and write() methods into three step algorithms.

def open( self ):
    common preparation
    self.do_open() #overridden by subclasses
    common finish-up tasks

The common preparation and common finish-up sections are generally internal housekeeping. The do_open() method would be overridden by subclasses to change the basic behavior.

Implementation

The class has the following attributes:

fileName, the name of the current open file created by the open method;
theFile, the current open file created by the open method;
context, the indentation context stack, updated by setIndent, clrIndent and resetIndent methods;
indent, the current indentation, the topmost value on the context stack;
lastIndent, the last indent used when writing a line of source code.
linesWritten, the total number of '\n' characters written to the file.

Emitter superclass (3) =


class Emitter( object ):
    """Emit an output file; handling indentation context."""
    def __init__( self ):
        self.fileName= ""
        self.theFile= None
        self.context= [0]
        self.indent= 0
        self.lastIndent= 0
        self.linesWritten= 0
        self.totalFiles= 0
        self.totalLines= 0
        self.log_indent= logging.getLogger( "pyweb.%s.indent" % self.__class__.__name__ )
    def __str__( self ):
        return self.__class__.__name__
    →Emitter core open, close and write (4)
    →Emitter write a block of code (8)(9)(10)
    →Emitter indent control: set, clear and reset (11)

◊ Emitter superclass (3). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The core open() method tracks the open files. A subclass overrides a doOpen() method to name the output file, and then actually open the file. The Weaver will create an output file with a name that's based on the overall project. The Tangler will open the given file name.

The close() method closes the file. As with open(), a doClose() method actually closes the file. This allows subclasses to do overrides on the actual file processing.

The write() method is the lowest-level, unadorned write. This does no some additional counting as well as moving the characters to the file. Any further processing could be added in a function that overrides doWrite().

The default write() method prints to the standard output file.

Emitter core open, close and write (4) =


def open( self, aFile ):
    """Open a file."""
    self.fileName= aFile
    self.doOpen( aFile )
    self.linesWritten= 0
→Emitter doOpen, to be overridden by subclasses (5)
def close( self ):
    self.codeFinish()
    self.doClose()
    self.totalFiles += 1
    self.totalLines += self.linesWritten
→Emitter doClose, to be overridden by subclasses (6)
def write( self, text ):
    if text is None: return
    self.linesWritten += text.count('\n')
    self.doWrite( text )
→Emitter doWrite, to be overridden by subclasses (7)

◊ Emitter core open, close and write (4). Used by Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The doOpen(), doClose() and doWrite() method is overridden by the various subclasses to perform the unique operation for the subclass.

Emitter doOpen, to be overridden by subclasses (5) =


def doOpen( self, aFile ):
    self.fileName= aFile
    logger.debug( "creating %r", self.fileName )

◊ Emitter doOpen, to be overridden by subclasses (5). Used by Emitter core open, close and write (4); Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Emitter doClose, to be overridden by subclasses (6) =


def doClose( self ):
    logger.debug( "wrote %d lines to %s",
        self.linesWritten, self.fileName )

◊ Emitter doClose, to be overridden by subclasses (6). Used by Emitter core open, close and write (4); Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Emitter doWrite, to be overridden by subclasses (7) =


def doWrite( self, text ):
    print( text, end=None )

◊ Emitter doWrite, to be overridden by subclasses (7). Used by Emitter core open, close and write (4); Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeBlock() method writes several lines of code. It calls the codeLine() method for each line of code after doing the correct indentation. Often, the last line of code is incomplete, so it is left unterminated. This last line of code also shows the indentation for any additional code to be tangled into this section.

Note that tab characters confuse the indent algorithm. Tabs are not expanded to spaces in this application. They should be expanded prior to creating a .w file.

The algorithm is as follows:

Save the topmost value of the context stack as the current indent.
Split the block of text on '\n' boundaries.
For each line (except the last), call codeLine() with the indented text, ending with a newline.
The string split() method will put a trailing zero-length element in the list if the original block ended with a newline. We drop this zero length piece to prevent writing a useless fragment of indent-only after the final '\n'. If the last line has content, call codeLine with the indented text, but do not write a trailing '\n'.
Save the length of the last line as the most recent indent.

Emitter write a block of code (8) =


def codeBlock( self, text ):
    """Indented write of a block of code."""
    self.indent= self.context[-1]
    lines= text.split( '\n' )
    for l in lines[:-1]:
        self.write( '%s%s\n' % (self.indent*' ',l) )
    if lines[-1]:
        self.write( '%s%s' % (self.indent*' ',lines[-1]) )
    self.lastIndent= len(lines[-1]) + self.indent

◊ Emitter write a block of code (8). Used by Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeLine() method writes a single line of source code. This is often overridden by Weaver subclasses to transform source into a form acceptable by the final weave file format.

In the case of an HTML weaver, the HTML reserved characters (<, >, &, and ") must be replaced in the output of code. However, since the author's original document sections contain HTML these will not be altered.

Emitter write a block of code (9) +=


quoted_chars = [
    # Must be empty for tangling to work.
]

def quote( self, aLine ):
    """Each individual line of code; often overridden by weavers to quote the code."""
    clean= aLine
    for from_, to_ in self.quoted_chars:
        clean= clean.replace( from_, to_ )
    return clean

◊ Emitter write a block of code (9). Used by Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeFinish() method finishes writing any cached lines when the emitter is closed.

Emitter write a block of code (10) +=


def codeFinish( self ):
    if self.lastIndent > 0:
        self.write('\n')

◊ Emitter write a block of code (10). Used by Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The setIndent() method pushes the last indent on the context stack. This is used when tangling source to be sure that the included text is indented correctly with respect to the surrounding text.

The clrIndent() method discards the most recent indent from the context stack. This is used when finished tangling a source chunk. This restores the indent to the prevailing indent.

The resetIndent() method removes all indent context information.

TODO: Note that setIndent() should be refactored, since tangling uses the command option and weaving uses the fixed option.

Emitter indent control: set, clear and reset (11) =


def setIndent( self, fixed=None, command=None ):
    """Either use a fixed indent (for weaving) or the previous command (for tangling)."""
    self.context.append( self.context[-1]+command.indent() if fixed is None else fixed )
    self.log_indent.debug( "setIndent %s: %r", fixed, self.context )
def clrIndent( self ):
    if len(self.context) > 1:
        self.context.pop()
    self.indent= self.context[-1]
    self.log_indent.debug( "clrIndent %r", self.context )
def resetIndent( self ):
    self.context= [0]
    self.log_indent.debug( "resetIndent %r", self.context )

◊ Emitter indent control: set, clear and reset (11). Used by Emitter superclass (3); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Weaver subclass of Emitter

Usage

A Weaver is an Emitter that produces the final user-focused document. This will include the source document with the code blocks surrounded by markup to present that code properly. In effect, the pyWeb @ commands are replaced by markup.

The Weaver class uses a simple set of templates to product RST markup.

Most weaver languages don't rely on special indentation rules. The woven code samples usually start right on the left margin of the source document. However, the RST markup language does rely on indentation of code blocks. For that reason, the weavers have a fixed indent for code blocks. This is generally set to zero, except when generating RST.

Design

The Weaver subclass defines an Emitter used to weave the final documentation. This involves decorating source code to make it displayable. It also involves creating references and cross references among the various chunks.

The Weaver class adds several methods to the basic Emitter methods. These additional methods are also included that are used exclusively when weaving, never when tangling.

Implementation

This class hierarch depends heavily on the string module.

Imports (12) =

import string

◊ Imports (12). Used by pyweb.py (150).

Weaver subclass of Emitter to create documentation with fancy markup and escapes (13) =


class Weaver( Emitter ):
    """Format various types of XRef's and code blocks when weaving."""
    extension= ".rst" # A subclass will provide their preferred extension
    code_indent= 4
    →Weaver doOpen, doClose and doWrite overrides (14)
    
    # Template Expansions.
    →Weaver quoted characters (15)
    →Weaver document chunk begin-end (16)
    →Weaver reference summary, used by code chunk and file chunk (17)
    →Weaver code chunk begin-end (18)
    →Weaver file chunk begin-end (19)
    →Weaver reference command output (20)
    →Weaver cross reference output methods (21)(22)

◊ Weaver subclass of Emitter to create documentation with fancy markup and escapes (13). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The open method opens the file for writing. For weavers, the file extension is specified part of the target markup language being created.

The close method overrides the Emitter class close() method by closing the actual file created by the open() method.

This write method overrides the Emitter class write() method by writing to the actual file created by the open() method.

Weaver doOpen, doClose and doWrite overrides (14) =


def doOpen( self, aFile ):
    src, _ = os.path.splitext( aFile )
    self.fileName= src + self.extension
    self.theFile= open( self.fileName, "w" )
    logger.info( "Weaving %r", self.fileName )
def doClose( self ):
    self.theFile.close()
    logger.info( "Wrote %d lines to %r", 
        self.linesWritten, self.fileName )
def doWrite( self, text ):
    self.theFile.write( text )

◊ Weaver doOpen, doClose and doWrite overrides (14). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The remaining methods apply a chunk to a template.

Weaver quoted characters (15) =


quoted_chars = [
    # prevent some RST markup from being recognized
    ('`',r'\`'),
    ('_',r'\_'), 
    ('*',r'\*'),
    ('|',r'\|'),
]

◊ Weaver quoted characters (15). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The docBegin() and docEnd() methods are used when weaving a document text chunk. Typically, nothing is done before emitting these kinds of chunks. However, putting a  comment is an example of possible additional processing.

Weaver document chunk begin-end (16) =


def docBegin( self, aChunk ):
    pass
def docEnd( self, aChunk ):
    pass

◊ Weaver document chunk begin-end (16). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Each code chunk includes the places where the chunk is referenced

Weaver reference summary, used by code chunk and file chunk (17) =


ref_template = string.Template( "\nUsed by: ${refList}\n" )
ref_item_template = string.Template( "$fullName (`${seq}`_)" )
def references( self, aChunk ):
    if aChunk.references_list:
        refList= [ 
            self.ref_item_template.substitute( seq=s, fullName=n )
            for n,s in aChunk.references_list ]
        return self.ref_template.substitute( refList="; ".join( refList ) ) # HTML Separator
    return ""

◊ Weaver reference summary, used by code chunk and file chunk (17). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeBegin() method emits the necessary material prior to a chunk of source code, defined with the @d command.

The codeEnd() method emits the necessary material subsequent to a chunk of source code, defined with the @d command. Links or cross references to chunks that refer to this chunk can be emitted.

Weaver code chunk begin-end (18) =


cb_template = string.Template( "\n..  _`${seq}`:\n..  rubric:: ${fullName} (${seq})\n..  parsed-literal::\n    " )
def codeBegin( self, aChunk ):
    tex = self.cb_template.substitute( 
        seq= aChunk.seq,
        lineNumber= aChunk.lineNumber, 
        fullName= aChunk.fullName,
        concat= "=" if aChunk.initial else "+=", # LaTeX Separator
    )
    self.write( tex )
ce_template = string.Template( "\n${references}\n" )
def codeEnd( self, aChunk ):
    tex = self.ce_template.substitute( 
        seq= aChunk.seq,
        lineNumber= aChunk.lineNumber, 
        fullName= aChunk.fullName,
        references= self.references( aChunk ),
    )
    self.write(tex)

◊ Weaver code chunk begin-end (18). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The fileBegin() method emits the necessary material prior to a chunk of source code, defined with the @o command. A subclass would override this to provide specific text for the intended file type.

The fileEnd() method emits the necessary material subsequent to a chunk of source code, defined with the @o command. The list of references is also provided so that links or cross references to chunks that refer to this chunk can be emitted. A subclass would override this to provide specific text for the intended file type.

Weaver file chunk begin-end (19) =


fb_template = string.Template( "\n..  _`${seq}`:\n..  rubric:: ${fullName} (${seq})\n..  parsed-literal::\n    " )
def fileBegin( self, aChunk ):
    txt= self.fb_template.substitute(
        seq= aChunk.seq, 
        lineNumber= aChunk.lineNumber, 
        fullName= aChunk.fullName,
        concat= "=" if aChunk.initial else "+=", # HTML Separator
    )
    self.write( txt )
fe_template= string.Template( "\n${references}\n" )
def fileEnd( self, aChunk ):
    txt= self.fe_template.substitute(
        seq= aChunk.seq, 
        lineNumber= aChunk.lineNumber, 
        fullName= aChunk.fullName,
        references= self.references( aChunk ) )
    self.write( txt )

◊ Weaver file chunk begin-end (19). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The referenceTo() method emits a reference to a chunk of source code. There reference is made with a @<...@> reference within a @d or @o chunk. The references are defined with the @d or @o commands. A subclass would override this to provide specific text for the intended file type.

Weaver reference command output (20) =


refto_name_template= string.Template("""|srarr| ${fullName} (`${seq}`_)""")
refto_seq_template= string.Template("""|srarr| (`${seq}`_)""")
def referenceTo( self, aName, seq ):
    """Weave a reference to a chunk."""
    # Provide name to get a full reference.
    # Omit name to get a short reference.
    if aName:
        return self.refto_name_template.substitute( fullName= aName, seq= seq )
    else:
        return self.refto_seq_template.substitute( seq= seq )

◊ Weaver reference command output (20). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The xrefHead() method puts decoration in front of cross-reference output. A subclass may override this to change the look of the final woven document.

The xrefFoot() method puts decoration after cross-reference output. A subclass may override this to change the look of the final woven document.

The xrefLine() method is used for both file and chunk ("macro") cross-references to show a name (either file name or chunk name) and a list of chunks that reference the file or chunk.

The xrefDefLine() method is used for the user identifier cross-reference. This shows a name and a list of chunks that reference or define the name. One of the chunks is identified as the defining chunk, all others are referencing chunks.

The default behavior simply writes the Python data structure used to represent cross reference information. A subclass may override this to change the look of the final woven document.

Weaver cross reference output methods (21) =


xref_head_template = string.Template( "\n" )
xref_foot_template = string.Template( "\n" )
xref_item_template = string.Template( ":${fullName}:\n    ${refList}\n" )
def xrefHead( self ):
    txt = self.xref_head_template.substitute()
    self.write( txt )
def xrefFoot( self ):
    txt = self.xref_foot_template.substitute()
    self.write( txt )
def xrefLine( self, name, refList ):
    refList= [ self.referenceTo( None, r ) for r in refList ]
    txt= self.xref_item_template.substitute( fullName= name, refList = " ".join(refList) ) # HTML Separator
    self.write( txt )

◊ Weaver cross reference output methods (21). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

xref Def Line...

Weaver cross reference output methods (22) +=


name_def_template = string.Template( '[`${seq}`_]' )
name_ref_template = string.Template( '`${seq}`_' )
def xrefDefLine( self, name, defn, refList ):
    templates = { defn: self.name_def_template }
    refTxt= [ templates.get(r,self.name_ref_template).substitute( seq= r )
        for r in sorted( refList + [defn] ) 
        ]
    txt= self.xref_item_template.substitute( fullName= name, refList = " ".join(refTxt) ) # HTML Separator
    self.write( txt )

◊ Weaver cross reference output methods (22). Used by Weaver subclass of Emitter to create documentation with fancy markup and escapes (13); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

LaTeX subclass of Weaver

Usage

An instance of LaTeX can be used by the Web object to weave an output document. The instance is created outside the Web, and given to the weave() method of the Web.

w= Web( "someName.w" )
WebReader().web(w).load()
weave_latex= LaTeX()
w.weave( weave_latex )

Note that the template language and LaTeX both use $. This means that all $ that are intended to be output to LaTeX must appear as $$ in the template.

Design

The LaTeX subclass defines a Weaver that is customized to produce LaTeX output of code sections and cross reference information. Its markup is pretty rudimentary, but it's also distinctive enough to function pretty well in most LaTeX documents.

Implementation

LaTeX subclass of Weaver (23) =


class LaTeX( Weaver ):
    """LaTeX formatting for XRef's and code blocks when weaving.
    Requires \\usepackage{fancyvrb}
    """
    extension= ".tex"
    code_indent= 0
    →LaTeX code chunk begin (24)
    →LaTeX code chunk end (25)
    →LaTeX file output begin (26)
    →LaTeX file output end (27)
    →LaTeX references summary at the end of a chunk (28)
    →LaTeX write a line of code (29)
    →LaTeX reference to a chunk (30)

◊ LaTeX subclass of Weaver (23). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The LaTeX open() method opens the woven file by replacing the source file's suffix with ".tex" and creating the resulting file.

The LaTeX codeBegin() template writes the header prior to a chunk of source code. It aligns the block to the left, prints an italicised header, and opens a preformatted block.

LaTeX code chunk begin (24) =


cb_template = string.Template( """\\label{pyweb${seq}}
\\begin{flushleft}
\\textit{Code example ${fullName} (${seq})}
\\begin{Verbatim}[commandchars=\\\\\\{\\},codes={\\catcode`$$=3\\catcode`^=7},frame=single]\n""") # Prevent indent

◊ LaTeX code chunk begin (24). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The LaTeX codeEnd() template writes the trailer subsequent to a chunk of source code. This first closes the preformatted block and then calls the references() method to write a reference to the chunk that invokes this chunk; finally, it restores paragraph indentation.

LaTeX code chunk end (25) =


ce_template= string.Template("""
\\end{Verbatim}
${references}
\\end{flushleft}\n""") # Prevent indentation

◊ LaTeX code chunk end (25). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The LaTeX fileBegin() template writes the header prior to a the creation of a tangled file. Its formatting is identical to the start of a code chunk.

LaTeX file output begin (26) =


fb_template= cb_template

◊ LaTeX file output begin (26). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The LaTeX fileEnd() template writes the trailer subsequent to a tangled file. This closes the preformatted block, calls the LaTeX references() method to write a reference to the chunk that invokes this chunk, and restores normal indentation.

LaTeX file output end (27) =


fe_template= ce_template

◊ LaTeX file output end (27). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The references() template writes a list of references after a chunk of code. Each reference includes the example number, the title, and a reference to the LaTeX section and page numbers on which the referring block appears.

LaTeX references summary at the end of a chunk (28) =


ref_item_template = string.Template( """
\\item Code example ${fullName} (${seq}) (Sect. \\ref{pyweb${seq}}, p. \\pageref{pyweb${seq}})\n""")
ref_template = string.Template( """
\\footnotesize
Used by:
\\begin{list}{}{}
${refList}
\\end{list}
\\normalsize\n""")

◊ LaTeX references summary at the end of a chunk (28). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeLine() method quotes a single line of code to the weaver; since these lines are always in preformatted blocks, no special formatting is needed, except to avoid ending the preformatted block. Our one compromise is a thin space if the phrase \\end{Verbatim} is used in a code block.

LaTeX write a line of code (29) =


quoted_chars = [
    ("\\end{Verbatim}", "\\end\,{Verbatim}"), # Allow \end{Verbatim}
    ("\\{","\\\,{"), # Prevent unexpected commands in Verbatim
    ("$","\\$"), # Prevent unexpected math in Verbatim
]

◊ LaTeX write a line of code (29). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The referenceTo() template writes a reference to another chunk of code. It uses write directly as to follow the current indentation on the current line of code.

LaTeX reference to a chunk (30) =


refto_name_template= string.Template("""$$\\triangleright$$ Code Example ${fullName} (${seq})""")
refto_seq_template= string.Template("""(${seq})""")

◊ LaTeX reference to a chunk (30). Used by LaTeX subclass of Weaver (23); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

HTML subclasses of Weaver

Usage

An instance of HTML can be used by the Web object to weave an output document. The instance is created outside the Web, and given to the weave() method of the Web.

w= Web( "someName.w" )
WebReader().web(w).load()
weave_html= HTML()
w.weave( weave_html )

Variations in the output formatting are accomplished by having variant subclasses of HTML. In this implementation, we have two variations: full path references, and short references. The base class produces complete reference paths; a subclass produces abbreviated references.

Design

The HTML subclass defines a Weaver that is customized to produce HTML output of code sections and cross reference information.

All HTML chunks are identified by anchor names of the form pywebn. Each n is the unique chunk number, in sequential order.

An HTMLShort subclass defines a Weaver that produces HTML output with abbreviated (no name) cross references at the end of the chunk.

Implementation

HTML subclass of Weaver (31) =


class HTML( Weaver ):
    """HTML formatting for XRef's and code blocks when weaving."""
    extension= ".html"
    code_indent= 0
    →HTML code chunk begin (33)
    →HTML code chunk end (34)
    →HTML output file begin (35)
    →HTML output file end (36)
    →HTML references summary at the end of a chunk (37)
    →HTML write a line of code (38)
    →HTML reference to a chunk (39)
    →HTML simple cross reference markup (40)

◊ HTML subclass of Weaver (31). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

HTML subclass of Weaver (32) +=


class HTMLShort( HTML ):
    """HTML formatting for XRef's and code blocks when weaving with short references."""
    →HTML short references summary at the end of a chunk (42)

◊ HTML subclass of Weaver (32). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeBegin() template starts a chunk of code, defined with @d, providing a label and HTML tags necessary to set the code off visually.

HTML code chunk begin (33) =


cb_template= string.Template("""
<a name="pyweb${seq}"></a>
<!--line number ${lineNumber}-->
<p><em>${fullName}</em> (${seq})&nbsp;${concat}</p>
<code><pre>\n""")

◊ HTML code chunk begin (33). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeEnd() template ends a chunk of code, providing a HTML tags necessary to finish the code block visually. This calls the references method to write the list of chunks that reference this chunk.

HTML code chunk end (34) =


ce_template= string.Template("""
</pre></code>
<p>&loz; <em>${fullName}</em> (${seq}).
${references}
</p>\n""")

◊ HTML code chunk end (34). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The fileBegin() template starts a chunk of code, defined with @o, providing a label and HTML tags necessary to set the code off visually.

HTML output file begin (35) =


fb_template= string.Template("""<a name="pyweb${seq}"></a>
<!--line number ${lineNumber}-->
<p><tt>${fullName}</tt> (${seq})&nbsp;${concat}</p>
<code><pre>\n""") # Prevent indent

◊ HTML output file begin (35). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The fileEnd() template ends a chunk of code, providing a HTML tags necessary to finish the code block visually. This calls the references method to write the list of chunks that reference this chunk.

HTML output file end (36) =


fe_template= string.Template( """</pre></code>
<p>&loz; <tt>${fullName}</tt> (${seq}).
${references}
</p>\n""")

◊ HTML output file end (36). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The references() template writes the list of chunks that refer to this chunk. Note that this list could be rather long because of the possibility of transitive references.

HTML references summary at the end of a chunk (37) =


ref_item_template = string.Template(
'<a href="#pyweb${seq}"><em>${fullName}</em>&nbsp;(${seq})</a>'
)
ref_template = string.Template( '  Used by ${refList}.'  )

◊ HTML references summary at the end of a chunk (37). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeLine() method writes an individual line of code for HTML purposes. This encodes the four basic HTML entities (<, >, &, ") to prevent code from being interpreted as HTML.

HTML write a line of code (38) =


quoted_chars = [
    ("&", "&amp;"), # Must be first
    ("<", "&lt;"),
    (">", "&gt;"),
    ('"', "&quot;"),
]

◊ HTML write a line of code (38). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The referenceTo() template writes a reference to another chunk. It uses the direct write() method so that the reference is indented properly with the surrounding source code.

HTML reference to a chunk (39) =


refto_name_template = string.Template(
'<a href="#pyweb${seq}">&rarr;<em>${fullName}</em> (${seq})</a>'
)
refto_seq_template = string.Template(
'<a href="#pyweb${seq}">(${seq})</a>'
)

◊ HTML reference to a chunk (39). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The xrefHead() method writes the heading for any of the cross reference blocks created by @f, @m, or @u. In this implementation, the cross references are simply unordered lists.

The xrefFoot() method writes the footing for any of the cross reference blocks created by @f, @m, or @u. In this implementation, the cross references are simply unordered lists.

The xrefLine() method writes a line for the file or macro cross reference blocks created by @f or @m. In this implementation, the cross references are simply unordered lists.

HTML simple cross reference markup (40) =


xref_head_template = string.Template( "<dl>\n" )
xref_foot_template = string.Template( "</dl>\n" )
xref_item_template = string.Template( "<dt>${fullName}</dt><dd>${refList}</dd>\n" )
→HTML write user id cross reference line (41)

◊ HTML simple cross reference markup (40). Used by HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The xrefDefLine() method writes a line for the user identifier cross reference blocks created by @u. In this implementation, the cross references are simply unordered lists. The defining instance is included in the correct order with the other instances, but is bold and marked with a bullet (•).

HTML write user id cross reference line (41) =


name_def_template = string.Template( '<a href="#pyweb${seq}"><b>&bull;${seq}</b></a>' )
name_ref_template = string.Template( '<a href="#pyweb${seq}">${seq}</a>' )

◊ HTML write user id cross reference line (41). Used by HTML simple cross reference markup (40); HTML subclass of Weaver (31); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The HTMLShort subclass enhances the HTML class to provide short cross references. The references() method writes the list of chunks that refer to this chunk. Note that this list could be rather long because of the possibility of transitive references.

HTML short references summary at the end of a chunk (42) =


ref_item_template = string.Template( '<a href="#pyweb${seq}">(${seq})</a>' )

◊ HTML short references summary at the end of a chunk (42). Used by HTML subclass of Weaver (32); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Tangler subclass of Emitter

Usage

The Tangler class is concrete, and can tangle source files. An instance of Tangler is given to the Web class tangle() method.

w= Web( "someFile.w" )
WebReader().web(w).load()
t= Tangler()
w.tangle( t )

Design

The Tangler subclass defines an Emitter used to tangle the various program source files. The superclass is used to simply emit correctly indented source code and do very little else that could corrupt or alter the output.

Language-specific subclasses could be used to provide additional decoration. For example, inserting #line directives showing the line number in the original source file.

For Python, where indentation matters, the indent rules are relatively simple. The whitespace berfore a @< command is preserved as the prevailing indent for the block tangled as a replacement for the @<...@>.

Implementation

Tangler subclass of Emitter to create source files with no markup (43) =


class Tangler( Emitter ):
    """Tangle output files."""
    def __init__( self ):
        super( Tangler, self ).__init__()
        self.comment_start= ""
        self.comment_end= ""
        self.debug= False
    →Tangler doOpen, doClose and doWrite overrides (44)
    →Tangler code chunk begin (45)
    →Tangler code chunk end (46)

◊ Tangler subclass of Emitter to create source files with no markup (43). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The default for all tanglers is to create the named file.

This doClose() method overrides the Emitter class doClose() method by closing the actual file created by open.

This doWrite() method overrides the Emitter class doWrite() method by writing to the actual file created by open.

Tangler doOpen, doClose and doWrite overrides (44) =


def doOpen( self, aFile ):
    self.fileName= aFile
    self.theFile= open( aFile, "w" )
    logger.info( "Tangling %r", aFile )
def doClose( self ):
    self.theFile.close()
    logger.info( "Wrote %d lines to %r",
        self.linesWritten, self.fileName )
def doWrite( self, text ):
    self.theFile.write( text )

◊ Tangler doOpen, doClose and doWrite overrides (44). Used by Tangler subclass of Emitter to create source files with no markup (43); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeBegin() method starts emitting a new chunk of code. It does this by setting the Tangler's indent to the prevailing indent at the start of the @< reference command.

Tangler code chunk begin (45) =


def codeBegin( self, aChunk ):
    self.log_indent.debug( "<tangle %s:", aChunk.fullName )
    if self.debug:
        self.write( "\n%s %s (%d) -- %s %s\n" % ( 
            self.comment_start, aChunk.fullName, aChunk.seq, aChunk.lineNumber, self.comment_end ) )

◊ Tangler code chunk begin (45). Used by Tangler subclass of Emitter to create source files with no markup (43); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

The codeEnd() method ends emitting a new chunk of code. It does this by resetting the Tangler's indent to the previous setting.

Tangler code chunk end (46) =


def codeEnd( self, aChunk ):
    self.log_indent.debug( ">%s", aChunk.fullName )

◊ Tangler code chunk end (46). Used by Tangler subclass of Emitter to create source files with no markup (43); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

TanglerMake subclass of Tangler

Usage

The TanglerMake class is can tangle source files. An instance of TanglerMake is given to the Web class tangle() method.

w= Web( "someFile.w" )
WebReader().web(w).load()
t= TanglerMake()
w.tangle( t )

Design

The TanglerMake subclass makes the Tangler used to tangle the various program source files more make-friendly. This subclass of Tangler does not touch an output file where there is no change. This is helpful when pyWeb's output is sent to make. Using TanglerMake assures that only files with real changes are rewritten, minimizing recompilation of an application for changes to the associated documentation.

Implementation

This subclass of Tangler changes how files are opened and closed.

Imports (47) +=

import tempfile
import filecmp

◊ Imports (47). Used by pyweb.py (150).

Tangler subclass which is make-sensitive (48) =


class TanglerMake( Tangler ):
    """Tangle output files, leaving files untouched if there are no changes."""
    def __init__( self ):
        Tangler.__init__( self )
        self.tempname= None
    →TanglerMake doOpen override, using a temporary file (49)
    →TanglerMake doClose override, comparing temporary to original (50)

◊ Tangler subclass which is make-sensitive (48). Used by Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

A TanglerMake creates a temporary file to collect the tangled output. When this file is completed, we can compare it with the original file in this directory, avoiding a "touch" if the new file is the same as the original.

TanglerMake doOpen override, using a temporary file (49) =


def doOpen( self, aFile ):
    self.tempname= tempfile.mktemp()
    self.theFile= open( self.tempname, "w" )
    logger.info( "Tangling %r", aFile )

◊ TanglerMake doOpen override, using a temporary file (49). Used by Tangler subclass which is make-sensitive (48); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

If there is a previous file: compare the temporary file and the previous file. If there was previous file or the files are different: rename temporary to replace previous; else: unlink temporary and discard it. This preserves the original (with the original date and time) if nothing has changed.

TanglerMake doClose override, comparing temporary to original (50) =


def doClose( self ):
    self.theFile.close()
    try:
        same= filecmp.cmp( self.tempname, self.fileName )
    except OSError,e:
        same= 0
    if same:
        logger.info( "No change to %r", self.fileName )
        os.remove( self.tempname )
    else:
        # note the Windows requires the original file name be removed first
        try: 
            os.remove( self.fileName )
        except OSError,e:
            pass
        os.rename( self.tempname, self.fileName )
        logger.info( "Wrote %d lines to %r",
            self.linesWritten, self.fileName )

◊ TanglerMake doClose override, comparing temporary to original (50). Used by Tangler subclass which is make-sensitive (48); Emitter class hierarchy - used to control output files (2); Base Class Definitions (1); pyweb.py (150).

Chunks

A Chunk is a piece of the input file. It is a collection of Command instances. A chunk can be woven or tangled to create output.

The two most important methods are the weave() and tangle() methods. These visit the commands of this chunk, producing the required output file.

Additional methods (startswith(), searchForRE() and usedBy()) are used to examine the text of the Command instances within the chunk.

A Chunk instance is created by the WebReader as the input file is parsed. Each Chunk instance has one or more pieces of the original input text. This text can be program source, a reference command, or the documentation source.

Chunk class hierarchy - used to describe input chunks (51) =


→Chunk class (52)
→NamedChunk class (63)
→OutputChunk class (68)
→NamedDocumentChunk class (72)

◊ Chunk class hierarchy - used to describe input chunks (51). Used by Base Class Definitions (1); pyweb.py (150).

The Chunk class is both the superclass for this hierarchy and the implementation for anonymous chunks. An anonymous chunk is always documentation in the target markup language. No transformation is ever done on anonymous chunks.

A NamedChunk is a chunk created with a @d command. This is a chunk of source programming language, bracketed with @{ and @}.

An OutputChunk is a named chunk created with a @o command. This must be a chunk of source programming language, bracketed with @{ and @}.

A NamedDocumentChunk is a named chunk created with a @d command. This is a chunk of documentation in the target markup language, bracketed with @[ and @].

Chunk Superclass

Usage

An instance of the Chunk class has a life that includes four important events: creation, cross-reference, weave and tangle.

A Chunk is created by a WebReader, and associated with a Web. There are several web append methods, depending on the exact subclass of Chunk. The WebReader calls the chunk's webAdd() method select the correct method for appending and indexing the chunk. Individual instances of Command are appended to the chunk. The basic outline for creating a Chunk instance is as follows:

w= Web( "someFile.w" )
c= Chunk()
c.webAdd( w )
c.append( ...some Command... )
c.append( ...some Command... )

Before weaving or tangling, a cross reference is created for all user identifiers in all of the Chunk instances. This is done by: (1) visit each Chunk and call the getUserIDRefs() method to gather all identifiers; (2) for each identifier, visit each Chunk and call the searchForRE() method to find uses of the identifier.

ident= []
for c in the Web's named chunk list:
    ident.extend( c.getUserIDRefs() )
for i in ident:
    pattern= re.compile('\W%s\W' % i)
    for c in the Web's named chunk list:
        c.searchForRE( pattern )

A Chunk is woven or tangled by the Web. The basic outline for weaving is as follows. The tangling action is essentially the same.

for c in the Web's chunk list:
    c.weave( aWeaver )

Design

The Chunk class contains the overall definitions for all of the various specialized subclasses. In particular, it contains the append(), and appendText() methods used by all of the various Chunk subclasses.

When a @@ construct is located in the input stream, the stream contains three text tokens: material before the @@, the @@, and the material after the @@. These three tokens are reassembled into a single block of text. This reassembly is accomplished by changing the chunk's state so that the next TextCommand is appended onto the previous TextCommand.

The appendText() method either:

appends to a previous TextCommand instance,
or finds that there are not commands at all, and creates a TextCommand instance,
or finds that the last Command isn't a subclass of TextCommandand creates a TextCommand instance.

Each subclass of Chunk has a particular type of text that it will process. Anonymous chunks only handle document text. The NamedChunk subclass that handles program source will override this method to create a different command type. The makeContent() method creates the appropriate Command instance for this Chunk subclass.

The weave() method of an anonymous Chunk uses the weaver's docBegin() and docEnd() methods to insert text that is source markup. Other subclasses will override this to use different Weaver methods for different kinds of text.

A Chunk has a Strategy object which is a subclass of Reference. This is either an instance of SimpleReference or TransitiveReference. A SimpleRerence does no additional processing, and locates the proximate reference to this chunk. The TransitiveReference walks "up" the web toward top-level file definitions that reference this Chunk.

Implementation

The Chunk constructor initializes the following instance variables:

commands is a sequence of the various Command instances the comprise this chunk.
user_id_list is used the list of user identifiers associated with this chunk. This attribute is always None for this class. The NamedChunk subclass, however, can have user identifiers.
initial is True if this is the first definition (display with '=') or a subsequent definition (display with '+=').
name has the name of the chunk. This is '' for anonymous chunks.
seq has the sequence number associated with this chunk. This is None for anonymous chunks.
referencedBy is the list of Chunks which reference this chunk.
references is the list of Chunks this chunk references.

These variables are deprecated.

_lastCommand is used to force a character to be appended to the last command (which must be a Textcommand instance) instead of appending a new command. This needs to be removed. If each Command has trailing text, then this isn't necessary.

Chunk class (52) =


class Chunk( object ):
    """Anonymous piece of input file: will be output through the weaver only."""
    # construction and insertion into the web
    def __init__( self ):
        self.commands= [ ] # The list of children of this chunk
        self.user_id_list= None
        self.initial= None
        self.name= ''
        self.fullName= None
        self.seq= None
        self.referencedBy= [] # Chunks which reference this chunk.  Ideally just one.
        self.references= [] # Names that this chunk references
        
        self.reference_style= None # Instance of Reference 
        
        self._lastCommand= None
    def __str__( self ):
        return "\n".join( map( str, self.commands ) )
    def __repr__( self ):
        return "%s('%s')" % ( self.__class__.__name__, self.name )
    →Chunk append a command (53)
    →Chunk append text (54)
    →Chunk add to the web (55)
    →Chunk generate references from this Chunk (59)
    →Chunk superclass make Content definition (56)
    →Chunk examination: starts with, matches pattern (57)
    →Chunk references to this Chunk (60)
    →Chunk weave this Chunk into the documentation (61)
    →Chunk tangle this Chunk into a code file (62)

◊ Chunk class (52). Used by Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The append() method simply appends a Command instance to this chunk.

Chunk append a command (53) =


def append( self, command ):
    """Add another Command to this chunk."""
    self.commands.append( command )
    command.chunk= self

◊ Chunk append a command (53). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The appendText() method appends a TextCommand to this chunk, or it concatenates it to the most recent TextCommand.

When an @@ construct is located, the appendText() method is used to accumulate this character. This means that it will be appended to any previous TextCommand, or new TextCommand will be built.

The reason for appending is that a TextCommand has an implicit indentation. The "@" cannot be a separate TextCommand because it will wind up indented.

Chunk append text (54) =


def appendText( self, text, lineNumber=0 ):
    """Append a single character to the most recent TextCommand."""
    try:
        # Works for TextCommand, otherwise breaks
        self.commands[-1].text += text
    except IndexError, e:
        # First command?  Then the list will have been empty.
        self.commands.append( self.makeContent(text,lineNumber) )
    except AttributeError, e:
        # Not a TextCommand?  Then there won't be a text attribute.
        self.commands.append( self.makeContent(text,lineNumber) )
    self._lastCommand= self.commands[-1]

◊ Chunk append text (54). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The webAdd() method adds this chunk to the given document web. Each subclass of the Chunk class must override this to be sure that the various Chunk subclasses are indexed properly. The Chunk class uses the add() method of the Web class to append an anonymous, unindexed chunk.

Chunk add to the web (55) =


def webAdd( self, web ):
    """Add self to a Web as anonymous chunk."""
    web.add( self )

◊ Chunk add to the web (55). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

This superclass creates a specific Command for a given piece of content. A subclass can override this to change the underlying assumptions of that Chunk. The generic chunk doesn't contain code, it contains text and can only be woven, never tangled. A Named Chunk using @{ and @} creates code. A Named Chunk using @[ and @[ creates text.

Chunk superclass make Content definition (56) =


def makeContent( self, text, lineNumber=0 ):
    return TextCommand( text, lineNumber )

◊ Chunk superclass make Content definition (56). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The startsWith() method examines a the first Command instance this Chunk instance to see if it starts with the given prefix string.

The lineNumber() method returns the line number of the first Command in this chunk. This provides some context for where the chunk occurs in the original input file.

A NamedChunk instance may define one or more identifiers. This parent class provides a dummy version of the getUserIDRefs method. The NamedChunk subclass overrides this to provide actual results. By providing this at the superclass level, the Web can easily gather identifiers without knowing the actual subclass of Chunk.

The searchForRE() method examines each Command instance to see if it matches with the given regular expression. If so, this can be reported to the Web instance and accumulated as part of a cross reference for this Chunk.

Chunk examination: starts with, matches pattern (57) =


def startswith( self, prefix ):
    """Examine the first command's starting text."""
    return len(self.commands) >= 1 and self.commands[0].startswith( prefix )
def searchForRE( self, rePat ):
    """Visit each command, applying the pattern."""
    →Chunk search for user identifiers in each child command (58)
@property
def lineNumber( self ):
    """Return the first command's line number or None."""
    return self.commands[0].lineNumber if len(self.commands) >= 1 else None
def getUserIDRefs( self ):
    return []

◊ Chunk examination: starts with, matches pattern (57). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The chunk search in the searchForRE() method parallels weaving and tangling a Chunk. The operation is delegated to each Command instance within the Chunk instance.

Chunk search for user identifiers in each child command (58) =


for c in self.commands:
    if c.searchForRE( rePat ):
        return self
return None

◊ Chunk search for user identifiers in each child command (58). Used by Chunk examination: starts with, matches pattern (57); Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The genReferences() method visits each Command instance inside this chunk; a Command will yield the references.

Note that an exception may be raised by this operation if a referenced Chunk does not actually exist. If a reference Command does raise an error, we append this Chunk information and reraise the error with the additional context information.

Chunk generate references from this Chunk (59) =


def genReferences( self, aWeb ):
    """Generate references from this Chunk."""
    try:
        for t in self.commands:
            ref= t.ref( aWeb )
            if ref is not None:
                yield ref
    except Error,e:
        raise Error,e.args+(self,)

◊ Chunk generate references from this Chunk (59). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The list of references to a Chunk uses a Strategy plug-in to either generate a simple parent or a transitive closure of all parents.

Chunk references to this Chunk (60) =


@property
def references_list( self ):
    """This should return chunks themselves, not (name,seq) pairs."""
    return self.reference_style.chunkReferencedBy( self )

◊ Chunk references to this Chunk (60). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The weave() method weaves this chunk into the final document as follows:

call the Weaver class docBegin() method. This method does nothing for document content.
visit each Command instance: call the Command instance weave() method to emit the content of the Command instance
call the Weaver class docEnd() method. This method does nothing for document content.

Note that an exception may be raised by this action if a referenced Chunk does not actually exist. If a reference Command does raise an error, we append this Chunk information and reraise the error with the additional context information.

Chunk weave this Chunk into the documentation (61) =


def weave( self, aWeb, aWeaver ):
    """Create the nicely formatted document from an anonymous chunk."""
    aWeaver.docBegin( self )
    try:
        for t in self.commands:
            t.weave( aWeb, aWeaver )
    except Error, e:
        raise Error,e.args+(self,)
    aWeaver.docEnd( self )
def weaveReferenceTo( self, aWeb, aWeaver ):
    """Create a reference to this chunk -- except for anonymous chunks."""
    raise Exception( "Cannot reference an anonymous chunk.""")
def weaveShortReferenceTo( self, aWeb, aWeaver ):
    """Create a short reference to this chunk -- except for anonymous chunks."""
    raise Exception( "Cannot reference an anonymous chunk.""")

◊ Chunk weave this Chunk into the documentation (61). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

Anonymous chunks cannot be tangled. Any attempt indicates a serious problem with this program or the input file.

Chunk tangle this Chunk into a code file (62) =


def tangle( self, aWeb, aTangler ):
    """Create source code -- except anonymous chunks should not be tangled"""
    raise Error( 'Cannot tangle an anonymous chunk', self )

◊ Chunk tangle this Chunk into a code file (62). Used by Chunk class (52); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

NamedChunk class

Usage

A NamedChunk is created and used almost identically to an anonymous Chunk. The most significant difference is that a name is provided when the NamedChunk is created. This name is used by the Web to organize the chunks.

Design

A NamedChunk is created with a @d or @o command. A NamedChunk contains programming language source when the brackets are @{ and @}. A separate subclass of NamedDocumentChunk is used when the brackets are @[ and @].

A NamedChunk can be both tangled into the output program files, and woven into the output document file.

The weave() method of a NamedChunk uses the Weaver's codeBegin() and codeEnd() methods to insert text that is program source and requires additional markup to make it stand out from documentation. Other subclasses can override this to use different Weaver methods for different kinds of text.

Implementation

This class introduces some additional attributes.

fullName is the full name of the chunk. It's possible for a chunk to be an abbreviated forward reference; full names cannot be resolved until all chunks have been seen.
user_id_list is the list of user identifiers associated with this chunk.
refCount is the count of references to this chunk. If this is zero, the chunk is unused; if this is more than one, this chunk is multiply used. Either of these conditions is a possible error in the input. This is set by the usedBy() method.
name has the name of the chunk. Names can be abbreviated.
seq has the sequence number associated with this chunk. This is set by the Web by the webAdd() method.

NamedChunk class (63) =


class NamedChunk( Chunk ):
    """Named piece of input file: will be output as both tangler and weaver."""
    def __init__( self, name ):
        Chunk.__init__( self )
        self.name= name
        self.user_id_list= []
        self.refCount= 0
    def __str__( self ):
        return "%r: %s" % ( self.name, Chunk.__str__(self) )
    def makeContent( self, text, lineNumber=0 ):
        return CodeCommand( text, lineNumber )
    →NamedChunk user identifiers set and get (64)
    →NamedChunk add to the web (65)
    →NamedChunk weave (66)
    →NamedChunk tangle into the source file (67)

◊ NamedChunk class (63). Used by Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The setUserIDRefs() method accepts a list of user identifiers that are associated with this chunk. These are provided after the @| separator in a @d named chunk. These are used by the @u cross reference generator.

NamedChunk user identifiers set and get (64) =


def setUserIDRefs( self, text ):
    """Save user ID's associated with this chunk."""
    self.user_id_list= text.split()
def getUserIDRefs( self ):
    return self.user_id_list

◊ NamedChunk user identifiers set and get (64). Used by NamedChunk class (63); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The webAdd() method adds this chunk to the given document Web instance. Each class of Chunk must override this to be sure that the various Chunk classes are indexed properly. This class uses the addNamed() method of the Web class to append a named chunk.

NamedChunk add to the web (65) =


def webAdd( self, web ):
    """Add self to a Web as named chunk, update xrefs."""
    web.addNamed( self )

◊ NamedChunk add to the web (65). Used by NamedChunk class (63); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The weave() method weaves this chunk into the final document as follows:

call the Weaver class codeBegin() method. This method emits the necessary markup for code appearing in the woven output.
visit each Command, calling the command's weave() method to emit the command's content
call the Weaver class CodeEnd() method. This method emits the necessary markup for code appearing in the woven output.

The weaveRefenceTo() method weaves a reference to a chunk using both name and sequence number. The weaveShortReferenceTo() method weaves a reference to a chunk using only the sequence number. These references are created by ReferenceCommand instances within a chunk being woven.

If a ReferenceCommand does raise an error during weaving, we append this Chunk information and reraise the error with the additional context information.

NamedChunk weave (66) =


def weave( self, aWeb, aWeaver ):
    """Create the nicely formatted document from a chunk of code."""
    # format as <pre> in a different-colored box
    self.fullName= aWeb.fullNameFor( self.name )
    aWeaver.codeBegin( self )
    aWeaver.setIndent( aWeaver.code_indent )
    for t in self.commands:
        try:
            t.weave( aWeb, aWeaver )
        except Error,e:
            raise Error,e.args+(self,)
    aWeaver.clrIndent( )
    aWeaver.codeEnd( self )
def weaveReferenceTo( self, aWeb, aWeaver ):
    """Create a reference to this chunk."""
    self.fullName= aWeb.fullNameFor( self.name )
    txt= aWeaver.referenceTo( self.fullName, self.seq )
    aWeaver.codeBlock( txt )
def weaveShortReferenceTo( self, aWeb, aWeaver ):
    """Create a shortened reference to this chunk."""
    txt= aWeaver.referenceTo( None, self.seq )
    aWeaver.codeBlock( txt )

◊ NamedChunk weave (66). Used by NamedChunk class (63); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The tangle() method tangles this chunk into the final document as follows:

call the Tangler class codeBegin() method to set indents properly.
visit each Command, calling the Command's tangle() method to emit the Command's content
call the Tangler class codeEnd() method to restore indents.

If a ReferenceCommand does raise an error during tangling, we append this Chunk information and reraise the error with the additional context information.

NamedChunk tangle into the source file (67) =


def tangle( self, aWeb, aTangler ):
    """Create source code."""
    # use aWeb to resolve @<namedChunk@>
    # format as correctly indented source text
    self.previous_command= TextCommand( "", self.commands[0].lineNumber )
    aTangler.codeBegin( self )
    for t in self.commands:
        try:
            t.tangle( aWeb, aTangler )
        except Error,e:
            raise Error,e.args+(self,)
        self.previous_command= t
    aTangler.codeEnd( self )

◊ NamedChunk tangle into the source file (67). Used by NamedChunk class (63); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

OutputChunk class

Usage

A OutputChunk is created and used identically to a NamedChunk. The difference between this class and the parent class is the decoration of the markup when weaving.

Design

The OutputChunk class is a subclass of NamedChunk that handles file output chunks defined with @o.

The weave() method of a OutputChunk uses the Weaver's fileBegin() and fileEnd() methods to insert text that is program source and requires additional markup to make it stand out from documentation. Other subclasses could override this to use different Weaver methods for different kinds of text.

All other methods, including the tangle method are identical to NamedChunk.

Implementation

OutputChunk class (68) =


class OutputChunk( NamedChunk ):
    """Named piece of input file, defines an output tangle."""
    def __init__( self, name, comment_start="", comment_end="" ):
        super( OutputChunk, self ).__init__( name )
        self.comment_start= comment_start
        self.comment_end= comment_end
    →OutputChunk add to the web (69)
    →OutputChunk weave (70)
    →OutputChunk tangle (71)

◊ OutputChunk class (68). Used by Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The webAdd() method adds this chunk to the given document Web. Each class of Chunk must override this to be sure that the various Chunk classes are indexed properly. This class uses the addOutput() method of the Web class to append a file output chunk.

OutputChunk add to the web (69) =


def webAdd( self, web ):
    """Add self to a Web as output chunk, update xrefs."""
    web.addOutput( self )

◊ OutputChunk add to the web (69). Used by OutputChunk class (68); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The weave() method weaves this chunk into the final document as follows:

call the Weaver class codeBegin() method to emit proper markup for an output file chunk.
visit each Command, call the Command's weave() method to emit the Command's content
call the Weaver class codeEnd() method to emit proper markup for an output file chunk.

These chunks of documentation are never tangled. Any attempt is an error.

If a ReferenceCommand does raise an error during weaving, we append this Chunk information and reraise the error with the additional context information.

OutputChunk weave (70) =


def weave( self, aWeb, aWeaver ):
    """Create the nicely formatted document from a chunk of code."""
    # format as <pre> in a different-colored box
    self.fullName= aWeb.fullNameFor( self.name )
    aWeaver.fileBegin( self )
    try:
        for t in self.commands:
            t.weave( aWeb, aWeaver )
    except Error,e:
        raise Error,e.args+(self,)
    aWeaver.fileEnd( self )

◊ OutputChunk weave (70). Used by OutputChunk class (68); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

OutputChunk tangle (71) =


def tangle( self, aWeb, aTangler ):
    aTangler.comment_start= self.comment_start
    aTangler.comment_end= self.comment_end
    super( OutputChunk, self ).tangle( aWeb, aTangler )

◊ OutputChunk tangle (71). Used by OutputChunk class (68); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

NamedDocumentChunk class

Usage

A NamedDocumentChunk is created and used identically to a NamedChunk. The difference between this class and the parent class is that this chunk is only woven when referenced. The original definition is silently skipped.

Design

The NamedDocumentChunk class is a subclass of NamedChunk that handles named chunks defined with @d and the @[...@] delimiters. These are woven slightly differently, since they are document source, not programming language source.

We're not as interested in the cross reference of named document chunks. They can be used multiple times or never. They are often referenced by anonymous chunks. While this chunk subclass participates in this data gathering, it is ignored for reporting purposes.

All other methods, including the tangle method are identical to NamedChunk.

Implementation

NamedDocumentChunk class (72) =


class NamedDocumentChunk( NamedChunk ):
    """Named piece of input file with document source, defines an output tangle."""
    def makeContent( self, text, lineNumber=0 ):
        return TextCommand( text, lineNumber )
    →NamedDocumentChunk weave (73)
    →NamedDocumentChunk tangle (74)

◊ NamedDocumentChunk class (72). Used by Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

The weave() method quietly ignores this chunk in the document. A named document chunk is only included when it is referenced during weaving of another chunk (usually an anonymous document chunk).

The weaveReferenceTo() method inserts the content of this chunk into the output document. This is done in response to a ReferenceCommand in another chunk. The weaveShortReferenceTo() method calls the weaveReferenceTo() to insert the entire chunk.

NamedDocumentChunk weave (73) =


def weave( self, aWeb, aWeaver ):
    """Ignore this when producing the document."""
    pass
def weaveReferenceTo( self, aWeb, aWeaver ):
    """On a reference to this chunk, expand the body in place."""
    try:
        for t in self.commands:
            t.weave( aWeb, aWeaver )
    except Error,e:
        raise Error,e.args+(self,)
def weaveShortReferenceTo( self, aWeb, aWeaver ):
    """On a reference to this chunk, expand the body in place."""
    self.weaveReferenceTo( aWeb, aWeaver )

◊ NamedDocumentChunk weave (73). Used by NamedDocumentChunk class (72); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

NamedDocumentChunk tangle (74) =


def tangle( self, aWeb, aTangler ):
    """Raise an exception on an attempt to tangle."""
    raise Error( "Cannot tangle a chunk defined with @[.""" )

◊ NamedDocumentChunk tangle (74). Used by NamedDocumentChunk class (72); Chunk class hierarchy - used to describe input chunks (51); Base Class Definitions (1); pyweb.py (150).

Commands

The input stream is broken into individual commands, based on the various @x strings in the file. There are several subclasses of Command, each used to describe a different command or block of text in the input.

All instances of the Command class are created by a WebReader instance. In this case, a WebReader can be thought of as a factory for Command instances. Each Command instance is appended to the sequence of commands that belong to a Chunk. A chunk may be as small as a single command, or a long sequence of commands.

Each command instance responds to methods to examine the content, gather cross reference information and tangle a file or weave the final document.

Command class hierarchy - used to describe individual commands (75) =


→Command superclass (76)
→TextCommand class to contain a document text block (79)
→CodeCommand class to contain a program source code block (80)
→XrefCommand superclass for all cross-reference commands (81)
→FileXrefCommand class for an output file cross-reference (82)
→MacroXrefCommand class for a named chunk cross-reference (83)
→UserIdXrefCommand class for a user identifier cross-reference (84)
→ReferenceCommand class for chunk references (85)

◊ Command class hierarchy - used to describe individual commands (75). Used by Base Class Definitions (1); pyweb.py (150).

Command Superclass

Usage

A Command is created by the WebReader, and attached to a Chunk. The Command participates in cross reference creation, weaving and tangling.

The Command superclass is abstract, and has default methods factored out of the various subclasses. When a subclass is created, it will override some of the methods provided in this superclass.

class MyNewCommand( Command ):
    ... overrides for various methods ...

Additionally, a subclass of WebReader must be defined to parse the new command syntax. The main process() function must also be updated to use this new subclass of WebReader.

Design

The Command superclass provides the parent class definition for all of the various command types. The most common command is a block of text, which is woven or tangled. The next most common command is a reference to a chunk, which is woven as a mark-up reference, but tangled as an expansion of the source code.

The startswith() method examines any source text to see if it begins with the given prefix text.
The searchForRE() method examines any source text to see if it matches the given regular expression, usually a match for a user identifier.
The ref() method is ignored by all but the Reference subclass, which returns reference made by the command to the parent chunk.
The weave() method weaves this into the output. If a document text command, it is emitted directly; if a program source code command, markup is applied. In the case of cross-reference commands, the actual cross-reference content is emitted. In the case of reference commands, they are woven as a reference to a named chunk.
The tangle() method tangles this into the output. If a this is a document text command, it is ignored; if a this is a program source code command, it is indented and emitted. In the case of cross-reference commands, no output is produced. In the case of reference commands, the named chunk is indented and emitted.

The attributes of a Command instance includes the line number on which the command began, in lineNumber.

Implementation

Command superclass (76) =


class Command( object ):
    """A Command is the lowest level of granularity in the input stream."""
    def __init__( self, fromLine=0 ):
        self.lineNumber= fromLine
        self.chunk= None
    def __str__( self ):
        return "at %r" % self.lineNumber
    →Command analysis features: starts-with and Regular Expression search (77)
    →Command tangle and weave functions (78)

◊ Command superclass (76). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

Command analysis features: starts-with and Regular Expression search (77) =


def startswith( self, prefix ):
    return None
def searchForRE( self, rePat ):
    return None
def indent( self ):
    return None

◊ Command analysis features: starts-with and Regular Expression search (77). Used by Command superclass (76); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

Command tangle and weave functions (78) =


def ref( self, aWeb ):
    return None
def weave( self, aWeb, aWeaver ):
    pass
def tangle( self, aWeb, aTangler ):
    pass

◊ Command tangle and weave functions (78). Used by Command superclass (76); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

TextCommand class

Usage

A TextCommand is created by a Chunk or a NamedDocumentChunk when a WebReader calls the chunk's appendText() method. This Command participates in cross reference creation, weaving and tangling. When it is created, the source line number is provided so that this text can be tied back to the source document.

Design

An instance of the TextCommand class is a block of document text. It can originate in an anonymous block or a named chunk delimited with @[ and @].

This subclass provides a concrete implementation for all of the methods. Since text is the author's original markup language, it is emitted directly to the weaver or tangler.

Implementation

TextCommand class to contain a document text block (79) =


class TextCommand( Command ):
    """A piece of document source text."""
    def __init__( self, text, fromLine=0 ):
        super( TextCommand, self ).__init__( fromLine )
        self.text= text
    def __str__( self ):
        return "at %r: %r..." % (self.lineNumber,self.text[:32])
    def startswith( self, prefix ):
        return self.text.startswith( prefix )
    def searchForRE( self, rePat ):
        return rePat.search( self.text )
    def indent( self ):
        if self.text.endswith('\n'):
            return 0
        try:
            last_line = self.text.splitlines()[-1]
            return len(last_line)
        except IndexError:
            return 0
    def weave( self, aWeb, aWeaver ):
        aWeaver.write( self.text )
    def tangle( self, aWeb, aTangler ):
        aTangler.write( self.text )

◊ TextCommand class to contain a document text block (79). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

CodeCommand class

Usage

A CodeCommand is created by a NamedChunk when a WebReader calls the appendText() method. The Command participates in cross reference creation, weaving and tangling. When it is created, the source line number is provided so that this text can be tied back to the source document.

Design

An instance of the CodeCommand class is a block of program source code text. It can originate in a named chunk (@d) with a @{ and @} delimiter. Or it can be a file output chunk (@o).

It uses the codeBlock() methods of a Weaver or Tangler. The weaver will insert appropriate markup for this code. The tangler will assure that the prevailing indentation is maintained.

Implementation

CodeCommand class to contain a program source code block (80) =


class CodeCommand( TextCommand ):
    """A piece of program source code."""
    def weave( self, aWeb, aWeaver ):
        aWeaver.codeBlock( aWeaver.quote( self.text ) )
    def tangle( self, aWeb, aTangler ):
        aTangler.codeBlock( self.text )

◊ CodeCommand class to contain a program source code block (80). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

XrefCommand superclass

Usage

An XrefCommand is created by the WebReader when any of the @f, @m, @u commands are found in the input stream. The Command is then appended to the current Chunk being built by the WebReader.

Design

The XrefCommand superclass defines any common features of the various cross-reference commands (@f, @m, @u).

The formatXref() method creates the body of a cross-reference by the following algorithm:

Use the Weaver class xrefHead() method to emit the cross-reference header.
Sort the keys in the cross-reference mapping.
Use the Weaver class xrefLine() method to emit each line of the cross-reference mapping.
Use the Weaver class xrefFoot() method to emit the cross-reference footer.

If this command winds up in a tangle action, that use is illegal. An exception is raised and processing stops.

Implementation

XrefCommand superclass for all cross-reference commands (81) =


class XrefCommand( Command ):
    """Any of the Xref-goes-here commands in the input."""
    def __str__( self ):
        return "at %r: cross reference" % (self.lineNumber)
    def formatXref( self, xref, aWeaver ):
        aWeaver.xrefHead()
        for n in sorted(xref):
            aWeaver.xrefLine( n, xref[n] )
        aWeaver.xrefFoot()
    def tangle( self, aWeb, aTangler ):
        raise Error('Illegal tangling of a cross reference command.')

◊ XrefCommand superclass for all cross-reference commands (81). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

FileXrefCommand class

Usage

A FileXrefCommand is created by the WebReader when the @f command is found in the input stream. The Command is then appended to the current Chunk being built by the WebReader.

Design

The FileXrefCommand class weave method gets the file cross reference from the overall web instance, and uses the formatXref() method of the XrefCommand superclass for format this result.

Implementation

FileXrefCommand class for an output file cross-reference (82) =


class FileXrefCommand( XrefCommand ):
    """A FileXref command."""
    def weave( self, aWeb, aWeaver ):
        """Weave a File Xref from @o commands."""
        self.formatXref( aWeb.fileXref(), aWeaver )

◊ FileXrefCommand class for an output file cross-reference (82). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

MacroXrefCommand class

Usage

A MacroXrefCommand is created by the WebReader when the @m command is found in the input stream. The Command is then appended to the current Chunk being built by the WebReader.

Design

The MacroXrefCommand class weave method gets the named chunk (macro) cross reference from the overall web instance, and uses the formatXref() method of the XrefCommand superclass method for format this result.

Implementation

MacroXrefCommand class for a named chunk cross-reference (83) =


class MacroXrefCommand( XrefCommand ):
    """A MacroXref command."""
    def weave( self, aWeb, aWeaver ):
        """Weave the Macro Xref from @d commands."""
        self.formatXref( aWeb.chunkXref(), aWeaver )

◊ MacroXrefCommand class for a named chunk cross-reference (83). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

UserIdXrefCommand class

Usage

A MacroXrefCommand is created by the WebReader when the @u command is found in the input stream. The Command is then appended to the current Chunk being built by the WebReader.

Design

The UserIdXrefCommand class weave method gets the user identifier cross reference information from the overall web instance. It then formats this line using the following algorithm, which is similar to the algorithm in the XrefCommand superclass.

Use the Weaver class xrefHead() method to emit the cross-reference header.
Sort the keys in the cross-reference mapping.
Use the Weaver class xrefDefLine() method to emit each line of the cross-reference definition mapping.
Use the Weaver class xrefFoor() method to emit the cross-reference footer.

Implementation

UserIdXrefCommand class for a user identifier cross-reference (84) =


class UserIdXrefCommand( XrefCommand ):
    """A UserIdXref command."""
    def weave( self, aWeb, aWeaver ):
        """Weave a user identifier Xref from @d commands."""
        ux= aWeb.userNamesXref()
        aWeaver.xrefHead()
        for u in sorted(ux):
            defn, refList= ux[u]
            aWeaver.xrefDefLine( u, defn, refList )
        aWeaver.xrefFoot()

◊ UserIdXrefCommand class for a user identifier cross-reference (84). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

ReferenceCommand class

Usage

A ReferenceCommand instance is created by a WebReader when a @<name@> construct in is found in the input stream. This is attached to the current Chunk being built by the WebReader.

Design

During a weave, this creates a markup reference to another NamedChunk. During tangle, this actually includes the NamedChunk at this point in the tangled output file.

The constructor creates several attributes of an instance of a ReferenceCommand.

refTo, the name of the chunk to which this refers, possibly elided with a trailing '...'.
fullName, the full name of the chunk to which this refers.
chunkList, the list of the chunks to which the name refers.

Implementation

ReferenceCommand class for chunk references (85) =


class ReferenceCommand( Command ):
    """A reference to a named chunk, via @<name@>."""
    def __init__( self, refTo, fromLine=0 ):
        Command.__init__( self, fromLine )
        self.refTo= refTo
        self.fullname= None
        self.sequenceList= None
        self.chunkList= []
    def __str__( self ):
        return "at %r: reference to chunk %r" % (self.lineNumber,self.refTo)
    →ReferenceCommand resolve a referenced chunk name (86)
    →ReferenceCommand refers to a chunk (87)
    →ReferenceCommand weave a reference to a chunk (88)
    →ReferenceCommand tangle a referenced chunk (89)

◊ ReferenceCommand class for chunk references (85). Used by Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

The resolve() method queries the overall Web instance for the full name and sequence number for this chunk reference. This is used by the Weaver class referenceTo() method to write the markup reference to the chunk.

ReferenceCommand resolve a referenced chunk name (86) =


def resolve( self, aWeb ):
    """Expand the referenced chunk name into a full name and list of parts"""
    self.fullName= aWeb.fullNameFor( self.refTo )
    self.chunkList= [ c.seq for c in aWeb.getchunk( self.refTo ) ]

◊ ReferenceCommand resolve a referenced chunk name (86). Used by ReferenceCommand class for chunk references (85); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

The ref() method is a request that is delegated by a Chunk; it resolves the reference this Command makes within the containing Chunk. When the Chunk iterates through the Commands, it can accumulate a list of Chinks to which it refers.

ReferenceCommand refers to a chunk (87) =


def ref( self, aWeb ):
    """Find and return the full name for this reference."""
    self.resolve( aWeb )
    return self.fullName

◊ ReferenceCommand refers to a chunk (87). Used by ReferenceCommand class for chunk references (85); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

The weave() method inserts a markup reference to a named chunk. It uses the Weaver class referenceTo() method to format this appropriately for the document type being woven.

ReferenceCommand weave a reference to a chunk (88) =


def weave( self, aWeb, aWeaver ):
    """Create the nicely formatted reference to a chunk of code."""
    self.resolve( aWeb )
    aWeb.weaveChunk( self.fullName, aWeaver )

◊ ReferenceCommand weave a reference to a chunk (88). Used by ReferenceCommand class for chunk references (85); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

The tangle() method inserts the resolved chunk in this place. When a chunk is tangled, it sets the indent, inserts the chunk and resets the indent.

ReferenceCommand tangle a referenced chunk (89) =


def tangle( self, aWeb, aTangler ):
    """Create source code."""
    self.resolve( aWeb )
    # Update indent based on last line of previous command. 
    if self.chunk is None or self.chunk.previous_command is None:
        logger.error( "Command disconnected from Chunk." )
        raise Error( "Serious problem in WebReader." )
    logger.debug( "Indent %s + %r", aTangler.context, self.chunk.previous_command.indent() )
    aTangler.setIndent( command=self.chunk.previous_command )
    aWeb.tangleChunk( self.fullName, aTangler )
    aTangler.clrIndent()

◊ ReferenceCommand tangle a referenced chunk (89). Used by ReferenceCommand class for chunk references (85); Command class hierarchy - used to describe individual commands (75); Base Class Definitions (1); pyweb.py (150).

Error class

Usage

An Error is raised whenever processing cannot continue. Since it is a subclass of Exception, it takes an arbitrary number of arguments. The first should be the basic message text. Subsequent arguments provide additional details. We will try to be sure that all of our internal exceptions reference a specific chunk, if possible. This means either including the chunk as an argument, or catching the exception and appending the current chunk to the exception's arguments.

The Python raise statement takes an instance of Error and passes it to the enclosing try/except statement for processing.

The typical creation is as follows:

raise Error("No full name for %r" % chunk.name, chunk)

A typical exception-handling suite might look like this:

try:
    ...something that may raise an Error or Exception...
except Error,e:
    print( e.args ) # this is a pyWeb internal Error
except Exception,w:
    print( w.args ) # this is some other Python Exception

Design

The Error class is a subclass of Exception used to differentiate application-specific exceptions from other Python exceptions. It does no additional processing, but merely creates a distinct class to facilitate writing except statements.

Implementation

Error class - defines the errors raised (90) =


class Error( Exception ): pass

◊ Error class - defines the errors raised (90). Used by Base Class Definitions (1); pyweb.py (150).

The Reference Strategy Hierarchy

The Reference Strategy has two implementations. An instance of this is injected into each Chunk by the Web. The transitive closure of references requires walking through the web. By injecting this algorithm, we assure that that (1) each Chunk can produce all relevant information and (2) a simple configuration change can be applied to the document.

Reference Superclass

The superclass is an abstract class that defines the interface for this object.

Reference class hierarchy - references to a chunk (91) =


class Reference( object ):
    def __init__( self, aWeb ):
        self.web = aWeb
    def chunkReferencedBy( self, aChunk ):
        """Return a list of Chunks."""
        pass

◊ Reference class hierarchy - references to a chunk (91). Used by Base Class Definitions (1); pyweb.py (150).

SimpleReference Class

The SimpleReference subclass does the simplest version of resolution.

Reference class hierarchy - references to a chunk (92) +=


class SimpleReference( Reference ):
    def __init__( self, aWeb ):
        self.web = aWeb
    def chunkReferencedBy( self, aChunk ):
        """:todo: Return the chunks themselves."""
        refBy= aChunk.referencedBy
        return [ (c.fullName, c.seq) for c in refBy ]

◊ Reference class hierarchy - references to a chunk (92). Used by Base Class Definitions (1); pyweb.py (150).

TransitiveReference Class

The TransitiveReference subclass does a transitive closure of all references to this Chunkn.

Reference class hierarchy - references to a chunk (93) +=


class TransitiveReference( Reference ):
    def __init__( self, aWeb ):
        self.web = aWeb
    def chunkReferencedBy( self, aChunk ):
        """:todo: Return the chunks themselves."""
        refBy= aChunk.referencedBy
        logger.debug( "References: %r(%d) %r", aChunk.name, aChunk.seq, refBy )
        closure= self.allParentsOf( refBy )
        return [ (c.fullName, c.seq) for c in closure ]
    def allParentsOf( self, chunkList, depth=0 ):
        """Transitive closure of parents.
        :todo: Return the chunks themselves.
        """
        final = []
        for c in chunkList:
            final.append( c )
            final.extend( self.allParentsOf( c.referencedBy, depth+1 ) )
        logger.debug( "References: %*s %r", 2*depth, '--', final )
        return final

◊ Reference class hierarchy - references to a chunk (93). Used by Base Class Definitions (1); pyweb.py (150).

The Web Class

The overall web of chunks is carried in a single instance of the Web class that drives the weaving and tangling actions. Broadly, the functionality of a Web can be separated into several areas. Fundamentally, a Web is a hybrid list-dictionary. It's a list of chunks that also offers a moderately sophisticated lookup, including exact match for a chunk name and an approximate match for a chunk name. It's a dictionary that also retains anonymous chunks in order. Additionally, there are some methods that can be refactored into the WebReader for resolve references among chunks.

construction methods used by Chunks and WebReader
Chunk name resolution methods
enrichment of the web, once all the Chunks are known; each Chunk is updated with Chunk references it makes as well as Chunks which reference it.
Chunk cross reference methods
miscellaneous access
tangle
weave

A web instance has a number of attributes.

sourceFileName, the name of the original .w file.
chunkSeq, the sequence of Chunk instances as seen in the input file. To support anonymous chunks, and to assure that the original input document order is preserved, we keep all chunks in a master sequential list.
output, the @o named OutputChunk chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with "="), all others a second definitions (marked with "+=").
named, the @d named NamedChunk chunks. Each element of this dictionary is a sequence of chunks that have the same name. The first is the initial definition (marked with "="), all others a second definitions (marked with "+=").
usedBy, the cross reference of chunks referenced by commands in other chunks.
sequence, is used to assign a unique sequence number to each named chunk.

Web class - describes the overall "web" of chunks (94) =


class Web( object ):
    """The overall Web of chunks."""
    def __init__( self, name ):
        self.webFileName= name
        self.chunkSeq= [] 
        self.output= {} # Map filename to Chunk
        self.named= {} # Map chunkname to Chunk
        self.sequence= 0
        self.reference_style = TransitiveReference(self)
    def __str__( self ):
        return "Web %r" % ( self.webFileName, )
    →Web construction methods used by Chunks and WebReader (95)
    →Web Chunk name resolution methods (100)(101)
    →Web Chunk cross reference methods (102)(104)(105)(106)
    →Web determination of the language from the first chunk (109)
    →Web tangle the output files (110)
    →Web weave the output document (111)

◊ Web class - describes the overall "web" of chunks (94). Used by Base Class Definitions (1); pyweb.py (150).

During web construction, it is convenient to capture information about the individual Chunk instances being appended to the web. This done using a Callback design pattern. Each subclass of Chunk provides an override for the Chunk class webAdd() method. This override calls one of the appropriate web construction methods.

Also note that the full name for a chunk can be given either as part of the definition, or as part a reference. Typically, the first reference has the full name and the definition has the elided name. This allows a reference to a chunk to contain a more complete description of the chunk.

Web construction methods used by Chunks and WebReader (95) =


→Web add full chunk names, ignoring abbreviated names (96)
→Web add an anonymous chunk (97)
→Web add a named macro chunk (98)
→Web add an output file definition chunk (99)

◊ Web construction methods used by Chunks and WebReader (95). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

A name is only added to the known names when it is a full name, not an abbreviation ending with "...". Abbreviated names are quietly skipped until the full name is seen.

The algorithm for the addDefName() method, then is as follows:

Use the fullNameFor() method to locate the full name.
If no full name was found (the result of fullNameFor() ends with '...'), ignore this name as an abbreviation with no definition.
If this is a full name and the name was not in the named mapping, add this full name to the mapping.

This name resolution approach presents a problem when a chunk is defined before it is referenced and the first definition uses an abbreviated name. This is an atypical construction of an input document, however, since the intent is to provide high-level summaries that have forward references to supporting details.

Web add full chunk names, ignoring abbreviated names (96) =


def addDefName( self, name ):
    """Reference to or definition of a chunk name."""
    nm= self.fullNameFor( name )
    if nm is None: return None
    if nm[-3:] == '...':
        logger.debug( "Abbreviated reference %r", name )
        return None # first occurance is a forward reference using an abbreviation
    if nm not in self.named:
        self.named[nm]= []
        logger.debug( "Adding empty chunk %r", name )
    return nm

◊ Web add full chunk names, ignoring abbreviated names (96). Used by Web construction methods used by Chunks and WebReader (95); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

An anonymous Chunk is kept in a sequence of chunks, used for tangling.

Web add an anonymous chunk (97) =


def add( self, chunk ):
    """Add an anonymous chunk."""
    self.chunkSeq.append( chunk )

◊ Web add an anonymous chunk (97). Used by Web construction methods used by Chunks and WebReader (95); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

A named Chunk is defined with a @d command. It is collected into a mapping of NamedChunk instances. An entry in the mapping is a sequence of chunks that have the same name. This sequence of chunks is used to produce the weave or tangle output.

All chunks are also placed in the overall sequence of chunks. This overall sequence is used for weaving the document.

The addDefName() method is used to resolve this name if it is an abbreviation, or add it to the mapping if this is the first occurance of the name. If the name cannot be added, an instance of our Error class is raised. If the name exists or was added, the chunk is appended to the chunk list associated with this name.

The web's sequence counter is incremented, and this unique sequence number sets the seq attribute of the Chunk. If the chunk list was empty, this is the first chunk, the initial flag is set to True when there's only one element in the list. Otherwise, it's false.

Web add a named macro chunk (98) =


def addNamed( self, chunk ):
    """Add a named chunk to a sequence with a given name."""
    chunk.reference_style= self.reference_style
    self.chunkSeq.append( chunk )
    nm= self.addDefName( chunk.name )
    if nm:
        # We found the full name for this chunk
        self.sequence += 1
        chunk.seq= self.sequence
        chunk.fullName= nm
        self.named[nm].append( chunk )
        chunk.initial= len(self.named[nm]) == 1
        logger.debug( "Extending chunk %r from %r", nm, chunk.name )
    else:
        raise Error("No full name for %r" % chunk.name, chunk)

◊ Web add a named macro chunk (98). Used by Web construction methods used by Chunks and WebReader (95); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

An output file definition Chunk is defined with an @o command. It is collected into a mapping of OutputChunk instances. An entry in the mapping is a sequence of chunks that have the same name. This sequence of chunks is used to produce the weave or tangle output.

Note that file names cannot be abbreviated.

All chunks are also placed in overall sequence of chunks. This overall sequence is used for weaving the document.

If the name does not exist in the output mapping, the name is added with an empty sequence of chunks. In all cases, the chunk is appended to the chunk list associated with this name.

The web's sequence counter is incremented, and this unique sequence number sets the Chunk's seq attribute. If the chunk list was empty, this is the first chunk, the initial flag is True if this is the first chunk.

Web add an output file definition chunk (99) =


def addOutput( self, chunk ):
    """Add an output chunk to a sequence with a given name."""
    chunk.reference_style= self.reference_style
    self.chunkSeq.append( chunk )
    if chunk.name not in self.output:
        self.output[chunk.name] = []
        logger.debug( "Adding chunk %r", chunk.name )
    self.sequence += 1
    chunk.seq= self.sequence
    chunk.fullName= chunk.name
    self.output[chunk.name].append( chunk )
    chunk.initial = len(self.output[chunk.name]) == 1

◊ Web add an output file definition chunk (99). Used by Web construction methods used by Chunks and WebReader (95); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

Web chunk name resolution has three aspects. The first is resolving elided names (those ending with ...) to their actual full names. The second is finding the named chunk in the web structure. The third is returning a reference to a specific chunk including the name and sequence number.

Note that a chunk name actually refers to a sequence of chunks. Multiple definitions for a chunk are allowed, and all of the definitions are concatenated to create the complete chunk. This complexity makes it unwise to return the sequence of same-named chunks; therefore, we put the burden on the Web to process all chunks with a given name, in sequence.

The fullNameFor() method resolves full name for a chunk as follows:

If the string is already in the named mapping, this is the full name
If the string ends in '...', visit each key in the dictionary to see if the key starts with the string up to the trailing '...'. If a match is found, the dictionary key is the full name.
Otherwise, treat this as a full name.

Web Chunk name resolution methods (100) =


def fullNameFor( self, name ):
    """Resolve "..." names into the full name."""
    if name in self.named: return name
    if name[-3:] == '...':
        best= [ n for n in self.named.keys()
            if n.startswith( name[:-3] ) ]
        if len(best) > 1:
            raise Error("Ambiguous abbreviation %r, matches %r" % ( name, best ) )
        elif len(best) == 1: 
            return best[0]
    return name

◊ Web Chunk name resolution methods (100). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The getchunk() method locates a named sequence of chunks by first determining the full name for the identifying string. If full name is in the named mapping, the sequence of chunks is returned. Otherwise, an instance of our Error class is raised because the name is unresolvable.

It might be more helpful for debugging to emit this as an error in the weave and tangle results and keep processing. This would allow an author to catch multiple errors in a single run of pyWeb.

Web Chunk name resolution methods (101) +=


def getchunk( self, name ):
    """Locate a named sequence of chunks."""
    nm= self.fullNameFor( name )
    if nm in self.named:
        return self.named[nm]
    raise Error( "Cannot resolve %r in %r" % (name,self.named.keys()) )

◊ Web Chunk name resolution methods (101). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

Cross-reference support includes creating and reporting on the various cross-references available in a web. This includes creating the list of chunks that reference a given chunk; and returning the file, macro and user identifier cross references.

Each Chunk has a list Reference commands that shows the chunks to which a chunk refers. These relationships must be reversed to show the chunks that refer to a given chunk. This is done by traversing the entire web of named chunks and recording each chunk-to-chunk reference. This mapping has the referred-to chunk as the key, and a sequence of referring chunks as the value.

The accumulation is initiated by the web's createUsedBy() method. This method visits a Chunk, calling the genReferences() method, passing in the Web instance as an argument. Each Chunk class genReferences() method, in turn, invokes the usedBy() method of each Command instance in the chunk. Most commands do nothing, but a ReferenceCommand will resolve the name to which it refers.

When the createUsedBy() method has accumulated the entire cross reference, it also assures that all chunks are used exactly once.

Web Chunk cross reference methods (102) =


def createUsedBy( self ):
    """Update every piece of a Chunk to show how the chunk is referenced.
    Each piece can then report where it's used in the web.
    """
    for aChunk in self.chunkSeq:
        #usage = (self.fullNameFor(aChunk.name), aChunk.seq)
        for aRefName in aChunk.genReferences( self ):
            for c in self.getchunk( aRefName ):
                c.referencedBy.append( aChunk )
                c.refCount += 1
    →Web Chunk check reference counts are all one (103)

◊ Web Chunk cross reference methods (102). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

We verify that the reference count for a chunk is exactly one. We don't gracefully tolerate multiple references to a chunk or unreferenced chunks.

Web Chunk check reference counts are all one (103) =


for nm in self.no_reference():
    logger.warn( "No reference to %r", nm )
for nm in self.multi_reference():
    logger.warn( "Multiple references to %r", nm )
for nm in self.no_definition():
    logger.warn( "No definition for %r", nm )

◊ Web Chunk check reference counts are all one (103). Used by Web Chunk cross reference methods (102); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The one-pass version

for nm,cl in self.named.items():
   if len(cl) > 0:
       if cl[0].refCount == 0:
           logger.warn( "No reference to %r", nm )
       elif cl[0].refCount > 1:
           logger.warn( "Multiple references to %r", nm )
   else:
       logger.warn( "No definition for %r", nm )

We use three methods to filter chunk names into the various warning categories. The no_reference list is a list of chunks defined by never referenced. The multi_reference list is a list of chunks defined by never referenced. The no_definition list is a list of chunks referenced but not defined.

Web Chunk cross reference methods (104) +=


def no_reference( self ):
    return [ nm for nm,cl in self.named.items() if len(cl)>0 and cl[0].refCount == 0 ]
def multi_reference( self ):
    return [ nm for nm,cl in self.named.items() if len(cl)>0 and cl[0].refCount > 1 ]
def no_definition( self ):
    return [ nm for nm,cl in self.named.items() if len(cl) == 0 ]

◊ Web Chunk cross reference methods (104). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The fileXref() method visits all named file output chunks in output and collects the sequence numbers of each section in the sequence of chunks.

The chunkXref() method uses the same algorithm as a the fileXref() method, but applies it to the named mapping.

Web Chunk cross reference methods (105) +=


def fileXref( self ):
    fx= {}
    for f,cList in self.output.items():
        fx[f]= [ c.seq for c in cList ]
    return fx
def chunkXref( self ):
    mx= {}
    for n,cList in self.named.items():
        mx[n]= [ c.seq for c in cList ]
    return mx

◊ Web Chunk cross reference methods (105). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The userNamesXref() method creates a mapping for each user identifier. The value for this mapping is a tuple with the chunk that defined the identifer (via a @| command), and a sequence of chunks that reference the identifier.

For example: { 'Web': ( 87, (88,93,96,101,102,104) ), 'Chunk': ( 53, (54,55,56,60,57,58,59) ) }, shows that the identifier 'Web' is defined in chunk with a sequence number of 87, and referenced in the sequence of chunks that follow.

This works in two passes:

_gatherUserId() gathers all user identifiers
_updateUserId() searches all text commands for the identifiers and updates the Web class cross reference information.

Web Chunk cross reference methods (106) +=


def userNamesXref( self ):
    ux= {}
    self._gatherUserId( self.named, ux )
    self._gatherUserId( self.output, ux )
    self._updateUserId( self.named, ux )
    self._updateUserId( self.output, ux )
    return ux
def _gatherUserId( self, chunkMap, ux ):
    →collect all user identifiers from a given map into ux (107)
def _updateUserId( self, chunkMap, ux ):
    →find user identifier usage and update ux from the given map (108)

◊ Web Chunk cross reference methods (106). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

User identifiers are collected by visiting each of the sequence of Chunks that share the same name; within each component chunk, if chunk has identifiers assigned by the @| command, these are seeded into the dictionary. If the chunk does not permit identifiers, it simply returns an empty list as a default action.

collect all user identifiers from a given map into ux (107) =


for n,cList in chunkMap.items():
    for c in cList:
        for id in c.getUserIDRefs():
            ux[id]= ( c.seq, [] )

◊ collect all user identifiers from a given map into ux (107). Used by Web Chunk cross reference methods (106); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

User identifiers are cross-referenced by visiting each of the sequence of Chunks that share the same name; within each component chunk, visit each user identifier; if the Chunk class searchForRE() method matches an identifier, this is appended to the sequence of chunks that reference the original user identifier.

find user identifier usage and update ux from the given map (108) =


# examine source for occurances of all names in ux.keys()
for id in ux.keys():
    logger.debug( "References to %r", id )
    idpat= re.compile( r'\W%s\W' % id )
    for n,cList in chunkMap.items():
        for c in cList:
            if c.seq != ux[id][0] and c.searchForRE( idpat ):
                ux[id][1].append( c.seq )

◊ find user identifier usage and update ux from the given map (108). Used by Web Chunk cross reference methods (106); Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The language() method determines the output language. The determination of the language can be done a variety of ways. One is to use command line parameters, another is to use the filename extension on the input file.

We examine the first few characters of input. A proper HTML, XHTML or XML file begins with '<!', '<?' or '<H'. LaTeX files typically begin with '%' or '\'.

Web determination of the language from the first chunk (109) =


def language( self, preferredWeaverClass=None ):
    """Construct a weaver appropriate to the document's language"""
    if preferredWeaverClass:
        return preferredWeaverClass()
    if self.chunkSeq[0].startswith('<'): return HTML()
    if self.chunkSeq[0].startswith('%') or self.chunkSeq[0].startswith('\\'):  return LaTeX()
    return Weaver()

◊ Web determination of the language from the first chunk (109). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The tangle() method of the Web class performs the tangle() method for each Chunk of each named output file. Note that several chunks may share the file name, requiring the file be composed of material in each chunk.

During tangling of a chunk, the chunk may reference another chunk. This transitive tangling of an individual chunk is handled by the tangleChunk() method.

Web tangle the output files (110) =


def tangle( self, aTangler ):
    for f,c in self.output.items():
        aTangler.open( f )
        for p in c:
            p.tangle( self, aTangler )
        aTangler.close()
def tangleChunk( self, name, aTangler ):
    logger.debug( "Tangling chunk %r", name )
    chunkList= self.getchunk(name)
    if len(chunkList) == 0:
        raise Error( "Attempt to tangle an undefined Chunk, %s." % ( name, ) )
    for p in chunkList:
        p.tangle( self, aTangler )

◊ Web tangle the output files (110). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The weave() method of the Web class creates the final documentation. This is done by stepping through each Chunk in sequence and weaving the chunk into the resulting file via the Chunk class weave() method.

During weaving of a chunk, the chunk may reference another chunk. When weaving a reference to a named chunk (output or ordinary programming source defined with @{), this does not lead to transitive weaving: only a reference is put in from one chunk to another. However, when weaving a chunk defined with @[, the chunk is expanded when weaving. The decision is delegated to the referenced chunk.

Web weave the output document (111) =


def weave( self, aWeaver ):
    aWeaver.open( self.webFileName )
    for c in self.chunkSeq:
        c.weave( self, aWeaver )
    aWeaver.close()
def weaveChunk( self, name, aWeaver ):
    logger.debug( "Weaving chunk %r", name )
    chunkList= self.getchunk(name)
    if not chunkList:
        raise Error( "No Definition for %s", name )
    chunkList[0].weaveReferenceTo( self, aWeaver )
    for p in chunkList[1:]:
        p.weaveShortReferenceTo( self, aWeaver )

◊ Web weave the output document (111). Used by Web class - describes the overall "web" of chunks (94); Base Class Definitions (1); pyweb.py (150).

The WebReader Class

Usage

There are two forms of the constructor for a WebReader. The initial WebReader instance is created with code like the following:

p= WebReader( aFileName, command=aCommandCharacter )

This will define the initial input file and the command character, both of which are command-line parameters to the application.

When processing an include file (with the @i command), a child WebReader instance is created with code like the following:

c= WebReader( anIncludeName, parent=parentWebReader )

This will define the included file, but will inherit the command character from the parent WebReader. This will also include a reference from child to parent so that embedded Python expressions can view the entire input context.

Design

The WebReader class parses the input file into command blocks. These are assembled into Chunks, and the Chunks are assembled into the document Web. Once this input pass is complete, the resulting Web can be tangled or woven.

The parser works by reading the entire file and splitting on @. patterns. The split() method of the Python re module will separate the input and preserve the actual character sequence on which the input was split. This breaks the input into blocks of text separated by the @. characters.

"Major" commands partition the input into Chunks. The major commands are @d and @o, as well as the @{, @}, @[, @] brackets, and the @i command to include another file.

"Minor" commands are inserted into a Chunk as a Command. Blocks of text are minor commands, as well as the @<name@> references, the various cross-reference commands (@f, @m and @u). The @@ escape is also handled here so that all further processing is independent of any parsing.

Implementation

The class has the following attributes:

fileName is used to pass the file name to the Web instance.
tokenList is the completely tokenized input file.
token is the most recently examined token.
tokenIndex is an index through the tokenList.
lineNumber is the count of '\n' characters seen in the tokens.
aChunk is the current open Chunk.
parent is the outer WebReader when processing a @i command.
theWeb is the current open Web.
permitList is the list of commands that are permitted to fail. This is generally an empty list or ('@i',).
command is the command character; a WebReader will use the parent command character if the parent is not None.
parsePat is generated from the command character, and is used to parse the input into tokens.

WebReader class - parses the input file, building the Web structure (112) =


class WebReader( object ):
    """Parse an input file, creating Commands and Chunks."""
    def __init__( self, parent=None, command='@', permit=None ):
        # Configuration of this reader.
        self._source= None
        self.fileName= None
        self.parent= parent
        self.theWeb= None
        if self.parent: 
            self.command= self.parent.command
            self.permitList= self.parent.permitList
        else:
            self.command= command
            self.permitList= [] if permit is None else permit
            
        self.log_reader= logging.getLogger( "pyweb.%s" % self.__class__.__name__ )

        # State of reading and parsing.
        self.tokenList= []
        self.token= ""
        self.tokenIndex= 0
        self.tokenPushback= []
        self.lineNumber= 0
        self.aChunk= None
        self.totalLines= 0
        self.totalFiles= 0
        self.parsePat= '(%s.)' % self.command
        →WebReader command literals (130)
    def __str__( self ):
        return self.__class__.__name__
    →WebReader fluent property-like methods (113)
    →WebReader tokenize the input (114)
    →WebReader location in the input stream (115)
    →WebReader handle a command string (116)
    →WebReader load the web (128)

◊ WebReader class - parses the input file, building the Web structure (112). Used by Base Class Definitions (1); pyweb.py (150).

A few fluent property-like methods help set the attributes of a WebReader.

WebReader fluent property-like methods (113) =


def web( self, aWeb ):
    self.theWeb= aWeb
    return self
def source( self, name, source=None ):
    """Set a name to display with error messages; also set the actual file-like source.
    if no source is given, the name is treated as a filename and opened.
    """
    self.fileName= name
    self._source= source
    return self

◊ WebReader fluent property-like methods (113). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

This tokenizer centralizes a single call to nextToken(). This assures that every token is examined by nextToken(), which permits accurate counting of the '\n' characters and determining the line numbers of the input file. This line number information can then be attached to each Command, directing the user back to the correct line of the original input file.

The tokenizer supports lookahead by allowing the parser to examine tokens and then push them back into a pushBack stack. Generally this is used for the special case of parsing the @i command, which has no @-command terminator or separator. It ends with the following '\n'.

Python permits a simplified double-ended queue for this kind of token stream processing. Ordinary tokens are fetched with a pop(0), and a pushback is done by prepending the pushback token with a tokenList = [ token ] + tokenList. For this application, however, we need to keep a count of '\n's seen, and we want to avoid double-counting '\n' pushed back into the token stream. So we use a queue of tokens and a stack for pushback.

WebReader tokenize the input (114) =


def openSource( self ):
    if self._source is None:
        self._source= open( self.fileName, "r" )
    text= self._source.read()
    self.tokenList= re.split(self.parsePat, text )
    self.lineNumber= 1
    self.totalLines= 0
    self.totalFiles += 1
    self.tokenPushback= []
def nextToken( self ):
    lines=  self.token.count('\n')
    self.lineNumber += lines
    self.totalLines += lines
    if self.tokenPushback:
        self.token= self.tokenPushback.pop()
    else:
        self.token= self.tokenList.pop(0)
    return self.token
def moreTokens( self ):
    return self.tokenList or self.tokenPushback
def pushBack( self, token ):
    self.tokenPushback.append( token )

◊ WebReader tokenize the input (114). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

The location() provides the file name and range of lines for a particular command. This allows error messages as well as tangled or woven output to correctly reference the original input files.

WebReader location in the input stream (115) =


def location( self ):
    return ( self.fileName, self.lineNumber, self.lineNumber+self.token.count("\n") )

◊ WebReader location in the input stream (115). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

Command recognition is done via a Chain of Command-like design. There are two conditions: the command string is recognized or it is not recognized.

If the command is recognized, handleCommand() either:

(for major commands) attaches the current Chunk (self.aChunk) to the current Web (self.aWeb), or
(for minor commands) create a Command, attach it to the current Chunk (self.aChunk)

and returns a true result.

If the command is not recognized, handleCommand() returns false.

A subclass can override handleCommand() to (1) call this superclass version; (2) if the command is unknown to the superclass, then the subclass can attempt to process it; (3) if the command is unknown to both classes, then return false. Either a subclass will handle it, or the default activity taken by load() is to treat the command a text, but also issue a warning.

WebReader handle a command string (116) =


def handleCommand( self, token ):
    self.log_reader.debug( "Reading %r", token )
    →major commands segment the input into separate Chunks (117)
    →minor commands add Commands to the current Chunk (123)
    elif token[:2] in (self.cmdlcurl,self.cmdlbrak):
        # These should be consumed as part of @o and @d parsing
        raise Error('Extra %r (possibly missing chunk name)' % token, self.aChunk)
    else:
        return None # did not recogize the command
    return True # did recognize the command

◊ WebReader handle a command string (116). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

The following sequence of if-elif statements identifies the major commands that partition the input into separate Chunks.

major commands segment the input into separate Chunks (117) =


if token[:2] == self.cmdo:
    →start an OutputChunk, adding it to the web (119)
elif token[:2] == self.cmdd:
    →start a NamedChunk or NamedDocumentChunk, adding it to the web (120)
elif token[:2] == self.cmdi:
    →import another file (121)
elif token[:2] in (self.cmdrcurl,self.cmdrbrak):
    →finish a chunk, start a new Chunk adding it to the web (122)

◊ major commands segment the input into separate Chunks (117). Used by WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

An output chunk has the form @o name @{ content @}. We use the first two tokens to name the OutputChunk. We simply expect the @{ separator. We then attach all subsequent commands to this chunk while waiting for the final @} token to end the chunk.

TODO The file name information can be split into parts on a ' '. We can add escaping ('\ ') and quoting to allow more flexibility. If there's one part, it's the file name. If there is more than one part, it will provide comment characters. The shlex module will handle the parsing into quoted fields.

Imports (118) +=

import shlex

◊ Imports (118). Used by pyweb.py (150).

start an OutputChunk, adding it to the web (119) =


args= self.nextToken().strip()
values = shlex.split( args )
if len(values) == 1:
    self.aChunk= OutputChunk( values[0], "", "" )
elif len(values) == 2:
    self.aChunk= OutputChunk( values[0], values[1], "" )
else:
    self.aChunk= OutputChunk( values[0], values[1], values[2] )
self.aChunk.webAdd( self.theWeb )
self.expect( (self.cmdlcurl,) )
# capture an OutputChunk up to @}

◊ start an OutputChunk, adding it to the web (119). Used by major commands segment the input into separate Chunks (117); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

An named chunk has the form @d name @{ content @} for code and @d name @[ content @] for document source. We use the first two tokens to name the NamedChunk or NamedDocumentChunk. We expect either the @{ or @[ separator, and use the actual token found to choose which subclass of Chunk to create. We then attach all subsequent commands to this chunk while waiting for the final @} or @] token to end the chunk.

start a NamedChunk or NamedDocumentChunk, adding it to the web (120) =


name= self.nextToken().strip()
# next token is @{ or @[
brack= self.expect( (self.cmdlcurl,self.cmdlbrak) )
if brack == self.cmdlcurl: 
    self.aChunk= NamedChunk( name )
else: 
    self.aChunk= NamedDocumentChunk( name )
self.aChunk.webAdd( self.theWeb )
# capture a NamedChunk up to @} or @]

◊ start a NamedChunk or NamedDocumentChunk, adding it to the web (120). Used by major commands segment the input into separate Chunks (117); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

An import command has the unusual form of @i name, with no trailing separator. When we encounter the @i token, the next token will start with the file name, but may continue with an anonymous chunk. We require that all @i commands occur at the end of a line, and break on the '\n' which must occur after the file name. This permits file names with embedded spaces.

Once we have split the file name away from the rest of the following anonymous chunk, we push the following token back into the token stream, so that it will be the first token examined at the top of the load() loop.

We create a child WebReader instance to process the included file. The entire file is loaded into the current Web instance. A new, empty Chunk is created at the end of the file so that processing can resume with an anonymous Chunk.

import another file (121) =


# break this token on the '\n' and pushback the new token.
next= self.nextToken().split('\n',1)
self.pushBack('\n')
if len(next) > 1:
    self.pushBack( '\n'.join(next[1:]) )
incFile= next[0].strip()
try:
    with open(incFile,"r") as source:
        logger.info( "Including %r", incFile )
        include= WebReader( parent=self )
        include.source( incFile, source ).web( self.theWeb )
        include.load()
    self.totalLines += include.totalLines
    self.totalFiles += include.totalFiles
except (Error,IOError),e:
    logger.error( 
        "Problems with included file %s, output is incomplete.",
        incFile )
    # Discretionary - sometimes we want total failure
    if self.cmdi in self.permitList: pass
    else: raise
self.aChunk= Chunk()
self.aChunk.webAdd( self.theWeb )

◊ import another file (121). Used by major commands segment the input into separate Chunks (117); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

When a @} or @] are found, this finishes a named chunk. The next text is therefore part of an anonymous chunk.

Note that no check is made to assure that the previous Chunk was indeed a named chunk or output chunk started with @{ or @[. To do this, an attribute would be needed for each Chunk subclass that indicated if a trailing bracket was necessary. For the base Chunk class, this would be false, but for all other subclasses of Chunk, this would be true.

finish a chunk, start a new Chunk adding it to the web (122) =


self.aChunk= Chunk()
self.aChunk.webAdd( self.theWeb )

◊ finish a chunk, start a new Chunk adding it to the web (122). Used by major commands segment the input into separate Chunks (117); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

The following sequence of elif statements identifies the minor commands that add Command instances to the current open Chunk.

minor commands add Commands to the current Chunk (123) =


elif token[:2] == self.cmdpipe:
    →assign user identifiers to the current chunk (124)
elif token[:2] == self.cmdf:
    self.aChunk.append( FileXrefCommand(self.lineNumber) )
elif token[:2] == self.cmdm:
    self.aChunk.append( MacroXrefCommand(self.lineNumber) )
elif token[:2] == self.cmdu:
    self.aChunk.append( UserIdXrefCommand(self.lineNumber) )
elif token[:2] == self.cmdlangl:
    →add a reference command to the current chunk (125)
elif token[:2] == self.cmdlexpr:
    →add an expression command to the current chunk (126)
elif token[:2] == self.cmdcmd:
    →double at-sign replacement, append this character to previous TextCommand (127)

◊ minor commands add Commands to the current Chunk (123). Used by WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

User identifiers occur after a @| in a NamedChunk.

Note that no check is made to assure that the previous Chunk was indeed a named chunk or output chunk started with @{. To do this, an attribute would be needed for each Chunk subclass that indicated if user identifiers are permitted. For the base Chunk class, this would be false, but for the NamedChunk class and OutputChunk class, this would be true.

assign user identifiers to the current chunk (124) =


# variable references at the end of a NamedChunk
# aChunk must be subclass of NamedChunk
# These are accumulated and expanded by @u reference
try:
    self.aChunk.setUserIDRefs( self.nextToken().strip() )
except AttributeError:
    # Out of place user identifier command
    raise Error("Unexpected references near %s: %s" % (self.location(),token) )

◊ assign user identifiers to the current chunk (124). Used by minor commands add Commands to the current Chunk (123); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

A reference command has the form @< name @>. We accept three tokens from the input, the middle token is the referenced name.

add a reference command to the current chunk (125) =


# get the name, introduce into the named Chunk dictionary
expand= self.nextToken().strip()
self.expect( (self.cmdrangl,) )
self.theWeb.addDefName( expand )
self.aChunk.append( ReferenceCommand( expand, self.lineNumber ) )
self.aChunk.appendText( "", self.lineNumber ) # to collect following text
self.log_reader.debug( "Reading %r %r", expand, self.token )

◊ add a reference command to the current chunk (125). Used by minor commands add Commands to the current Chunk (123); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

An expression command has the form @( Python Expression @). We accept three tokens from the input, the middle token is the expression.

There are two alternative semantics for an embedded expression.

Deferred Execution. This requires definition of a new subclass of Command, ExpressionCommand, and appends it into the current Chunk. At weave and tangle time, this expression is evaluated. The insert might look something like this: aChunk.append( ExpressionCommand( expression, self.lineNumber ) ).
Immediate Execution. This simply creates a context and evaluates the Python expression. The output from the expression becomes a TextCommand, and is append to the current Chunk.

We use the Immediate Execution semantics.

add an expression command to the current chunk (126) =


# get the Python expression, create the expression command
expression= self.nextToken()
self.expect( (self.cmdrexpr,) )
try:
    theLocation= self.location()
    theWebReader= self
    theFile= self.theWeb.webFileName
    thisApplication= sys.argv[0]
    result= str(eval( expression ))
except Exception,e:
    result= '!!!Exception: %s' % e
    logger.exception( 'Failure to process %r: result is %s', expression, e )
self.aChunk.appendText( result, self.lineNumber )

◊ add an expression command to the current chunk (126). Used by minor commands add Commands to the current Chunk (123); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

A double command sequence ('@@', when the command is an '@') has the usual meaning of '@' in the input stream. We do this via the appendText() method of the current Chunk. This will append the character on the end of the most recent TextCommand; if this fails, it will create a new, empty TextCommand.

double at-sign replacement, append this character to previous TextCommand (127) =


# replace with '@' here and now!
# Put this at the end of the previous chunk
# AND make sure the next chunk is appended to this.
self.aChunk.appendText( self.command, self.lineNumber )

◊ double at-sign replacement, append this character to previous TextCommand (127). Used by minor commands add Commands to the current Chunk (123); WebReader handle a command string (116); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

The expect() method examines the next token to see if it is the expected string. If this is not found, a standard type of error message is written.

The load() method reads the entire input file as a sequence of tokens, split up by the openSource() method. Each token that appears to be a command is passed to the handleCommand() method. If the handleCommand() method returns a true result, the command was recognized and placed in the Web. if handleCommand() returns a false result, the command was unknown, and some default behavior is used.

The load() method takes an optional permit variable. This encodes commands where failure is permitted. Currently, only the @i command can be set to permit failure. This allows including a file that does not yet exist. The primary application of this option is when weaving test output. The first pass of pyWeb tangles the program source files; they are then run to create test output; the second pass of pyWeb weaves this test output into the final document via the @i command.

WebReader load the web (128) =


def expect( self, tokens ):
    if not self.moreTokens():
        raise Error("At %r: end of input, %r not found" % (self.location(),tokens) )
    t= self.nextToken()
    if t not in tokens:
        raise Error("At %r: expected %r, found %r" % (self.location(),tokens,t) )
    return t
    
def load( self ):
    self.aChunk= Chunk() # Initial anonymous chunk of text.
    self.aChunk.webAdd( self.theWeb )
    self.openSource()
    while self.moreTokens():
        token= self.nextToken()
        if len(token) >= 2 and token.startswith(self.command):
            if self.handleCommand( token ):
                continue
            else:
                →other command-like sequences are appended as a TextCommand (129)
        elif token:
            # accumulate non-empty block of text in the current chunk
            self.aChunk.appendText( token, self.lineNumber )

◊ WebReader load the web (128). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

other command-like sequences are appended as a TextCommand (129) =


logger.warn( 'Unknown @-command in input: %r', token )
self.aChunk.appendText( token, self.lineNumber )

◊ other command-like sequences are appended as a TextCommand (129). Used by WebReader load the web (128); WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

The command character can be changed to permit some flexibility when working with languages that make extensive use of the @ symbol, i.e., PERL. The initialization of the WebReader is based on the selected command character.

WebReader command literals (130) =


# major commands
self.cmdo= self.command+'o'
self.cmdd= self.command+'d'
self.cmdlcurl= self.command+'{'
self.cmdrcurl= self.command+'}'
self.cmdlbrak= self.command+'['
self.cmdrbrak= self.command+']'
self.cmdi= self.command+'i'
# minor commands
self.cmdlangl= self.command+'<'
self.cmdrangl= self.command+'>'
self.cmdpipe= self.command+'|'
self.cmdlexpr= self.command+'('
self.cmdrexpr= self.command+')'
self.cmdf= self.command+'f'
self.cmdm= self.command+'m'
self.cmdu= self.command+'u'
self.cmdcmd= self.command+self.command

◊ WebReader command literals (130). Used by WebReader class - parses the input file, building the Web structure (112); Base Class Definitions (1); pyweb.py (150).

Action Class Hierarchy

This application performs three major actions: loading the document web, weaving and tangling. Generally, the use case is to perform a load, weave and tangle. However, a less common use case is to first load and tangle output files, run a regression test and then load and weave a result that includes the test output file.

The -x option excludes one of the two output actions. The -xw excludes the weave pass, doing only the tangle action. The -xt excludes the tangle pass, doing the weave action.

This two pass action might be embedded in the following type of Python program.

import pyweb, os
pyweb.tangle( "source.w" )
os.system( "python source.py >source.log" )
pyweb.weave( "source.w" )

The first step runs pyWeb, excluding the final weaving pass. The second step runs the tangled program, source.py, and produces test results in a log file, source.log. The third step runs pyWeb excluding the tangle pass. This produces a final document that includes the source.log test results.

To accomplish this, we provide a class hierarchy that defines the various actions of the pyWeb application. This class hierarchy defines an extensible set of fundamental actions. This gives us the flexibility to create a simple sequence of actions and execute any combination of these. It eliminates the need for a forest of if-statements to determine precisely what will be done.

Each action has the potential to update the state of the overall application. A partner with this command hierarchy is the Application class that defines the application options, inputs and results.

Action class hierarchy - used to describe basic actions of the application (131) =


→Action superclass has common features of all actions (132)
→ActionSequence subclass that holds a sequence of other actions (135)
→WeaveAction subclass initiates the weave action (139)
→TangleAction subclass initiates the tangle action (142)
→LoadAction subclass loads the document web (145)

◊ Action class hierarchy - used to describe basic actions of the application (131). Used by Base Class Definitions (1); pyweb.py (150).

Action Class

The Action class embodies the basic operations of pyWeb. The intent of this hierarchy is to both provide an easily expanded method of adding new actions, but an easily specified list of actions for a particular run of pyWeb.

Usage

The overall process of the application is defined by an instance of Action. This instance may be the WeaveAction instance, the TangleAction instance or a ActionSequence instance.

The instance is constructed during parsing of the input parameters. Then the Action class perform() method is called to actually perform the action. There are three standard Action instances available: an instance that is a macro and does both tangling and weaving, an instance that excludes tangling, and an instance that excludes weaving. These correspond to the command-line options.

anOp= SomeAction( parameters )
anOp.options= parsed options
anOp.web = Current web
anOp()

Design

The Action is the superclass for all actions.

An Action has a number of common attributes.

name A name for this action.
Options The optparse options object.
web The current web that's being processed.
start The time at which the action started.

Implementation

Action superclass has common features of all actions (132) =


class Action( object ):
    """An action performed by pyWeb."""
    def __init__( self, name ):
        self.name= name
        self.web= None
        self.start= None
    def __str__( self ):
        return "%s [%s]" % ( self.name, self.web )
    →Action call method actually performs the action (133)
    →Action final summary method (134)

◊ Action superclass has common features of all actions (132). Used by Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The __call__() method does the real work of the action. For the superclass, it merely logs a message. This is overridden by a subclass.

Action call method actually performs the action (133) =


def __call__( self ):
    logger.info( "Starting %s", self )
    self.start= time.clock()

◊ Action call method actually performs the action (133). Used by Action superclass has common features of all actions (132); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The summary() method returns some basic processing statistics for this action.

Action final summary method (134) =


def duration( self ):
    """Return duration of the action."""
    return (self.start and time.clock()-self.start) or 0
def summary( self, *args ):
    return "%s in %0.1f sec." % ( self.name, self.duration() )

◊ Action final summary method (134). Used by Action superclass has common features of all actions (132); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

ActionSequence Class

A ActionSequence defines a composite action; it is a sequence of other actions. When the macro is performed, it delegates to the sub-actions.

Usage

The instance is created during parsing of input parameters. An instance of this class is one of the three standard actions available; it generally is the default, "do everything" action.

Design

This class overrides the perform() method of the superclass. It also adds an append() method that is used to construct the sequence of actions.

Implementation

ActionSequence subclass that holds a sequence of other actions (135) =


class ActionSequence( Action ):
    """An action composed of a sequence of other actions."""
    def __init__( self, name, opSequence=None ):
        super( ActionSequence, self ).__init__( name )
        if opSequence: self.opSequence= opSequence
        else: self.opSequence= []
    def __str__( self ):
        return "; ".join( [ str(x) for x in self.opSequence ] )
    →ActionSequence call method delegates the sequence of ations (136)
    →ActionSequence append adds a new action to the sequence (137)
    →ActionSequence summary summarizes each step (138)

◊ ActionSequence subclass that holds a sequence of other actions (135). Used by Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

Since the macro __call__() method delegates to other Actions, it is possible to short-cut argument processing by using the Python *args construct to accept all arguments and pass them to each sub-action.

ActionSequence call method delegates the sequence of ations (136) =


def __call__( self ):
    for o in self.opSequence:
        o.web= self.web
        o()

◊ ActionSequence call method delegates the sequence of ations (136). Used by ActionSequence subclass that holds a sequence of other actions (135); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

Since this class is essentially a wrapper around the built-in sequence type, we delegate sequence related actions directly to the underlying sequence.

ActionSequence append adds a new action to the sequence (137) =


def append( self, anAction ):
    self.opSequence.append( anAction )

◊ ActionSequence append adds a new action to the sequence (137). Used by ActionSequence subclass that holds a sequence of other actions (135); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The summary() method returns some basic processing statistics for each step of this action.

ActionSequence summary summarizes each step (138) =


def summary( self, *args ):
    return ", ".join( [ x.summary(*args) for x in self.opSequence ] )

◊ ActionSequence summary summarizes each step (138). Used by ActionSequence subclass that holds a sequence of other actions (135); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

WeaveAction Class

The WeaveAction defines the action of weaving. This action logs a message, and invokes the weave() method of the Web instance. This method also includes the basic decision on which weaver to use. If a Weaver was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected.

Usage

An instance is created during parsing of input parameters. The instance of this class is one of the standard actions available; it is the "exclude tangling" option and it is also an element of the "do everything" macro.

Design

This class overrides the perform() method of the superclass.

Implementation

WeaveAction subclass initiates the weave action (139) =


class WeaveAction( Action ):
    """An action that weaves a document."""
    def __init__( self ):
        super(WeaveAction, self).__init__( "Weave" )
        self.theWeaver= None
    def __str__( self ):
        return "%s [%s, %s]" % ( self.name, self.web, self.theWeaver )

    →WeaveAction call method does weaving of the document file (140)
    →WeaveAction summary method provides line counts (141)

◊ WeaveAction subclass initiates the weave action (139). Used by Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The language is picked just prior to weaving. It is either (1) the language specified on the command line, or, (2) if no language was specified, a language is selected based on the first few characters of the input.

Weaving can only raise an exception when there is a reference to a chunk that is never defined.

WeaveAction call method does weaving of the document file (140) =


def __call__( self ):
    super( WeaveAction, self ).__call__()
    if not self.theWeaver: 
        # Examine first few chars of first chunk of web to determine language
        self.theWeaver= self.web.language() 
    try:
        self.web.weave( self.theWeaver )
    except Error,e:
        logger.error(
            "Problems weaving document from %s (weave file is faulty).",
            self.web.webFileName )
        raise

◊ WeaveAction call method does weaving of the document file (140). Used by WeaveAction subclass initiates the weave action (139); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The summary() method returns some basic processing statistics for the weave action.

WeaveAction summary method provides line counts (141) =


def summary( self, *args ):
    if self.theWeaver and self.theWeaver.linesWritten > 0:
        return "%s %d lines in %0.1f sec." % ( self.name, self.theWeaver.linesWritten, self.duration() )
    return "did not %s" % ( self.name, )

◊ WeaveAction summary method provides line counts (141). Used by WeaveAction subclass initiates the weave action (139); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

TangleAction Class

The TangleAction defines the action of tangling. This operation logs a message, and invokes the weave() method of the Web instance. This method also includes the basic decision on which weaver to use. If a Weaver was specified on the command line, this instance is used. Otherwise, the first few characters are examined and a weaver is selected.

Usage

An instance is created during parsing of input parameters. The instance of this class is one of the standard actions available; it is the "exclude weaving" option, and it is also an element of the "do everything" macro.

Design

This class overrides the perform() method of the superclass.

Implementation

TangleAction subclass initiates the tangle action (142) =


class TangleAction( Action ):
    """An action that weaves a document."""
    def __init__( self ):
        super( TangleAction, self ).__init__( "Tangle" )
        self.theTangler= None
    →TangleAction call method does tangling of the output files (143)
    →TangleAction summary method provides total lines tangled (144)

◊ TangleAction subclass initiates the tangle action (142). Used by Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

Tangling can only raise an exception when a cross reference request (@f, @m or @u) occurs in a program code chunk. Program code chunks are defined with any of @d or @o and use @{ @} brackets.

TangleAction call method does tangling of the output files (143) =


def __call__( self ):
    super( TangleAction, self ).__call__()
    try:
        self.web.tangle( self.theTangler )
    except Error,e:
        logger.error( 
            "Problems tangling outputs from %s (tangle files are faulty).",
            self.web.webFileName )
        raise

◊ TangleAction call method does tangling of the output files (143). Used by TangleAction subclass initiates the tangle action (142); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The summary() method returns some basic processing statistics for the tangle action.

TangleAction summary method provides total lines tangled (144) =


def summary( self, *args ):
    if self.theTangler and self.theTangler.linesWritten > 0:
        return "%s %d lines in %0.1f sec." % ( self.name, self.theTangler.linesWritten, self.duration() )
    return "did not %s" % ( self.name, )

◊ TangleAction summary method provides total lines tangled (144). Used by TangleAction subclass initiates the tangle action (142); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

LoadAction Class

The LoadAction defines the action of loading the web structure. This action uses the application's webReader to actually do the load.

Usage

An instance is created during parsing of the input parameters. An instance of this class is part of any of the weave, tangle and "do everything" action.

Design

This class overrides the perform() method of the superclass.

Implementation

LoadAction subclass loads the document web (145) =


class LoadAction( Action ):
    """An action that loads the source web for a document."""
    def __init__( self ):
        super( LoadAction, self ).__init__( "Load" )
        self.web= None
        self.webReader= None
    def __str__( self ):
        return "Load [%s, %s]" % ( self.webReader, self.web )
    →LoadAction call method loads the input files (146)
    →LoadAction summary provides lines read (147)

◊ LoadAction subclass loads the document web (145). Used by Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

Trying to load the web involves two steps, either of which can raise exceptions due to incorrect inputs.

The WebReader class load() method can raise exceptions for a number of syntax errors.
- Missing closing brackets (@}, @] or @>).
- Missing opening bracket (@{ or @[) after a chunk name (@d or @o).
- Extra brackets (@{, @[, @}, @]).
- Extra @|.
- The input file does not exist or is not readable.
The Web class createUsedBy() method can raise an exception when a chunk reference cannot be resolved to a named chunk.

LoadAction call method loads the input files (146) =


def __call__( self ):
    super( LoadAction, self ).__call__()
    try:
        self.webReader.web(self.web).load()
        self.web.createUsedBy()
    except (Error,IOError),e:
        logger.error(
            "Problems with source file %s, no output produced.",
            self.web.webFileName )
        raise

◊ LoadAction call method loads the input files (146). Used by LoadAction subclass loads the document web (145); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

The summary() method returns some basic processing statistics for the load action.

LoadAction summary provides lines read (147) =


def summary( self, *args ):
    return "%s %d lines from %d files in %01.f sec." % ( 
        self.name, self.webReader.totalLines, 
        self.webReader.totalFiles, self.duration() )

◊ LoadAction summary provides lines read (147). Used by LoadAction subclass loads the document web (145); Action class hierarchy - used to describe basic actions of the application (131); Base Class Definitions (1); pyweb.py (150).

Module Components

Globals

It's convenient for a module, as a whole, to have a master logger. Individual classes may also have loggers, but it's helpful to have a global, default, logger.

Module Initialization of global variables (148) =


import logging
logger= logging.getLogger( "pyweb" )

◊ Module Initialization of global variables (148). Used by pyweb.py (150).

Additionally, the global list of weavers will be used by the Application.

Module Initialization of global variables (149) +=


# Module global list of available weavers.
weavers = {
    'html':  HTML(),
    'htmls': HTMLShort(),
    'latex': LaTeX(),
    'rst': Weaver(), # Generic Weaver produces RST.
}

◊ Module Initialization of global variables (149). Used by pyweb.py (150).

pyWeb Module File

The pyWeb application file is shown below:

pyweb.py (150) =

→Overheads (152)(153)(154)(155)
→Imports (12)(47)(118)(151)(156)
→Base Class Definitions (1)
→Application Class (157)
→Module Initialization of global variables (148)(149)
→Interface Functions (161)

◊ pyweb.py (150).

The overhead elements are described in separate sub sections as follows:

shell escape
from future imports
doc string
CVS cruft
imports

The more important elements are described in separate sections:

Base Class Definitions
Application Class and Main Functions
Module Initialization
Interface Functions

Python Library Imports

The following Python library modules are used by this application.

The sys module provides access to the command line arguments.
The os module provide os-specific file and path manipulations; it is used to transform the input file name into the output file name as well as track down file modification times.
The re module provides regular expressions; these are used to parse the input file.
The time module provides a handy current-time string; this is used to by the HTML Weaver to write a closing timestamp on generated HTML files, as well as log messages.

Imports (151) +=

import sys
import os
import re
import time

◊ Imports (151). Used by pyweb.py (150).

Overheads

The shell escape is provided so that the user can define this file as executable, and launch it directly from their shell. The shell reads the first line of a file; when it finds the '#!' shell escape, the remainder of the line is taken as the path to the binary program that should be run. The shell runs this binary, providing the file as standard input.

Overheads (152) =

#!/usr/bin/env python

◊ Overheads (152). Used by pyweb.py (150).

The from-future imports allow us to get ready for Python 3.0 compatibility. They also limit us to version of Python that support these __future__ modules. That means at least Python 2.6.

Overheads (153) +=

from __future__ import print_function

◊ Overheads (153). Used by pyweb.py (150).

A Python __doc__ string provides a standard vehicle for documenting the module or the application program. The usual style is to provide a one-sentence summary on the first line. This is followed by more detailed usage information.

Overheads (154) +=

"""pyWeb Literate Programming - tangle and weave tool.

Yet another simple literate programming tool derived from nuweb, 
implemented entirely in Python.  
This produces any markup for any programming language.

Usage:
    pyweb.py [-dvs] [-c x] [-w format] file.w

Options:
    -v           verbose output (the default)
    -s           silent output
    -d           debugging output
    -c x         change the command character from '@' to x
    -w format    Use the given weaver for the final document.
                 The default is based on the input file, a leading '<'
                 indicates HTML, otherwise LaTeX.
                 choices are 'html', 'latex', 'rst'.
    -xw          Exclude weaving
    -xt          Exclude tangling
    -pi          Permit include-command errors
    
    file.w       The input file, with @o, @d, @i, @[, @{, @|, @<, @f, @m, @u commands.
"""

◊ Overheads (154). Used by pyweb.py (150).

The keyword cruft is a standard way of placing version control information into a Python module so it is preserved. See PEP (Python Enhancement Proposal) #8 for information on recommended styles.

We also sneak in a "DO NOT EDIT" warning that belongs in all generated application source files.

Overheads (155) +=

__version__ = """$Revision$"""

### DO NOT EDIT THIS FILE!
### It was created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py, __version__='$Revision$'.
### From source pyweb.w modified Sat Feb 27 07:18:21 2010.
### In working directory '/Users/slott/Documents/Projects/pyWeb-2.1/pyweb'.

◊ Overheads (155). Used by pyweb.py (150).

The Application Class

Design

The Application class is provided so that the Action instances have an overall application to update. This allows the WeaveAction to provide the selected Weaver instance to the application. It also provides a central location for the various options and alternatives that might be accepted from the command line.

The constructor sets the default options for weaving and tangling.

The parseArgs() method uses the sys.argv sequence to parse the command line arguments and update the options. This allows a program to pre-process the arguments, passing other arguments to this module.

The process() method processes a list of files. This is either the list of files passed as an argument, or it is the list of files parsed by the parseArgs() method.

The parseArgs() and process() functions are separated so that another application can import pyweb, bypass command-line parsing, yet still perform the basic actionss simply and consistently.

For example:

import pyweb, optparse
p= optparse.OptionParser()
option definition
options, args = p.parse_args()
a= pyweb.Application( My Emitter Factory )
Configure the Application based on options
a.process( args )

The main() function creates an Application instance and calls the parseArgs() and process() methods to provide the expected default behavior for this module when it is used as the main program.

Implementation

Imports (156) +=

import optparse

◊ Imports (156). Used by pyweb.py (150).

Application Class (157) =


class Application( object ):
    def __init__( self ):
        →Application default options (158)
    →Application parse command line (159)
    →Application class process all files (160)

◊ Application Class (157). Used by pyweb.py (150).

The first part of parsing the command line is setting default values that apply when parameters are omitted. The default values are set as follows:

theTangler is set to a TanglerMake instance to create the output files.
theWeaver is set to None so that the input language will be used to select an appropriate weaver.
commandChar is set to @ as the default command introducer.
doWeave and doTangle are instances of Action that describe two use cases: Tangle only and Weave only.
theAction is an instance of Action that describes the default overall action: load, tangle and weave. This is the default unless overridden by an option.
permitList provides a list of commands that are permitted to fail. Typically this is empty, or contains @i to allow the include command to fail.
files is the final list of argument files from the command line; these will be processed unless overridden in the call to process().
webReader is the WebReader instance created for the current input file.

Application default options (158) =


self.theTangler= TanglerMake()
self.theWeaver= None
self.permitList= []
self.commandChar= '@'
self.loadOp= LoadAction()
self.weaveOp= WeaveAction()
self.tangleOp= TangleAction()
self.doWeave= ActionSequence( "load and weave", [self.loadOp, self.weaveOp] )
self.doTangle= ActionSequence( "load and tangle", [self.loadOp, self.tangleOp] )
self.theAction= ActionSequence( "load, tangle and weave", [self.loadOp, self.tangleOp, self.weaveOp] )
self.files= []

◊ Application default options (158). Used by Application Class (157); pyweb.py (150).

The algorithm for parsing the command line parameters uses the built in optparse module. We have to build a parser, define the options, provide default values, and the parse the command-line arguments.

Application parse command line (159) =


def parseArgs( self ):
    p = optparse.OptionParser()
    p.add_option( "-v", "--verbose", dest="verbosity", action="store_const", const=logging.INFO )
    p.add_option( "-s", "--silent", dest="verbosity", action="store_const", const=logging.WARN )
    p.add_option( "-d", "--debug", dest="verbosity", action="store_const", const=logging.DEBUG )
    p.add_option( "-c", "--command", dest="command", action="store" )
    p.add_option( "-w", "--weaver", dest="weaver", action="store" )
    p.add_option( "-x", "--except", dest="skip", action="store" )
    p.add_option( "-p", "--permit", dest="permit", action="store" )
    opts, self.files= p.parse_args()
    if opts.command:
        logger.info( "Setting command character to %r", opts.command )
        self.commandChar= opts.command
    if opts.weaver:
        self.theWeaver= weavers[ opts.weaver ]
        logger.info( "Setting weaver to %s", self.theWeaver )
    if opts.skip:
        if opts.skip.lower().startswith('w'): # skip weaving
            self.theAction= self.doTangle
        elif opts.skip.lower().startswith('t'): # skip tangling
            self.theAction= self.doWeave
        else:
            raise Exception( "Unknown -x option %r" % opts.skip )
    if opts.permit:
        # save permitted errors, usual case is -pi to permit include errors
        self.permitList= [ '%s%s' % ( commandChar, c ) for c in opts.permit ]
    if opts.verbosity:
        logger.setLevel( opts.verbosity )
    self.options= opts

◊ Application parse command line (159). Used by Application Class (157); pyweb.py (150).

The process() function uses the current Application settings to process each file as follows:

Create a new WebReader for the Application, providing the parameters required to process the input file.
Create a Web instance, w and set the Web's sourceFileName from the WebReader's fileName.
Perform the given command, typically a ActionSequence, which does some combination of load, tangle the output files and weave the final document in the target language; if necessary, examine the Web to determine the documentation language.
Print a performance summary line that shows lines processed per second.

In the event of failure in any of the major processing steps, a summary message is produced, to clarify the state of the output files, and the exception is reraised. The re-raising is done so that all exceptions are handled by the outermost main program.

Application class process all files (160) =


def process( self, theFiles=None ):
    self.weaveOp.theWeaver= self.theWeaver
    self.tangleOp.theTangler= self.theTangler
    for f in theFiles or self.files:
        w= Web( f ) # A web to work on.
        try:
            with open(f,"r") as source:
                logger.info( "Reading %r", f )
                webReader= WebReader( command=self.commandChar, permit=self.permitList )
                webReader.source( f, source ).web( w )
                self.loadOp.webReader= webReader
                self.theAction.web= w
                self.theAction()
        except Error,e:
            logger.exception( e )
        except IOError,e:
            logger.exception( e )
        logger.info( 'pyWeb: %s', self.theAction.summary(w,self) )

◊ Application class process all files (160). Used by Application Class (157); pyweb.py (150).

The Main Function

The top-level interface is the main() function. This function creates an Application instance.

The Application object parses the command-line arguments. Then the Application object does the requested processing. This two-step process allows for some dependency injection to customize argument processing.

Interface Functions (161) =


def main():
    logging.basicConfig( stream=sys.stderr, level=logging.INFO )
    logging.getLogger( "pyweb.TanglerMake" ).setLevel( logging.WARN )
    logging.getLogger( "pyweb.WebReader" ).setLevel( logging.WARN )
    a= Application()
    a.parseArgs()
    a.process()
    logging.shutdown()

if __name__ == "__main__":
    main( )

◊ Interface Functions (161). Used by pyweb.py (150).

This can be extended by doing something like the following.

Subclass Weaver create a subclass with different templates.
Update the pyweb.weavers dictionary.
Call pyweb.main() to run the existing main program with extra classes available to it.

import pyweb
class MyWeaver( HTML ):
     Any template changes
     
pyweb.weavers['myweaver']= MyWeaver()
pyweb.main()

This will create a variant on pyWeb that will handle a different weaver via the command-line option -w myweaver.

Additional Scripts

Two aditional scripts are provided as examples which an be customized.

`tangle.py` Script

This script shows a simple version of Tangling. This has a permitted error for '@i' commands to allow an include file (for example test results) to be omitted from the tangle operation.

tangle.py (162) =

#!/usr/bin/env python
"""Sample tangle.py script."""
import pyweb
import logging, sys

logging.basicConfig( stream=sys.stderr, level=logging.INFO )
logger= logging.getLogger(__file__)

w= pyweb.Web( "pyweb.w" ) # The web we'll work on.

permitList= ['@i']
commandChar= '@'
load= pyweb.LoadAction()
load.webReader= pyweb.WebReader( command=commandChar, permit=permitList )
load.webReader.web( w ).source( "pyweb.w" )
load.web= w
load()
logger.info( load.summary() )

tangle= pyweb.TangleAction()
tangle.theTangler= pyweb.TanglerMake()
tangle.web= w
tangle()
logger.info( tangle.summary() )

◊ tangle.py (162).

`weave.py` Script

This script shows a simple version of Weaving. This shows how to define a customized set of templates for a different markup language.

A customized weaver generally has three parts.

weave.py (163) =

→weave.py overheads for correct operation of a script (164)
→weave.py weaver definition to customize the Weaver being used (165)
→weaver.py actions to load and weave the document (166)

◊ weave.py (163).

weave.py overheads for correct operation of a script (164) =

#!/usr/bin/env python
"""Sample weave.py script."""
import pyweb
import logging, sys, string

logging.basicConfig( stream=sys.stderr, level=logging.INFO )
logger= logging.getLogger(__file__)

◊ weave.py overheads for correct operation of a script (164). Used by weave.py (163).

weave.py weaver definition to customize the Weaver being used (165) =


class MyHTML( pyweb.HTML ):
    """HTML formatting templates."""
    extension= ".html"
    
    cb_template= string.Template("""<a name="pyweb${seq}"></a>
    <!--line number ${lineNumber}-->
    <p><em>${fullName}</em> (${seq})&nbsp;${concat}</p>
    <code><pre>\n""")

    ce_template= string.Template("""
    </pre></code>
    <p>&loz; <em>${fullName}</em> (${seq}).
    ${references}
    </p>\n""")
        
    fb_template= string.Template("""<a name="pyweb${seq}"></a>
    <!--line number ${lineNumber}-->
    <p><tt>${fullName}</tt> (${seq})&nbsp;${concat}</p>
    <code><pre>\n""") # Prevent indent
        
    fe_template= string.Template( """</pre></code>
    <p>&loz; <tt>${fullName}</tt> (${seq}).
    ${references}
    </p>\n""")
        
    ref_item_template = string.Template(
    '<a href="#pyweb${seq}"><em>${fullName}</em>&nbsp;(${seq})</a>'
    )
    
    ref_template = string.Template( '  Used by ${refList}.'  )
            
    refto_name_template = string.Template(
    '<a href="#pyweb${seq}">&rarr;<em>${fullName}</em>&nbsp;(${seq})</a>'
    )
    refto_seq_template = string.Template( '<a href="#pyweb${seq}">(${seq})</a>' )
 
    xref_head_template = string.Template( "<dl>\n" )
    xref_foot_template = string.Template( "</dl>\n" )
    xref_item_template = string.Template( "<dt>${fullName}</dt><dd>${refList}</dd>\n" )
    
    name_def_template = string.Template( '<a href="#pyweb${seq}"><b>&bull;${seq}</b></a>' )
    name_ref_template = string.Template( '<a href="#pyweb${seq}">${seq}</a>' )

◊ weave.py weaver definition to customize the Weaver being used (165). Used by weave.py (163).

weaver.py actions to load and weave the document (166) =


w= pyweb.Web( "pyweb.w" ) # The web we'll work on.

permitList= []
commandChar= '@'
load= pyweb.LoadAction()
load.webReader=  pyweb.WebReader( command=commandChar, permit=permitList )
load.webReader.web( w ).source( "pyweb.w" )
load.web= w
load()
logger.info( load.summary() )

weave= pyweb.WeaveAction()
weave.theWeaver= MyHTML()
weave.web= w
weave()
logger.info( weave.summary() )

◊ weaver.py actions to load and weave the document (166). Used by weave.py (163).

Administrative Elements

In order to support a pleasant installation, the setup.py file is helpful.

setup.py (167) =

#!/usr/bin/env python
"""Setup for pyWeb."""

from distutils.core import setup

setup(name='pyweb',
      version='2.1',
      description='pyWeb 2.1: In Python, Yet Another Literate Programming Tool',
      author='S. Lott',
      author_email='s_lott@yahoo.com',
      url='http://slott-softwarearchitect.blogspot.com/',
      py_modules=['pyweb'],
      classifiers=[
      'Intended Audience :: Developers',
      'Topic :: Documentation',
      'Topic :: Software Development :: Documentation', 
      'Topic :: Text Processing :: Markup',
      ]
   )

◊ setup.py (167).

In order build a source distribution kit the setup.py sdist requires a MANIFEST. We can either list all files or provide a MANIFEST.in that specifies additional rules. We use a simple inclusion to augment the default manifest rules.

MANIFEST.in (168) =

include *.w *.css *.html
include test/*.w test/*.css test/*.html test/*.py

◊ MANIFEST.in (168).

Generally, a README is also considered to be good form.

README (169) =

pyWeb 2.1: In Python, Yet Another Literate Programming Tool

Literate programming is an attempt to reconcile the opposing needs
of clear presentation to people with the technical issues of 
creating code that will work with our current set of tools.

Presentation to people requires extensive and sophisticated typesetting
techniques.  Further, the "narrative arc" of a presentation may not 
follow the source code as layed out for the compiler.

pyWeb is a literate programming tool that combines the actions
of weaving a document with tangling source files.
It is independent of any particular document markup or source language.
Is uses a simple set of markup tags to define chunks of code and 
documentation.

The pyweb.w file is the source for the various pyweb module and script files, plus
the pyweb.html file.  The various source code files are created by applying a 
tangle operation to the .w file.  The final documentation is created by
applying a weave operation to the .w file.

Installation
-------------

::

    python setup.py install

This will install the pyweb module.  

Authoring
---------

The pyweb document describes the simple markup used to define code chunks
and assemble those code chunks into a coherent document as well as working code.

Operation
---------

You can then run pyweb with

::

    python -m pyweb pyweb.w 

This will create the various output files from the source .w file.

-   pyweb.html is the final woven document.

-   pyweb.py, tangle.py, weave.py, readme, setup.py and MANIFEST.in are tangled output files.

Testing
-------

The test directory includes pyweb_test.w, which will create a 
complete test suite.

This weaves a pyweb_test.html file.

This tangles several test modules:  test.py, test_tangler.py, test_weaver.py,
test_loader.py and test_unit.py.  Running the test.py module will include and
execute all 71 tests.

◊ README (169).

Indices

Files

MANIFEST.in: (168)
README: (169)
pyweb.py: (150)
setup.py: (167)
tangle.py: (162)
weave.py: (163)

Macros

Action call method actually performs the action: (133)
Action class hierarchy - used to describe basic actions of the application: (131)
Action final summary method: (134)
Action superclass has common features of all actions: (132)
ActionSequence append adds a new action to the sequence: (137)
ActionSequence call method delegates the sequence of ations: (136)
ActionSequence subclass that holds a sequence of other actions: (135)
ActionSequence summary summarizes each step: (138)
Application Class: (157)
Application class process all files: (160)
Application default options: (158)
Application parse command line: (159)
Base Class Definitions: (1)
Chunk add to the web: (55)
Chunk append a command: (53)
Chunk append text: (54)
Chunk class: (52)
Chunk class hierarchy - used to describe input chunks: (51)
Chunk examination: starts with, matches pattern: (57)
Chunk generate references from this Chunk: (59)
Chunk references to this Chunk: (60)
Chunk search for user identifiers in each child command: (58)
Chunk superclass make Content definition: (56)
Chunk tangle this Chunk into a code file: (62)
Chunk weave this Chunk into the documentation: (61)
CodeCommand class to contain a program source code block: (80)
Command analysis features: starts-with and Regular Expression search: (77)
Command class hierarchy - used to describe individual commands: (75)
Command superclass: (76)
Command tangle and weave functions: (78)
Emitter class hierarchy - used to control output files: (2)
Emitter core open, close and write: (4)
Emitter doClose, to be overridden by subclasses: (6)
Emitter doOpen, to be overridden by subclasses: (5)
Emitter doWrite, to be overridden by subclasses: (7)
Emitter indent control: set, clear and reset: (11)
Emitter superclass: (3)
Emitter write a block of code: (8) (9) (10)
Error class - defines the errors raised: (90)
FileXrefCommand class for an output file cross-reference: (82)
HTML code chunk begin: (33)
HTML code chunk end: (34)
HTML output file begin: (35)
HTML output file end: (36)
HTML reference to a chunk: (39)
HTML references summary at the end of a chunk: (37)
HTML short references summary at the end of a chunk: (42)
HTML simple cross reference markup: (40)
HTML subclass of Weaver: (31) (32)
HTML write a line of code: (38)
HTML write user id cross reference line: (41)
Imports: (12) (47) (118) (151) (156)
Interface Functions: (161)
LaTeX code chunk begin: (24)
LaTeX code chunk end: (25)
LaTeX file output begin: (26)
LaTeX file output end: (27)
LaTeX reference to a chunk: (30)
LaTeX references summary at the end of a chunk: (28)
LaTeX subclass of Weaver: (23)
LaTeX write a line of code: (29)
LoadAction call method loads the input files: (146)
LoadAction subclass loads the document web: (145)
LoadAction summary provides lines read: (147)
MacroXrefCommand class for a named chunk cross-reference: (83)
Module Initialization of global variables: (148) (149)
NamedChunk add to the web: (65)
NamedChunk class: (63)
NamedChunk tangle into the source file: (67)
NamedChunk user identifiers set and get: (64)
NamedChunk weave: (66)
NamedDocumentChunk class: (72)
NamedDocumentChunk tangle: (74)
NamedDocumentChunk weave: (73)
OutputChunk add to the web: (69)
OutputChunk class: (68)
OutputChunk tangle: (71)
OutputChunk weave: (70)
Overheads: (152) (153) (154) (155)
Reference class hierarchy - references to a chunk: (91) (92) (93)
ReferenceCommand class for chunk references: (85)
ReferenceCommand refers to a chunk: (87)
ReferenceCommand resolve a referenced chunk name: (86)
ReferenceCommand tangle a referenced chunk: (89)
ReferenceCommand weave a reference to a chunk: (88)
TangleAction call method does tangling of the output files: (143)
TangleAction subclass initiates the tangle action: (142)
TangleAction summary method provides total lines tangled: (144)
Tangler code chunk begin: (45)
Tangler code chunk end: (46)
Tangler doOpen, doClose and doWrite overrides: (44)
Tangler subclass of Emitter to create source files with no markup: (43)
Tangler subclass which is make-sensitive: (48)
TanglerMake doClose override, comparing temporary to original: (50)
TanglerMake doOpen override, using a temporary file: (49)
TextCommand class to contain a document text block: (79)
UserIdXrefCommand class for a user identifier cross-reference: (84)
WeaveAction call method does weaving of the document file: (140)
WeaveAction subclass initiates the weave action: (139)
WeaveAction summary method provides line counts: (141)
Weaver code chunk begin-end: (18)
Weaver cross reference output methods: (21) (22)
Weaver doOpen, doClose and doWrite overrides: (14)
Weaver document chunk begin-end: (16)
Weaver file chunk begin-end: (19)
Weaver quoted characters: (15)
Weaver reference command output: (20)
Weaver reference summary, used by code chunk and file chunk: (17)
Weaver subclass of Emitter to create documentation with fancy markup and escapes: (13)
Web Chunk check reference counts are all one: (103)
Web Chunk cross reference methods: (102) (104) (105) (106)
Web Chunk name resolution methods: (100) (101)
Web add a named macro chunk: (98)
Web add an anonymous chunk: (97)
Web add an output file definition chunk: (99)
Web add full chunk names, ignoring abbreviated names: (96)
Web class - describes the overall "web" of chunks: (94)
Web construction methods used by Chunks and WebReader: (95)
Web determination of the language from the first chunk: (109)
Web tangle the output files: (110)
Web weave the output document: (111)
WebReader class - parses the input file, building the Web structure: (112)
WebReader command literals: (130)
WebReader fluent property-like methods: (113)
WebReader handle a command string: (116)
WebReader load the web: (128)
WebReader location in the input stream: (115)
WebReader tokenize the input: (114)
XrefCommand superclass for all cross-reference commands: (81)
add a reference command to the current chunk: (125)
add an expression command to the current chunk: (126)
assign user identifiers to the current chunk: (124)
collect all user identifiers from a given map into ux: (107)
double at-sign replacement, append this character to previous TextCommand: (127)
find user identifier usage and update ux from the given map: (108)
finish a chunk, start a new Chunk adding it to the web: (122)
import another file: (121)
major commands segment the input into separate Chunks: (117)
minor commands add Commands to the current Chunk: (123)
other command-like sequences are appended as a TextCommand: (129)
start a NamedChunk or NamedDocumentChunk, adding it to the web: (120)
start an OutputChunk, adding it to the web: (119)
weave.py overheads for correct operation of a script: (164)
weave.py weaver definition to customize the Weaver being used: (165)
weaver.py actions to load and weave the document: (166)

User Identifiers

Action: •132 135 139 142 145
ActionSequence: •135 158
Application: •157 161
Chunk: •52 59 63 89 94 102 110 121 122 125 128
CodeCommand: 63 •80
Command: 53 •76 79 81 85 89
Emitter: •3 13 43
Error: 59 61 62 66 67 70 73 74 81 89 •90 98 100 101 110 111 116 121 124 128 140 143 146 160
FileXrefCommand: •82 123
HTML: 17 19 21 22 31 •32 109 149 154 165
LaTeX: 18 •23 109 149 154
LoadAction: •145 146 158 162 166
MacroXrefCommand: •83 123
NamedChunk: •63 68 72 120 124
NamedDocumentChunk: •72 120
OutputChunk: •68 71 119
ReferenceCommand: •85 125
TangleAction: •142 143 158 162
Tangler: •43 48
TanglerMake: •48 158 161 162
TextCommand: 54 56 67 72 •79 80
UserIdXrefCommand: •84 123
WeaveAction: •139 140 158 166
Weaver: •13 23 31 109 149
Web: 55 65 69 •94 160 162 166
WebReader: 89 •112 121 160 161 162 166
XrefCommand: •81 82 83 84
__version__: •155
_gatherUserId: •106
_updateUserId: •106
add: 55 •97
addDefName: •96 98 125
addNamed: 65 •98
addOutput: 69 •99
append: 11 53 54 93 97 98 99 102 108 114 123 125 •137
appendText: •54 125 126 127 128 129
chunkXref: 83 •105
close: •4 14 44 50 110 111
clrIndent: •11 66 89
codeBegin: •18 45 66 67
codeBlock: •8 66 80
codeEnd: •18 46 66 67
codeFinish: 4 •10
codeLine: •9
createUsedBy: •102 146
doClose: 4 6 14 •44 50
doOpen: 4 5 14 •44 49
doWrite: 4 7 14 •44
docBegin: •16 61
docEnd: •16 61
duration: •134 141 144 147
expect: 119 120 125 126 •128
fileBegin: 19 •26 70
fileEnd: 19 •36 70
fileXref: 82 •105
filecmp: •47 50
formatXref: •81 82 83
fullNameFor: 66 70 86 96 •100 101 102
genReferences: •59 102
getUserIDRefs: •57 64 107
getchunk: 86 •101 102 110 111
handleCommand: •116 128
language: •109 140 154 169
lineNumber: 18 19 33 35 45 54 56 •57 63 67 72 76 79 81 85 112 114 115 123 125 126 127 128 129 165
load: 121 •128 146 158 162 166
location: •115 124 126 128
main: •161
makeContent: 54 56 •63 72
moreTokens: •114 128
multi_reference: 103 •104
nextToken: •114 119 120 121 124 125 126 128
no_definition: 103 •104
no_reference: 103 •104
open: •4 14 44 49 110 111 114 121 160
openSource: •114 128
optparse: •156 159
os: 14 50 •151
parseArgs: •159 161
perform: •146
process: 126 •160 161
pushBack: •114 121
quoted_chars: 9 15 29 •38
re: 108 114 •151
ref: 28 59 •78 87
referenceTo: 20 21 •39 66
references: •17 18 19 25 32 34 36 52 59 103 124 165
resetIndent: •11
resolve: 67 •86 87 88 89 101
searchForRE: 57 58 •77 79 108
setIndent: •11 66 89
setUserIDRefs: •64 124
shlex: •118 119
startswith: 57 •77 79 100 109 128 159
string: •12 17 18 19 20 21 22 24 25 28 30 33 34 35 36 37 39 40 41 42 164 165
summary: 134 138 141 144 •147 160 162 166
sys: 126 •151 161 162 164
tangle: 45 62 67 68 71 72 74 •78 79 80 81 89 110 143 154 158 162 169
tangleChunk: 89 •110
tempfile: •47 49
time: 133 134 •151
totalLines: 3 4 112 •114 121 147
usedBy: •87
userNamesXref: 84 •106
weave: 61 66 70 73 •78 79 80 82 83 84 88 111 140 154 158 164 166 169
weaveChunk: 88 •111
weaveReferenceTo: •61 66 73 111
weaveShortReferenceTo: •61 66 73 111
webAdd: 55 65 •69 119 120 121 122 128
write: •4 8 10 14 18 19 21 22 44 45 79
xrefDefLine: 22 •41 84
xrefFoot: 21 •40 81 84
xrefHead: 21 •40 81 84
xrefLine: 21 •40 81

Created by /Users/slott/Documents/Projects/pyWeb-2.1/pyweb/pyweb.py at Mon Mar 1 07:58:19 2010.

pyweb.__version__ '$Revision$'.

Source pyweb.w modified Sat Feb 27 07:18:21 2010.

Working directory '/Users/slott/Documents/Projects/pyWeb-2.1/pyweb'.