########################################################################## What I love about Python == What I hate about the HTML mixed-content model ########################################################################## :date: 2007-07-03 00:01:05 :category: `Open Source Projects `_ The `mixed content `__ model, defined succinctly in the XML standards, is pleasant enough for human communication, but leaves a lot to be desired.  For example, mapping a `mixed content model to a relational database `__ is a hard problem. The problem is made worse when the document is HTML.  HTML doesn't have many constraints to begin with; it mixes structural and presentational markup; unless the content is prepared by a simple piece of software it may be wickedly inconsistent. Python has the tools to make the problem solvable.  It also has a world-view that facilitates solving the kind of problem where there are wicked little inconsistencies. :strong:`Enter Python.` The problem is to scrape the content of some web pages to make a regularly structured database out of the stuff floating around in HTML.  The :strong:`Cut, Revise and Paste` ™ (CRAP) technology, while available, is  error-prone and hard to perform repeatably.   The information is mostly structured:  the interesting content on the page has a