| date: | 2006-12-28 15:39:47 |
|---|---|
| category: | The Lure of XML |
I harp on Design Goal 6 in the XML 1.0 Standard , “XML documents should be human-legible and reasonably clear”. In Kontrawize, the response is XML editors help meet this design goal. “There are plenty of good XML-aware editors around, some of which are free.” While true, I think this violates the spirit of the standard, while adhering only to the letter of the law.
If we allow tools to mediate “human-legible and reasonably clear”, then too many things meet this standard. We could provide a bunch of SQL DDL and DML and claim that it was a reasonably clear document. Then, we can also claim that an MS-Word .DOC file is reasonably clear because we have a copy of Word.
While it’s true that “anything written in XML is a first class piece of data,” I’m not clear on the origin of the distinction. “Textual scripting languages are at best second class data.” I can’t discern why – precisely – a Domain Specific Language (DSL) or scripting solution isn’t first class. Presumably, the second class status comes from one of these origins:
It is clear that dependence on a tool isn’t the reason for second-class status: we’re allowed to use tools to make XML legible; we’re equally allowed to use tools to process the XML or script. What’s left are the Parsing ease and Semantic richness advantages of XML.
Parsing Ease.
This is an interesting point. Especially in light of this posting: “When do you use XML, again? ” where the use of XML was principally for the parsing ease. This is – in a way – a little silly. A scripting language has it’s own parser, the script interpreter. So, ease of parsing isn’t a great reason for using XML. Other DSL’s, however, may require additional software for parsing.
Python, Ruby (and any other object-oriented scripting language) has it’s own parser, every bit as good as Expat or Xerces . And if you choose a free scripting language, you get the parser and substantial libraries also for free. For non-script-based DSL, you don’t get a handy parser.
Here’s the bonus for me: my definition is part of the application. The definitions aren’t input to an application which reads, parses, and then performs some functionality based on the input. The definitions are the application, essentially a specialization of the framework, directly executable.
Semantic Depth.
Indeed, one of the strong points of script-based tools is the data structure that describes targets, dependencies and actions is often a simple bunch of object creation statements. The resulting objects have precisely the same semantics as the XML used by Ant. These precisely identical semantics, however, are parsed by Python (or Ruby), not Expat or Xerces.
I find that a bunch of Python objects that are are surrogates for other objects makes compelling sense. SCons is appropriate to building things like Java where compiles and jar-ear-war-building predominates. Each source is an object, as is each target. But SCons also describes static web content where HTML files are built from Cheetah templates. It also describes a Data Warehouse load where logs and exception reports are built from source application extracts.
In short, an SCons script-based control file has the same semantics as an XML non-script-based control file.
What I Learned.
The “XML is First Class” seems to mean that XML has an independent, widely-agreed-to existence, separate from a language community. From this I learned that there are a number of dimensions of comparison:
Tool Complexity
Clarity
Extensibility
Recommendations.
For my money, the low complexity, good clarity and immediate extensibility of a scripting solution is an award-winning technology application. The XML solution runs a distant second, and a purpose-built DSL has little to offer. The reason a purpose-built DSL is dead last is because “clarity” isn’t worth much. As Kontrawize points out, we can solve the obscurity problem with more tools. We can’t create extensibility or reduce complexity the same way.
Between the original “Stamp on the ants ” (referencing “Raven 1.1: Build Java with Ruby ” thread), and Kontrawize’s “XML is first class, scripting languages are second class ” the lessons are similar.
In one case, more than one person suggested that we extract 20 million customer accounts in XML. The idea was to do the transformation using XSLT to implement a number of business rules for standardizing data representations. We could also link business entities with dimensions, and identify the facts through another series of XSLT transformations. Finally, we would load the relational tables from the XML documents. Sigh. All that XML parsing and marshaling will paralyze processing. We’ll get nothing done – the heaviest CPU user will be Xalan, and our disks will be tied up with terabytes of XML source files that create mere gigabytes of usable database.