The realization that programs are just another kind of data is fundamental to computing. However, while data is stored in uniform, extensible, easy-to-process formats like XML, programs are stored as more-or-less arbitrary sequences of ASCII tokens. This primitive representation makes programs, and new programming tools, needlessly difficult to create. Switching to a richer storage format would facilitate better development practices, and allow new ideas to enter mainstream use more quickly.
Mainstream programming languages are stuck in a rut
Inject information into tool chain at a limited number of points
Type of information that can be injected is not extensible
This is already changing
Unevenly, haphazardly, and un-self-consciously
Getting this right will revolutionize programming
And help research ideas move into the mainstream more quickly
Programs are data, data (can be) programs
1. Running programs are just bytes in memory
2. Source code can be---and should be---manipulated like any other text
3. Every moderately complex macro or configuration language is really a programming language
We're making some progress on #1
Java's biggest intellectual contribution is to bring reflection into the mainstream
But very little on the #2
Most C/C++ programmers don't think of CPP as a text-to-text transformer
View code generators with suspicion and/or superstitious awe
CASE tools are the world's best selling shelfware
And we keep trying to pretend that #3 isn't true
Every tool configuration syntax eventually needs conditionals, repetition, functions, etc.
But our initial designs repeatedly try to avoid including them
Most programming tools just don't get it
Have you seen a debugger that understands C++ templates?
Do you expect to get one for Java generics any time soon?
How easy is it to program your debugger?
Better question: why isn't it as easy as writing macros for your editor?
Java Server Pages (JSPs)
Embed code fragments and directives in web pages
Process JSP to create a Java servlet
Pro: reduce cognitive gap between programmers' mental model and textual artifact
Con:
JSP syntax is even harder to parse than its parents
When something goes wrong, programmers have to reverse engineer the translation in their heads
<HTML>
<BODY>
<TABLE BORDER=2>
<%
for (int i = 0; i<n; i++) {
%>
<TR>
<TD>Number</TD>
<TD><%= i+1 %></TD>
</TR>
<%
}
%>
</TABLE>
</BODY>
</HTML>
JavaDoc
Embed HTML in Java source using specially-marked comments
Along with special shorthand directives that aren't HTML
(Yet another) processor pulls them out and formats them
Pro: the closer documentation is to code, the more likely it is to be up to date
Con:
Makes source code ugly
Human beings should not have to type, or see, <b> in the early 21st Century
No guarantee that it's actually correct
Inextensible
A pale shadow of Knuth's Literate Programming
Debugging requires still more mental reverse engineering
/**
* This class prints <em>odd</em> numbers.
* See the <a href="copyright.html">copyright</a>.
* @author Greg Wilson
* @version 1.2
*/
public class Odds {
public static void main(String[] args) {
for (int i=0; i<10; ++i) {
if (i % 2) {
System.out.println(i);
}
}
}
}
Ant
A replacement for Make developed as part of Apache's Jakarta efforts
Build specification written in XML
Invokes plugins written in Java to perform actions
Pro:
Real platform independence
Real extensibility
Con:
XML has lower signal-to-noise ratio than Makefiles
Have to drop down one level of abstraction in order to debug
Many attributes require extra parsing ($ expansion, value lists, etc.)
Which makes their content inaccessible to generic tools
<project name="MyProject" default="dist" basedir=".">
<property name="src" location="src"/>
<property name="build" location="build"/>
<property name="dist" location="dist"/>
<target name="init" description="setup">
<tstamp/>
<mkdir dir="${build}"/>
</target>
<target name="compile" depends="init">
<javac srcdir="${src}" destdir="${build}"/>
</target>
<target name="dist" depends="compile">
<mkdir dir="${dist}/lib"/>
<jar jarfile="${dist}/lib/MyProject-${DSTAMP}.jar" basedir="${build}"/>
</target>
<target name="clean">
<delete dir="${build}"/>
<delete dir="${dist}"/>
</target>
</project>
XSLT
Language for specifying XML-to-text transformations
Output is usually XML or HTML, but flat text is possible
A declarative language (sort of)
Match/replace with forall and conditionals
Pro:
Better than the alternatives
And there are debuggers!
Con:
Did this wheel really need reinventing?
Most of the program isn't available via XML
<BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}">
Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>!
<xsl:if test="/FitnessCenter/Member/@level='platinum'">
Our special offer to platinum members today is now open.
</xsl:if>
Your phone numbers are:
<TABLE border="1" width="25%">
<TR><TH>Type</TH><TH>Number</TH></TR>
<xsl:for-each select="/FitnessCenter/Member/Phone">
<TR>
<TD><xsl:value-of select="@type"/></TD>
<TD><xsl:value-of select="."/></TD>
</TR>
</xsl:for-each>
</TABLE>
</BODY>
Many other relevant examples
JSR-175: Adding Metadata to Java
Allows programmers to insert small tags of the form @something into code
E.g. to specify that a class needs a remote stub, or that a field is a property
C++ Template Metaprogramming
C++ template expansion mechanism is Turing-complete
Recursion, conditionals, integer arithmetic
So cleverly-written template classes can be used to turn the compiler into a code generator
Todd likes his music Baroque, too
SUIF (Stanford University Intermediate Format)
Compiler loads a set of optimization modules
Each module reads and writes a uniform format
It's a pity they stopped with code generation...
Apache
Increasingly a framework for invoking dynamically-loaded plugins
"We'll take care of communciation, you take care of content"
Configuration files are, well, challenging...
So what's really going on here?
Trying to squeeze new types of content into old formats
Blurring the distinction between code and data
Programming our programming tools
Unix: an improvement on all its successors
The "lots of little tools" paradigm made it an extremely productive environment
But what made LOLT work?
Common data format: newline-separated strings
Common communications protocol: stdin, stdout, exit(0)
The world's first component-based programming system
Many important aspects of programs cannot be captured by the "stream of strings" paradigm
Proof: Java doesn't have a preprocessor, and Bjarne would like you to use C++'s less
There's a new universal data format in town
(Just about) everything either is, or can pretend to be, XML
Yes, there's some bandwagoneering going on
But there's a lot to be said for being able to process everything in a uniform way
Scheme: possibly the cleanest programming language ever invented
Represented data and code in a single format
Which in practice meant that it didn't really distinguish between them
And which made powerful programming tools easier to build
That uniformity also allowed Scheme to provide a workable syntax extension mechanism
Safely translate user-defined forms into established forms
As far ahead of its time as Smalltalk
Which may be one of the reasons it never became more than a boutique language
Store programs as XML documents
That is, store all program structure explicitly in a single format
Create smart tools that:
Support WYSIWYG editing
Just like everyone else's editors these days
Real programmers don't read tags
Support extension via namespace-protected metadata
Which will often itself be programs to be executed by tools
What does this buy us?
Can view source however we want
Just as we customize views of web pages, CAD drawings, and other documents
More important: freely mix program source and other types of content
Documentation
Diagrams
My secretary can put a sketch of the new floor plan in email
Why can't I put a class diagram in source code?
Processing instructions
What was that last one?
Using a uniform storage format gives us an extensible way to embed arbitrary metadata in programs
Free from the (many) limitations of the examples given above
Each tool in the chain can inspect, inject, modify, or process whatever it wants
LOLT taken to the next level
This includes tools that run after the compiler is done
BLOB on disk includes instructions for debugger, profiler, logger, etc.
So what does this look like?
<program>
<doc>
...something like XHTML...
</doc>
<codegen>
...stuff telling code generator how to generate extra code...
</codegen>
<staticcode>
...invariant stuff to be compiled...
<doc>
...which may itself contain documentation blocks...
</doc>
</staticcode>
<debugger>
...code to be executed by debugger, e.g. to customize display...
<runtime-help>
...which may itself contain documentation blocks...
</runtime-help>
</debugger>
<profiler>
...and so on...
</profiler>
</program>
Instructions themselves
<for-loop>
<for-loop-head>
</for-loop-head>
<stmt-seq>
<doc>Only replace below threshold</doc>
<cond>
<test>
<compare-expr operator="less">
<field-expr field="age"><evaluate>record</evaluate></field-expr>
<evaluate>threshold</evaluate>
</compare-expr>
</test>
<body>
<invoke-expr method="release"><evaluate>record</evaluate></invoke-expr>
</body>
</cond>
</stmt-seq>
</for-loop>
Good gracious, that's ugly!
But who cares? This is just a model
As computer scientists, we ought to understand (and be comfortable with) the difference between models and views
One possible view of the code block above:
// Only replace below threshold
for (record in candidates) {
if (record.age < threshold) {
record.release();
}
}
Another view of the same model:
;;; Only replace below threshold
(foreach record candidates
(if (< (field record 'age) threshold)
(record 'release))
)
Examples
Design by contract
Use of certain tags triggers a transformation module
Converts expressions tagged pre and post into legal Java
Injects records formatted for debugger API to allow transformation back to original source
Debugger customization
A standard way to ask the compiler to preserve information in the generated code
So that the debugger can display templated variables using the original semantics
Round-trip CASE
CASE tool can store model descriptions and link to them from code
Standard (XML) editors can be told what to show and hide
Model checker plugin for tool chain will come bundled with the syntax extensions, formats, etc.
We're already doing this
JAR (Java ARchive) and WAR (Web Application Archive) files already contain heterogeneous content
Executable code destined for final product may be only a fraction of content
"I want to see my programs as they actually are!"
1. You can
Text representation of XML is the equivalent of assembly language
2. You never really have
Remember, it takes over 300,000 lines of C to make magnetized regions on disk appear as formatted text on your screen
/ * * \r \n * T h i s c l a
s s p r i n t s < e m > o d
d n u m b e r s < / e m > . \r
\n * S e e t h e < a h
r e f = " { @ d o c R o o t } /
c o p y r i g h t . h t m l " >
C o p y r i g h t < / a > . \r \n
* @ a u t h o r G r e g
W i l s o n \r \n * @ v e r s
i o n 1 . 2 \r \n * / \r \n \r \n
p u b l i c c l a s s O d d
s { \r \n p u b l i c s t
a t i c v o i d m a i n ( S
t r i n g [ ] a r g s ) { \r
\n f o r ( i n t i =
0 ; i < 1 0 ; + + i ) { \r
\n i f ( i % 2
) { \r \n S y s
t e m . o u t . p r i n t l n (
i ) ; \r \n } \r \n
} \r \n } \r \n } \r \n
"Vee Eye or Die!"
Do you use lynx?
More importantly, do you think a generation that has grown up with Word and IE is going to put up with Emacs?
"This can all be done with existing tools"
Sure---and you can write a web server in Fortran-77
The real question is, is it feasible with existing tools?
Answer by inference is "no"
"You don't need XML to do this"
S-expressions would work just as well
Difference is, people will actually use XML...
"This kind of extensibility will make programs harder to understand"
Harder than what?
Scattered directives in a variety of syntaxes?
Or the contortions programmers go through to squeeze metaprogramming, lazy execution, and unification into existing syntaxes?
Remember, every new idea in programming is an amplifier
Allows good programmers to be better
Allows bad programmers to be worse
"'Big Bang' changes never work"
Unless the whole tool chain is owned by a single vendor
MATLAB 7.0 could do this
VB.NET could have used XML as its storage format
If it had, we would be hailing Anders Hejlsberg as a visionary
And hundreds of companies would be building productivity tools right now
This is already happening
All the examples above
Superx++
Jelly
Many (like Water) that only pretend to be XML
Won't (can't) take off until general-purpose WYSIWYG XML editors appear
But they're on their way, because everyone else needs them
It is the nature of revolutions to begin where no-one is looking
And it would be a lot of fun
$Id: xmlprog.html,v 1.3 2003/09/19 17:54:08 gvwilson Exp $