I used to find it supremely ironic that some of the languages which receive the most praise for placing the programmer on equal footing with the language designer, which give the programmer the most power to extend the language in any way the programmer wants, seem to have some of the dumbest parsers. Now I'm starting to wonder if a dumb parser is a necessary ingredient in a highly extensible language.
Consider, for example, Lisp, Forth, and Smalltalk. None of these languages support Fortran style formulas. None of them will evaluate "10-2*3" and give you 4. Lisp will reject it outright. Forth will leave (x-10)*2 and 3 on the stack, where x is whatever was on the stack before evaluating the expression. Smalltalk will give you 24.
Lisp macros allow you to add your own syntax to Lisp, but the reason why they are so much cooler than Perl's source filters or the C preprocessor is the fact that they operate on the syntax trees of your already parsed source code, instead of just text strings. But how can the parser ("reader" in Lisp terminology) parse your code before it knows what it means? Because the parser doesn't know anything about what any code means. It follows some fairly simple rules about how to parse s-expressions and leaves the meaning ("semantics") of those s-expressions entirely up to the macro expander and the interpreter or code generator.
Forth's defining words and immediate words allow you to extend the compiler in arbitrary ways. Defining words are words that you can write which will create new words when they are run. Immediate words allow you to use the full power of the language during compilation. Both of these features are made possible by the fact that the Forth compiler is incredibly simple minded. It has a simple rule for figuring out what the next word on the input is, and a simple rule for deciding what to do with it (run it if it's immediate, and compile its address if it's not immediate).
Smalltalk automatically turns any method whose name matches some simple criteria (1 or two non-alphanumeric characters) into a binary operator, allowing you to implement any binary operator that you like. It is able to do this because the parser treats all binary operators the same, making them all left associative. This simple minded approach places user created operators on the same level as those that come with the language. The programmer can implement any desired control structure by writing methods that expect blocks as arguments. The built in control structures are also all methods that take blocks as arguments. These programmer defined control structures look exactly like the built in control structures because the parser doesn't know about any control structures.
In all three of these cases, having a simple syntax that the compiler can parse without any knowledge of the semantics is what allows the parser to parse programmer created extensions to the language without knowing what they mean.
When a parser knows too much about the meaning of a program, for example the fact that + means addition and * means multiplication, a fact which is implicit in the common assumption that * has a higher level of precedence than +, maybe that "freezes" the language in a way. Maybe a really simple syntax makes it easier to completely separate syntax from semantics, and maybe a strong separation between syntax and semantics is what makes a highly extensible language possible.
Monday, December 24, 2007
Subscribe to:
Post Comments (Atom)
22 comments:
Counter-example: ruby. Fairly smart language, one of the hairiest parsers known to man.
I love ruby, but it’s really not at all extensible the way, say, Lisp is. It simply doesn’t have Lisp-style macros, and adding them is much more difficult because of how difficult the parsing is.
Where does metalua sit? I haven't used it, just curious, as it's been highly touted.
Prolog lets you define your own operators, like this:
:- op(500,xfx,'has_color').
a has_color red.
b has_color blue.
?- X has_color blue.
X = a
It even lets you specify left/right associativity as well as precedence. That's pretty cool, but rare in other languages. I'd love to see a Forth or a Smalltalk that had that kind of feature; it would make them both (Forth especially) much more usable.
In general, though, I agree: the more complex the parser is, the less control it gives to the programmer. A simple parser is a powerful parser, maybe because the lack of complexity means a lack of limits.
@adam: Re: metalua.
metalua gives you complete dynamic control on the parser. To take only the infix operators example, you can create/delete/change operators on-the-fly, and of course set their semantics to whatever suits you. Moreover, the strict (some say psycho-rigid :)) separation of meta-levels actively encourages to separate semantics for syntax extension matters.
However, the parser remains limited on purpose: a too flexible parser would have those drawbacks:
- encouraging people to design poor syntaxes that won't compose with other extensions, supposing that they work at all alone.
- it would be harder to keep the dynamic behavior, i.e. the ability to modify the parser in the middle of a file. Or at least, it would be harder to keep it without regularly violating the principle of lwast surprise.
You can perform arbitrary transformations with the parser (it's trivial to change it into a full parser combinator, and users can do it), but if you do dangerous stuff with syntax, you'll have to cross a couple of red lines, so that you'll be warned , and have occasions to wonder whether you adopted the best approach.
Compared to Lisp, Metalua macros are on average more complex/powerful, but you write less of them. The rationale is that in the latter's design, the maintenance cost of macros has been estimated higher than in the former's. Using an existing macro is encouraged, but writing a new one is already something special, you're encouraged by the language to ask yourself twice whether a pure runtime approach wouldn't do the trick.
Well, it wouldn't be hard to define mechanisms to make syntax on the fly a la style du Algol, but why bother? It's not like you can't just change the compiler to extend the language anyhow. And there you can do it using a parser-generator as the input language, which makes it easier. Unless of course, all you have is a hammer.
I would call one of these parsers that doesn't know the language a "reader," in honor of Lisp. I spent literally months pondering this question: "Can I build a reader that produces familiar-looking syntax?" Eventually I came up with something along these lines:
expression := identifier
expression := parenlist
expression := expression expression
expression := expression infixop expression
parenlist := ( expression , expression , ... )
parenlist := [ expression , expression , ... ]
parenlist := { expression ; expression ; ... }
I don't know if that was exactly it, but it's close. I don't remember the precedence rules, but they were tricky. The really interesting bit was the precedence rules for infix operators. Any sequence of punctuation marks (other than the ones used to delimit a parenlist) counted as an infixop. The precedence was determined by using a string hash on the infix op. The hash function was carefully chosen such that the usual operators (+, *, etc) had the usual relative precedences. Net result: with a little searching, you can usually find an operator that looks sensible for whatever language construct you want to design, and which has about the right precedence.
Never did build a language around this reader. :)
P.S. When I say "familiar syntax," I meant something that would be comfortable to all those people who say "lisp has horrible syntax, I like Java."
P.P.S. One of the things that I should mention is that I was aiming for a reader that isn't itself alterable. I feel that altering the reader (ie, by defining reader-macros in lisp) creates incompatibilities between language extensions.
@josh: indeed, the trickiest part of extensible syntaxes is that you want to balance users' freedom with the ability to use different extensions together. At the very least:
- it must be easy to write a 'composing' extension, that won't interfere with other composing extensions.
- such extensions should integrate nicely into the language (you don't want the core language to be algol-like, and the extensions to be lisp-like syntactically).
- when people write non-composing extensions, they should be warned about that serious limitation of their macro. Or if you're into bondage & discipline, you can forbid to write these altogether.
Another problem with extensible parsers is that you want them to produce decent syntax error messages. The more dynamic the parser, the harder it is to provide helpful diagnostics.
Josh: you went to all that trouble and then misplaced the language? :)
The distinction between macros and reader macros is interesting. As is the connection with C++-style operator-overloading.
Add Tcl to your list. Its minimal syntax is both a source of flexibility and power, and the reason why many people dismiss it as weird. See eg. http://wiki.tcl.tk/2401 , http://www.equi4.com/moam/strength , http://antirez.com/articoli/tclmisunderstood.html
Yes, I totally agree with the central premise of this post, which is that making the basic rules clear and simple facilitates complexity on higher levels; whereas complexity on this fundamental level impedes progress later on.
But sadly, the alienness of the syntax, as well as the unsexy language names, are some of the likelier reasons why Lisp and Smalltalk are unpopular. People don't like "weird".
Lisp might be salvagable with a sexier name, perhaps Scheme, and with curly braces...
But Smalltalk, that's simply alien. Even with a more attractive name, it's too different from what we are used to and what we expect. The chasm is too great.
You can learn Smalltalk/Lisp syntax in a day, the syntaxes are amazingly simple. If you think the chasm is too great, you're just plain fucking lazy.
Languages that want to be popular need to appeal to millions of people who are "just plain fucking lazy".
Anyone who can't spend one day learning a new syntax needs to find a different career, programming isn't for them. One day is not a high barrier to entry, in fact it's absurdly low.
(Smalltalker here -- so don't flame me...)
I don't think the "weird" factor of Smalltalk rests in its extremely simple syntax (see the "Entire Language Example" of http://www.eli.sdsu.edu/courses/fall01/cs535/notes/basicSyntax/basicSyntax.html).
You can spend (even less than) a day to learn the syntax of Smalltalk, and still not be able to really use it. [Everything is an object], [actions are performed through message sends], and [a far superior programming and execution environment that is "living"] are all things that most programmers today are not used to -- I blame the curly-brace-languages for that. Objects there are hybrids at best.
You don't get it. It's not the syntax. It's the environment.
Syntax is simple. But the environment takes years to master.
It might be that the Smalltalk environment would allow one to be more productive (or would it?), but why spend years to find out, if you already master the ins and outs of the Win32 API, the Java or .NET libraries, or the Unix environment?
You'd be abandoning years of experience just to be a novice again.
The only reason you would seriously consider something like that is if you're not yet an expert in any of these environments, or your job requires you to master the new one.
Which is why you don't see millions of programmers moving into Smalltalk. It's not just a day's worth of learning new syntax.
Unless your experience is so small that you're still a novice in everything.
P.S.
Nice post. Thanks.
I get it, I'm not the one blaming syntax, you are. Syntax is simple, learning libraries absolutely takes time, that's life. However, it doesn't take that much time, and picking up new languages and libraries is something you should always be doing, as the Pragmatic programmer says, attempt one new language a year.
No matter the language or library, there's going to be lots of things you always do, you just find out how to do that thing with this library, they're not all that different.
Anyone who plans on making software development a career, simply cannot pick on language and one library and stop there, you'll be obsolete if you do. This field is about continual learning, period. You can't really survive in this industry over the long haul if you don't enjoy continual learning.
Try getting that point across to the millions of programmers who avoid learning Lisp and Smalltalk every day because (1) the syntax is peculiar, and (2) the environment is alien.
You can have your own personal convictions about what makes a good programmer, but that doesn't change the fact that PHP, C#, Java, C and C++ have millions of users, and Lisp and Smalltalk have, well, ten.
You're mistaken in the assumption that a community need number in the thousands or more to be alive and well.
As for those who choose to learn one language, I don't need to convince them, they'll simply be out competed in the market by smarter people.
Haskell lets you specify associativity and precedence. Otherwise, though, its parser is pretty barebones.
Post a Comment