Parser-friendly Syntax ― Andreas Zwinkau

Modern programming languages usually have a parser-friendly syntax. Some people wonder why. Maybe we should optimize the syntax to be human-friendly instead? Computers should make it easier for the programmer after all. I argue: parser-friendly is the right choice and effectively leads to more human-friendlyness. The cost for the programmer is neglible.

Parser-friendly is human-friendly

First, what is parser-unfriendly? For example, C/C++ has the well-known problem that a * b is ambiguous without type information. If a is a type, then this declares a variable b of type "pointer to a". In contrast, if a is a variable, then this is a multiplication. For a compiler writer this means to entangle parsing and semantic analysis. The compiler is harder to write and less modular. This means a parser-friendly language will have less bugs in the compiler. For example, probably every C++ compiler still has at least some edge case bugs in the parser.

In addition, easy-to-parse is not only good for the compiler. There are lots of other applications which need to parse a language: syntax highlighting, style checker, reformatter, static program analysis, automatic refactoring. This means a parser-friendly language will have better tool support faster. For example, compare refactoring tools for C++ to Java, which is much easier to parse.

The cost of parser-friendly

Now what is the cost of parser-friendly syntax? Usually it only means to add a few additional keywords or adapt some punctuation. For example, we can introduce a special keyword for declarations. So we have to convert a declaration, which can be mistaken for an expression,

a * b;

into an unambiguous declaration like

var a * b;

which only has the downside of four more charactes to type. To mitigate this, modern languages usually employ (local) type inference, so there a fewer of these declarations. Alternatively, we could remove the ambiguity of the asterisk, so the declaration cannot be mistaken for an expression:

ref a b;

Or provide an additional token to separate variable name from type:

b : a*;

Which solution is best, is mostly an opinion of aesthetics. However, all solutions have little overhead and do not make the language less human-friendly. To the contrary, syntactic clarity in itself is a benefit for the programmer too.

Effectively, there is no argument against parser-friendly syntax.