The parser’s job is to give semantic structure to the syntactic tokens bestowed upon it by the lexical analyzer. There are, as always, automatic tools like yacc that create from a BNF grammar a program that parses tokens in a certain language. However, it is often more efficient and certainly more educational to write a parser by hand.
We’ll be dealing with a very special kind of parser—a recursive descent predictive parser. As you might remember from the last installment, Jack is almost an LL(1) language, meaning that a single look-ahead token is sufficient for parsing almost all Jack statements and expressions. (This predictive nature is a property of deterministic finite automata.)
When you look at a Jack program fragment (such as the following one, taken from the previous installment), you probably see a certain structure, but translating that structure into a set of parsing procedures is usually non-trivial without some sort of grammar representation.
let prime = true;
while (i < numbers[j] & prime) {
if (numbers[j] % i = 0) {
do System.printInt(numbers[j]);
do System.print(" is composite.");
do System.println();
let prime = false;
}
let i = i + 1;
}
One such convenient representation is Backus-Naur Form (BNF).
If you’re familiar with regular expressions, you should have no difficulty understanding the following simplified BNF definitions of some of Jack’s grammar:
stmt ::= while-stmt | if-stmt | let-stmt
while-stmt ::= while '(' expr ')' { stmt* }
if-stmt ::= if '(' expr ')' { stmt* }
let-stmt ::= let var ( [ expr ] )? = expr ;
var ::= identifier
identifier ::= letter_ ( letter_ | digit )*
letter_ ::= A | B | … | Z | a | b | … | z | _
digit ::= 0 | 1 | … | 9
The entire Jack grammar can be expressed in less than a single printed page of BNF definitions. I hope you’re convinced by now that BNF is a concise form of expression; it remains for me to convince you that it’s also a convenient form.
Read more: All Your Base Are Belong To Us