This is a mirror of official site: http://jasper-net.blogspot.com/

Writing a Compiler in C#: Parsing, Part 1

| Monday, October 18, 2010
In the previous installment we saw the core of a lexical analyzer, a module that generates from a stream of characters a set of tokens for symbols, identifiers, keywords, integer constants, and string constants. Today, we move to parsing.

The parser’s job is to give semantic structure to the syntactic tokens bestowed upon it by the lexical analyzer. There are, as always, automatic tools like yacc that create from a BNF grammar a program that parses tokens in a certain language. However, it is often more efficient and certainly more educational to write a parser by hand.

We’ll be dealing with a very special kind of parser—a recursive descent predictive parser. As you might remember from the last installment, Jack is almost an LL(1) language, meaning that a single look-ahead token is sufficient for parsing almost all Jack statements and expressions. (This predictive nature is a property of deterministic finite automata.)

When you look at a Jack program fragment (such as the following one, taken from the previous installment), you probably see a certain structure, but translating that structure into a set of parsing procedures is usually non-trivial without some sort of grammar representation.

let prime = true;
while (i < numbers[j] & prime) {
   if (numbers[j] % i = 0) {
       do System.printInt(numbers[j]);
       do System.print(" is composite.");
       do System.println();
       let prime = false;
   }
   let i = i + 1;
}
One such convenient representation is Backus-Naur Form (BNF).

If you’re familiar with regular expressions, you should have no difficulty understanding the following simplified BNF definitions of some of Jack’s grammar:

stmt       ::= while-stmt | if-stmt | let-stmt
while-stmt ::= while '(' expr ')' { stmt* }
if-stmt    ::= if '(' expr ')' { stmt* }
let-stmt   ::= let var ( [ expr ] )? = expr ;
var        ::= identifier
identifier ::= letter_ ( letter_ | digit )*
letter_    ::= A | B | … | Z | a | b | … | z | _
digit      ::= 0 | 1 | … | 9

The entire Jack grammar can be expressed in less than a single printed page of BNF definitions. I hope you’re convinced by now that BNF is a concise form of expression; it remains for me to convince you that it’s also a convenient form.

Read more: All Your Base Are Belong To Us

Posted via email from .NET Info

0 comments: