How a simple compiler works
A simple compiler might have a four-step process: a lexer, a parser, a translator and an interpreter.
- The lexer, or lexical analyser (or scanner, or tokeniser) scans your source code and turns it into atomic units called tokens. This is most commonly achieved by pattern matching using regular expressions.
- The tokenised code is then passed through a parser to identify and encode its structure and scope into what’s called a syntax tree.
- This graph-like structure is then passed through a translator to be turned into bytecode. The simplest implementation of which would be a huge switch statement mapping tokens to their bytecode equivalent.
- The bytecode is then passed to a bytecode interpreter to be turned into native code and rendered.
This is a classic compiler design and it’s been around for many years. The requirements of the desktop are very different however from those of the browser. This classic architecture is deficient in a number of ways. The innovative way in which these issues were resolved is the story of the race for speed in the browser.
Fast, Slim, Correct
“Fast, Slim, Correct. Pick any two, so long as one is ‘Correct’”
The principle problem with the classic architecture is that runtime bytecode interpretation is slow. The performance can be improved with the addition of a compilation step to convert the bytecode into machine code. Unfortunately waiting several minutes for a web page to fully compile isn’t going to make your browser very popular.
More recently browser vendors have introduced optimising compilers with an additional step. After the Directed Flow Graph (DFG) or syntax tree has been generated the compiler can use this knowledge to perform further optimisations prior to the generation of machine code. Mozilla’s IonMonkey and Google’s Crankshaft are examples of these DFG compilers.