Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing compilers ... what's right and what's wrong? [closed]

Okay, in my quest to figure out the necessary stuff to write a compiler, I've reached a bit of a roadblock. It seems that every technology or tool that I find has some opposition somewhere.

I use Bison and Flex right now but I'm getting the feeling that this method is outdated. Is this true? Is this a good forward-compatible way to proceed with writing a full fledged programming language?

In a sea of different concepts and tools (ANTLR, LL(k), GLR, LALR, LLVM, Flex, Bison) What's the current trend and best practices for writing compilers? Is the dragon book out of date?


2 Answers

Unless you want to write a truly simple compiler, your focus is wrong.

Writing compilers is only a tiny bit about writing parsers. Having a parser is like climbing the foothills of the Himalayas when the problem is climbing Everest. You get to the top of the foothill and look up ... only 20,000 feet to go and you've only done the truly easy part. And you'll note the technology required to get to the top of the foothills is radically easier than the technology you need to go the rest of the way.

(FYI: the best present parsing technology is GLR, which easily accepts ambiguous grammars without hacking the grammar. GLR even easily parses C++, which violates the folk theorem that C++ is hard to parse. The folk theorem came from people trying to use YACC and ANTLR to parse it).

To build a compiler you need lots of machinery:

  • AST building
  • Symbol table construction
  • Control flow analysis
  • Data flow analysis
  • Representation of program code essentially as a data flow computation (SSA or triples)
  • A model of the target machine
  • A means to map program code to machine instructions
  • Register allocation
  • Optimizations: constant propagation, loop unrolling, ...

We haven't even gotten near global flow analysis, global optimizations, or special handling for modern day instruction sets involving SIMD instructions or cache optimizations. ... The list goes on and on. The Dragon book gives a nice introduction to the basic topics, but doesn't address any of the advanced ones. You'll want Cooper's "Engineering a Compiler" and Muchnick's "Advanced Compiler Design" as references and it would be good if you had skimmed them well before you start.

Building a modern compiler is quite a feat of engineering.

like image 164
Ira Baxter Avatar answered Sep 14 '25 11:09

Ira Baxter


Parsing, although heavily studied, is the least important part of compiling. (Exception: you're designing your own concrete syntax and you are continually refining and changing the language.)

Yacc, Bison, and friends were designed for an era of machines with 64K of memory. They're great for running fast on machines with limited memory. But the amount of human engineering required to force a grammar into LALR(1) form is ridiculous today. Ira Baxter is right that GLR is probably the best, most flexible parsing technology, but PEG (Parsing Expression Grammars) are also good. In both cases the human engineering is light-years ahead of the older tools.

Having dismissed parsing, I will now start another technology food fight :-) Compiling mostly consists of rewriting a program over and over from one form into another, until eventually you reach assembly code or machine code. For this kind of problem you don't really want to use C or C++:

Q: (Asked of Dave Hanson when he published his amazing book on lcc with Chris Fraser) "You and Chris have spent ten years building what may be one of the most carefully engineered compilers ever made. What did you learn from the experience?"

A: "Well, C is a lousy language to write a compiler in."

I urge you to try one of the popular functional languages, like Haskell or Standard ML. People who work in this field widely believe that compilers are the "killer app" for functional languages. Algebraic data types and pattern matching are tailor-made for writing abstract syntax into intermediate code into machine code. A good place to see the power of these techniques is Andrew Appel's book Compiling With Continuations. (Appel's compiler textbook is also a good read and a very elegant design, but he doesn't always explain why the design is the way it is.)

like image 36
Norman Ramsey Avatar answered Sep 14 '25 12:09

Norman Ramsey