There are some excellent books out there. In its own way, the dragon book is excellent, but it is a terrible starting place.
Here are a bunch of references from the same vintage as OP. I recommend starting with a book that actually walks through the process of building a compiler and doesn't spend its time exclusively with theory.
Some years later I (re-) discovered Forth, and I thought "why not?" and built my own forth in 32-bit Intel assembly, _that_ brought back the wonder and "magical" feeling of compilers again. All in less than 4KB.
I guess I wasn't the right audience for the dragon book.
The Tiger book (with C, Standard ML, and Java variants)
https://www.cs.princeton.edu/~appel/modern/
Compiler Design in C (freely available nowadays, beware this is between K&R C and C89)
lcc, A Retargetable Compiler for ANSI C
Or if one wants to go with more clever stuff,
Compiling with Continuations
Lisp in Small Pieces
The book is famous for its SSA treatment. Chapters 1-8 are not required to understand SSA. This allows you to walk away with a clear win. Refer to 9.2 if you're struggling with dominance + liveness.
http://www.r-5.org/files/books/computers/compilers/writing/K...
But then, pushing regular languages theory into the curriculum, just to rush over it so you can use them for parsing is way worse.
At least in the typical curriculum of German universities, the students already know the whole theory of regular languages from their Theoretical Computer Science lectures quite well, thus in a compiler lecture, the lecturer can indeed rush over this topic because it is just a repetition.
The dragon book almost convinced me never to try to write a compiler.
That was the point. That's why it's not a cute beaver on the cover :)it taught me to think very differently but i am sure i am still not ready to write a compiler :D
A lot of people say the dragon book is difficult, so I suppose there must be something there. But I don't see what it is, I thought it was quite accessible.
I'm curious, what parts/aspects of the dragon book make it difficult to start with?
I repeatedly skip parts that are not important to me when reading books like this. I grabbed a book about embedded design and skipped about half of it, which was bus protocols, as I knew I wouldn't need it. There is no need to read the dragon book from front to back.
> But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.
Unless the reason is explicitly stated there is no way to verify it's any good.
There's a reason people use AI to write do their homework - it just doesn't mean it's a good one.
I can think of plenty arguments for why you wouldn't look into the pros and cons of different parsing strategies in an introduction to compilers, "everyone is(or isn't) doing it" does not belong to them.
In the end, it has to be written down somewhere, and if no other book is doing it for whatever reason, then the dragon book it shall be. You can always recommend skipping that part if someone asks about what book to use.The first edition was my first CS textbook, back in the '90s and as a young programmer I learned a lot from it. A couple years ago, I started on a modern compiler back-end however, and found that I needed to update my knowledge with quite a lot.
The 2nd ed covers data-flow analysis, which is very important. However, modern compilers (GCC, LLVM, Cranelift, ...) are built around an intermediate representation in Static Single Assignment-form. The 2nd ed. has only a single page about SSA and you'd need to also learn a lot of theory about its properties to actually use it properly.
Most of the work is actually the backend, and people sort of illusion themselves into "creating a language" just because they have an AST.
It also is only the case that most of the work is the backend for some compilers, though of course all of this depends on how backend is defined. Is backend just codegen or is it all of the analysis between parsing and codegen? If you target a high level language, which is very appropriate for one's first few compilers, the backend can be quite simple. At the simplest, no ast is even necessary and the compiler can just mechanically translate one syntax into another in a single pass.
It's actually the reverse, in my opinion. Semantics can change much more easily than syntax. You can see this in that small changes in syntax can cause massive changes in a recursive-descent parser while the semantics can change from pass-by-reference to pass-by-value and make it barely budge.
There is a reason practically every modern language has adopted syntax sigils like (choosing Zig):
pub fn is_list(arg: arg_t, len: ui_t) bool {
This allows the identification of the various parts and types without referencing or compiling the universe. That's super important and something that must be baked in the syntax at the start or there is nothing you can do about it.Another alternative is basing the language on S-expressions, for which a parser is extremely simple to write.
This one? https://people.inf.ethz.ch/wirth/CompilerConstruction/Compil...
This ( https://github.com/tpn/pdfs/blob/master/Compiler%20Construct... ) seems to be a previous version (2005) and it's 131 pages long
> And after Volumes 1--5 are done, God willing, I plan to publish Volume 6 (the theory of context-free languages) and Volume 7 (Compiler techniques), but only if the things I want to say about those topics are still relevant and still haven't been said.
Admittedly, volumes 5-7 wouldn't be as massive as volume 4 (it sort of turns out that almost all interesting algorithms ends up being categorized as being in volume 4), so you probably wouldn't have a half-dozen subvolumes per topic but, it's still too many books down the line, especially if he plans to revise volumes 1-3 before working on anything else.