undefined

points

by logdahl11 hours ago |

comments

by eru9 hours ago|

[-]

The Dragon book is not very good, to be honest.

It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.

Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.

For parsing: by default you should be using parser combinators.

by pklausler7 hours ago|

parent|

[-]

Is there a production compiler out there that doesn't use recursive descent, preferably constructed from combinators? Table-driven parsers seem now to be a "tell" of an old compiler or a hobby project.

by eru7 hours ago|

parent|

[-]

Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.

Btw, I mentioned parser combinators: those are basically just a front-end. Similar to regular expressions. The implementation can be all kinds of things, eg could be recursive descent or a table or backtracking or whatever. (Even finite automata, if your combinators are suitably restricted.)

by pklausler7 hours ago|

parent|

[-]

I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.

by dbcurtis6 hours ago|

parent|

prev|

[-]

The thing about LR parsers is that since it is parsing bottom-up, you have no idea what larger syntactic structure is being built, so error recovery is ugly, and giving the user a sensible error message is a fool’s errand.

In the end, all the hard work in a compiler is in the back-end optimization phases. Put your mental energy there.

by ogogmad7 hours ago|

parent|

prev|

[-]

Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.

It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.

by pklausler7 hours ago|

parent|

[-]

Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.

by 2 hours ago|

parent|

[-]

deleted

by dbcurtis6 hours ago|

parent|

prev|

[-]

I was just going into the second quarter of compiler design when the dragon book came out. My copy was still literally “hot of the press” — still warm from the ink baking ovens. It was worlds better that anything else available at the time.

by joe_the_user3 hours ago|

parent|

prev|

[-]

The Dragon book wasn't good for me either but I'd disagree about using parser combinators. The problem that I'd see the Dragon book having is basically starting to use concepts (phases of compilation) before it introduces and motivates them in the abstract. I can see how people who already know these concepts can look at the Dragon book and say "oh, that's a good treatment of this" so perhaps it's good reference but it's problematic for a class and terrible to pick up and try to read as a stand alone (which I did back in Berkeley in the 80s).

As far as I can tell, parser combinators are just one way that promises to let "write a compiler without understanding abstract languages" but all these methods actually wind-up being libraries that are far complicated than gp's "recursive descent + pratt parsing", which is easy once you understand the idea of an abstract language.

by gignico10 hours ago|

prev|

[-]

Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.

by norir7 hours ago|

parent|

[-]

It is easily possible to parse at > 1MM lines per second with a well designed grammar and handwritten parser. If I'm editing a file with 100k+ lines, I likely have much bigger problems than the need for incremental parsing.

by fwip6 hours ago|

parent|

[-]

It's not just speed - incremental parsing allows for better error recovery. In practice, this means that your editor can highlight the code as-you-type, even though what you're typing has broken the parse tree (especially the code after your edit point).

by marssaxman5 hours ago|

prev|

[-]

I am a compiler guy, and I completely agree. Parsing is not that hard and not that important. Recursive descent + pratt expressions is almost always the practical choice.

by randomNumber710 hours ago|

prev|

[-]

It's not for toy languages. Most big compilers use recursive descent parsing.

by ebiederm5 hours ago|

parent|

[-]

Language design benefits from parser generators that can point out ambiguities and verify a language is easy to parse.

by marssaxman3 hours ago|

parent|

[-]

It does not follow that a generated parser would make sense in production code.

by joe_the_user3 hours ago|

prev|

[-]

Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

Well, it depends how formal you're talking about. I have to say that the standard you mention, recursive descent parsing + Pratt for expressions. actually requires you to understand what a formal language is - that it's a "thing" that can't (or shouldn't) be an object or a data structure but exists abstractly before any objects created by the program.

Moreover, the standard way of producing a recursive descend parser is to begin with your language in Chomsky normal form or some human understandable format and then convert to Greibach Normal form and that specification converts readily to your series of recursive functions. So all language transforms are useful to know (though you can skip steps if you have a good intuition of your language).

by signa1110 hours ago|

prev|

[-]

> Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)

exactly this ! a thousand times this !

by ogogmad9 hours ago|

parent|

[-]

I think even the theory of Regular Languages is somewhat overdone: You can get the essence of what NFAs are without really needing NFAs. You can get O(n) string matching without formally implementing NFAs, or using any other formal model like regex-derivatives. In fact, thinking in terms of NFAs makes it harder to see how to implement negation (or "complement" if you prefer to call it that) efficiently. It's still only linear time!

The need for NFA/DFA/derivative models is mostly unnecessary because ultimately, REG is just DSPACE(O(1)). That's it. Thinking in any other way is confusing the map with the territory. Furthermore, REG is extremely robust, because we also have REG = DSPACE(o(log log n)) = NSPACE(o(log log n)) = 1-DSPACE(o(log n)). For help with the notation, see here: https://en.wikipedia.org/wiki/DSPACE

by ogogmad10 hours ago|

prev|

[-]

Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.

The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.

Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.

by duped4 hours ago|

parent|

[-]

An even easier approach is to give all infix operators the same precedence and force the programmer to group subexpressions.

by randomNumber73 hours ago|

parent|

[-]

You can always write lisp but most people can read code better that doesnt have that many (((()))))))

by ogogmad1 hours ago|

parent|

[-]

I'm sure there's a middle ground which still gives you some of the metaprogramming power of Lisp. OTOH this: https://www.gingerbill.org/article/2026/02/21/does-syntax-ma...