undefined

points

[-]

It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice. I understand why academics teach them, but not why some spend so long on different parsing techniques, nor why hobbyists who just want to compile their toy language are directed to them.

I work in PL, and from my first compiler to today, have always found recursive descent easiest, most effective (less bugs, better error diagnostics, fast enough), and intuitive. Many popular language compilers use recursive descent: I know at least C# (Roslyn) and Rust, but I believe most except Haskell (GHC) and ocaml.

The LR algorithm was simple once I learned it, and yacc-like LR (and antlr-like LL) parser generators were straightforward once I learned how to resolve conflicts. But recursive descent (at least to me) is simpler and more straightforward.

LR being more expressive than LL has never mattered. A hand-written recursive descent parser is most expressive: it has unlimited lookahead, and can modify parsed AST nodes (e.g. reordering for precedence, converting if into if-else).

The only solution that comes close is tree-sitter, because it implements GLR, provides helpful conflict messages, and provides basic IDE support (e.g. syntax highlighting) almost for free. But it’s a build dependency, while recursive descent parsers can be written in most languages with zero dependencies and minimal boilerplate.

by wglb12 hours ago|

parent|

[-]

> It seems to me LL and LR parser generators are overrated, and hand-written recursive descent is best in practice

I would now agree with that. My compiler experience was on a team that happened to have a LALR expert, Jeanne Musinski PhD, a student of Jeffrey Ullman. She invented a better error recovery for the language. Recursive descent would have been perfectly suited to the task.

> LR being more expressive than LL has never mattered.

Quite agree. One might guess that a language that needs that might be hard to program in.

by _old_dude_13 hours ago|

parent|

prev|

[-]

Parser generators are great in Python (Lark for me) so you can iterate fast and get a runtime spec of your grammar.

A hand-written recursive descent parser is something you do later when you start to industrialize your code, to get better error messages, make the parser incremental, etc.

Bison/ANTLR are code generators, they do not fit well in that model.

by mrkeen14 hours ago|

prev|

[-]

I'll push back and say that the lexer/parser split is well worth it.

And the best thing about the parser combinator approach is that each is just a kind of parser, something like

  type Lexer = ParsecT e ByteString m [Token]

  type Parser = ParsecT e [Token] Expr

All the usual helper functions like many or sepBy work equally well in the lexing and parsing phases.

It really beats getting to the parentheses-interacting-with-ordering-of-division-operations stage and still having to think "have I already trimmed off the whitespace here or not?"

by omcnoe4 hours ago|

parent|

[-]

I am writing a whitespace sensitive parser - trimming whitespace matters because whitespace consumption is used to implement the indentation rules/constraints.

For example, doing things like passing an indentation sensitive whitespace consumer to a parser inside `many` for consuming all of an indented child block. If I split lexing/parsing I think I'd have to do things like insert indentation tokens into the stream, and end up with the same indentation logic (but instead matching on those indentation tokens) in the parser regardless.

I have found that order-of-operations is somewhat trivially solved by `makeExprParser` from `Control.Monad.Combinators.Expr`.