upvote
A common problem I noticed is that if you took certain courses in computer science, you may have a pre-conceived notion of how to parse programming languages, and the shell language doesn't quite fit that model

I have seen this misconception many times

In Oils, we have some pretty minor elaborations of the standard model, and it makes things a lot easier

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

Everything I wrote there still holds, although that post could use some minor updates (and OSH is the most bash-compatible shell, and more POSIX-compatible than /bin/sh on Debian - e.g. https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp... )

---

To summarize that, I'd say that doing as much work as possible in the lexer, with regular languages and "lexer modes", drastically reduces the complexity of writing a shell parser

And it's not just one parser -- shell actually has 5 to 15 different parsers, depending on how you count

I often show this file to make that point: https://oils.pub/release/0.37.0/pub/src-tree.wwz/_gen/_tmp/m...

(linked from https://oils.pub/release/0.37.0/quality.html)

Fine-grained heterogenous algebraic data types also help. Shells in C tend to use a homogeneous command* and word* kind of representation

https://oils.pub/release/0.37.0/pub/src-tree.wwz/frontend/sy... (~700 lines of type definitions)

reply
Author here, and yeah, I agree. I skipped writing a parser altogether and just split on whitespace and `|` so that I could get to the interesting bits.

For side-projects, I have to ask myself if I'm writing a parser, or if I'm building something else; e.g. for a toy programming language, it's way more fun to start with an AST and play around, and come back to the parser if you really fall in love with it.

reply
Can say the same for control characters in terminals. I even think maybe it's just easier to ditch them all and use QT to build a "terminal" with clickable urls, something similar to what TempleOS does.
reply