upvote
An interesting idea I saw long ago in some book (I thought it was K&P's "Software Tools," or my second guess was K&R1, but neither of those panned out — a strong Mandela effect) was the clever idea of a whole-document spellchecker that works purely probabilistically, by histograms: you feed it a document, it tallies the trigraphs, and any trigraph that appears only rarely is flagged as a likely typo. This approach lets through unknown-but-realistic words like "antithematory" while flagging unrealistic words like "prisencolinensinainciusol" (because of its unlikely "ciu" and "ius" clusters) and "antthemaory" (because of "ntt" and "aor").

To make this approach work better, feed it a bunch of English text (or whatever language your document is in) before the document you really want to "spellcheck."

Essentially this isn't a spell "checker" so much as a spell "linter" — it looks for antipatterns statistically associated with bugs, and reports the patterns for further investigation.

If anyone knows where this trigraph-based "spellchecker" was first presented, I'd love to find out again.

reply
I like that. Might some of that logic have made it to consumer word processing programs?
reply
That's uhh...a language model?
reply
It's a modeling of language, it's not structurally anything like an LLM.
reply
It’s literally a trigram (character) language model. Check any NLP book from before 2015 or so.

LLMs have more stuff bolted onto them (embeddings, RLHF) but the autoregressive core is a direct descendent of that sort of language model.

reply
You used to be able to add your own words to spellcheckers, somehow that went out the window. I rarely see the option for it on a red-lined word now in the context menu, and when it does adding the word seems to make no difference at all.
reply
Human copy editors are less than perfect too. I hired one copy editor who I could not trust to be the last person who touched a document before it went out.

I had a friend who wrote an article for the New York Times: the article made a lot of sense before she submitted it, but it was edited for length and style and it definitely read like a New York Times piece but didn't completely make sense.

reply