Pasting something directly into the chat interface seems weird, but if you could somehow just see where P(token | context) falls off a cliff, that's a pretty good hint that your writing has problem.
Even Haiku is massive overkill for this use case.
In comparison to non-AI traditional tools, AI has the advantage of "understanding" the text, reducing the number of "stupid" mis-corrections. And its spelling correctness is usually already impeccable, so what is there to gain by interfacing it with traditional solutions, and how can it be achieved?
To make this approach work better, feed it a bunch of English text (or whatever language your document is in) before the document you really want to "spellcheck."
Essentially this isn't a spell "checker" so much as a spell "linter" — it looks for antipatterns statistically associated with bugs, and reports the patterns for further investigation.
If anyone knows where this trigraph-based "spellchecker" was first presented, I'd love to find out again.
LLMs have more stuff bolted onto them (embeddings, RLHF) but the autoregressive core is a direct descendent of that sort of language model.
I had a friend who wrote an article for the New York Times: the article made a lot of sense before she submitted it, but it was edited for length and style and it definitely read like a New York Times piece but didn't completely make sense.
The problem being misspelling, hence, "spell checker". Like, this seems pretty straightforward? Grammar checking if you cannot use the language properly is a pretty different problem space, and indeed has long existed and is exposed as a separate thing. And not just in fancy word processors either, if you go to something as simple as macOS TextEdit you'll see separate check boxes for "Check spelling as you type" vs "Check grammar with spelling". If someone wants to try out using LLMs for grammar no problem, but spell checking is purely about the mechanical and, importantly, deterministic aspect of typos or outright non-words.
>As the classic joke goes, "Me spell chucker work grate. Need grandma chicken."
There is a genuine touch of irony/meta in you using that here in this context. That sentence has no misspelled words, and importantly gets across the exact humorous meaning the human who wrote it intended. The joke literally only works because a human was able to make creative use of language. If you had an LLM agent posting for you to HN and it automatically changed that to:
>As the classic joke goes, "My spellchecker works great but could use some grammar checking."
Well, where would the joke be now!? This goes to the exact concern people have with powerful non-deterministic meaning-changing tools replacing deterministic meaning-preserving ones.
>I immediately disable spellchecking on every avenue it tries to approach because managing a bunch of dictionaries on every browser/device/application that has its own spellchecker for some godforsaken reason to not have squigglies spammed over every piece of jargon, slang, and slightly atypical spelling is incredibly annoying.
But this is a logic fail is it not? LLMs are irrelevant to this. Your stated problem is "not all software/devices I use has a single shared dictionary/grammar tool to my preferences". That's a very, very reasonable complaint. I agree with you that it's always been tremendously irritating that so many applications won't even make use of operating system dictionaries but rather recreate their own, really that the entire infrastructure around spelling or grammar dictionaries is so primitive.
But how do you think LLMs help? Even setting aside quality concerns they don't magically retroactively make every software/device use them, they're just another tool in the space something could use, or not. So you're still stuck with the exact same problem. You still don't have something sync'd/shared universally across your entire experience. I can see how you could just live within some single environment to avoid that (do everything in a browser, use the same browser company's products across platforms with sync supported, so you can use the browser language tools for everything), but again that's not unique to LLMs. That approach would work for conventional tools as well.
>I just fed this entire thread to an LLM
This is a second logic fail. The entire point and meaning of "non-determinism" is precisely that you can't just do something once and then have that be evidence. If we all did the "same thing", feeding every thread to an LLM, thousands of times we wouldn't all get identical results every time. Sometimes we'd get something else. And the very fact it's rare is one of the core challenges of this entire space, because humans are very, very bad at dealing with things where it works 99% of the time and fails 1% of the time. This has always been true.
It is not. The LLM approach is not dependent on system configurations. You can expect that it probably works the same from any device or application, because it can surmise slang/jargon from training and context rather than needing to be fed every little individual case as a per-user configuration. There are advantages to making a program more sophisticated than a literal == check against a list of pre-programmed words.
And even if there were an easy and satisfying way to unify dictionaries cross-device, it still wouldn't be a pleasant experience. That first time adding every single jargon term you use is not enjoyable. If there was a solution that just... didn't require that, it would solve a problem current spellcheckers do not solve. And what do you know, it appears there is one!
> This is a second logic fail.
Saying things are logic fails doesn't make them logic fails, all the more so when the failure is your own reading comprehension. I explicitly noted that non-determinism doesn't need to be flawless, only better than the deterministic solution on average. If the non-deterministic error rate of LLMs is below 1%, that still puts it far, far, far ahead of the deterministic tool's error rate.
It may be possible to create a deterministic tool that is better on average, but I haven't seen one. The current tooling is so fucking horrendously bad that after decades they cannot handle pluralising any uncommon word that is pluralised with "ies", for example squiggly is recognised and squigglies is not. That is fucking shamefully bad technology.
How is it not dependent? Like, help me out here: I'm writing something up in vim on my FreeBSD system using the built-in dictionary capability, maybe I've got grammar too with LanguageTool via the ALE plugin. I've added various words to my good words list over time. I save it to a network drive and want to keep working on it and do some graphical formatting as well for output to a different audience with a different tool on an iPad for a flight. How does "the LLM approach" uniquely slot into vim and the iPad app. "Uniquely" as-in a way that you couldn't slot in a shared sync'd dictionary file or whatever else. What if one of the developers doesn't want to and I don't have time or (if it's closed source) can't? How does it help all the other different software I use that are still using their own thing?
If by "LLM approach" you specifically mean "I copy/paste into this whole other software, and that software is what I use from different platforms" well, that's nice but it's not an "LLM approach" it's a "copy/paste into different software" approach which again could be done with whatever.
I explicitly noted that non-determinism doesn't need to be flawless, only better than the deterministic solution on average.
But how do you know what the "average" is? You can't get that from a single shot. And what's the upside vs downside of false positives or false negatives or meaning changes/hallucinations? That's also a point of contention, particularly when it comes to any problem space (coding of course, but also law, medicine etc) where precision in language is important even 1% of the time. And you clearly have an intense personal issue here around grammar/spelling that is not universally shared. Which is fine, but the tradeoffs you're willing to make are also going to be personal. It's also going to vary, just as with using LLMs for coding, based on the user. Some people are sufficiently capable with language to realistically be able to expect to double check an LLM and mostly do fine. It's a lot riskier though for someone with a weak grasp to depend on.
> But how do you know what the "average" is? You can't get that from a single shot.
I don't know what the average is. I never made a claim that LLMs are categorically better than spellcheckers; I simply said it's hard to imagine they'd be worse, given how bad spellcheckers already are, and that I understand why people would be willing to give a non-deterministic tool a try, contrary to it being stated like doing so was the dumbest thing imaginable and that spellchecking was a 'solved problem'.
You're correct that one shot is not a statistical analysis, but multiple people were throwing around assertive claims that LLMs rewrite entire sentences and change their meaning when prompted to spellcheck, or that LLMs were incapable of handling a joke with intentional mispellings being integral to the joke, both of which seemed incorrect on their face to me, so I gave it a try. LLMs are typically conditioned to a high degree of mode collapse, so I do expect that if I retried the same prompt and context on the same model 100 times, it probably would give approximately the same output at least 90/100 times, if not 99, but I'm not presenting a thesis here.
> And what's the upside vs downside of false positives or false negatives or meaning changes/hallucinations?
Sure, these are valid considerations. I would not, under any circumstances, let an LLM touch my legal documents for any reason. However, the stakes for spellchecking an internet comment are non-existent, so one could easily imagine trading the downsides for the benefit of not being nagged by squigglies.
> And you clearly have an intense personal issue here around grammar/spelling
I really don't, actually. As I mentioned, I disable spellcheckers on sight, and I don't use LLMs for spellchecking myself. I rely on my own two eyes for spellchecking, and sometimes I miss things, which is an outcome I'm okay with. Spellcheckers, then, are not something I ever think about, beyond the time it takes to disable them after being nagged on a new device or application. I do take offense to calling such a laughably poor state of technology a "solved problem", though, and the sneering at people attempting to find new solutions to it. There is absolutely nothing wrong with attempting to iterate on a bad status quo.
I would also note that I think the non-determinism could also be solved to an appreciable degree by simply having the integrated LLM tool offer suggestions, which require human approval to correct, much as current squigglies operate but perhaps with a lower failure rate on average. Or not! But it's an area I can see value in exploring, anyways.
An example of a sentence like this with correct spelling but bad grammar would be "my spell checker works good." All of the words are what they're meant to be, but the last word is not the correct part of speech.=
But because computers are good at detecting "this doesn't match any known word" and bad at detecting "this matches a word but isn't the word you meant to use here," we've redefined "spell checking" to mean "find words that don't match any known word."
Your point about the joke is not correct. If I put my comment into ChatGPT and ask for a grammar check, it recognizes that it's a joke with deliberately bad grammar and suggests leaving it alone. If I put my comment into a grammar checker, it flags multiple errors in the joke. And "deterministic meaning-preserving ones"? Traditional spell/grammar checkers may be deterministic, but at no point have they ever been guaranteed to preserve meaning, or even been particularly good at it.
It actually is clear, because words have meaning. "Spelling" refers specifically to the order of letters forming a given word [0, 1]. The proper use of words with a sentence, the "the study of the classes of words, their inflections, and their functions and relations in the sentence" [2] is the definition of "grammar"!
>I'd argue it's a spelling mistake.
Perhaps so, you're welcome to invent your own special snowflake definitions for words without much relation to decades/centuries of usage. It's a free country. But I would and will argue you are incorrect to do so and then expect to communicate with other humans.
----
0: https://www.merriam-webster.com/dictionary/spell
Right. And "the given word" in that particular example means "well" and is spelled G R E A T. G R A T E is a misspelling of that word.
Your position doesn't make any sense when you boil it down. I write some word as some sequence of letters. Whether it's correctly spelled depends not only on how that word is spelled, but how all other words, completely unrelated, are also spelled?
Let's say someone meant to write "bite" but wrote "byte" back in 1950. That's a misspelling. Did it retroactively become a grammar error when the word "byte" was coined in 1956? Or does the word have to exist at the time of writing for it to be a grammar error instead of a spelling error?
It's a lot more consistent if you consider the spelling relative to the word that's supposed to be there and accept that computer spell checkers miss the case where a misspelling happens to match a different word.
"Grate" is a real word, and it is correctly spelled. In fact, within the purpose of the joke, it's even correctly used! But even if someone were to write that sentence out with no joke meaning, because perhaps they had learned English as a second language purely phonetically and were just trying to write things as they sounded, it'd be a grammar issue not a spelling one. Same as more common IRL hiccups like their/there, or its/it's. We even have other words like in the English language specifically to describe that in turn, like "homonym".
>Your position doesn't make any sense when you boil it down.
No, it's your position that makes no sense. You are effectively arguing that the word "grammar" shouldn't exist! There is in fact an objective difference between mechanically misspelling words and incorrectly using a homonym.
>I write some word as some sequence of letters. Whether it's correctly spelled depends not only on how that word is spelled, but how all other words, completely unrelated, are also spelled?
As I said, you're free to invent your own special snowflake definitions. But what you are writing is not in fact the shared definition at all. You for some reason are very determined to conflate "spelling" with "grammar". I linked you a few major sources, but this is not an area of contention, it has been consistently used for a very long time including in computers. It's even had plenty of attention over the decades. I still remember when a grammar checker was added for the first time to Microsoft Word and the debates about its quality (or lack thereof). There are even whole UX patterns around this, like coloring the squiggly lines below writing differently depending on if it's a spelling check error (commonly red) or grammar check (often blue). Precisely because grammar checking is harder and has often been iffier many people will disable it but leave spell checking on, because they're confident enough in their grammar and don't trust the computer, but don't want to accidentally post or send a message with "great" or "grate" as "graeyte".
Edit: in reply to wat10000 doing the 'ol virtual "good day to you SIR!" below:
I said it was a snowflake definition, given it's completely contrary to every dictionary and historical usage. I didn't call you yourself a snowflake.
And what's actually really fucking infuriating is when people like you simply refuse to use standard, shared dictionary definitions of words and widely used established software tools in your conversation and then further refuse to acknowledge it when corrected. And also refuse to engage with any substance and instead storm off in a virtual huff. You could have just gone "right I meant grammar correction, present grammar correction really kind of sucks and that's what I think people need most vs spelling correction" and that'd be that.
Of course, LLMs are non-deterministic and do occasionally make mistakes, so you have to use them correctly and review their output. You shouldn't paste a doc into the web UI and tell it "fix all the mistakes and write the output to a new file." You should instead have it present each mistake and fix to the user as a diff and let the user approve or deny, either within the application or allowing the user to make their own edits. Never let it "rewrite" the whole document, that's the document-editing equivalent of giving OpenClaw root on your personal computer. Nothing good will come of it.
Classic spell checkers can't detect homophones. E.g. "there" and "their." Grammar checkers can, but at least the ones that I have used also like to change the tone of my writing to sterile corporate PC speak. LLMs used for grammar checking have not, in my experience, meddled with my tone. (Although sometimes they try to admonish me for it!)
Most grammar checker packages also include style checking, and the default options tend toward that style (because that’s the big market for them.) Most of them are also configurable, so you can disable style checking entirely while still checking grammar, or tweak which style rules are applied.
Traditional methods might not be perfect, but they also easily fit in the memory of even low power devices. Perhaps it isn't a problem worth burning a dollar of tokens for every spelling mistake.
Don't do a stupid thing like that in the first place.
> In comparison to non-AI traditional tools, AI has the advantage of "understanding" the text, reducing the number of "stupid" mis-corrections.
I doubt it, but if that's true, run a normal spell checker, and then give the output to your LLM to filter.
> what is there to gain by interfacing it with traditional solutions,
About a billionfold improvement in compute efficiency, and a lower error rate.
> and how can it be achieved?
10 seconds of actual thought.
Check out left pad or the two dozen other "utility" packages that could be done in a single line of code.
I guess that works if you aren't a programmer or don't want to hire somebody, but then wtf would I pay for your service or product?