You can learn how to use it, or you can put it down if you think it doesn't bring you any benefit.
So are animals, but we've used dogs and falcons and truffle hunting pigs as tools for thousands of years.
Non-deterministic tools are still tools, they just take a bunch more work to figure out.
https://simonwillison.net/2025/Dec/10/html-tools/ is the 37th post in my series about this: https://simonwillison.net/series/using-llms/
https://simonwillison.net/2025/Mar/11/using-llms-for-code/ is probably still my most useful of those.
I know you absolutely hate being told you're holding them wrong... but you're holding them wrong.
They're not nearly as unpredictable as you appear to think they are.
One of us is misleading people here, and I don't think it's me.
Firstly, I am not the one with an LLM-influencer side-gig. Secondly - No sorry, please don't move the goalposts. You did not answer my main argument - which is - how does a "tool" which constantly change its behaviour deserve being called a tool at all? If a tailor had scissors which cut the fabric sometimes just a bit, and sometimes completely differently every time they used it, would you tell the tailor he is not using them right too? Thirdly you are now contradicting yourself. First you said we need to live with the fact that they are un-predictable. Now you are sugarcoating it into being "a bit unpredictable", or "not as nearly unpredictable". I am not sure if you are doing this intentionally or do you really want to believe in the "magic" but either way you are ignoring the ground tenets of how this technology works. I'd be fine if they used it to generate cheap holiday novels or erotica - but clearly after four years of experimenting with the crap machines to write code created a huge pushback in the community - we don't need the proverbial scissors which cut our fabric differently each time!
Let's go with blast furnaces. They're definitely tools. They change over time - a team might constantly run one for twenty years but still need to monitor and adjust how they use it as the furnace itself changes behavior due to wear and tear (I think they call this "drift".)
The same is true of plenty of other tools - pottery kilns, cast iron pans, knife sharpening stones. Expert tool users frequently use tools that change over time and need to be monitored and adjusted.
I do think dogs and horses other animal tools remain an excellent example here as well. They're unpredictable and you have to constantly adapt to their latest behaviors.
I agree that LLMs are unpredictable in that they are non-deterministic by nature. I also think that this is something you can learn to account for as you build experience.
I just fed this prompt to Claude Code:
Add to_text() and to_markdown() features to justhtml.html - for the whole document or for CSS selectors against it
Consult a fresh clone of the justhtml Python library (in /tmp) if you need to
It did exactly what I expected it would do, based on my hundred of previous similar interventions with that tool: https://github.com/simonw/tools/pull/162I wrote about another solid case study this morning: https://simonwillison.net/2025/Dec/14/justhtml/
I genuinely don't understand how you can look at all of this evidence and still conclude that they aren't useful for people who learn how to use them.
Now let's make the analogy more accurate: let's imagine the blast furnace often ignores the operator controls, and just did what it "wanted" instead. Additionally, there are no gauges and there is no telemetry you can trust (it might have some that can the furnace will occasionally falsify, but you won't know when it's doing that).
Let's also imagine that the blast furnace changes behavior minute-to-minute (usually in the middle of the process) between useful output, useless output (requires scrapping), and counterproductive output (requires rework which exceeds the productivity gains of using the blast furnace to begin with).
Furthermore, the only way to tell which one of those 3 options you got, is to manually inspect every detail of every piece of every output. If you don't do this, the output might leak secrets (or worse) and bankrupt your company.
Finally, the operator would be charged for usage regardless of how often the furnace actually worked. At least this part of the analogy already fits.
What a weird blast furnace! Would anyone try to use this tool in such a scenario? Not most experienced metalworkers. Maybe a few people with money to burn. In particular, those who sing the highest praises of such a tool would likely be ignorant of all these pitfalls, or have a vested interest in the tool selling.
Absolutely wrong. If this blast furnace would cost a fraction of other blast furnaces, and would allow you to produce certain metals that were too expensive to produce previously (even with high error rate), almost everyone would use it.
Which is exactly what we're seeing right now.
Yes, you have to distinguish marketing message vs real value. But in terms of bang for buck, Claude Code is an absolute blast (pun intended)!
Totally incorrect: as we already mentioned, this blast furnace actually costs just as much as every other blast furnace to run all the time (which they do). The difference is only in the outputs, which I described in my post and now repeat below, with emphasis this time.
Let's also imagine that the blast furnace changes behavior minute-to-minute (usually in the middle of the process) between useful output, useless output (requires scrapping), and counterproductive output ——>(requires rework which exceeds the productivity gains of using the blast furnace to begin with)<——
Does this describe any currently-operating blast furnaces you are aware of? Like I said, probably not, for good reason.
I couldn't agree more.
I did not say that. I said that most metalworkers familiar with all the downsides (only 1 of which you are referring to here) would avoid using such an unpredictable, uncontrollable, uneconomical blast furnace entirely.
A regular blast furnace requires the user to be careful. A blast furnace which randomly does whatever it wants from minute to minute, producing bad output more often than good, including bad output that costs more to fix than the furnace cost to run, more than any cost savings, with no way to tell or meaningfully control it, is pretty useless.
Saying "be careful" using a machine with no effective observability or predictability or controls is a silly misnomer, when no amount of care will bestow the machine with them.
What other tools work this way, and are in widespread use? You mentioned horses, for example: What do you think usually happens to a deranged, rabid, syphilitic working horse which cannot effectively perform any job with any degree of reliability, and which often unpredictably acts out in dangerous and damaging ways? Is it usually kept on the job and 'run carefully'? Of course not.
Wow, was that a shark just then?
Dogs learn their jobs way faster, more consistently and more expressively than any AI tool.
Trivially, dogs understand "good dog" and "bad dog" for example.
Reinforcement learning with AI tooling clearly seems not to work.
That doesn't match my experience with dogs or LLMs at all.
They fully understand their limitations. Users of accessibility technology are extremely good at understanding the precise capabilities of the tools they use - which reminds me that screenreaders themselves are a great example of unreliable tools due to the shockingly bad web apps that exist today.
I've also discussed the analogy to service dogs with them, which they found very apt given how easily their assistive tool could be distracted by a nearby steak.
The one thing people who use assistive technology do not appreciate is being told that they shouldn't try a technology out themselves because it's unreliable and hence unsafe for them to use!