upvote
> Not really, those are exactly the things said by people who dabble with LLMs a little...

From the footnote in section 1.5:

  The examples I give in this essay are mainly from major commercial models (e.g. ChatGPT GPT-5.4, Gemini 3.1 Pro, or Claude Opus 4.6) in the last three months; several are from late March. Several of them come from experienced software engineers who use LLMs professionally in their work. Modern ML models are astonishingly capable, and they are also blithering idiots. This should not be even slightly controversial.
I wonder just how Scottish the Scotsman has to be before you'll let him order a drink.

> And in my experience it takes a surprising amount of time and practice to learn how to leverage AI effectively.

Let's ignore -for a minute- the fact that people who actually use these things as part of their dayjobs were consulted, which moots this complaint.

Every six-ish months we hear "Wow. All the past commentary on LLMs is completely invalid. These new models aren't just a step change — they're a whole new way of working.".

If we consider only that datapoint, it's pretty obvious that you're not missing out on much by choosing to just work on skills that are universally applicable and "evergreen". But, when you add in to that the fact that every six-ish months we also hear "Wow. These new revs of the LLM products are just as stupid and nondeterministic as the old ones. They also still make the same classes of stupid mistakes, are pretty much as dangerously unreliable as they always have been [0], and -just like previous versions- have 'capability rot' that cannot be anticipated, but might be caused by inability to handle current demand, deliberate shifting of backend resources to serve newer, more-hyped LLM products, or even errors in the vibecoded vendor-supplied tooling that interfaces with the backend.", the decision to ignore the FOMO and hype becomes pretty obviously correct.

> I mean, the series literally ends with "maybe I'll try to code with it."

Well, this is how the series ends:

   The security consequences are minimal, it’s a constrained use case that I can verify by hand, and I wouldn’t be pushing tech debt on anyone else. I still write plenty of code, and I could stop any time. What would be the harm?
   
   Right?
   
   ...Right?
There's a certain subtlety to this that you missed. [2]

If we ignore that subtlety, I expect that your retort to a report that goes "Wow. They suck just as hard at coding for me as they do for everything else I've attempted to use them for. I'm not surprised because I've talked to professional programmers who regularly use these things in their dayjobs and I'm getting results that are similar to what they've been reporting to me." will be "Bro. You didn't spend enough time learning how to use it, bro!".

By way of analogy, I'll also mention -somewhat crassly- that one doesn't have to have an enormous bosom to understand that all that weight can cause substantial back pain. One can rely on both one's informed understanding of the fundamentals behind the system under consideration, as well as first-hand testimony from enormous-bosom-equipped people to arrive at that conclusion.

[0] eg. [1] and many, many other examples

[1] <https://github.com/anthropics/claude-code/issues/39201>

[2] Your failure to notice that subtlety makes me wonder how often you use LLMs to summarize lengthy technical articles that you read.

reply
>. Several of them come from experienced software engineers who use LLMs professionally in their work.

So, not from personal experience. And we don't know which examples came from which users or what they used them for. We get enough hearsay on HN and again, there's nothing in this series that has not been discussed here. There is however, a ton of other hearsay missing in the series, which is the utility so many people are finding (in many cases, along with actual data or open source projects.)

> Every six-ish months we hear ...

I've been yelling about LLMs since early 2024 [0]! They needed much more "holding it right" back then. Now it's way easier, but the massive potential was clear way back then.

> They also still make the same classes of stupid mistakes, are pretty much as dangerously unreliable as they always have been.

Yes, and this is where a lot of the skill in managing them comes into play. Hint: people are dangerously unreliable too.

> One can rely on both one's informed understanding of the fundamentals behind the system under consideration, as well as first-hand testimony from enormous-bosom-equipped people to arrive at that conclusion.

Of course, but when faced with many contradictory opinions, I prefer data. And the preponderence of data I've looked at and discussed [0] paints a very different picture.

> There's a certain subtlety to this that you missed.

From TFA:

> I want to use them. I probably will at some point.

My complaint is that he is speaking entirely from second-hand information and provides no new insight of his own. That he has trepidations to actually get his hands dirty with them does not change it, and only makes it worse that he spent 10 pages going on about them! He's a technologist, not a journalist! So, I'm genuinely curious, what subtlety did I miss?

[0] Available in my comment history. To allay suspicion that I only engage in breathless boosterism, some relevant comments about the negatives: https://news.ycombinator.com/item?id=47405189 or https://news.ycombinator.com/item?id=46830919

reply
[1] is so bad, like the worst imaginable thing you can think of... like if this is the possible fuckup all bets are off what other fuckups you might need to deal with. I got hit with this problem several times and I was like "well this is just impossible..." absolutely mind-blown
reply