That hasn't, universally, been my experience. Sometimes the code is fine. Sometimes it is functional, but organized poorly, or does things in a very unusual way that is hard to understand. And sometimes it produces code that might work sometimes but misses important edge cases and isn't robust at all, or does things in an incredibly slow way.
> They have no problem writing tedious guards against edge cases that humans brush off.
The flip side of that is that instead of coming up with a good design that doesn't have as many edge cases, it will write verbose code that handles many different cases in similar, but not quite the same ways.
> They also keep comments up to date and obsess over tests.
Sure but they will often make comments or tests that aren't actually useful, or modify tests to succeed instead of fixing the code.
One significant danger of LLMs is that the quality of the output is higly variable and unpredictable.
That's ok, if you have someone knowledgeable reviewing and correcting it. But if you blindly trust it, because it produced decent results a few times, you'll probably be sorry.
> Sure but they will often make comments or tests that aren't actually useful, or modify tests to succeed instead of fixing the code.
I've been deeply concerned that there's been a rise of TDD. I thought we already went through this and saw its failure. But we're back to we're people cannot differentiate "tests aren't enough" from "tests are useless". The amount of faith people put into tests is astounding. Especially when they aren't spending much time analyzing the tests and understanding their coverage. > They don't take shortcuts or resort to ugly hacks.
My experience is quite different > They have no problem writing tedious guards against edge cases that humans brush off.
Ditto.I have a hard time getting them to write small and flexible functions. Even with explicit instructions about how a specific routine should be done. (Really easy to produce in bash scripts as they seem to avoid using functions, but so do people, but most people suck at bash) IME they're fixated on the end goal and do not grasp the larger context (which is often implicit though I still find difficulty when I'm highly explicit. Which at that point it's usually faster to write myself)
It also makes me question context. Are humans not doing this because they don't think about it or because we've been training people to ignore things? How often do we hear "I just care that it works?" I've only heard that phrase from those that also love to talk about minimum viable products because... frankly, who is not concerned if it works? That's always been a disagreement about what is sufficient. Only very junior people believe in perfection. It's why we have sayings like "there's no solution more permanent than a temporary fix that works". It's the same people who believe tests are proof of correctness rather than a bound on correctness. The same people who read that last sentence and think I'm suggesting to not write tests or believe tests are useless.
I'd be concerned with the LLM operator quite a bit because of this. Subtle things are important when instructing LLMs. Subtle things in the prompts can wildly change the output
It gave up, removed the code it had written directly accessing the correct property, and replaced it with a new function that did a BFS to walk through every single field in the API response object while applying a regex "looksLikeHttpsUrl" and hoping the first valid URL that had https:// would be the correct key to use.
On the contrary, the shift from pretraining driving most gains to RL driving most gains is pressuring these models resort to new hacks and shortcuts that are increasingly novel and disturbing!