I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
Why not? It is subject to the same pressures, in fact it is subject to more time pressure than most corp code out there. Also, it's the model that's doing the coding, not the frontend tool.
As a user of terrible products, I only care about code quality in as much as the product is crap (Spotify I'm looking at you), or it takes forever for it to evolve/improve.
Biz people don't care about quality, but they're notoriously short sighted. Whoever nerfed Google's search is angering millions of people as we speak.
I wouldnt say that customers are indifferent, but it wouldnt be the first time that investor expectations are prioritized far above customer satisfaction.
I don't actually think it's a solved problem, I'm saying that the fact that it generates terrible code doesn't necessarily mean that it doesn't have parity with humans.
Yeah, we even have an idiom for this - "Temporary is always permanent"
But as a great man once said: Later == Never.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
I think the better unit to commit and work with is the prompt itself, and I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.
The fundamental problem there is the code generation step is non-deterministic. You might make a two sentence change to the prompt to fix a bug and the generation introduces two more. Generate again and everything is fine. Way too much uncertainty to have confidence in that approach.
Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.