undefined

upvote

points

by ryanjshaw16 hours ago |

upvote

by symfrog14 hours ago|

[-]

The closer you get to releasing software, the less useful LLMs become. They tend to go into loops of 'Fixed it!' without having fixed anything.

In my opinion, attempting to hold the hand of the LLM via prompts in English for the 'last mile' to production ready code runs into the fundamental problem of ambiguity of natural languages.

From my experience, those developers that believe LLMs are good enough for production are either building systems that are not critical (e.g. 80% is correct enough), or they do not have the experience to be able to detect how LLM generated code would fail in production beyond the 'happy path'.

reply

upvote

by Tanjreeve4 hours ago|

[-]

The amount of "apps" I've had dumped on my team that are everything from un-releasable to deployed on some random shit-cloud we haven't approved (vercel comes up a lot). If you needed hand holding to release things or had to throw software over the fence to others to "productionise" etc then you probably don't know what you're talking about.

reply

upvote

by empath7514 hours ago|

[-]

This is not my experience with claude code. It does forget big picture things but if you scope your changes well it’s fine.

reply

upvote

by symfrog14 hours ago|

[-]

I would estimate that out of every 200 lines of code that Claude Code produces, I notice at least 1 issue that would cause severe problems in production.

In my opinion these discussions should include MREs (minimal reproducible examples) in the form of prompts to ground the discussion.

For example, take this prompt and put it into Claude Code, can you see the problematic ways it is handling transactions?

---

The invoicing system is being merged into the core system that uses Postgres as its database. The core system has a table for users with columns user_id, username, creation_date . The invoicing data is available in a json file with columns user_id, invoice_id, amount, description.

The data is too big to fit in memory.

Your role is to create a Python program that creates a table for the invoices in Postgres and then inserts the data from the json file. Users will be accessing the system while the invoices are being inserted.

---

reply

upvote

by zozbot2349 hours ago|

[-]

And that's why you ask for a high level plan for something like that before you let the agent write any code. Then you review the plan for flaws, revise it, and prompt the system to fill out more details for each step. Repeat as necessary. Yes it's slow, but it's the best way of using this "glorified autocomplete" to ease and speed up real work.

reply

upvote

by snackerblues6 hours ago|

[-]

People that have never written their own code won't know what the flaws are.

reply

upvote

by edgyquant12 hours ago|

[-]

What he’s saying is split this up into multiple tasks to create the table, insert the data etc

reply

upvote

by cmiles7411 hours ago|

[-]

Isn’t that the hard part? If the tasks are small enough and well defined, where’s the win over just writing the code right there and then?

reply

upvote

by flagos1010 hours ago|

[-]

You can use an LLM to generate that list of tasks.

reply

upvote

by snackerblues6 hours ago|

[-]

And how does a new grad that's never actually programmed know whether that list of tasks makes sense?

reply

upvote

by empath7510 hours ago|

[-]

Well claude can also refine it into smaller tasks and that’s where you can fix those major problems in production issues.

reply

upvote

by ajshahH13 hours ago|

[-]

Yes, but knowing how to scope your changes requires a lot of expertise.

reply

upvote

by Roark6615 hours ago|

[-]

After 2 years of using all of these tools (Claude C, Gemini cli, opencode with all models available) I can tell you it is a huge enabler, but you have to provide these "expert guardrails" by monitoring every single deliverable.

For someone who is able to design an end to end system by themselves these tools offer a big time saving, but they come with dangers too.

Yesterday I had a mid dev in my team proudly present a Web tool he "wrote" in python (to be run on local host) that runs kubectl in the background and presents things like versions of images running in various namespaces etc. It looked very slick, I can already imagine the product managers asking for it to be put on the network.

So what's the problem? For one, no threading whatsoever, no auth, all queries run in a single thread and on and on. A maintenance nightmare waiting to happen. That is a risk of a person that knows something, but not enough building tools by themselves.

reply

upvote

by ryanjshaw15 hours ago|

[-]

Yup. I’m not expert so maybe I’m completely off base, but if I were OpenAI or Anthropic I’d likely just hire 1000 highly skilled engineers across multiple disciplines, tell them to build something in their domain of expertise, then critique the model’s output, iteratively work on guardrails for a month or two until the model one-shots the problem, and package that into the new release.

reply

upvote

by LiamPowell14 hours ago|

[-]

That's exactly what they are doing via dataannotation.tech and other services.

reply

upvote

by kopirgan13 hours ago|

[-]

Any comments on how the copyright issues are handled in corporate settings? I mean both in terms of staying clear of lawsuit+ ensuring what we produce remains safe from copying

reply

upvote

by cmiles7411 hours ago|

[-]

I can take a verbal description from a meeting with five to ten people and put together something they can interact with in two weeks. That is a lot slower than Claude Code! Yet everywhere I’ve worked, this is more than fast enough.

Over two more weeks I can work with those same five to ten people (who often disagree or have different goals) and get a first draft of a feature or small, targeted product together. In those latter two weeks, writing code isn’t what takes time; working through what people think they mean verses what they are actually saying, mediating one group of them to another when they disagree (or mostly agree) is the work. And then, after that, we introduce a customer. Along the way I learn to become something of an expert in whatever the thing is and continue to grow the product, handing chunks of responsibility to other developers at which point it turns into a real thing.

I work with AI tooling and leverage AI as part of products, where it makes sense. There are parts of this cycle where it is helpful and time saving, but it certainly can’t replace me. It can speed up coding in the first version but, today, I end up going back and rewriting chunks and, so far, that eats up the wins. The middle bit it clearly can’t do, and even at the end when changes are more directed it tends toward weirdly complicated solutions that aren’t really practical.

reply

upvote

by geraneum12 hours ago|

[-]

> poor adherence to high-level design principles and consistency. This can be solved with expert guardrails, I believe.

That’s a bit… handwavy…!

reply

upvote

by 7 hours ago|

[-]

deleted

reply

upvote

by bakugo14 hours ago|

[-]

I've been hearing this for several years. How much longer is "it won't be long"?

reply

upvote

by bluGill9 hours ago|

[-]

I've heard the same "it won't be long" from UML and 4GL - until the industry finally gave up. Both of those are still used a lot in industry and they do well in their place, but nobody pretends they will ever be everything to everyone anymore.

reply