undefined

points

[-]

This. Once you're building something that genuinely hasn't been built before, LLMs cannot be trusted with any architectural decisions. I'm building a product based around various physics simulations, so it's purely first principles, but without active research, thinking, and challenging, it produces computational code literally hundreds of orders of magnitude slower WHILE implementing absurd fallbacks and shortcuts that effectively result in a useless calculation.

This is the case perhaps 95% of the time.

Oversight is very important, and architectural thinking cannot yet be outsourced, only execution.

by physicsguy1 hours ago|

parent|

[-]

I have had similar when trying it too. I couldn't even drive Claude Opus 4.7 to get PETsc to compile properly (with all the optional dependencies)

by ex-aws-dude1 hours ago|

parent|

prev|

[-]

I find that too, Claude Code is constantly trying to break the architecture patterns and do hacky stuff

Like its only focused on solving the local problem as easy as possible

by MagicMoonlight2 hours ago|

parent|

prev|

[-]

[dead]

by mellosouls2 hours ago|

prev|

[-]

LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

This is domain expertise - software engineers are not needed for that. Ofc often senior sws are expert in it, but they aren't necessary.

Traditionally its been useful for frictionless production to have engineers to be able to do maybe 90% of their work without consulting the business experts but this is the whole crux of the moment TFA discusses - "tradition" is over.

In this new world its now the job of a senior engineer not to have this domain expertise themselves, but to know how to ensure the agents have it, or can acquire it and it be verifiably correct.

Senior engineers who hang on to the idea that their advanced business domain expertise makes them safe will soon be as dead in the water as juniors who haven't pivoted.

by causal3 hours ago|

prev|

[-]

I can't even get Claude or GPT-5 to consistently produce good flows for common use cases, much less domain-specific shit. They have deep vocabulary though, which makes them sound better informed than they are.

They are very good at writing code and debugging visible errors- but that's like 50% the harness.

by worldthruword4 hours ago|

prev|

[-]

> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

Would a skill which forces you and LLM to reach a shared understanding of the product features and the regulations those features are supposed to capture be of help here? The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have. I would suggest please take a look at skills. They are really helpful.

https://www.youtube.com/watch?v=6BB6exR8Zd8

by rdedev2 hours ago|

parent|

[-]

> The main idea is we provide documents to the LLM and it asks lot of questions which clear ambiguity and possible misconceptions the LLM might have

This kind of works but the difficulty is that you have to be very explicit about everything. It was mentioned in a spec document that a particular excel file is treated as a source of truth throughout the whole company and it is treated as an append only database. The agent still decided to add a check to see if a previous row was modified. It pushed back on its decision when asked why it decided to do so. "What if someone entered it wrong and had to correct it"? Valid question but it's not my teams responsibility to check for it

This check makes sense from a traditional development view point and that's why the agent did it. I would say it's good practice too but it's beyond the scope of the project it was working on. If what you are doing is beyond the norm you have to watch out for things like this

by causal3 hours ago|

parent|

prev|

[-]

Sure but finding their shortcomings and patching them with skills takes real trial and error. They are incapable of identifying their own shortcomings for you.

by enraged_camel4 hours ago|

prev|

[-]

>> LLMs routinely fail at our business specifics: Local tax regulations, particularities of the accounting process, specifics of our ledger implementations.

My company also deals with a lot of complex regulations and domain-specific system implementations, which AIs used to struggle with. We were able to solve the problem with well-organized claude.md/agents.md files. On top of that we also implemented supermemory.ai, so newly made decisions are always recalled by AI agents when starting new sessions.