upvote
This is the way.

I am doing something similar where I have a parser which looks for changes in documentation, matches them with the GraphQL schema and generates code using Apollo. In a nutshell it is a code generator written using Claude to generate more code and on failure goes back to Claude to fix the generator and asks a human for review.

reply
There are lots of APIs with poor or nonexistent documentation. I'm talking about internal systems where one programmer that kinda knew what he was doing built a proof of concept, and now it's a core business requirement.
reply
I’m sorry but this is a caveman mentality . How about tests , payloads , integrating into existing system , logs etc . Llm is perfect for that , you can point your skill in harvest to learn from docs , that will save tokens.

I’m not going to trust a scripted codegen without any logic fo such thing as api integration

reply
Define "it" in the context of "doing it wrong".

The post provides a lot of good food for thought based on experience which is exactly what the title conveys

reply
> There are two obvious approaches: start with lots of guardrails, or start with very few and learn what the models actually do.

> We chose the second because we didn’t want to overfit our assumptions.

> Some of it went better than expected.

> But they also broke in very unexpected ways, sometimes spectacularly.

You clearly missed the whole point of the article, which is to experiment with agents and explore the limits of having them run wild.

Efficient use of tokens and which tasks to delegate is secondary to the experiment. Optimizing these is in any case premature if you don't understand the limits of the models.

reply
> which is to experiment with agents

I think you completely missed the point - they built a product purely using agents and deployed it to production for others to use. Read what the product actually does first.

reply
Why shouldn't they ship it to production if the experiment was a success? You say the only way to code is to "learn to appropriate the correct usage of algorithms and AI" which for you is to code a generator and only use "dumb" generators to produce code, which is fine, but they just showed that for 20 bucks and a few minutes you can get very far, so their evidence is just stronger than yours.
reply
> their evidence is just stronger than yours.

What evidence? There is 0 evidence. It's deployed to production, but that doesn't mean it works fine or is free of bugs - which is exactly my point and why you use algorithms for these types of things. They're testable, repeatable and scalable.

With LLM slop it's just that - slop.

reply
Have you seen the code to write it off as slop?
reply