undefined

points

[-]

> However, I think that they’re really worried about is that a person needs to design and implement that stuff… It throws a wet blanket on their insistence that this will replace entire people in entire workflows or even projects, and I just don’t buy it.

I think you are on to something. But I also think this sort of system lends itself to not needing really good LLMs to do impressive things. I've noticed that the quality of a lot of these LLMs just gets worse the more datapoints they need to track. But, if you break it up into smaller and easier to consume chunks all the sudden you need a much less capable LLM to get results comparable or better than the SOTA.

Why pay extra money for Opus 4.7 when you could run Qwen 3.6 35b for free and get similar results?

by devin8 hours ago|

parent|

[-]

And then you realize that what you’re using the smaller models for is ALSO decomposable and part of it is just a few if statements, and then you realize that for this feature you don’t actually need or want a model because the performance, reliability, reproducibility are cheaper and better for you and your users.

by jimbokun8 hours ago|

parent|

[-]

So you have the model write the if statements and put itself out of a job.

by aleqs5 hours ago|

parent|

prev|

[-]

Indeed, I've been experimenting with agent workflows, for complicated tasks - where I essentially have a graph of agents with different roles/capabilities, including such things as breaking down complex tasks into simpler ones. There seems to be a point where a complex enough task is better performed by a group of cheaper agents/models than by one agent using one of the SOTA big models, in terms of both quality and cost.

by tempest_11 hours ago|

parent|

prev|

[-]

It is also interesting because you get people with very different use cases arguing about the effectiveness of various models but doing very different things with them.

Its one things for a model to be very clearly instructed to add a REST endpoint to an existing Django app and add a button connected to it on the front vs "Design me a youtube". The smaller models can pretty dependably do the first and fall flat on the second.

by pishpash14 hours ago|

prev|

[-]

Aren't they just buying time to build you whatever harness you need? They want to be the only software engineering shop in the world.

by user3428312 hours ago|

prev|

[-]

The designing and implementing of a code harness in your workflow can be as simple as running something like /skill-builder.

You prompt for what you want it to do, and it will write eg. python scripts as needed for the looping part, and for example use claude -p for the LLM call.

You can build this in 10 minutes.

I don’t use a cloud platform, so I can’t comment on that part. I‘d say just run it on your own hardware, it’s probably cheaper too.