(* explained at https://news.ycombinator.com/item?id=26998308)
The power dynamics are also vastly against me. I represent a fraction of my employer's labour, but my employer represents 100% of my income.
That dynamic is totally inverted with AI. You are a rounding error on their revenue sheet, they have a monopoly on your work throughput. How do you budget an workforce that could turn 20% more expensive overnight?
If you're talking about output quality, then yeah, that's not as easy. But for product outputs (building a customer service agent or something like that), having a well-designed eval harness and doing testing and iteration can get you some degree of convergence between the models of similar generations. Coding is similar, but probably not easy to eval.
It is transferable-yes, you will get issues if you take prompts and workflows tuned for one model and send them to another unchanged. But, most of the time, fixing it is just tinkering with some prompt templates
People port solutions between models all the time. It takes some work, but the amount of work involved is tractable
Plus: this is absolutely the kind of task a coding agent can accelerate
The biggest risk is if your solution is at the frontier of capability, and a competing model (even another frontier model) just can’t do it. But a lot of use cases, that isn’t the case. And even if that is the case today, decent odds in a few more months it won’t be
This is why there are a ton of corps running the open source models in house... Known costs, known performance, upgrade as you see fit. The consumer backlash against 4o was noted by a few orgs, and they saw the writing on the wall... they didnt want to develop against a platform built on quicksand (see openweb, apps on Facebook and a host of other examples).
There are people out there making smart AI business decisions, to have control over performance and costs.
If you've got something to share I'd love to see it.
This is an architecture that people are increasing begging to give network connectivity that can't differentiate its system prompt from user input
I'd also flip your framing on its head. One of the advantages of human labor over agents is accountability. Someone needs to own the work at the end of the day, and the incentive alignment is stronger for humans given that there is a real cost to being fired.