That's not a problem, that's a feature; I have something like 8 tabs open to different free-tier providers. ChatGPT, Claude and Gemini are the SOTA ones.
I have no problem maxing one out, then moving to the next. I can do this all day, have them implement specific functions (or classes) in my code. The things is, because I actually know how to write and design software, I don't need to run an agent in a loop to produce everything in a day, I can use the web chatbots with copy/paste to literally generate thousands of lines of code per hour while still having a strong mental model of the code that I can go in and change whatever I need to.[1]
---------------------
[1] Just did that this morning on a Python project: because I designed what I needed, each generation was me prompting for a single function. So when I needed to add something this morning I didn't even bother asking an chatbot to do it, I just went ahead directly to the correct place and did it.
You can't do that if you generate the entire thing from specs.
The feature of using all these SOTAs to exhaustion on the free tiers is burning their VC money!
The more I use for free, the more of their money I burn, the closer we'll get to actual 3rd-party and independent setups (local or otherwise).
I have a sneaking suspicion this is kinda like the situation with Linux in the 90s, where it kinda worked but it reeeeeally wasn't ready for the home user, but you had a lot of people who would insist to your face everything was fine, mostly for ideological reasons.
I'm currently running both Sonnet 4.6 and Qwen 3.6-27b on the same codebase (via OpenCode, the parameters were carefully tuned to have a good quality/context size ratio), and on this project, they both struggle with complex non-trivial tasks, and both work flawlessly otherwise. Sonnet 4.6 understands the intent better if my task is ambiguously formulated, but otherwise the gap is pretty small for coding under a harness.
Different usage patterns - you want to issue a single spec then walk away and come back later (when it has consumed $10k worth of API tokens inside your $200/m subscription) to a finished product.
Many people issue a spec for a single function, a single class or similar. When you break it down like that, the advantages of SOTA models shrinks.
What do you mean "trust it"? It sounds like you want to vibe-code (never look at the output), and maybe for that you need SOTA, but like I said in a different comment, I can easily generate 1000s of lines of code per hour just prompting the chatbots.
I don't, because I actually review everything, but I can, and some of those chatbots are actually SOTA anyway.
With subpar models I must be more careful on providing instructions and check it step by step because the path it chose is wrong, or I didn't ask for or the agent stuck in a loop somewhere.
I’ve begun to suspect that most people are probably running different hardware. Sure, you run the latest deep flash on your brand new M5 128G maybe you get acceptable performance?
But honestly, how many people have an extra $9000 laying around these days?
Right now, running with acceptable performance is kind of a luxury. I wish the people who always say - “This is great!” - would realize that not everyone has their hardware.