I have a suite or tools ive built for myself on top of the openrouter api for very specific tasks. Press button amd LLM does (one) useful thing, not press button and let LLM run tool calls in a loop for 5 minutes and hope it does things in the correct order.
If multiple tools need to be called to do a useful thing, I will chain those together deterministically in my code. This is much more reliable as I can check the output of A before proceeding to task B or C, also its more time and token efficient. Agentic loops are a huge scam.
Granted I've let it mostly vibecode those tools, so they might be garbage. I should perhaps have it do a refactoring round to make more composable tools..
> though the new models (GPT-5.5 and Opus 4.6) seem to suffer from this less
> My takeaway was that
> haven’t found Gemini to be
For the love of all that's holy, folks please stop investing your time to fill in the gaps that the Slop Corporations are leaving wide open in their "tooling". Why should you strain yourself in an attempt to "make it work" one way or another? Google, MS, Meta, OpenAI etc. are all now subtly pushing to call their tooling "Intelligence" (not even Artificial Intelligence), so why is it not intelligent? Why does it not work? 1T+ investments and still we should think of best magic chants and configurations to make the slop generators produce half-valid output? All while some of the tech leaders are openly threatening to subdue us in their weird visions of "civilisation" ? We have a better use for our superior brains, let's not denigrate ourselves into being helpless helpers to the magic oracle (if at least it was some magic oracle!)