As i understand it, no LLM is miles ahead of the others right now, especially when it comes to simple agentic stuff. Hell, Qwen3.6-35B-A3 quantized to 3bits running on an 8 year old consumer GPU handles most agentic stuff fine, if a bit slow.
Differences in LLMs boil down to mostly the harness and the compute to run the models. Even for high complexity tasks like coding, the differences between openai, anthropic, google, and the bigger qwen models aren’t that dramatic.