For example right now a lot of work is being done on improving tool calling and agentic workflows, which tool calling was first popping up around end of 2023 for local LLMs.
This is putting aside the standard benchmarks which get "benchmaxxed" by local LLMs and show impressive numbers, but when used with OpenCode rarely meet expectations. In theory Qwen3.5-397B-A17B should be nearly a Sonnet 4.6 model but it is not.
Local LLMs don't make sense for most people compared to "cloud" services, even more so for coding.
I am wondering how people rave so much about local "small devices" LLM vs what codex or Claude code are capable of.
Sadly there are too much hype on local LLM, they look great for 5min tests and that's it.