upvote
Great name, but ironically hard to reason about from a role perspective, at least at the read me.

Does this interfere with cache hits? Could a single conversation or task span multiple roles?

Why are you building this? Does this maximize my toxen value by saving the hard tasks for the hard model? Does it maximize cache hits as part of its scoring? Does it help agents develop a specialist mindset? Are you anticipating users will have many local models hot, or is this also a model load/unload controller?

reply
I'm building this to achieve a state where I can, as a user and on my own device, decide that certain type of workloads should be handled by my Qwen model and keep the data on my device, while other workloads should be handled by more capable models.

For this we don't just need a router, because the information to make detailed and accurate routing decisions currently doesn't exist. And there are no standards but every lab and maybe even inference providers have their own way of implementing reasoning, chat templates, cache, tool use and so on. All issues that make models non-interoperable.

What we need is applications that clearly specify their requests so they can be accurately routed to a provider, whether local or remote. And for that they need to use a standard protocol for model requests and intent.

I wrote a longer piece here: https://news.ycombinator.com/item?id=48706181

reply