upvote
I'm building this to achieve a state where I can, as a user and on my own device, decide that certain type of workloads should be handled by my Qwen model and keep the data on my device, while other workloads should be handled by more capable models.

For this we don't just need a router, because the information to make detailed and accurate routing decisions currently doesn't exist. And there are no standards but every lab and maybe even inference providers have their own way of implementing reasoning, chat templates, cache, tool use and so on. All issues that make models non-interoperable.

What we need is applications that clearly specify their requests so they can be accurately routed to a provider, whether local or remote. And for that they need to use a standard protocol for model requests and intent.

I wrote a longer piece here: https://news.ycombinator.com/item?id=48706181

reply