undefined

points

by nok22kon8 hours ago |

comments

by dd8601fn4 hours ago|

[-]

Mine does this. Only I don’t use the whole context with the router because that’s wildly resource intensive and slow.

Then, a bit like open router, it does a classifier job with a fast model to choose which one should process the turn.

In my case I usually don’t do local vs remote… although it can. Now I use it for thinking vs no-think against my preferred local model, which is a huge time saver even with the added classification step.

by cyanydeez7 hours ago|

prev|

[-]

I'm still waiting for an isolated protocol so we don't have to run the hanress directly on any of the code base's infrastructure. Something as simple as piping everything into and out of an ssh shell would be better than anything I've tested so far.

by try-working6 hours ago|

parent|

[-]

i've created the protocol, role-model: https://github.com/try-works/role-model

by niles6 hours ago|

parent|

[-]

Great name, but ironically hard to reason about from a role perspective, at least at the read me.

Does this interfere with cache hits? Could a single conversation or task span multiple roles?

Why are you building this? Does this maximize my toxen value by saving the hard tasks for the hard model? Does it maximize cache hits as part of its scoring? Does it help agents develop a specialist mindset? Are you anticipating users will have many local models hot, or is this also a model load/unload controller?

by try-working5 hours ago|

parent|

[-]

I'm building this to achieve a state where I can, as a user and on my own device, decide that certain type of workloads should be handled by my Qwen model and keep the data on my device, while other workloads should be handled by more capable models.

For this we don't just need a router, because the information to make detailed and accurate routing decisions currently doesn't exist. And there are no standards but every lab and maybe even inference providers have their own way of implementing reasoning, chat templates, cache, tool use and so on. All issues that make models non-interoperable.

What we need is applications that clearly specify their requests so they can be accurately routed to a provider, whether local or remote. And for that they need to use a standard protocol for model requests and intent.

I wrote a longer piece here: https://news.ycombinator.com/item?id=48706181