undefined

points

[-]

Agree that routing is becoming the critical layer here. Vllm iris is really promising for this https://blog.vllm.ai/2026/01/05/vllm-sr-iris.html

There's already some good work on router benchmarking which is pretty interesting

by condiment8 hours ago|

prev|

[-]

At 16k tokens/s why bother routing? We're talking about multiple orders of magnitude faster and cheaper execution.

Abundance supports different strategies. One approach: Set a deadline for a response, send the turn to every AI that could possibly answer, and when the deadline arrives, cancel any request that hasn't yet completed. You know a priori which models have the highest quality in aggregate. Pick that one.

by IanCal6 hours ago|

parent|

[-]

The best coding model won’t be the best roleplay one which won’t be the best at tool use. It depends what you want to do in order to pick the best model.

by PhunkyPhil5 hours ago|

parent|

[-]

I'm not saying you're wrong, but why is this the case?

I'm out of the loop on training LLMs, but to me it's just pure data input. Are they choosing to include more code rather than, say fiction books?

by jmalicki3 hours ago|

parent|

[-]

There is the pre-training, where you passively read stuff from the web.

From there you go to RL training, where humans are grading model responses, or the AI is writing code to try to pass tests and learning how to get the tests to pass, etc. The RL phase is pretty important because it's not passive, and it can focus on the weaker areas of the model too, so you can actually train on a larger dataset than the sum of recorded human knowledge.

by refulgentis5 hours ago|

parent|

prev|

[-]

I’ll go ahead and say they’re wrong (source: building and maintaining llm client with llama.cpp integrated & 40+ 3p models via http)

I desperately want there to be differentiation. Reality has shown over and over again it doesn’t matter. Even if you do same query across X models and then some form of consensus, the improvements on benchmarks are marginal and UX is worse (more time, more expensive, final answer is muddied and bound by the quality of the best model)

by monooso9 hours ago|

prev|

[-]

I came across this yesterday. Haven't tried it, but it looks interesting:

https://agent-relay.com/

by eshaham7810 hours ago|

prev|

[-]

[dead]