upvote
A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.
reply
I'm curious how hardware and power cost would stack up to subscription cost
reply
Can you give an example of such a problem?
reply