undefined

points

[-]

A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.

by neals50 minutes ago|

prev|

[-]

I'm curious how hardware and power cost would stack up to subscription cost

by bigmadshoe1 hours ago|

prev|

[-]

Can you give an example of such a problem?