undefined

points

[-]

Sure, 26B models on beefy desktop silicon are finally nipping at the heels of commercial APIs, but this is a mobile thread. On a phone with 8GB of RAM and passive cooling, your tokens per second (t/s) are going to fall off a cliff after the first minute of sustained compute

by zozbot23414 hours ago|

prev|

[-]

There's a 31B dense model in the Gemma 4 series that's obviously going to be smarter (though a whole lot slower) than the MoE 26A4B.

by the_pwner22414 hours ago|

parent|

[-]

I tried it and it was unusably slow at ~5-6 TPS. 26A4B gets close to 40 TPS which is faster than you can read, and still pretty quick with reasoning enabled.