upvote
Not to mention the text-only 0.8GB version. Just crazy. You can have basic real-time conversations on-device that's video and audio aware now.
reply
0.8GB is for text only. It's more like ~1.1GB if you include video/audio encoder
reply
And your point is what? That’s more than 0.8GB text only if you include more than, text-only?
reply
Have you seen a 0.8GB model file floating around yet? I couldn't find one earlier.
reply
I think this is the one but it’s 0.8GB VRAM not 0.8GB size.

https://huggingface.co/google/gemma-4-E2B-it-qat-mobile-ct

But they could be cooking up a smaller one because the model card lists the Q_4 quants as being bigger than the mobile or text-only so I think we’ll need to wait for the Q_2_Distilled_Mobile_Textformer version. Still, just amazing work.

reply
I'll be honest with you. My main ask for on device AI is that when I am typing "Going out for a quick j" it corrects to "jog" and not "Jonathan". I don't think it needs that many gigabytes.
reply
Who doesn't enjoy a quick Jonathan now and then.

But seriously, wouldn't productive text on a 90s cell phone pass this test?

reply
The autocomplete of a decade ago is better than what we have now.

It’s harder now because emojis and draw-to-type as well as pen input. We didn’t have these things 14 years ago when “I’ll be right back” could be expanded from “I’ll b ri ba”

reply
Where is it? On ollama I see only the bigger one
reply
I don’t use ollama, can you pull from HF?
reply
Is that actually QAT? the MLX Community models have that in their names, but these don't, and the upload dates don't quite line up.
reply
As an aside uvx is so pleasant to use... I wish Nvidia supported it as first-class rather than making folks jump through Docker hoops.
reply
I wish people would stop using python sure ai.

It's slow and the PKG resolution is way too flat.

reply
What do you use?
reply