undefined

points

[-]

I haven't seen anybody else post it in this thread, but this is running on 8GB of RAM. It's not the full Gemma 4 32B model. It's a completely different thing from the full Gemma 4 experience if you were running the flagship model, almost to the point of being misleading.

It's their E2B and E4B variants (so 2B and 4B but also quantized)

https://ai.google.dev/gemma/docs/core/model_card_4#dense_mod...

by zozbot23422 hours ago|

parent|

[-]

The relevant constraint when running on a phone is power, not really RAM footprint. Running the tiny E2B/E4B models makes sense, this is essentially what they're designed for.

by Shawnj24 hours ago|

parent|

[-]

Depends on the phone, I have trouble fitting models into memory on my iPhone 13 before iOS kills the app. I imagine newer phones with more RAM don’t have this issue especially with some new flagship phones having 16+ GB of memory

by trvz18 hours ago|

parent|

prev|

[-]

It absolutely is RAM…

So much so that this was what made Apple increase their base sizes.

by bigyabai15 hours ago|

parent|

prev|

[-]

Between the GPU, NPU and big.LITTLE cores, many phones have no fewer than 4 different power profiles they can run inference at. It's about as solved as it will get without an architectural overhaul.

by 1f60c22 hours ago|

prev|

[-]

Strangely, reasoning is not on by default. If you enable it, it answers as you'd expect.

by shtack19 hours ago|

prev|

[-]

With reasoning on I found E4B to be solid, but E2B was completely unusable across several tests.