undefined

points

[-]

For Qwen 3.5 0.8B presumably you're running it unquantized, because it's so small. Get at least the Q8 of Gemma 4 12B with the F32 mmproj and use an f16 kv cache.

Then run it with the latest llama.cpp that contains the Gemma 4 12B unified bug fixes, using --image-min-tokens 560 --image-max-tokens 2240 --batch-size 4096 --ubatch-size 4096 --temp 1.0 --top-p 0.95 --top-k 64 --jinja

It's understanding far more complex things for me and can reliably handle tiny text, so it should be easily understanding an image that only contains the text "This is a test".

by usef-20 hours ago|

prev|

[-]

That sounds like a bug. They're very common for open model releases on the first day. If I wasn't on mobile I'd try it on Google's own app.

by staticman22 hours ago|

prev|

[-]

Test it on a professional inference provider to rule out trouble on your end.

by JacobAsmuth13 hours ago|

prev|

[-]

Sounds like you're doing it wrong, to be honest.

by ma2kx23 hours ago|

prev|

[-]

I guess Google implements more / stronger guard rails than Alibaba and thus confuses these small models. At least this was my impression with Gemma3 models where it often said that the image contains some nudity / sex scenes and therefore it cannot give a description of the image. Never understood the point of this behavior....

by jimmy7661521 hours ago|

parent|

[-]

The biggest problem with all the Google models has always been RLHF, particularly safety training. They take a good, smart model and make it behave like a corporate person that has been to far to many forced anti-{sexism, racism...} seminars so that it is now living in fear of saying something that could be construed as wrong by some moral standard.

by staticman221 hours ago|

parent|

[-]

This is almost certainly not true.

If it was, they wouldn't need to be using the classifiers they are using to warn Gemini about problematic prompts.

by ai_fry_ur_brain20 hours ago|

parent|

prev|

[-]

[flagged]

by thot_experiment23 hours ago|

prev|

[-]

I've always found the Gemma models to vastly under-perform on vision tasks compared to Qwen so that's nothing new.

by mountainriver21 hours ago|

parent|

[-]

The Qwen series adopted vision wayyy earlier than anyone else. No idea why the other labs were sleeping on it but they had about 2 years of experimentation without any competition.