undefined

points

by refulgentis1 days ago |

comments

by mlsu23 hours ago|

[-]

It looks to me by the marketing copy that the vision encoder can run 60FPS.

> MobileNet-V5-300M

Which makes sense as it's 300M in size and probably far less complex, not a multi billions of parameters transformer.

by refulgentis23 hours ago|

parent|

[-]

I agree that's the most likely interpretation - does it read as a shell game to you? Like, it can do that but once you get the thing that can use the output involved it's 1/100th of that? Do they have anything that does stuff with the outputs from just MobileNet? If they don't, how are they sure I can build 60 fps realtime audiovisual experiences they say I can?

by namibj20 hours ago|

parent|

[-]

Classify/similarity/clustering works fine with just an encoder, doesn't it?

I guess there's benefit to running that step without subsampling to the initial 256 tokens per image/frame ( https://ai.google.dev/gemma/docs/gemma-3n/model_card#inputs_... ) to go on from that, https://github.com/antimatter15/reverse-engineering-gemma-3n suggests these are 2048 dimensional tokens, which makes these 60 Hz frame digestion rate produce just under 31.5 Million floats-of-your-choosen-precision per second. At least at the high (768x768) input resolution, this is a bit less than one float per pixel.

I guess maybe with very heavy quantizing to like 4 bit that could beat sufficiently-artifact-free video coding for then streaming the tokenized vision to a (potentially cloud) system that can keep up with the 15360 token/s at (streaming) prefill stage?

Or I could imagine just local on-device visual semantic search by expanding the search query into a bunch of tokens that have some signed desire/want-ness each and where the search tokens get attended to the frame's encoded tokens, activation function'd, scaled (to positive/negative) by the search token's desire score, and then just summed over each frame to get a frame score which can be used for ranking and other such search-related tasks.

(For that last thought, I asked Gemini 2.5 Pro to calculate flops load, and it came out to 1.05 MFLOPS per frame per search token; Reddit suggests the current Pixel's TPU does around 50 TOPS, so if these reasonably match each terminology wise, assuming we're spending about 20% of it's compute on the search/match aspect, it comes out to an unreasonably (-seeming) about 190k tokens the search query could get expanded to. I interpret this result to imply that quality/accuracy issues in the searching/filtering mechanism would hit before encountering throughout issues in this.)

by catchmrbharath23 hours ago|

prev|

[-]

The APK that you linked, runs the inference on CPU and does not run it on Google Tensor.

by refulgentis23 hours ago|

parent|

[-]

That sounds fair, but opens up another N questions:

- Are there APK(s) that run on Tensor?

- Is it possible to run on Tensor if you're not Google?

- Is there anything at all from anyone I can download that'll run it on Tensor?

- If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)

by catchmrbharath21 hours ago|

parent|

[-]

> Are there APK(s) that run on Tensor?

No. AiCore service internally uses the inference on Tensor (http://go/android-dev/ai/gemini-nano)

> Is there anything at all from anyone I can download that'll run it on Tensor?

No.

> If there isn't, why not? (i.e. this isn't the first on device model release by any stretch, so I can't give benefit of the doubt at this point)

Mostly because 3P support has not been a engineering priority.

by refulgentis21 hours ago|

parent|

[-]

> Mostly because 3P support has not been a engineering priority.

Got it: assuming you're at Google, in eng. parlance, it's okay if it's not Prioritized™ but then product/marketing/whoever shouldn't be publishing posts around the premise it's running 60 fps multimodal experiences on device.

They're very, very, lucky that ratio of people vaguely interested in this, to people follow through on using it, is high, so comments like mine end up at -1.

by my12318 hours ago|

parent|

[-]

Tensor is essentially shipping subpar hardware with not even taking care of software properly.

https://ai.google.dev/edge/litert/android/npu/overview has been identical for a year+ now.

In practice Qualcomm and MediaTek ship working NPU SDKs for third party developers, NNAPI doesn't count and is deprecated anyway.

by refulgentis16 hours ago|

parent|

[-]

Man this is a funny situation. Ty for sharing, more or less confirms my understanding. Couldn't quite believe it when I was in Google, or out of Google. This should be a big scandal afaict. What is going on???

(n.b. to readers, if you click through, the Google Pixel Tensor API is coming soon. So why in the world has Google been selling Tensor chips in Pixel as some big AI play since...idk, at least 2019?)

by my12315 hours ago|

parent|

[-]

Yes, you can use first-party models on the Pixel NPUs or you're stuck with NNAPI which is self-admittedly deprecated by Google and doesn't work all that well.

On third party model workloads, this is what you will get:

https://ai-benchmark.com/ranking.html

https://browser.geekbench.com/ai-benchmarks (NPU tab, sort w/ quantisation and/or half precision)

Google is clearly not serious on Pixels in practice, and the GPU performance is also behind by quite a lot compared to flagships, which really doesn't help. CPUs are also behind by quite a lot too...

by lostmsu23 hours ago|

parent|

prev|

[-]

How does their demo work then? It's been 3 months since 3n was first released publicly.

by refulgentis16 hours ago|

parent|

[-]

What demo?

The only one we have works as described, TL;Dr 0.1 fps.