This just exposes an API for sites to use. If they wanted to do the types of spying you're cynically suggesting, they could just add it without an API and you'd be none the wiser. Chrome contains closed source components so you wouldn't even know.
I appreciate you feel this is a cynical take. But have you seen the class action lawsuits against Google over the last 5 years? They exceed a billion dollars as far as I can remember and they are for more blatant things than this.
Why would adding a ML API or library require an EULA change?
With MoE models, you could fetch expert layers from the network on demand by issuing HTTP range queries for the corresponding offset, similar to how bittorrent downloads file chunks from multiple hosts. You'd still have to download shared layers, but time to first token would now be proportional to active-size rather than total-size. Of course this wouldn't be totally "offline" inference anymore, but for a web browser feature that's not a key consideration.
This is a common misconception, probably due to the unfortunate naming. Expert layers are not "expert" at any particular subject, and active-size only refers to the activated layers per token. You'd still need all (or most of all) the layers for any particular query, even if some layers have a very low chance of being activated.
All in all, you'd be better off with lazy loading the entire model, at least you'd know you have the capability to run inference from then on.
Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.
Here's to hoping that that dystopia will never happen.
fantastic!
> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back
sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?
that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?
also, any idea what the on disk impact is?
So it's once per browser, not once per site.
You can track the download state yourself and pop whatever UI you want.
Then it's possible the model you get will scale with the CPU/GPU/RAM available, so if you have a 12GB GPU you probably get a better model, perhaps that's a 10-11GB model? At 2x that's 22GB.
Then consider that a machine is not static, GPUs/hardware come and go, VRAM allocation in integrated graphics changes, etc. You end up with just needing to pick a number and not confuse users.
This is part of it, and also we just didn't want to use up the last of the user's disk space! It's disrespectful to use up 3 GB if the user only has 4 GB left; it's sketchy if the user only has 10 GB. At 22 GB, we felt there was more room to breathe.
One could argue that users should have more agency and transparency into these decisions, and for power users I agree... some kind of neato model management UI in chrome://settings would have been cool. But 99% of users would never see that, so I don't think it ever got built.
> Built-in models should be significantly smaller. The exact size may vary slightly with updates.Yes, I can read and comprehend English and you should assume I read the page. Because of the "At least" wording, I was curious what a person who has actually used the feature has noticed, aka, learning from people who have actually done it already.
If it turns out useful enough I'm sure browsers will just start including it as (perhaps optional?) part of installation.