Do you know if these actually preserve the structure of Gemma 3n that make these models more memory efficient on consumer devices? I feel like the modified inference architecture described in the article is what makes this possible, but it probably needs additional software support.
But given that they were uploaded a day ago (together with the blog post), maybe these are actually the real deal? In that case, I wish Google could just link to these instead of to https://huggingface.co/mlx-community/gemma-3n-E4B-it-bf16.
Edit: Ah, these are just non-MLX models. I might give them a try, but not what I was looking for. Still, thank you!