(9to5mac.com)
A more accurate headline would be - Apple starting to create images using 4 year old techniques.
> short: both Apple and OpenAI are moving beyond diffusion, but while OpenAI is building for its data centers, Apple is clearly building for our pockets.
https://arstechnica.com/apple/2024/11/apple-intelligence-not...
Or, you know, just posting an article based on an Apple's press release about a new technique that falls squarely into their target audience (people reading Apple centric news) and is a great fit to current fashionable technologies (AI) people will show interest in.
Without giving a fuck to "position the company in the AI race". They'd post about Apple sewers having an issue at their HQs, if that news story was available.
Besides, when did Apple ever came first in some particular tech race (say, the mp3 player, the smartphone, the store, the tablet, the smartwatch, maybe VR now)? What they do typically is wait for the dust to settle and sweep the end-user end of that market.
A glance through the comments also shows HNers doing their best too. The mind still boggles as to why this site is so willing to perform mental gymnastics for a corporate.
1. NVAE: A Deep Hierarchical Variational Autoencoder https://arxiv.org/pdf/2007.03898
Also, if we're being nitpicky, diffusion model inference has been proven equivalent to (and is often used as) a particular NF so.. shrug
The appendix goes on to explain, "We apply simple volume-preserving normalizing flows of the form z′ = z + b(z) to the samples generated by the encoder at each level".
It's a different set of trade-offs.
* Theoretically; I don't own an iPhone.
Server side means shared resources, shared upgrades and shared costs. The privacy aspect matters, but at what cost?
The cost, so far, is greater.
How so if efficiency is key for datacenters to be competitive? If anything it's the other way around.
Local doesn't cost the company anything, and increases the minimum hardware customers need to buy.
To get deterministic results, you fix the seed for your pseudorandom number generator and make sure not to execute any operations that produce different results on different hardware. There's no difference between the approaches in that respect.
Let's see how this would turn out in longterm.
That they didn't scale beyond AFHQ (high-quality animal faces: cats, dogs and big cats) at 256×256 is probably not due to an explicit preference for small models at the expense of output resolution, but because this is basic research to test the viability of the approach. If this ever makes it into a product, it'll be a much bigger model trained on more data.
EDIT: I missed the second paper https://arxiv.org/abs/2506.06276 where they scale up to 1024×1024 with a 3.8-billion-parameter model. It seems to do about as well as diffusion models of similar size.
If they approach things this way, and transistor progress continues linearly (relative to the last few years) maybe they can make their first devices that can meet these goals in… 2-3 years?