Apple Research unearthed forgotten AI technique and using it to generate images

upvote

Apple Research unearthed forgotten AI technique and using it to generate images

(9to5mac.com)

120 points

by celias3 days ago |

upvote

by bitpush7 hours ago|

[-]

I find it fascinating that Apple-centric media sites are stretching so much to position the company in the AI race. The title is meant to say that Apple found something unique that other people missed, when the simplest explanation is they started working on this a while back (2021 paper, afterall) and just released it.

A more accurate headline would be - Apple starting to create images using 4 year old techniques.

reply

upvote

by niyyou1 hours ago|

[-]

It's not even some "forgotten AI technique" (sigh...). It's been a hot topic for the last 5 years. Used a lot with Variational Auto-encoders, etc. Such a bad journalism.

reply

upvote

by darkstar_165 hours ago|

[-]

I think its just Apple PR pushing these out now to get Apple's name out in the AI era.

reply

upvote

by 4 hours ago|

[-]

deleted

reply

upvote

by danhau7 hours ago|

[-]

This „4 year old technique“ apparently could give Apple an edge for on-device workloads.

> short: both Apple and OpenAI are moving beyond diffusion, but while OpenAI is building for its data centers, Apple is clearly building for our pockets.

reply

upvote

by bitpush6 hours ago|

[-]

The same edge Apple had summarizing notifications so poorly that they had to turn it off?

https://arstechnica.com/apple/2024/11/apple-intelligence-not...

reply

upvote

by janalsncm5 hours ago|

[-]

That was a bad and unnecessary feature but the privacy benefits of running a model on device rather than in the cloud are undeniable.

reply

upvote

by coldtea3 hours ago|

[-]

>I find it fascinating that Apple-centric media sites are stretching so much to position the company in the AI race

Or, you know, just posting an article based on an Apple's press release about a new technique that falls squarely into their target audience (people reading Apple centric news) and is a great fit to current fashionable technologies (AI) people will show interest in.

Without giving a fuck to "position the company in the AI race". They'd post about Apple sewers having an issue at their HQs, if that news story was available.

Besides, when did Apple ever came first in some particular tech race (say, the mp3 player, the smartphone, the store, the tablet, the smartwatch, maybe VR now)? What they do typically is wait for the dust to settle and sweep the end-user end of that market.

reply

upvote

by rTX5CMRXIfFG6 hours ago|

[-]

That site's target market is what we know as "Apple fanboys". I'm not one to consider 9to5 serious journalism (nor even worthy to post in HN), but even those publications that I consider serious are businesses, too, and need to pander to their markets in order to make money.

reply

upvote

by politelemon6 hours ago|

[-]

> I find it fascinating that Apple-centric media sites are stretching so much to position the company in the AI race.

A glance through the comments also shows HNers doing their best too. The mind still boggles as to why this site is so willing to perform mental gymnastics for a corporate.

reply

upvote

by amelius4 hours ago|

[-]

We seriously need an AI to dampen the reality distorion field and bring back common sense. Maybe it can be something that people install in their browsers.

reply

upvote

by kelseyfrog9 hours ago|

[-]

Forgotten from like 2021? NVAE[1] was a great paper but maybe four years is long enough to be forgotten in the AI space? shrug

1. NVAE: A Deep Hierarchical Variational Autoencoder https://arxiv.org/pdf/2007.03898

reply

upvote

by bbminner9 hours ago|

[-]

Right, it is bizzare to read that someone "unearthed a forgotten AI technique" that you happened to have worked with/on when it was still hot - when did I become a fossil? :D

Also, if we're being nitpicky, diffusion model inference has been proven equivalent to (and is often used as) a particular NF so.. shrug

reply

upvote

by nabla98 hours ago|

[-]

They are both variational inference, but Normalizing Flow (NF) is not VAE.

reply

upvote

by kelseyfrog6 hours ago|

[-]

If you read the paper, you'll find "More Expressive Approximate Posteriors with Normalizing Flows" is in the methods section. The authors are in fact using (inverse) normalizing flows within the context of VAEs.

The appendix goes on to explain, "We apply simple volume-preserving normalizing flows of the form z′ = z + b(z) to the samples generated by the encoder at each level".

reply

upvote

by imoverclocked9 hours ago|

[-]

It’s pretty great that despite having large data centers capable of doing this kind of computation, Apple continues to make things work locally. I think there is a lot of value in being able to hold the entirety of a product in hand.

reply

upvote

by xnx8 hours ago|

[-]

Google has a family of local models too! https://ai.google.dev/gemma/docs

reply

upvote

by coliveira7 hours ago|

[-]

It's very convenient for Apple to do this: less expenses on costly AI chips, and more excuses to ask customers to buy their latest hardware.

reply

upvote

by nine_k7 hours ago|

[-]

Users have to pay for the compute somehow. Maybe by paying for models run in datacenters. Maybe paying for hardware that's capable enough to run models locally.

reply

upvote

by Bootvis7 hours ago|

[-]

I can upgrade to a bigger LLM I use through an API with one click. If it runs on my device device I need to buy a new phone.

reply

upvote

by nine_k6 hours ago|

[-]

I* can run the model on my device, no matter if I have an internet connection, nor if I have a permission from whoever controls the datacenter. I can run the model against highly private data while being certain that the private data never leaves my device.

It's a different set of trade-offs.

* Theoretically; I don't own an iPhone.

reply

upvote

by lostlogin6 hours ago|

[-]

But also: if Apple's way works, it’s incredibly wasteful.

Server side means shared resources, shared upgrades and shared costs. The privacy aspect matters, but at what cost?

reply

upvote

by shakna5 hours ago|

[-]

Server side means an excuse to not improve model handling everywhere you can, and increasing global power usage by noticable percentage point, at a time when we're approaching "point of no return" with burning out the only planet we can live on.

The cost, so far, is greater.

reply

upvote

by hu31 hours ago|

[-]

> Server side means an excuse to not improve model handling everywhere you can...

How so if efficiency is key for datacenters to be competitive? If anything it's the other way around.

reply

upvote

by coliveira39 minutes ago|

[-]

The previous commenter is right in that server-side companies have little incentive to do less, especially when they're backed by investors money. Client-side AI will be bound by device capabilities and customer investment in new devices.

reply

upvote

by v5v31 hours ago|

[-]

With no company having a clear lead in everyday ai for the non technical mainstream user, there is only going to be a race to the bottom for subscription and API pricing.

Local doesn't cost the company anything, and increases the minimum hardware customers need to buy.

reply

upvote

by celias3 days ago|

[-]

Paper at https://machinelearning.apple.com/research/normalizing-flows

reply

upvote

by b0a04gl7 hours ago|

[-]

flows make sense here not just for size but cuz they're fully invertible and deterministic. imagine running same gen on 3 iphones, same output. means apple can kinda ensure same input gives same output across devices, chips, runs. no weird variance or sampling noise. good for caching, testing, user trust all that. fits apple's whole determinism dna and more of predictable gen at scale

reply

upvote

by yorwba6 hours ago|

[-]

Normalizing flows generate samples by starting from Gaussian noise and passing it through a series of invertible transformations. Diffusion models generate samples by starting from Gaussian noise and running it through an inverse diffusion process.

To get deterministic results, you fix the seed for your pseudorandom number generator and make sure not to execute any operations that produce different results on different hardware. There's no difference between the approaches in that respect.

reply

upvote

by MBCook8 hours ago|

[-]

I wonder if it’s noticeably faster or slower than the common way on the same set of hardware.

reply

upvote

by lnyan3 hours ago|

[-]

normalizing flow might be unpopular but definitely not a forgotten technique

reply

upvote

by tiahura9 hours ago|

[-]

https://github.com/bayesiains/nflows

reply

upvote

by OhNoNotAgain_996 hours ago|

[-]

[dead]

reply

upvote

by rfv67239 hours ago|

[-]

Apple AI team keeps going against the bitter lesson and focusing on small on-device models.

Let's see how this would turn out in longterm.

reply

upvote

by peepeepoopoo1378 hours ago|

[-]

"""The bitter lesson""" is how you get the current swath of massively unprofitable AI companies that are competing with each other over who can lose money faster.

reply

upvote

by furyofantares8 hours ago|

[-]

I can't tell if you're perpetuating the myth that these companies are losing money on their paid offerings, or just overestimating how much money they lose on their free offerings.

reply

upvote

by janalsncm5 hours ago|

[-]

If it costs you a billion dollars to train a GPT5 and I can distill your model for a million dollars and get 90% of the performance, that’s a terrible deal for you. Or more realistically, whoever you borrowed from.

reply

upvote

by yorwba5 hours ago|

[-]

They took a simple technique (normalizing flows), instantiated its basic building blocks with the most general neural network architecture known to work well (transformer blocks), and trained models of different sizes on various datasets to see whether it scales. Looks very bitter-lesson-pilled to me.

That they didn't scale beyond AFHQ (high-quality animal faces: cats, dogs and big cats) at 256×256 is probably not due to an explicit preference for small models at the expense of output resolution, but because this is basic research to test the viability of the approach. If this ever makes it into a product, it'll be a much bigger model trained on more data.

EDIT: I missed the second paper https://arxiv.org/abs/2506.06276 where they scale up to 1024×1024 with a 3.8-billion-parameter model. It seems to do about as well as diffusion models of similar size.

reply

upvote

by janalsncm5 hours ago|

[-]

The bitter-er lesson is that distillation from bigger models works pretty damn well. It’s great news for the GPU poor, not great for the guys training the models we distill from.

reply

upvote

by sipjca9 hours ago|

[-]

somewhat hard to say how the cards fall when the cost of 'intelligence' is coming down 1000x year over year while at the same time compute continues to scale. the bet should be made on both sides probably

reply

upvote

by furyofantares8 hours ago|

[-]

10x year over year, not 1000x, right? The 1000x is from this 10x observation having held for 3 years.

reply

upvote

by echelon9 hours ago|

[-]

Edge compute would be clutch, but Apple feels a decade too early.

reply

upvote

by 7speter6 hours ago|

[-]

Maybe for a big llm, but if they add some gpu cores and added a magnitude or 2 more unified memory to their i devices, or shoehorned m socs into high tier iDevices (especially as their lithography process advances), image generation becomes more viable, no? Also, I thought I read somewhere that apple wanted to infer simpler queries locally and switch to datacenter inference when the request was more complicated.

If they approach things this way, and transistor progress continues linearly (relative to the last few years) maybe they can make their first devices that can meet these goals in… 2-3 years?

reply

upvote

by nextaccountic9 hours ago|

[-]

This subject is fascinating and the article is informative, but I wish that HN had a button like "flag", but specific for articles that seems written by AI (well at least the section "How STARFlow compares with OpenAI’s 4o image generator" sounds like it)

reply

upvote

by janalsncm5 hours ago|

[-]

I had the opposite reaction, it definitely reads like a tech journalist who doesn’t have a great understanding of the tech. AI would’ve written a less clunky (and possibly incorrect) explanation.

reply

upvote

by CharlesW8 hours ago|

[-]

FWIW, you can always report any HN quality concerns to hn@ycombinator.com and it'll be reviewed promptly and fairly (IMO).

reply

upvote

by Veen7 hours ago|

[-]

It reads like the work of a professional writer who uses a handful of variant sentence structures and conventions to quickly write an article. That’s what professional writers are trained to do.

reply