In my (very personal) opinion, they're part of a very small group of organizations that sell inference under a sane and successful business model.
I was a mod on MJ for its first few years and got to know MJ's founder through discussions there. He already had "enough" money for himself from his prior sale of Leap Motion to do whatever he wanted. And, he decided what he wanted was to do cool research with fun people. So, he started MJ. Now he has far more money than before and what he wants to do with it is to have more fun doing more cool research.
1. real time world models for the "holodeck". It has to be fast, high quality, and inexpensive for lots of users. They started on this two years ago before "world model" hype was even a thing.
2. some kind of hardware to support this.
David Holz talks about this on Twitter occasionally.
Midjourney still has incredible revenue. It's still the best looking image model, even if it's hard to prompt, can't edit, and has artifacting. Every generation looks like it came out of a magazine, which is something the other leading commercial models lack.
Even something like Flux.1 Dev which can be run entirely locally and was released back in August of 2024 has significantly better prompt understanding.
Though in some cases, it is a bit easier to fix visual artifacts (using second-pass refiners, Img2Img, ultimate upscale, stylistic LoRAs, etc.) than a fundamental coherency problem.
How. By magic? You fell for 'Deepseek V3 is as good as SOTA'?
What's interesting is that the bottleneck is no longer the model — it's the person directing it. Knowing what to ask for and recognizing when the output is good enough matters more than which model you use. Same pattern we're seeing in code generation.
The fight right now outside of API SOTA is who will replace SDXL to be the “community preference”
It’s now a three way between Flux2 Klein, Z-Image, and now Qwen2.
I want the ability to lean into any image and tweak it like clay.
I've been building open source software to orchestrate the frontier editing models (skip to halfway down), but it would be nice if the models were built around the software manipulation workflows: