upvote
These large MoE models can work quite well on consumer or prosumer platforms, they'll just be slow, and you have to offset that by running them unattended around the clock. (Something that you can't really do with large SOTA models without spending way too much on tokens.) This actually works quite well for DeepSeek V4 series which has comparatively tiny KV-cache sizes so even a consumer platform can run big batches in parallel.
reply
I don’t fully understand what open weights unlocks that cannot be accomplished via API from a product standpoint.

Open weights is great if you want to do additional training, or if you need on-prem for security.

reply
Multiple providers of the same model. That means competition for price, reliability, latency, etc. It also means you can use the same model as long as you want, instead of having it silently change behaviour.
reply
Those open weight providers where found nerfing models too.
reply
> Open weights is great if you want to do additional training, or if you need on-prem for security.

The power of giving universities, companies, and hackers "full" models should not be understated.

Here are a just a few ideas for image, video, and creative media models:

- Suddenly you're not "blocked" for entire innocuous prompts. This is a huge issue.

- You can fine tune the model to learn/do new things. A lighting adjustment model, a pose adjustment model. You can hook up the model to mocap, train it to generate plates, etc.

- You can fine tune it on your brand aesthetic and not have it washed out.

reply
Or try to beat Anthropic's uptime.
reply