undefined

points

[-]

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

by zozbot2344 hours ago|

parent|

[-]

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

by dilap4 hours ago|

parent|

[-]

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.

by prodigycorp4 hours ago|

parent|

prev|

[-]

the models were objectively horrible

by NitpickLawyer4 hours ago|

parent|

[-]

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

by refulgentis3 hours ago|

parent|

[-]

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

by prodigycorp4 hours ago|

parent|

prev|

[-]

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

by pixel_popping1 hours ago|

parent|

[-]

failing non-stop at tool calls on top of that.

by refulgentis3 hours ago|

parent|

prev|

[-]

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

by alex11382 hours ago|

parent|

[-]

Your comments seem to imply the engineers made a great product but Zuck intervened so now it's shit

by refulgentis45 minutes ago|

parent|

[-]

I don't know how Zuck intervening could change float32s in a trained model, so I don't think I think that, but maybe I'm parsing your words incorrectly.

by modeless2 hours ago|

prev|

[-]

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

by redox994 hours ago|

prev|

[-]

> If it slightly beats or even matches Opus 4.6

It doesn't though

by ryeguy_244 hours ago|

parent|

[-]

Curious on why you think this. Any data points that led you to this?

by howdareme4 hours ago|

parent|

[-]

The benchmarks they released

by johnfn2 hours ago|

parent|

[-]

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

by spprashant2 hours ago|

parent|

[-]

In Multimodal yes, but Opus is definitely edging out in Text/Reasoning and Agentic benchmarks.

I think the general skepticism is because they are late to race, and they are releasing a Opus-4.6-equivalent model now, when Anthropic is teasing Mythos.

by ChipopLeMoral4 hours ago|

prev|

[-]

> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.

by blazespin47 minutes ago|

prev|

[-]

Because bots and trillion dollar ipos and even bigger stakes. People need to better appreciate the level of manipulation going on. Social media has an outsized impact. Bots and even people are getting paid to post and upvote/downvote narratives.

by asdfman12330 minutes ago|

parent|

[-]

> people are getting paid to post and upvote/downvote narratives

This problem will be solved shortly with better AI (if it hasn't essentially been solved already).

No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!