undefined

points

[-]

Per BalatroBench, gemini-3-pro-preview makes it to round (not ante) 19.3 ± 6.8 on the lowest difficulty on the deck aimed at new players. Round 24 is ante 8's final round. Per BalatroBench, this includes giving the LLM a strategy guide, which first-time players do not have. Gemini isn't even emitting legal moves 100% of the time.

by ankit2191 hours ago|

prev|

[-]

Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets.

(i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking)

by ebiester3 hours ago|

prev|

[-]

It's trained on YouTube data. It's going to get roffle and drspectred at the very least.

by silver_sun3 hours ago|

prev|

[-]

Google has a library of millions of scanned books from their Google Books project that started in 2004. I think we have reason to believe that there are more than a few books about effectively playing different traditional card games in there, and that an LLM trained with that dataset could generalize to understand how to play Balatro from a text description.

Nonetheless I still think it's impressive that we have LLMs that can just do this now.

by mjamesaustin2 hours ago|

parent|

[-]

Winning in Balatro has very little to do with understanding how to play traditional poker. Yes, you do need a basic knowledge of different types of poker hands, but the strategy for succeeding in the game is almost entirely unrelated to poker strategy.

by gilrain3 hours ago|

parent|

prev|

[-]

If it tried to play Balatro using knowledge of, e.g., poker, it would lose badly rather than win. Have you played?

by gcr2 hours ago|

parent|

[-]

I think I weakly disagree. Poker players have intuitive sense of the statistics of various hand types showing up, for instance, and that can be a useful clue as to which build types are promising.

by barnas22 hours ago|

parent|

[-]

>Poker players have intuitive sense of the statistics of various hand types showing up, for instance, and that can be a useful clue as to which build types are promising.

Maybe in the early rounds, but deck fixing (e.g. Hanged Man, Immolate, Trading Card, DNA, etc) quickly changes that. Especially when pushing for "secret" hands like the 5 of a kind, flush 5, or flush house.

by winstonp4 hours ago|

prev|

[-]

DeepSeek hasn't been SotA in at least 12 calendar months, which might as well be a decade in LLM years

by cachius4 hours ago|

parent|

[-]

What about Kimi and GLM?

by zozbot2343 hours ago|

parent|

[-]

These are well behind the general state of the art (1yr or so), though they're arguably the best openly-available models.

by epolanski31 minutes ago|

parent|

[-]

Idk man, GLM 5 in my tests matches opus 4.5 which is what, two months old?

by tgrowazay2 hours ago|

parent|

prev|

[-]

According to artificial analysis ranking, GLM-5 is at #4 after Claude Opus 4.5, GPT-5.2-xhigh and Claude Opus 4.6 .

by dudisubekti4 hours ago|

prev|

[-]

But... there's Deepseek v3.2 in your link (rank 7)

by tehsauce2 hours ago|

prev|

[-]

How does it do on gold stake?

by littlestymaar4 hours ago|

prev|

[-]

> . I don't think there are many people who posted their Balatro playthroughs in text form online

There are *tons* of balatro content on YouTube though, and it makes absolutely zero doubt that Google is using YouTube content to train their model.

by sdwr4 hours ago|

parent|

[-]

Yeah, or just the steam text guides would be a huge advantage.

I really doubt it's playing completely blind

by acid__3 hours ago|

prev|

[-]

> Most (probably >99.9%) players can't do that at the first attempt

Eh, both myself and my partner did this. To be fair, we weren’t going in completely blind, and my partner hit a Legendary joker, but I think you might be slightly overstating the difficulty. I’m still impressed that Gemini did it.

by Falsintio4 hours ago|

prev|

[-]

[dead]