(i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking)
Nonetheless I still think it's impressive that we have LLMs that can just do this now.
Maybe in the early rounds, but deck fixing (e.g. Hanged Man, Immolate, Trading Card, DNA, etc) quickly changes that. Especially when pushing for "secret" hands like the 5 of a kind, flush 5, or flush house.
There are *tons* of balatro content on YouTube though, and it makes absolutely zero doubt that Google is using YouTube content to train their model.
I really doubt it's playing completely blind
Eh, both myself and my partner did this. To be fair, we weren’t going in completely blind, and my partner hit a Legendary joker, but I think you might be slightly overstating the difficulty. I’m still impressed that Gemini did it.