undefined

points

[-]

Have your LLM write a simulation of the deck rather so it can play 10,000 games in a second. I think that is a lot better for gold fishing and not nearly as expensive :)

https://github.com/spullara/mtg-reanimator

I have also tried evaluating LLMs for playing the game and have found them to be really terrible at it, even the SoTA ones. They would probably be a lot better inside an environment where the rules are enforced strictly like MTG Arena rather than them having to understand the rules and play correctly on their own. The 3rd LLM acting as judge helps but even it is wrong a lot of the time.

https://github.com/spullara/mtgeval

by GregorStocks5 hours ago|

parent|

[-]

Yeah, that's why I'm using XMage for my project - it has real rules enforcement.

by spullara4 hours ago|

parent|

[-]

I was really hoping they could play the game like a human does. Sadly they aren't that close :)

by GregorStocks6 hours ago|

prev|

[-]

XMage has non-LLM-based built in AIs, just using regular old if-then logic. Getting them to play against each other with no human interaction is the first thing I built. https://www.youtube.com/watch?v=a1W5VmbpwmY is an example with two of those guys plus Sleepy and Potato no-op players - they do a fine job with straightforward decks.

You could clone mage-bench https://github.com/GregorStocks/mage-bench and add a new config like https://github.com/GregorStocks/mage-bench/blob/master/confi... pointing at the deck you want to test, and then do `make run CONFIG=my-config`. The logs will get dumped in ~/.mage-bench/logs and you can do analysis on them after the fact with Python or whatever. https://github.com/GregorStocks/mage-bench/tree/master/scrip... has various examples of varying quality levels.

You could also use LLMs, just passing a different `type` in the config file. But then you'd be spending real money for slower gameplay and probably-worse results.

by benbayard4 hours ago|

parent|

[-]

This is super helpful, thank you!