Not like it's a big lead as of yet. I expect to see more action within the next few months, as people tune the harnesses and better models roll in.
This is far more of a "VLA" task than it is an "LLM" task at its core, but I guess ARC-AGI-3 is making an argument that human intelligence is VLA-shaped.
When running, the grids are represented in JSON, so the visual component is nullified but it still requires pretty heavy spatial understanding to parse a big old JSON array of cell values. Given Gemini's image understanding I do wonder if it would perform better with a harness that renders the grid visually.