undefined

points

by tantalor8 hours ago |

comments

by et13378 hours ago|

[-]

Asking random people to write SVG gives even worse results

by lxgr7 hours ago|

parent|

[-]

Especially without being able to look at the rendered output! (At least I'd be surprised if modern server-side tool calls regularly include an SVG renderer that can show a rasterized version to the model to iterate on it.)

by gpm3 hours ago|

parent|

[-]

One of the many things Google was pitching today is that they're going to run things like google search with access to linux container environments to do things like run tool calls... which will presumably be able to rasterize SVGs and show them to the model.

But Simon says he runs these through the API without tool access specifically to prevent that sort of "cheating". I.e. it's an LLM benchmark not an LLM+Harness benchmark.

by Eji17005 hours ago|

prev|

[-]

Although every single render of those has pedals on the correct side as opposed to the Gemini optical illusion back pedal that tries to be both on the other side of the central gear and infront of the back wheel.

Not really a criticism but an interesting point that you would never expect a human to make that mistake even in a bad drawing.