undefined

points

by simonw20 hours ago |

comments

by cedws7 hours ago|

[-]

Given how primitive that image is, what's the point of even having an image model at this size?

by simonw5 hours ago|

parent|

[-]

This isn't an image model. It's a text model, but text models can output SVG so you can challenge them to generate a challenging image and see how well they do.

by cedws5 hours ago|

parent|

[-]

>Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs.

But I understood your point, Simon asked it to output SVG (text) instead of a raster image so it's more difficult.

by simonw4 hours ago|

parent|

[-]

It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model.

by cedws3 hours ago|

parent|

[-]

Yeah you're right. Also, you're Simon :)

by JohnKemeny19 hours ago|

prev|

[-]

Is that actually a useful benchmark, or is it just for the laughs? I've never really understood that.

by simonw18 hours ago|

parent|

[-]

It was supposed to be a joke. But weirdly it turns out there's a correlation between how good a model is and how good it as at my stupid joke benchmark.

I didn't realize quite how strong the correlation was until I put together this talk: https://simonwillison.net/2025/Jun/6/six-months-in-llms/

by moritzwarhier10 hours ago|

parent|

[-]

Always loved this example, what do you think of ASCII art vs SVG?

Since it's not a formal encoding of geometric shapes, it's fundamentally different I guess, but it shares some challenges with the SVG tasks I guess? Correlating phrases/concepts with an encoded visual representation, but without using imagegen, that is.

Do you think that "image encoding" is less useful?

It's a thing I love to try with various models for fun, too.

Talking about illustration-like content, neither text-based ASCII art nor abusing it for rasterization.

The results have been interesting, too, but I guess it's less predictable than SVG.

by simonw5 hours ago|

parent|

[-]

I've had disappointing results with ASCII art so far. Something I really like about SVG is that most models include comments, which give you an idea of what they were trying to do.

by moritzwarhier50 minutes ago|

parent|

[-]

Yes, the comments part makes sense, you also included it in the talk (I read the transcript but forgot to mention it in my comment, sorry :)

It makes sense, since it works adds associations between descriptions and individual shapes / paths etc., similar to other code.

by OtherShrezzing19 hours ago|

parent|

prev|

[-]

For me, it shows if LLM are generalising from their training data. LLM understand all of the words in the prompt. they understand the spec for svg better than any human. They know what a bird is. They know what a bike is. They know how to draw (and given access to computer-use could probably ace this test). They can plan and execute on those plans.

Everything here should be trivial for LLM, but they’re quite poor at it because there’s almost no “how to draw complex shapes in svg” type content in their training set.

by jerpint9 hours ago|

parent|

prev|

[-]

It’s been useful though given the authors popularity I suspect it’s only a matter of time new LLMs become “more aware” of it

by dominicrose9 hours ago|

parent|

prev|

[-]

It's useful because it's SVG so it's different than other image generation methods.

by owebmaster17 hours ago|

parent|

prev|

[-]

I think in 5 years we might have some ultra-realistic pelicans and this benchmark will turn out quite interesting.

by lofaszvanitt11 hours ago|

parent|

[-]

And then the author will try the "Pelican tries to swallow the capybara as-is". And it will fall apart again.

by moritzwarhier10 hours ago|

parent|

[-]

It's this part where it gets interesting... how exactly it falls apart :D

by mvdtnz18 hours ago|

parent|

prev|

[-]

[flagged]

by Aeolun16 hours ago|

parent|

[-]

Kinda feel like the content is a much better reason to visit than the pelicans. Though I suppose the pelicans are part of the content.

I'm quite happy that there's someone with both the time to keep up with all the LLM/AI stuff, that is also good enough at writing amusing stuff that I want to keep reading it.

by mathgeek9 hours ago|

parent|

[-]

> Kinda feel like the content is a much better reason to visit than the pelicans.

That's how the pelicans get ya.

by simonw18 hours ago|

parent|

prev|

[-]

Scored a whole two upvotes here, my scheme is clearly working great!

by DrammBA13 hours ago|

parent|

[-]

Leave some upvotes for the rest of us