undefined

points

by ValdikSS10 hours ago |

comments

by FeepingCreature6 hours ago|

[-]

That's actually how vision language models already work, pretty much.

by wongarsu3 hours ago|

parent|

[-]

And there's a reason nobody uses them for face recognition

Vision language models are an incredible achievement in the generality and usability. But they pay a hefty price in fidelity and speed

by stingraycharles6 hours ago|

parent|

prev|

[-]

Huh? The images are tokenized in the same way language is and it’s just fed into one single model. Not multiple smaller expert models.

Image gets rasterized into smaller pieces (eg 4x4 pixels) and each of those is assigned a token, similarly how text is broken up into tokens. And the whole thing is fed into a single model.

by FeepingCreature5 hours ago|

parent|

[-]

Yes I'm saying

> Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".

that's p much how it works.

by stingraycharles3 hours ago|

parent|

[-]

But that isn’t a specialized model like the grandparent claimed, but rather a single, multi-modal model.

by Dylan168072 hours ago|

parent|

[-]

Yes, the "imagine" was showcasing the opposite of a specialized model to call it a bad idea.

by stingraycharles6 hours ago|

prev|

[-]

Do you know that MoE is a thing?

by jampekka6 hours ago|

parent|

[-]

The experts in MoEs aren't specialized in any meaningful task sense. From level of what we would think as tasks MoEs are selected essentially arbitrarily per token and per block.

by stingraycharles6 hours ago|

parent|

[-]

It’s unsupervised, yes, but “unspecialized in any meaningful task sense” is incorrect, that’s the whole point. It’s just not in the sense of “this is a legal expert, this is a software developer”.

by orbital-decay23 minutes ago|

parent|

[-]

Optimal expert separation depends on the goal and can be pretty arbitrary, for example DeepSeek v4 separates them more or less by domain if I remember correctly.