undefined

upvote

points

by WarmWash15 hours ago |

upvote

by gruez14 hours ago|

[-]

I thought all the recent models are "multimodal"? Is the image part just sticking an image recognizer in front of the text model?

reply