upvote
It can handle image and audio inputs, but it cannot produce those as outputs - it's purely a text output model.
reply
Yeah you're right. Also, you're Simon :)
reply