upvote
Not aware of any that do native video-to-vector embedding the way Gemini Embedding 2 does. There are CLIP-based models (like VideoCLIP) that embed frames individually, but they don't process temporal video. you'd need to average frame embeddings which loses a lot.

Would love to see open-weight models with this capability since it would eliminate the API cost and the privacy concern of uploading footage.

reply