(github.com)
The presence of cameras everywhere is considerably more concerning than the status quo, to me at least, when there is an AI watching and indexing every second of every feed—where camera owners or manufacturers or governments could set simple natural language parameters for highly specific people or activities notify about. There are obviously compelling and easy-to-sell cases here that will surely drive adoption as it becomes cost effective: get an alert to crime in progress, get an alert when a neighbor who doesn't clean up after his dog, get an alert when someone has fallen...but the potential implications of living in a panopticon like this if not well regulated are pretty ugly.
[0]: https://www.axon.com/products/axon-fusus [1]: https://citizen.com/
The problems start cropping up when you get things like Flock where governments start deploying cameras on a massive scale, or Ring where a single company has unrestricted access to everyone's private cameras.
I don't think it's a good thing but it seems the limiting factor has been technological feasibility instead of any kind of principle against it.
Thanks for sharing!
Imagine a Premiere plugin where you could say "remove all scenes containing cats" and it'll spit out an EDL (Edit Decision List) that you can still manually adjust.
This very well might be a reality in a couple years though!
Would love to see open-weight models with this capability since it would eliminate the API cost and the privacy concern of uploading footage.
for example, for now if i search "cybertruck" in my indexed dashcam footage, i don't have any cybertrucks in my footage, so it'll return a clip of the next best match which is a big truck, but not a cybertruck
If there is text on the video (like a caption or wtv), will the embedding capture that? Never thought about this before.
If the video has audio, does the embedding capture that too?
Cool Project, thanks for sharing!
a bit expensive right now so it's not as practical at scale. but once the embedding model comes out of public preview, and we hopefully get a local equivalent, this will be a lot more practical.