upvote
It's being built as we speak. I attended at a city council meeting yesterday, discussing approving a contract for ALPR cameras. I learned about a product from the camera vendor called Fusus[0], a dashboard that integrates various camera systems, ALPRs, alerts, etc. Two things stood out to me: natural-language querying of video feeds, and future planned integration with civilian-deployed cameras. The city only had budget for 50 ALPRs, and they stressed how they're only deploying them on main streets, but it seems like only a matter of time before your neighbor is able to install a camera that feeds right into the local PD's AI-enabled systems. One council member raised concerns about integrations with the citizen app[1] specifically (and a few others I didn't catch the names of). I'm very worried about where all this is heading.

[0]: https://www.axon.com/products/axon-fusus [1]: https://citizen.com/

reply
Totally valid concern. Right now the cost ($2.50/hr) and latency make continuous real-time indexing impractical, but that won't always be the case. This is one of the reasons I'd want to see open-weight local models for this, keeps the indexing on your own hardware with no footage leaving your machine. But you're right that the broader trajectory here is worth thinking carefully about.
reply
It's 2.50 an hour because Google has margins. A nation state could do it at cost, and even if it's not a huge difference, the price of a year's worth of embeddings is just $21,900. That's a rounding error, especially considering it's a one time cost for footage.
reply
Right? $2.50 an hour is trivial to a Government that can vote to invent a trillion dollars. Even just 1 million dollars is the cost of monitoring 45 real time feeds for a year. I'm sure just many very rich people would pay that for the safety of their compound.
reply
All the major cloud providers offer some form of face detection and numberplate reading, with many supporting object detection (ie package, vehicle, person) out of the camera itself.
reply
Most cameras are also not queryable by any one person or organization. They are owned by different companies and if the government wants access they have to subpoena them after the fact.

The problems start cropping up when you get things like Flock where governments start deploying cameras on a massive scale, or Ring where a single company has unrestricted access to everyone's private cameras.

reply
I think Flock is just a symptom of the underlying tech becoming so cheap that "just blanket the city in cameras" starts to sound like a viable solution when police rely so heavily on camera footage.

I don't think it's a good thing but it seems the limiting factor has been technological feasibility instead of any kind of principle against it.

reply
Yeah, the panopticon is now technically very feasible it's just expensive to implement (for now).
reply
For specific people they probably wouldn’t use general embeddings. These embeddings can let you search for “tall man in a trenchcoat” but if you want a specific person you would use facial recognition.
reply
I think a general description is better for surveillance/tracking like this, no? If they're at a weird angle or intentionally concealing their face then facial recognition falls apart but being able to describe them naturally would result in better tracking IMO.
reply