If you need something less restricted to existing labels (say wanting all the red apples, or all cardboard signs) SAM3 is great, as the sibling comment says
A quick note to say that this is also a task you can hand to things like gemini.
Large general models have taken over in NLP, and (outside of embedded/low latency applications) it seems like they are coming for CV next.
So you should soon be able to have large generic model that can detect whatever for you.
It's already pretty much possible with open-vocabulary detectors like SAM3, where you could just prompt it with "Apple": https://ai.meta.com/research/sam3/