upvote
I'm not sure there's a one-stop shop for this at the moment. I think the process is:

* Have a box with sufficient spare (V)RAM -- probably 8G for simple categorization with qwen3.5-4b, and 24G or more for more intelligent categorization with qwen3.6-27b or gemma4-31b.

* Download or compile llama.cpp. Choose a model, then choose one of the "quantized" builds that will actually fit on your hardware. There are literally hundreds to thousands of these per model on Hugging Face.

* Spend half a day tuning command-line parameters until llama.cpp doesn't crash.

* Watch llama.cpp regularly OOM itself, then put it in a systemd service with a memory limit so it doesn't take the entire machine down when it dies.

* Download all your photos to a folder.

* Start vibing a Python script to categorize your images by repeatedly prompting the LLM with each image in turn.

* Spend days tweaking/refining the prompt to try to get the LLM to actually do what you want.

The endgame is one of:

* The local model categorizes your images. Yay.

* The local model is too slow and you give up. Boo.

* The local model is too slow, so you spend $1k-$10k on hardware. Your image categorization task becomes a cover story for buying new gear. Yay.

* The local model can't understand your categorization metric, so you give up. Boo.

* You eagerly await news of the next open model being released. Yay?

* You consider replacing your local model with a frontier model, but then you realize you'd be spending $500 to categorize your photos. Boo.

* You refuse to allow Google/Gemini/Anthropic to train on your nudes. Boo.

reply
I'm also interested on how to do this
reply