upvote
> The out-of-the-box Shotwell manages photos quite well without any intelligence.

This piqued my interest on how it does it and after briefly checking the project it seems it only has two features for automatic photo categorization. 1) it can group photos by date and 2) It has face detection and recognition that uses trained weights (so ML "intelligence").

reply
The Mac Studio's disappearance is related to the fact that people now want them for the purpose of running local models. Supply and demand. That plus Apple doesn't shift prices for released products, and it essentially became underpriced when large RAM quantities exploded in price. For the price of 512GB of RAM alone you could get an M3 Ultra with 512GB of unified memory in a nice, quiet, and power efficient package. With the RAM you still need to spend a few thousand more on CPU/GPU, power supplies, storage and case.

Also the fact that an M5 version will be coming, and they likely know they are going to sell out on day one (I expect we'll see a price correction from Apple for higher end configs of M5 studios, base price will probably stay the same), so they need to build up stock reserves.

reply
Do we even have decent OCR nowadays? Any free solutions?
reply
The latest rounds of open weights vision language models are incredibly good. Like, massively good. Open weights vision capabilities trade blows with frontier models. Over the last few months I'd roughly rank capabilities as Gemini -> {chatgpt and SoTa open weights models} -> Claude.

qwen3.5-2b and qwen3.5-4b are great at document parsing. They can run on CPU

qwen3.6-27b and gemma4-31b are borderline better than the human eye in some cases. Their OCR isn't perfect, but they're seriously good. They can still run on the CPU but you'll be waiting minutes per document.

You can demand JSON, YAML, MD, or freeform text just by varying the prompt. Even if you have a custom template, you can just put that in the prompt and they'll do an OK-ish job.

There's also models that aren't in the r/locallama zeitgeist. IBM released a new 4b parameter model for structured text extraction last week, and there's a sea of recent chinese OCR models too.

IMO the open wights models are so good that in a lot of cases it's not worth paying frontier labs for OCR purposes. The only barrier to entry is the effort to set up a pipeline, and havin the spare CPU/GPU capacity.

reply
Many of the open-weights LLMs accept either text or images as input.

Besides those, there are a few smaller open-weights models that are dedicated for OCR tasks, for instance DeepSeek-OCR-2 and IBM granite-vision-4.1-4b. (They can be found on huggingface.co)

The dedicated vision models can be run on much cheaper hardware, including smartphones, than the big models that can process images besides text.

Similarly, besides bigger multimodal models, that can accept audio, images or text as imput, there are smaller open-weights models that are dedicated for speech recognition, e.g. Xiaomi MiMo-V2.5-ASR and IBM granite-speech-4.1-2b.

reply
The qwen models not only have good OCR, they will describe pictures to you.
reply
>Also, have you noticed as top-end Mac Studios got downgraded recently? They don't want you to have access to frontier models. And you will not have it.

Isn't that a function of RAM supply not being available now?

reply
OpenAI did buy out the RAM supply to block competition. Arguably local models are one of its (smaller) competitors.

Even if that weren't the case, every corp _needs_ you to be on a subscription.

reply
Huh? Why would Apple not want you to be able to run local models? They have very deliberately stayed the hell away from this space.
reply
The conspiracy angle here is not really relevant. Ram is expensive and they're gearing up for M5 studios. Not the illuminati keeping better LLM models out of your hands.
reply
You think Apple doesn't want you to use local models?

That's an interesting way to view the world. I mean, utterly stupid as it is, but interesting.

But the previous sentence is even stupider (a Perl script 10 years ago could write code like Qwen does now?), so I guess at least it's consistent.

reply