This piqued my interest on how it does it and after briefly checking the project it seems it only has two features for automatic photo categorization. 1) it can group photos by date and 2) It has face detection and recognition that uses trained weights (so ML "intelligence").
Also the fact that an M5 version will be coming, and they likely know they are going to sell out on day one (I expect we'll see a price correction from Apple for higher end configs of M5 studios, base price will probably stay the same), so they need to build up stock reserves.
qwen3.5-2b and qwen3.5-4b are great at document parsing. They can run on CPU
qwen3.6-27b and gemma4-31b are borderline better than the human eye in some cases. Their OCR isn't perfect, but they're seriously good. They can still run on the CPU but you'll be waiting minutes per document.
You can demand JSON, YAML, MD, or freeform text just by varying the prompt. Even if you have a custom template, you can just put that in the prompt and they'll do an OK-ish job.
There's also models that aren't in the r/locallama zeitgeist. IBM released a new 4b parameter model for structured text extraction last week, and there's a sea of recent chinese OCR models too.
IMO the open wights models are so good that in a lot of cases it's not worth paying frontier labs for OCR purposes. The only barrier to entry is the effort to set up a pipeline, and havin the spare CPU/GPU capacity.
Besides those, there are a few smaller open-weights models that are dedicated for OCR tasks, for instance DeepSeek-OCR-2 and IBM granite-vision-4.1-4b. (They can be found on huggingface.co)
The dedicated vision models can be run on much cheaper hardware, including smartphones, than the big models that can process images besides text.
Similarly, besides bigger multimodal models, that can accept audio, images or text as imput, there are smaller open-weights models that are dedicated for speech recognition, e.g. Xiaomi MiMo-V2.5-ASR and IBM granite-speech-4.1-2b.
Isn't that a function of RAM supply not being available now?
Even if that weren't the case, every corp _needs_ you to be on a subscription.
That's an interesting way to view the world. I mean, utterly stupid as it is, but interesting.
But the previous sentence is even stupider (a Perl script 10 years ago could write code like Qwen does now?), so I guess at least it's consistent.