upvote
It seems to heavily depend on what exactly you're transcribing, the performance/quality between them is really uneven. Some models work really well for old cursive but then fail reading 8-bit segment LCD digital fonts, vice-versa or any combination out there.

Basically, to find the answer you really need your own benchmark you run with real examples from what you want to do. Basically the same goes for anything ML nowadays as the public benchmarks cannot really be trusted to give you any sort of indication on how we'll it'd work for you.

reply
It's really good. I didn't do any type of statistical evaluation or comparison to other models, but it's so good that it doesn't matter to me if there's an option that might be even better.
reply
This just dropped: https://huggingface.co/baidu/Unlimited-OCR

Which can run comfortably on 12gb of vram. I gave it a whirl and it does seem pretty competitive. I wonder how that compares for your usecase

reply
curious if you tried local LLM models for OCR, like a Gemma4, or your volume is too much for that
reply
Haven't tried them in a while, so I can't comment on current performance.
reply