upvote
This just dropped: https://huggingface.co/baidu/Unlimited-OCR

Which can run comfortably on 12gb of vram. I gave it a whirl and it does seem pretty competitive. I wonder how that compares for your usecase

reply
curious if you tried local LLM models for OCR, like a Gemma4, or your volume is too much for that
reply
Haven't tried them in a while, so I can't comment on current performance.
reply