Chrome ships a local OCR model for text extraction from PDFs which is better than any of the VLM or open source OCR models i've tried. I had a few hundred gigs of old newspaper scans and after trying all the other options I ended up building a wrapper around the DLL it uses to get the text and bboxes. Performance and accuracy on another level compared to tesseract, and while VLM models sometimes produced good results they just seemed unreliable.
I've thought of open sourcing the wrapper but havent gotten around to it yet. I bet claude code can build a functioning prototype if you just point it to "screen_ai" dir under chrome's user data.