undefined

points

[-]

Chrome ships a local OCR model for text extraction from PDFs which is better than any of the VLM or open source OCR models i've tried. I had a few hundred gigs of old newspaper scans and after trying all the other options I ended up building a wrapper around the DLL it uses to get the text and bboxes. Performance and accuracy on another level compared to tesseract, and while VLM models sometimes produced good results they just seemed unreliable.

I've thought of open sourcing the wrapper but havent gotten around to it yet. I bet claude code can build a functioning prototype if you just point it to "screen_ai" dir under chrome's user data.

by alvibo2 hours ago|

parent|

[-]

Is there a chance you'll open source the wrapper after all? It would help a lot of people like me. No pressure though, but now I really want to try it to OCR a bunch of Japanese scans I have lying around. Unfortunately, finding a good OCR for Japanese scans is still a huge problem in 2026.

by zzleeper7 hours ago|

parent|

prev|

[-]

Surprisingly, I have a few hundred gigs of old newspaper scans so am very curious.

How fast was it per page? Do you recall if it's CPU or GPU based? TY!

by Stagnant4 hours ago|

parent|

[-]

It is CPU-based. Somewhere between 1 to 2 seconds per page on a single core. I ran 20 instances of it in parallel to utilize 20 CPU cores so the avg time came down nicely.

by mwcampbell6 hours ago|

parent|

prev|

[-]

What's the name of this DLL? I assume it's separate from the monster chrome.dll, and that the model is proprietary.

by Stagnant5 hours ago|

parent|

[-]

chrome_screen_ai.dll is the name of the dll (libchromescreenai.so on linux) and yes it is proprietary. It isn't included by default, Chrome uses its component service to download it automatically when you open a PDF file that doesn't have pre-existing OCR'd text on it. You can download it separately from here: https://chrome-infra-packages.appspot.com/p/chromium/third_p...

by ghrl8 hours ago|

prev|

[-]

I remember someone building a meme search engine for millions of images using a cluster of used iPhone SE's because of Apple's very good and fast OCR capabilities. Quite an interesting read as well: https://news.ycombinator.com/item?id=34315782

by fzysingularity8 hours ago|

parent|

[-]

Apple OCR even on the Mac is insanely good, in fact way better than AWS textract/GCP cloud vision OCR.

Any idea what model is being used?

by AlphaSite8 hours ago|

parent|

[-]

Probably some custom model built for their hardware.