upvote
It works with relatively good scans, when there are bad/skewed scans and especially something with many label/value pairs, that aren't nicely tucked inside sentences, the more context you have, the more you can find the correct words and fix the errors.

There is a whole class of tricky documents. A decent (if you ignore the marketing bias) post about this problem can be found here:

https://getomni.ai/blog/ocr-benchmark

reply
How do you know where to slice an image? What if you slice an image mid-word?
reply
I calculate* the appropriate overlap and the slicer overlaps a certain amount of the previous slice. There is some post-processing assembly required, but it's trivial.

[*] SWAG line height, trial and error to figure out the right amount of overlap given LLM error rates, etc.

reply
Interesting. Do you have a uniform data set? E.g. documents of a specific type that you know consistently have similar formats, or is this training something you need to do per-document?
reply