upvote
i don’t quite understand, what makes it reverse typesetting?

my understanding is your typesetting books for responsive eink readers.

reply
You're inferring the structure of the document from the printed result. If typesetting takes a set of layout directives and outputs a page, this is taking a finished page and guessing what layout directives could create it. Then you can take that inferred structure and reflow the page in a new layout.
reply
so like ocr but not recognizing characters and words but recognizing the layouted structure and transforming it into content markup and layout markup?
reply
That's a way to view it!

The reason I'm not falling back on OCR is because the general case is full of things, like math equations and inset graphics/diagrams, that can't be OCR'd. The only robust way to deal with those is to treat them as graphical atoms: "this bounding box can be moved around, but should not be split up into pieces".

reply