my understanding is your typesetting books for responsive eink readers.
The reason I'm not falling back on OCR is because the general case is full of things, like math equations and inset graphics/diagrams, that can't be OCR'd. The only robust way to deal with those is to treat them as graphical atoms: "this bounding box can be moved around, but should not be split up into pieces".