upvote
- Azure Document Intelligence (has an option to return markdown too including headers and footers).

- AWS Textract

reply
Exactly. They're both very expensive and prone to surprising you. Sometimes in a good way, sometimes in a bad way. I'd rate them 85%, but you have to run a test because they both fail in different ways on the 15%.
reply
poma-ai has really great chunking techniques that chunk the document based on the document structure/heirarchy.

We use it on 200 page IEEE standards that are notoriously complex, filled with tables and diagram. Highly reccomend.

reply