undefined

points

[-]

I'll need to check it out!

We had the same observation in that the possible space is almost endless, and for example even for the same file type there may be different kind of processing required (e.g. an excel can be database style, vs small narrative heavy, or both).

We have baked in some ground processing rules for different kinds of documents, and we do allow custom instructions on how to deal with specific cases (e.g. translations, particular format layouts). The best write-up I have at the moment is https://www.parsewise.ai/doc-processing-pipelines but we're working on something that goes into more detail:)