I tried something similar where I needed a bunch of tables extracted from the pdf over like 40 pages. It was crazy slow on my MacBook and innacurate
https://github.com/zai-org/GLM-OCR
Use mlx-vlm for inference:
https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de...
Then you can run a single command to process your PDF:
glmocr parse example.pdf
Loading images: example.pdf
Found 1 file(s)
Starting Pipeline...
Pipeline started!
GLM-OCR initialized in self-hosted mode
Using Pipeline (enable_layout=true)...
=== Parsing: example.pdf (1/1) ===
My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with
"label": "table"
from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good.
I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally.
2. The n8n workflow passes a given binary pdf to gemma, which (based on a detailed prompt) analyzes it and produces JSON output.
See https://github.com/LinkedInLearning/build-with-ai-running-lo... if you want more details. :)