I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0]
It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa.
Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
How can we describe OCR that wouldn't match this definition exactly?
If the image is actually text, both of them can end up finding things. Binning will identify "these things look almost the same", while OCR will identify "these look like the letter M"
It also gives a false sense of security when it displays dirty pixels that still clearly show a specific digit, since you think you're basically looking at the original.
Jbig2 is an OCR algorithm that doesn't assume the document comes from a pre-existing alphabet.
Take another look at my comment.
What is the maximum resolution you support for PDFs? The max gemini will do is 3072x3072. We have plans that are 10x that size.
To think that everything has been digitalized a long time ago, yet contract law cannot properly deal with delineating responsibilities between GC and Architects, who are still sending 2D drawings to each other.
Imagine, all this information about quantities and door types (and everything else) is already available and produced by the architect's team, BUT they cannot share it! Because if they do, they are responsible for the numbers in case something is wrong.
So now there is this circus of: Arch technologist making the base drawing with doors. GC receives documents, counts doors for verification, and sends them to the sub. Subcontractor looks at these drawings, counts them again, and sends data to the supplier. Guess what, the supplier also looks, counts, confirms, and back we go.
Though I think robotics will change all of that. And when we have some sort of bot assistance, big tech players will have a bigger leverage in this, which will lead to the proper change management architecture.
Anyway, cool product. Anything to help with estimation. Really hope it gets traction.
They even gave me a big desk at Trondheim/Tyholt so I could help them with the software during my studies.
- Counting all the doors: https://www.getanchorgrid.com/developer/docs/endpoints/drawi... - Extracting schedules in architectural drawings: https://www.getanchorgrid.com/developer/docs/endpoints/drawi...
and use Claude or any other AI tool to wire up the UI
We're releasing toilets (division 10) later this week, then floors and pipes next.
The challenge we kept running into is that construction drawings in the wild aren’t always that clean. Unresolved xrefs, exploded dynamic blocks, version incompatibilities, SHX font substitutions — by the time a PDF hits a GC’s desk it’s often the only reliable artifact left. The CAD source may not even be available.
That’s why we see vision becomes the more pragmatic path — not because it’s more precise than structured CAD parsing, but because PDFs are the actual lingua franca of construction. Every firm, every trade, every discipline hands off PDFs. So we made a bet on meeting the document where it actually lives.
On consistency and reproducibility — that’s a real challenge with vision models. Our approach is to keep detection scope narrow and validate confidence scores on every output rather than trying to generalize broadly. Happy to go deeper on that if useful.
There already is a format that is plain text and preserves the semantics: IFC. That's what it was made for.
we're thinking of adding a params for the ROC curve so that you can decide your own optimal thresholds depend on when false positive true positive rate is acceptable
I hope you succeed because it would be great to have a standard API for this data, but I would advise on one of two directions: become the standard by being close to 100% accurate at finding symbols (one symbol doesn't seem to cut it in our testing) or make a great, comprehensive workflow for a small subset of the market and become standard that way.
In both cases, you cannot do a broad 'market test', you need to spend many hours with a specific sub-set of users in construction.
Disclaimer: I'm a co-founder of Provision.
The generalization problem you're pointing at is real and it's the hardest part of this. Our approach is to keep the detection scope tight — rather than trying to generalize across every firm's conventions, we train on a small but high-quality set of fixtures and optimize for precision within that scope.
The result is high confidence outputs on the elements we support, rather than mediocre coverage across everything.
We're expanding the detection surface incrementally as we validate accuracy division by division!
Tailscale’s article about NAT traversal is an example of how to write “how we did it”: https://tailscale.com/blog/how-nat-traversal-works
The world in which metadata is a common thing attached to any file doesn't exist, and probably never will, no matter how much you try to improve CAD work flow.
I know you're just repeating a phrase from a TV show but do you know how incredibly condescending this comes across to most people?
I have to make a BOM and oh boy I hate my job
A lot of them are "archival" so I'm pretty OOL
It is telling that so many of the comments here assume the person with a thing that is not the most practical would be easily able to request thing in a different format. The assumption that the person with the inconvenient thing would never have thought to ask if more convenient thing was available and just willfully toiling with the inconvenient thing is kind of insulting.
Also do doors, windows, and mechanical equipment.
dm, and I can include you in the next preview.
Let me know if you find it useful or have any questions, happy to help.
Love to give it to an arc client, not sure who the right person to implement this would be? Hmm…
https://cal.com/anchorgrid/anchorgrid-external-meeting?durat...
so you would want these documents translated lets say to German, mandarin, ect?
There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format
For example: add this is in the contents stream for a pdf page and it'll put hello world on the page
BT
/myfont 50 Tf
100 200 Td
(Hello World) Tj
ET
(Note: a bit more is required to select the font etc)