There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format
For example: add this is in the contents stream for a pdf page and it'll put hello world on the page
BT
/myfont 50 Tf
100 200 Td
(Hello World) Tj
ET
(Note: a bit more is required to select the font etc)