This presumably is going to be cheap and effective. Its much easier to wrap a prompt round this and know it works that mess around with crawling it all yourself.
You'll still be hand-rolling it if you want to disrespect crawling requirements though.
I’ve actually written a crawler like that before, and still ended up going with Firecrawl for a more recent project. There’s just so many headaches at scale: OOMs from heavy pages, proxies for sites that block cloud IPs, handling nested iframes, etc.