One thing I am curious about is a hybrid approach where LLMs work in conjunction with vision models (and probes which can query/manipulate the DOM) to generate Playwright code which wraps browser access to the site in a local, programmable API. Then you'd have agents use that API to access the site rather than going through the vision agents for everything.
https://playwright.dev/docs/getting-started-mcp#accessibilit...
I've mentioned several times and gotten snarky remarks about how rewriting your code so it fits in your head, and in the LLM's context helps the LLM code better, to which people complain about rewriting code just for an LLM, not realizing that the suggestion is to follow better coding principles to let the LLM code better, which has the net benefit of letting humans code better! Well looks like, if you support accessibility in your web apps correctly, Playwright MCP will work correctly for you.
Amazing.
Harder to scale if it's doing a lot of them, I suppose.
Most wikis you can mirror locally if you really need to hammer them.
and now the fact that interfaces need to be accessible to agents, not just humans, ironically increases it for humans in return