undefined

points

by merlindru18 hours ago |

comments

by ctoth17 hours ago|

[-]

If agents is what it finally takes to get good a11y I'll take it. I'll bitch about it, but I'll take it.

by tomjakubowski16 hours ago|

parent|

[-]

Playwright, the end-to-end testing framework for the web, provides a strong incentive to give sites good a11y: Playwright tests are an absolute delight to read, write and maintain on properly accessible sites, when using the accessibility locators. Somewhat less so when using a soup of CSS selector and getByText()-style locators.

One thing I am curious about is a hybrid approach where LLMs work in conjunction with vision models (and probes which can query/manipulate the DOM) to generate Playwright code which wraps browser access to the site in a local, programmable API. Then you'd have agents use that API to access the site rather than going through the vision agents for everything.

by giancarlostoro12 hours ago|

parent|

[-]

This is precisely how the Playwright MCP works, which lets something like Claude directly test a website.

https://playwright.dev/docs/getting-started-mcp#accessibilit...

I've mentioned several times and gotten snarky remarks about how rewriting your code so it fits in your head, and in the LLM's context helps the LLM code better, to which people complain about rewriting code just for an LLM, not realizing that the suggestion is to follow better coding principles to let the LLM code better, which has the net benefit of letting humans code better! Well looks like, if you support accessibility in your web apps correctly, Playwright MCP will work correctly for you.

Amazing.

by tyingq13 hours ago|

parent|

prev|

[-]

Was looking for this comment. I'd like to see this approach in the comparison...having the LLM build a playwright script and use it. I suspect it would beat time-to-market for the api, and be close-ish in elapsed time per transaction.

Harder to scale if it's doing a lot of them, I suppose.

by lsaferite13 hours ago|

parent|

prev|

[-]

Using playwright-cli with Claude code is highly effective for debugging locally deployed web apps with essentially zero setup.

by pjc5015 hours ago|

parent|

prev|

[-]

Very real risk of this going in reverse: people building inaccessible websites to prevent AI use.

by sciencejerk8 hours ago|

parent|

[-]

Or human engineers limiting AI-consumable documentation to improve job security!

by solenoid093714 hours ago|

parent|

prev|

[-]

Those people probably aren't working on anything useful anyways, so its no big deal.

by 20k13 hours ago|

parent|

[-]

I've found that by far the most useful websites as a programmer are also the ones most resistant to AI. This would be a huge loss for anyone vision impaired

by claytonjy13 hours ago|

parent|

[-]

What sorts of sites are you thinking of? To me, “most useful to a programmer” evokes docs and blogs and github issues and forum posts. I suppose some forums might be AI-resistant (login wall), but the others are trivially AI accessible.

by Rebelgecko11 hours ago|

parent|

[-]

Plenty of Linux-y websites use Anubis. Arch Wiki and IIRC some other distros too.

by fc417fc80210 hours ago|

parent|

[-]

That's less a value judgment, more a necessary evil due to the plethora of bad actors out there. I doubt it will get in the way of a local model used in a reasonable manner.

Most wikis you can mirror locally if you really need to hammer them.

by irishcoffee13 hours ago|

parent|

prev|

[-]

GitHub is naturally LLM resistant via its new uptime feature… I’ll show myself out.

by stingraycharles13 hours ago|

parent|

prev|

[-]

Examples, please.

by stingraycharles13 hours ago|

parent|

prev|

[-]

That’s such an extremely small niche of people it’s not a real risk.

by blurbleblurble15 hours ago|

parent|

prev|

[-]

"AI" is a made up hype thing. It's just computers and computer programs. For real!

by merlindru17 hours ago|

parent|

prev|

[-]

i think this goes both ways too :) agents have been a boon for everyone with disabilities, carpal tunnel, RSI, ADHD, anything

and now the fact that interfaces need to be accessible to agents, not just humans, ironically increases it for humans in return

by lopis15 hours ago|

parent|

[-]

And lets not forget that not all disabilities are chronic. Many disabilities are situational or temporary. AI is a great assist for a hangover day for example...

by linkjuice4all15 hours ago|

parent|

prev|

[-]

I mean…I guess. But this is ridiculous - how many layers does our technology need to bash through to update two records on remote systems? I get that value is being added at some point - but just charge some micropayment for transactions. This is just too much.

by lazide14 hours ago|

parent|

[-]

Ever read Vernor Vinge’s a deepness in the sky? Digital archeologist, coming right up.

by btown12 hours ago|

prev|

[-]

If you're on macOS and interested in this space, I highly recommend you open up the system-provided Accessibility Inspector.app and play around with apps and browsers. See how the green cells might guide an LLM to only need to read/OCR specific parts of a screen, how much text is already natively available to the accessibility engine, and how this could lead to really effective hybrid systems - not just MCPs, but code generators that can build and run their own scripts to crawl your accessibility hierarchy for your workflow!

I think this is very fertile ground - big labs need to use approaches that can work on multiple platforms and arbitrary workflows, and full-page vision is the lowest common denominator. Platform-specific approaches are a really exciting open space!

by jasomill6 hours ago|

parent|

[-]

Windows has similar APIs and tools, see, e.g.,

https://accessibilityinsights.io/

https://learn.microsoft.com/en-us/windows/win32/winauto/insp...

https://github.com/FlaUI/FlaUInspect

and for WPF applications specifically,

https://github.com/snoopwpf/snoopwpf

by merlindru12 hours ago|

parent|

prev|

[-]

That's how I got into this thing in the first place, hah. Golden advice. It's incredibly cool to see what some apps offer. More of them have great accessibility support than you think (or at least than I thought!)

by willwade6 hours ago|

parent|

prev|

[-]

take a peek at https://github.com/willwade/app-automate?tab=readme-ov-file#... - its early and needs some work -but this is the idea behind this.. (my use case is not agents but actual real disabled people..who need tooling to provide better access to the desktop)

by drob51812 hours ago|

parent|

prev|

[-]

Great idea.

by gbriel18 hours ago|

prev|

[-]

This is a good solution, instead of everyone blowing tokens on repeating the same computer use task, come up with a way to share the workflows. I think you'd need to make sure there aren't workflows shared that extract user information (passwords).

by merlindru18 hours ago|

parent|

[-]

this is protected against at the OS level, provided the applications declare the input correctly as a SecureTextField.

i so far haven't found any application that doesn't.

all you're able to get out, as far as i can tell, is the length of the entered password.

by jasomill6 hours ago|

parent|

[-]

From applications that capture the screen or use accessibility APIs, perhaps, but what about, e.g., Windows applications that capture window messages, e.g.,

https://devblogs.microsoft.com/cppblog/spy-internals/

Obviously, if you can inject code into a process that receives sensitive data, you're already running in a context where all security bets are off.

But with processes you yourself create, you probably can, even without elevated privileges, unless the application takes measures to prevent injection (akin to game anticheat mechanisms), so it seems worth pointing out that there are simple mechanisms to subvert such "protected" fields that don't require application-specific reverse engineering.

by willwade6 hours ago|

prev|

[-]

Interesting! I started something - nowhere near as complete as that and quite different but again using accessibility UI elements. The BIG problem I've found is SOOOO much stuff does really poorly having these elements exposed. Here was my approach https://github.com/willwade/app-automate?tab=readme-ov-file#... - What I do here is build UI templates - either using UIAccess OR using a one pass using a vision model.

Now the argument against this on [reddit](https://www.reddit.com/r/openclaw/comments/1s1dzxq/comment/o...)

"my experience is the opposite actually. UIA looks uniform on paper but WPF, WinForms, and Win32 all expose different control patterns and you end up writing per-toolkit handlers anyway. Qt only exposes anything if QAccessible was compiled in and the accessibility plugin is loaded at runtime, which on shipped binaries is basically never. Electron is just as opaque on Windows as on macOS because it's the same chromium underneath drawing into a canvas. the real split isn't OS vs OS, it's native toolkit vs everything else."

by teej18 hours ago|

prev|

[-]

You should call it Braille

by merlindru18 hours ago|

parent|

[-]

shit, why didn't i think of that

i tend to think of invoke as "an API over macOS apps" tho...

doesn't `invoke finder shareAndCopyLink` read very nicely? :P

by hellojimbo16 hours ago|

prev|

[-]

Isn't that basically what browser base does. I've found the hardest part of browser use to be stealth first then client change management then browser comprehension (which gets better with every new model).

by merlindru16 hours ago|

parent|

[-]

i'm not too familiar with browserbase, but invoke works with any macOS app (or at least the accessible ones), i think browserbase is only for browser usage.

in the context of this blog post, the conclusion looks similar though!

"use the whole web like it's an API"

works much better than

"figure out similar or identical tasks from a clean slate every single time you do them"

by izend15 hours ago|

prev|

[-]

Does https://github.com/webmachinelearning/webmcp overlap ?

by merlindru15 hours ago|

parent|

[-]

Not really IMO, webmcp has devs change their apps. invoke just works with existing apps, especially ones that are accessible

invoke rather has overlap with Claude's and Codex' computer-use, except the steps are stored/scripted.

webmcp is bottom-up. computer-use & invoke are top-down