undefined

points

[-]

No.

Computer use (to anthropic, as in the article) is an LLM controlling a computer via a video feed of the display, and controlling it with the mouse and keyboard.

by dbbk2 hours ago|

parent|

[-]

That sounds weird. Why does it need a video feed? The computer can already generate an accessibility tree, same as Playwright uses it for webpages.

by 0sdi2 hours ago|

parent|

[-]

So that it can utilize gui and interfaces designed for humans. Think of video editing program for example.

by dbbk1 hours ago|

parent|

[-]

Yes. GUIs expose an accessibility tree.

by lsaferite1 hours ago|

parent|

prev|

[-]

I feel like a legion of blind computer users could attest to how bad accessibility is online. If you added AI Agents to the users of accessibility features you might even see a purposeful regression in the space.

by chasd002 hours ago|

parent|

prev|

[-]

> controlling a computer via a video feed of the display, and controlling it with the mouse and keyboard.

I guess that's one way to get around robots.txt. Claim that you would respect it but since the bot is not technically a crawler it doesn't apply. It's also an easier sell to not identify the bot in the user agent string because, hey, it's not a script, it's using the computer like a human would!

by cowboylowrez2 hours ago|

parent|

prev|

[-]

oh hell no haha maybe with THEIR login hahaha

by michaelt3 hours ago|

prev|

[-]

> Almost every organization has software it can’t easily automate: specialized systems and tools built before modern interfaces like APIs existed. [...]

> hundreds of tasks across real software (Chrome, LibreOffice, VS Code, and more) running on a simulated computer. There are no special APIs or purpose-built connectors; the model sees the computer and interacts with it in much the same way a person would: clicking a (virtual) mouse and typing on a (virtual) keyboard.

https://www.anthropic.com/news/claude-sonnet-4-6

by jpalepu3 hours ago|

prev|

[-]

Interesting question! In this context, "computer use" means the model is manipulating a full graphical interface, using a virtual mouse and keyboard to interact with applications (like Chrome or LibreOffice), rather than simply operating in a shell environment.

by mentalgear2 hours ago|

parent|

[-]

Indeed GUI-use would have been the better naming.

by zmmmmm3 hours ago|

prev|

[-]

No their definition of "computer use" now means:

> where the model interacts with the GUI (graphical userinterface) directly.

by lukev2 hours ago|

prev|

[-]

This is being downvoted but it shouldn't be.

If the ultimate goal is having a LLM control a computer, round-tripping through a UX designed for bipedal bags of meat with weird jelly-filled optical sensors is wildly inefficient.

Just stay in the computer! You're already there! Vision-driven computer use is a dead end.

by zmmmmm1 hours ago|

parent|

[-]

you could say that about natural language as well, but it seems like having computers learn to interface with natural language at scale is easier than teaching humans to interface using computer languages at scale. Even most qualified people who work as software programmers produce such buggy piles of garbage we need entire software methodologies and testing frameworks to deal with how bad it is. It won't surprise me if visual computer use follows a similar pattern. we are so bad at describing what we want the computer to do that it's easier if it just looks at the screen and figures it out.

by ashirviskas1 hours ago|

parent|

prev|

[-]

Someone ping me in 5 years, I want to see if this aged like milk or wine

by JSR_FDED14 minutes ago|

parent|

[-]

“Computer, respond to this guy in 5 years”

by chasd002 hours ago|

parent|

prev|

[-]

i replied as much to a sibling comment but i think this is a way to wiggle out of robots.txt, identifying user agent strings, and other traditional ways for sites to filter for a bot.

by lukev1 hours ago|

parent|

[-]

Right but those things exist to prevent bots. Which this is.

So at this point we're talking about participating in the (very old) arms race between scrapers & content providers.

If enough people want agents, then services should (or will) provide agent-compatible APIs. The video round-trip remains stupid from a whole-system perspective.

by mvdtnz1 hours ago|

parent|

prev|

[-]

I mean if they want to "wriggle out" of robots.txt they can just ignore it. It's entirely voluntary.