upvote
I don't think Claude has this part yet:

> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

reply
>background computer use

How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

reply
I remember looking trying to build something like this 6 years ago[0]. There are some interesting APIs for injecting click/keystroke events directly into Cocoa, and other APIs for reading framebuffers for apps that aren't in the foreground.

In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.

If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.

[0]: https://github.com/antimatter15/microtask/blob/master/cocoa/... [1]: https://github.com/OpenFora/openqwaq/blob/189d6b0da1fb136118...

reply
Probably accessibility APIs
reply
Which specific ones though allow you to send input to a window without raising it? People have been trying to do "focus follows mouse [without auto raise]" for a long time on mac, and the synthetic event equivalent to command+click is the only discovered method I'm aware of, e.g. used in https://github.com/sbmpost/AutoRaise

There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.

Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.

[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys

reply
Maybe they used Claude to come up with a good method to do this. /s

But I was also wondering, how this even works. The AI agent can have its own cursors and none of its actions interrupt my own workflow at all? Maybe I need to try this.

Also, this sounds like it would be very expensive since from my understanding each app frame needs to be analysed as an image first, which is pretty token intensive.

reply
They aquired Vercep, and their older agent Vy did have background agent. IIRC the recent computer-use agent in Claude is based on Vy, so i'm kinda surprised that feature didn't carry over to Claude desktop app.
reply
Imagine where we’d be if the restrictive iOS model was dominant in all computing. We’d never get anything like this
reply
It mostly feels like they’re just converging on each other. The latest Claude Mac app release pushed a new UI that looks almost exactly like Codex’s.
reply
IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

reply
THANK YOU. I keep thinking this as well. I'm rolling my own skills to actually make my job easier, which is all about gathering, surfacing, and synthesizing information so I can make quick informed decisions. I feel like nobody is thinking this way and it's bizarre.
reply
Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback
reply
In the full blog post I actually go into more detail about automatically creating a knowledge graph of what is being worked on throughout the whole company. There are some really powerful transformative efforts that can be accomplished right now, but that no one is doing.

Basic things like detecting common pain points, to automatically figuring out who is the SME for a topic. AIs are really good at categorizations and tagging, heck even before modern LLMs this is something ML could do.

But instead we have AI driven code reviews.

Code Reviews are rarely the blocker for productivity! As an industry, we need to stop automating the easy stuff and start helping people accomplish the hard stuff!

reply
You should check out https://pieces.app/ ive been using it for months and I am surprised I have never seen anyone ever talk about it.

It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.

reply
Agreed. It is ironic that in the AI race, the real differentiation may not come from how smart the model is, but from who builds the best application layer on top of it. And that application layer is built with the same kind of software these models are supposed to commoditize.
reply
This feels like *nix.

Developers built themselves really good OSes for doing developer things. Actually using it to do things was secondary.

Want to run a web server? Awesome choice. Want to write networking code? Great. Setup a reliable DB with automated backups? Easy peasy.

Want a stable desktop environment? Well after almost 30 years we just about have one. Kind of. It isn't consistent and I need to have a post it note on my monitor with the command to restart plasma shell, but things kind of work.

Current AI tools are so damn focused on building developer experiences, everything else is secondary. I get it, developers know how to fix developer pain points, and it monitizes well.

But holy shit. Other things are possible. Someone please do them. Or hell give me a 20 or 30 million and I'll do it.

But just.... The obvious is sitting out there for anyone who has spent 10 minutes not being just a developer.

reply
Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app
reply
At least when I tried it last, Claude Cowork tried to spin up an entire virtual machine to sandbox itself properly - and not only is that sandboxing slow to start up, it also makes it difficult to actually interact freely across your filesystem. (Perhaps a feature, not a bug.)

Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).

reply
Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.
reply
Codex CLI is a TUI app, but Codex App is an actual desktop GUI app. If you actually look at the TFA, you'll see that all of the videos are of the desktop app.
reply
> Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.

reply
Codex is both a macOS app and a CLI/TUI app.

Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.

By the look and feel of it I would guess it is written with Electron.

reply
the codex desktop app is electron, as is claudes
reply
deleted
reply
??? Codex has more features than Claude Cowork (background computer use, etc)
reply
Antigravity off in the corner feeling sad about itself rn.
reply
I love poor forgotten Antigravity. For one, you can use your Gemini account to churn Opus credits until they run out then switch to Gemini 3.1 to finish off.
reply
The first time I tried anthropics version it burned up all its tokens in like 10 minutes and left me stuck in a broken state. So I uninstalled it.
reply
Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.
reply
Clicking UI elements can also be done in Github copilot for vscode, and cursor.
reply
Didn't the original ChatGPT desktop app have computer use first?
reply
I think your making assumptions without reading the entire thread and processing the general theme. This isn't about catching up or whos better. It really comes down two things. One, how far does your money go, and secondly which political narrative you subscribe too. Up until they started their beef with the u.s. government I was a subscriber. Between that and how fast my tokens depleted I switched to Codex. Best decision of my life and now I never run out of tokens.

It was the perfect storm and I would have never switched since the first AI I started with was Claude.

reply
You want to use the model that is potentially giving your data to the government vs the one that’s openly rejecting that partnership?
reply
At this point you gotta pick and chose your morality Claude is screwing people on credits and tokens OoenAI is selling three molecules left of your privacy to the government Are those three molecules worth fighting for when your budget is really tight or you are unemployed? Everyone has different priorities
reply
Its not like Claude is pioneering those. All that was done prior to all of them by some random startup.
reply
It's not x, It's y.

:^)

reply
[dead]
reply