> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.
How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.
I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.
In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.
If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.
[0]: https://github.com/antimatter15/microtask/blob/master/cocoa/... [1]: https://github.com/OpenFora/openqwaq/blob/189d6b0da1fb136118...
There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.
Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.
[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys
But I was also wondering, how this even works. The AI agent can have its own cursors and none of its actions interrupt my own workflow at all? Maybe I need to try this.
Also, this sounds like it would be very expensive since from my understanding each app frame needs to be analysed as an image first, which is pretty token intensive.
/s
I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.
I swear none of the AI companies have any sense of human centric design.
> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.
This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.
If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.
Basic things like detecting common pain points, to automatically figuring out who is the SME for a topic. AIs are really good at categorizations and tagging, heck even before modern LLMs this is something ML could do.
But instead we have AI driven code reviews.
Code Reviews are rarely the blocker for productivity! As an industry, we need to stop automating the easy stuff and start helping people accomplish the hard stuff!
It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.
Developers built themselves really good OSes for doing developer things. Actually using it to do things was secondary.
Want to run a web server? Awesome choice. Want to write networking code? Great. Setup a reliable DB with automated backups? Easy peasy.
Want a stable desktop environment? Well after almost 30 years we just about have one. Kind of. It isn't consistent and I need to have a post it note on my monitor with the command to restart plasma shell, but things kind of work.
Current AI tools are so damn focused on building developer experiences, everything else is secondary. I get it, developers know how to fix developer pain points, and it monitizes well.
But holy shit. Other things are possible. Someone please do them. Or hell give me a 20 or 30 million and I'll do it.
But just.... The obvious is sitting out there for anyone who has spent 10 minutes not being just a developer.
Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).
I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.
Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.
By the look and feel of it I would guess it is written with Electron.
It was the perfect storm and I would have never switched since the first AI I started with was Claude.
:^)