There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.
Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.
[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys
But I was also wondering, how this even works. The AI agent can have its own cursors and none of its actions interrupt my own workflow at all? Maybe I need to try this.
Also, this sounds like it would be very expensive since from my understanding each app frame needs to be analysed as an image first, which is pretty token intensive.