Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

upvote

Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue

(llmgame.scalex.dev)

31 points

by Wirbelwind3 hours ago |

upvote

by axod1 minutes ago|

[-]

Fun little game, but I think the questions jump context so much it's a little unrepresentative. It might be better to group things into "packs", which have more real-world representative structure to them. For example, lots of "editing something.js" file permission requests, and then an "npm publish" is far more normal, and it's more of a risk, if you're used to pressing Y lots and then suddenly out of the blue...

reply

upvote

by Wirbelwind4 minutes ago|

[-]

Thanks all for checking it out and your suggestions!

If anyone is curious about the actual underlying risks and problems with some mitigations (like the 17% false-negative rates of Auto Mode), I wrote up a quick summary of some of the approaches here

https://scalex.dev/blog/ai-agent-permissions/

reply

upvote

by cobbal29 minutes ago|

[-]

That's funny. It told me that blocking "npm run build" was the wrong answer. Maybe it doesn't really under The threat model.

reply

upvote

by zackify28 minutes ago|

[-]

I vibe coded a TUI that just shows running lxd containers

I hit 'n' to toggle all network access minus anthropic and openai URLs.

I use pi (sometimes claude, always on bypass) and I auto allow everything. I only toggle manual approval in rare cases like running a script or command that needs to touch a production system and I need to validate everything.

Normally my container has full write access to staging so it can debug and validate everything on its own

reply

upvote

by Liftyee37 minutes ago|

[-]

I haven't used local agentic AI yet for programming projects. Hence, -187 score

The filter for "commands I would run myself" and "commands I would let an agent run" are very different it seems.

reply

upvote

by soanvig13 minutes ago|

[-]

Fun game. Can somebody run an agent against those questions to see how it performs? :)

reply

upvote

by bspammer4 minutes ago|

[-]

[delayed]

reply

upvote

by ghrl39 minutes ago|

[-]

I am mostly using OpenCode and barely ever see a permission prompt. While they do enforce it for outside workspace read/write, with the bash tool the agent can just bypass that. I'm not quite sure why it is that way, and it certainly isn't a very good solution, but likely not worse than asking for everything which just trains the user to always accept and provides a false sense of security then.

reply

upvote

by MeetingsBrowser50 minutes ago|

[-]

It would be cool to see the distribution of all player scores.

reply

upvote

by Wirbelwind8 minutes ago|

[-]

That's a great idea, stay tuned

reply

upvote

by sevenseacat47 minutes ago|

[-]

Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer

Caught 8/8 threats "Not a single secret leaked"

→ llmgame.scalex.dev

reply

upvote

by carterschonwald53 minutes ago|

[-]

some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)

reply

upvote

by 54 minutes ago|

[-]

deleted

reply

upvote

by cadwell56 minutes ago|

[-]

1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)

reply

upvote

by nardib2 hours ago|

[-]

Use this and save yourself:

claude --dangerously-skip-permissions

reply

upvote

by tasuki49 minutes ago|

[-]

Just make sure to run it in an isolated environment where it's ok to mess things up, and make sure it doesn't have access to any secrets.

reply

upvote

by wildpeaks57 minutes ago|

[-]

This is why having a human in the loop isn't enough because they will cut corners and skip reviewing what they should review.

reply

upvote

by preciousoo3 minutes ago|

[-]

I created a watcher for this problem, to watch my PRs for unfinished scope and have a fresh Claude review

Uses tmux and gh https://github.com/Kyu/claude-pr-watch

reply

upvote

by chuckadams52 minutes ago|

[-]

A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent

reply

upvote

by qsxfthnkp232259 minutes ago|

[-]

I love it when Claude is dangerous

reply

upvote

by dheera31 minutes ago|

[-]

I got tired of typing that and just do

    alias claude="claude --dangerously-skip-permissions"

I do have a separate "claude" user on my system without sudo access and without access to my main user home dir

And yeah I know that's not perfect but I'm trying to get shit done

reply

upvote

by franze6 minutes ago|

[-]

alias claude+="claude --dangerously-skip-permissions"

alias claude++="claude --dangerously-skip-permissions --continue"

reply