undefined

upvote

points

by ok_dad19 hours ago |

upvote

by soiltype19 hours ago|

[-]

Personally I've found "carefully review every move it makes" to be an extremely unpleasant and difficult workflow. The effort needed to parse every action is immense, but there's a complete absence of creative engagement - no chance of flow state. Just the worst kind of work which I've been unable to sustain, unfortunately. At this point I mostly still do work by hand.

reply

upvote

by est3117 hours ago|

[-]

It's unpleasant for me at normal speed settings, but on fast mode it works really well: the AI does changes quickly enough for me to stay focused.

Of course this requires being fortunate enough that you have one of those AI positive employers where you can spend lots of money on clankers.

I don't review every move it makes, I rather have a workflow where I first ask it questions about the code, and it looks around and explores various design choices. then i nudge it towards the design choice I think is best, etc. That asking around about the code also loads up the context in the appropriate manner so that the AI knows how to do the change well.

It's a me in the loop workflow but that prevents a lot of bugs, makes me aware of the design choices, and thanks to fast mode, it is more pleasant and much faster than me manually doing it.

reply

upvote

by lambda17 hours ago|

[-]

This is my biggest problem with the promises of agentic coding (well, there are an awful lot of problems, but this is the biggest one from an immediate practical perspective).

One the one hand, reviewing and micromaning everything it does is tedious and unrewarding. Unlike reviewing a colleague's code, you're never going to teach it anything; maybe you'll get some skills out of it if you finds something that comes up often enough it's worth writing a skill for. And this only gets you, at best, a slight speedup over writing it yourself, as you have to stay engaged and think about everything that's going on.

Or you can just let it grind away agentically and only test the final output. This allows you to get those huge gains at first, but it can easily just start accumulating more and more cruft and bad design decisions and hacks on top of hacks. And you increasingly don't know what it's doing or why, you're losing the skill of even being able to because you're not exercising it.

You're just building yourself a huge pile of technical debt. You might delete your prod database without realizing it. You might end up with an auth system that doesn't actually check the auth and so someone can just set a username of an admin in a cookie to log in. Or whatever; you have no idea, and even if the model gets it right 95% of the time, do you want to be periodically rolling a d20 and if you get a 1 you lose everything?

reply

upvote

by JoshTriplett19 hours ago|

[-]

I agree, but I also think that giving the LLM free rein is also extremely unpleasant and difficult. And you still need to review the resulting code.

reply

upvote

by soiltype17 hours ago|

[-]

I don't think there's anything difficult or unpleasant about the process of letting the LLM run free, that's the whole point, it's nearly frictionless. Which includes not reviewing the code carefully. You say "need" but you mean "ought".

reply

upvote

by JoshTriplett17 hours ago|

[-]

Friction is not the only source of displeasure. I've tried out vibe-coding for something non-trivial; I found it deeply unpleasant.

reply

upvote

by tabwidth15 hours ago|

[-]

Reviewing isn't hard when the diff is what you asked for. It's when you asked for a one-line fix and get back 40 changed lines across four files. At that point you're not even reviewing your change anymore, you're auditing theirs.

reply

upvote

by Lihh2718 hours ago|

[-]

That's the trap though. The moment you approve every step, you're no longer getting the product that was sold to you. You're doing code review on a stochastic intern. The whole 10x story depends on you eventually looking away.

reply

upvote

by ok_dad16 hours ago|

[-]

Just don’t buy the tools for 10x improvements, buy them for the 1.1x improvement and the help it gives with the annoying stuff like refactoring arguments to a function that’s used all over, writing tests, etc. They can also help reduce cognitive load in certain ways when you just use them to ask about your large code base.

reply

upvote

by threatofrain18 hours ago|

[-]

Because the degree to which the LLM prompts you back to the terminal is too frequent for the human to engage in parallel work.

reply

upvote

by ok_dad16 hours ago|

[-]

I’m basically saying don’t do parallel work, use it as a tool. Just sit there and watch it do stuff, make sure it’s doing what you want, and stop it if it’s doing too much or not what you want to do.

Maybe I’m just weird (actually that’s a given) but I don’t mind babysitting the clanker while it works.

reply

upvote

by sornaensis16 hours ago|

[-]

I define tools that perform individual tasks, like build the application, run the tests, access project management tools with task context, web search, edit files in the workspace, read only vs write access source control, etc.

The agent only has access to exactly what it needs, be it an implementation agent, analysis agent, or review agent.

Makes it very easy to stay in command without having to sit and approve tons of random things the agent wants to do.

I do not allow bash or any kind of shell. I don't want to have to figure out what some random python script it's made up is supposed to do all the time.

reply

upvote

by ok_dad16 hours ago|

[-]

This is a cool idea, can you write more about how your tools work or maybe short descriptions of a few of them? I’m interested in more rails for my bots.

reply

upvote

by sornaensis16 hours ago|

[-]

I just made MCP servers that wrap the tools I need the agents to use, and give no-ask permissions to the specific tools the agents need in the agent definition.

Both OpenCode and VsCode support this. I think in ClaudeCode you can do it with skills now.

The other benefit is the MCP tool can mediate e.g. noisy build tool output, and reduce token usage by only showing errors or test failures, nothing else, or simply an ok response with the build run or test count.

So far, I have not needed to give them access to more than build tools, git, and a project/knowledge system (e.g. Obsidian) for the work I have them doing. Well and file read/write and web search.

reply

upvote

by ok_dad11 hours ago|

[-]

Cool, thanks for the additional details!

I use Cursor but it's getting expensive lately, so I'm trying to reduce context size and move to OpenCode or something like that which I can use with some cheaper provider and Kimi 2.5 or whatever.

reply

upvote

by andoando19 hours ago|

[-]

Because its SO much faster not to have to do all that. I think 10x is no joke, and if you're doing MVP, its just not worth the mental effort.

reply

upvote

by pron17 hours ago|

[-]

POC, sure (although 10x-ing a POC doesn't actually get you 10x velocity). MVP, though? No way. Today's frontier models are nowhere near smart enough to write a non-trivial product (i.e. something that others are meant to use), minimal or otherwise, without careful supervision. Anthropic weren't able to get agents to write even a usable C compiler (not a huge deal to begin with), even with a practically infeasible amount of preparatory work (write a full spec and a reference implementation, train the model on them as well as on relevant textbooks, write thousands of tests). The agents just make too many critical architectural mistakes that pretty much guarantee you won't be able to evolve the product for long, with or without their help. The software they write has an evolution horizon between zero days and about a year, after which the codebase is effectively bricked.

reply

upvote

by andoando16 hours ago|

[-]

There is a million things in between a C compiler and a non-trivial product. They do make a ton of horrible architectural decisions, but I only need to review the output/ask questions to guide that, not review every diff.

reply

upvote

by pron15 hours ago|

[-]

A C compiler is a 10-50KLOC job, which the agents bricked in 0 days despite a full spec and thousands of hand-written tests, tests that the software passed until it collapsed beyond saving. Yes, smaller products will survive longer, but how would you know about the time bombs that agents like hiding in their code without looking? When I review the diffs I see things that, if had let in, the codebase would have died in 6-18 months.

BTW, one tip is to look at the size of the codebase. When you see 100KLOC for a first draft of a C compiler, you know something has gone horribly wrong. I would suggest that you at least compare the number of lines the agent produced to what you think the project should take. If it's more than double, the code is in serious, serious trouble. If it's in the <1.5x range, there's a chance it could be saved.

Asking the agent questions is good - as an aid to a review, not as a substitute. The agents lie with a high enough frequency to be a serious problem.

The models don't yet write code anywhere near human quality, so they require much closer supervision than a human programmer.

reply

upvote

by sarchertech15 hours ago|

[-]

A C compiler with an existing C compiler as oracle, existing C compilers in the training set, and a formal spec, is already the easiest possible non-trivial product an agent could build without human review.

You could have it build something that takes fewer lines of code, but you aren’t gonna to find much with that level of specification and guardrails.

reply

upvote

by 19 hours ago|

[-]

deleted

reply

upvote

by applfanboysbgon19 hours ago|

[-]

This is significantly slower than just writing the code yourself.

reply

upvote

by ok_dad16 hours ago|

[-]

I don’t find it slower overall, personally, but YMMV depending on how you like to tackle problems. Also the problem space and the project details can dictate that these tools aren’t helpful. Luckily the code I write tends to be perfect for a coding agent to clank away for me.

reply

upvote

by sdevonoes17 hours ago|

[-]

For that kind of flow, I prefer to work without AI.

reply

upvote

by ok_dad16 hours ago|

[-]

The agent mostly helps me reduce cognitive load and avoid the fiddly bits. I still review and understand all of the code but I don’t have to think about writing all of it. I also still hand write tons of code when I want to be very specific about behavior.

reply

upvote

by giraffe_lady19 hours ago|

[-]

I agree with this too. I decided on constraints for myself around these tools and I give my complete focus & attention to every prompt, often stopping for minutes to figure things through and make decisions myself. Reviewing every line they produce. I'm a senior dev with a lot of experience with pair programming and code review, and I treat its output just as I would those tasks.

It has about doubled my development pace. An absolutely incredible gain in a vacuum, though tiny compared to what people seem to manage without these self-constraints. But in exchange, my understanding of the code is as comprehensive as if I had paired on it, or merged a direct report's branch into a project I was responsible for. A reasonable enough tradeoff, for me.

reply

upvote

by arjie18 hours ago|

[-]

I have never found any utility in that. After all, you can still just review the diffs and ask it for explanation for sections instead.

reply

upvote

by pavel_lishin18 hours ago|

[-]

> After all, you can still just review the diffs

anonu has explicitly said that they've wiped a database twice as a result of agents doing stuff. What sort of diff would help against an agent running commands, without your approval?

reply

upvote

by arjie14 hours ago|

[-]

Agent does not have to run in your user context. It is easy mistake to make in yolo mode but after that it's easy to fix. e.g. this is what I use now so I can release agent from my machine and also constrain its access:

    $ main-app git:(main) kubectl get pods | grep agent | head -n 1 | sed -E 's/[a-z]+-agent(.*)/app-agent\1/'
    app-agent-656c6ff85d-p86t8                          1/1     Running     0             13d

Agent is fully capable of making PR etc. if you provide appropriate tooling. It wipes DB but DB is just separate ephemeral pod. One day perhaps it will find 0-day and break out, but so far it has not done it.

reply

upvote

by exe3417 hours ago|

[-]

Hah I run my agent inside a docker with just the code. Anything clever it tries to do just goes nowhere.

reply

upvote

by ModernMech18 hours ago|

[-]

> After all, you can still just review the diffs

The diff: +8000 -4000

reply

upvote

by arjie13 hours ago|

[-]

You can ask it to make the changes in appropriate PRs. SOTA model + harness can do it. I find it useful to separate refactors and implementations, just like with humans, but I admittedly rely heavily on multi-provider review.

reply

upvote

by wahnfrieden19 hours ago|

[-]

It’s terribly slow

reply

upvote

by ok_dad16 hours ago|

[-]

I get it, but if tomorrow every inference provider doubled costs I still understand my applications code and can continue to work on it myself.

reply

upvote

by wahnfrieden15 hours ago|

[-]

I hear this a lot but I don't think decades of experience atrophies irretrievably so quickly as to make it worth it (alone) to abstain from making full use of these tools. I still read and direct enough of the architecture to not be lost in the code it generates. Maybe you haven't tried using agents to reorganize/refactor as much - I have cleaner code than I did before when it was done by hand, because I can afford to tackle debts.

I also don't find the permissions it prompts for very meaningful. Permission to use a file search tool? Permission to make a web request? It's a clumsy way to slow it down enough for me to catch up.

reply

upvote

by esafak18 hours ago|

[-]

You can push thousands of LOC every day while approving manually. If you went any faster you would not be able to read the code.

reply