undefined

points

[-]

You guys are describing wonderful things, but I've yet to see any implementation. I tried coding my own agents, yet the results were disappointing.

What kind of setup do you use ? Can you share ? How much does it cost ?

by throwaway77838 minutes ago|

parent|

[-]

We have a very uncomplicated setup with claude code. A CLAUDE.md with instructions and notes about the repo and how to run stuff. We also do code reviews with Claude Code, but in a separate session.

It works wonderfully well. Costs about $200USD per developer per month as of now.

by aprdm59 minutes ago|

parent|

prev|

[-]

If you are not spending 5-10k dollars a month for interesting projects, you likely won't see interesting results

by mrbungie28 minutes ago|

parent|

[-]

I can't really tell if this is sarcasm or not.

by dworks2 hours ago|

parent|

prev|

[-]

rlm-workflow does all that TDD for you: https://skills.sh/doubleuuser/rlm-workflow/rlm-workflow

(I built it)

by cheema3328 minutes ago|

parent|

[-]

Why make powershell a requirement? I like powershell, but Python is very common and already installed on many dev systems.

by _ink_2 hours ago|

parent|

prev|

[-]

Thanks for sharing. What does RLM stand for? Any idea why the socket security test fails?

by stavros2 hours ago|

parent|

[-]

Recursive language models: https://github.com/doubleuuser/rlm-workflow

by canadiantim1 hours ago|

parent|

prev|

[-]

Check out Mike Pocock’s work, he’s done excellent work writing about red green refactor and has a GitHub repo for his skills. Read and take what you need from his tdd skill and incorporate it into your own tdd skill tailored for your project.

by nojito43 minutes ago|

parent|

[-]

This is just ai slop. If you follow what the actual designers of Claude/GPT tell you it flys in the face of building out over engineered harnesses for agents.

by throwaway77837 minutes ago|

parent|

[-]

I agree with this. There is not a lot of harnesses/wrapping needed for Claude Code.

by canadiantim8 minutes ago|

parent|

prev|

[-]

Works better than standard claude / gpt, which doesn't do red-green-refactor. Doesn't seem like slop when it meaningfully changes the results for the better, consistently. Really is a game-changer. You should consider trying it.

by tomtom13374 hours ago|

prev|

[-]

This is very interesting, but like sibling comments, I'm very curious as to how you run this in practice. Do you just tell Claude/Copilot to do what you describe?

And do you have any prompts to share?

by throwaway77833 minutes ago|

parent|

[-]

You don't need most of this. Prompts are also normally what you would say to another engineer.

* There is a lot of duplication between A & B. Refactor this.

* Look at ticket X and give me a root cause

* Add support for three new types of credentials - Basic Auth, Bearer Token and OAuth Client Creds

Claude.md has stuff like "Here's how you run the frontend. here's how u run backend. This module support frontend. That module is batch jobs. Always start commit messages with ticket number. Always run compile at the top level. When you make code changes, always add tests" etc etc

by xienze2 hours ago|

prev|

[-]

This seems like a tremendous amount of planning, babysitting, verification, and token cost just to avoid writing code and tests yourself.

by habinero2 hours ago|

parent|

[-]

It's assigning yourself the literal worst parts of the job - writing specs, docs, tests and reading someone else's code.

by gedy1 hours ago|

parent|

prev|

[-]

Yes with the reward of: I don't understand this code and didn't learn anything incrementally about the feature I "planned".

by skybrian4 hours ago|

prev|

[-]

How do you define visibility rules? Is that possible for subagents?

by egeozcan4 hours ago|

parent|

[-]

AFAIK Claude doesn't support it, but if you're willing to go the extra mile, you can get creative with some bash script: https://pastebin.com/raw/m9YQ8MyS (generated this a second ago - just to get the point across )

To be clear, I don't do this. I never saw an agent cheat by peeking or something. I really did look through their logs.

I'd be very interested to see claude code and other tools support this pattern when dispatching agents to be really sure.

by achierius3 hours ago|

parent|

[-]

> To be clear, I don't do this.

How do you know that it works then? Are you using a different tool that does support it?

by skybrian3 hours ago|

parent|

prev|

[-]

So what do you do? Do you define roles somewhere and tell the agent to assign these roles to subagents?

by ssk422 hours ago|

parent|

[-]

Fun to see you not on tildes.

Setting up a clean room is one of the only ways to do Evals on agentic harnesses. Especially prevalent with Windsurf which doesn’t have an easy CLI start.

So how? The easiest answer when allowed is docker. Literally new image per prompt. There’s also flags with Claude to not use memory and from there you can use -p to have it just be like a normal cli tool. Windsurf requires manual effort of starting it up in a new dir.

by skybrian25 minutes ago|

parent|

[-]

Sounds interesting, but I'm not quite getting the relevance for people writing code with an agent. Should I be doing evals?