upvote
Yep I have an autonomous task where it has been running for 8 hours now and counting. It compacts context all the time. I’m pretty skeptical of the quality in long sessions like this so I have to run a follow on session to critically examine everything that was done. Long context will be great for this.
reply
Are those long unsupervised sessions useful? In the sense, do they produce useful code or do you throw most of it away?
reply
I get very useful code from long sessions. It’s all about having a framework of clear documentation, a clear multi-step plan including validation against docs and critical code reviews, acceptance criteria, and closed-loop debugging (it can launch/restsart the app, control it, and monitor logs)

I am heavily involved in developing those, and then routinely let opus run overnight and have either flawless or nearly flawless product in the morning.

reply
I haven't figured out how to make use of tasks running that long yet, or maybe I just don't have a good use case for it yet. Or maybe I'm too cheap to pay for that many API calls.
reply
My change cuts across multiple systems with many tests/static analysis/AI code reviews happening in CI. The agent keeps pushing new versions and waits for results until all of them come up clean, taking several iterations.
reply
I mean if you don't have your company paying for it I wouldn't bother... We are talking sessions of 500-1000 dollars in cost.
reply
Right. At Opus 4.6 rates, once you're at 700k context, each tool call costs ~$1 just for cache reads alone. 100 tool calls = $100+ before you even count outputs. 'Standard pricing' is doing a lot of work here lol
reply
Cache reads don’t count as input tokens you pay for lol.

https://www.claudecodecamp.com/p/how-prompt-caching-actually...

reply
All of those things are smells imo, you should be very weary of any code output from a task that causes that much thrashing to occur. In most cases it’s better to rewind or reset and adapt your prompt to avoid the looping (which usually means a more narrowly defined scope)
reply
A person has a supervision budget. They can supervise one agent in a hands-on way or many mostly-hands-off agents. Even though theres some thrashing assistants still get farther as a team than a single micromanaged agent. At least that’s my experience.
reply
Just curious, what kind of work are you doing where agentic workflows are consistently able to make notable progress semi-autonomously in parallel? Hearing people are doing this, supposedly productively/successfully, kind of blows my mind given my near-daily in-depth LLM usage on complex codebases spanning the full stack from backend to frontend. It's rare for me to have a conversation where the LLM (usually Opus 4.6 these days) lasts 30 minutes without losing the plot. And when it does last that long, I usually become the bottleneck in terms of having to think about design/product/engineering decisions; having more agents wouldn't be helpful even if they all functioned perfectly.
reply
I've passed that bottleneck with a review task that produces engineering recommendations along six axis (encapsulation, decoupling, simplification, dedoupling, security, reduce documentation drift) and a ideation tasks that gives per component a new feature idea, an idea to improve an existing feature, an idea to expand a feature to be more useful. These two generate constant bulk work that I move into new chat where it's grouped by changeset and sent to sub agent for protecting the context window.

What I'm doing mostly these days is maintaining a goal.md (project direction) and spec.md (coding and process standards, global across projects). And new macro tasks development, I've one under work that is meant to automatically build png mockup and self review.

reply
What are you using to orchestrate/apply changes? Claude CLI?
reply
I prefer in IDE tools because I can review changes and pull in context faster.

At home I use roo code, at work kiro. Tbh as long as it has task delegation I'm happy with it.

reply
weary (tired) -> wary (cautious)
reply
Wary, not weary. Wary: cautious. Weary: tired.
reply
This is really common, I think because there’s also “leery” - cautious, distrustful, suspicious.
reply