undefined

upvote

points

by bobjordan2 days ago |

upvote

by majormajor2 days ago|

[-]

This feels like the "prompt engineering" wave of 2023 all over again. A bunch of hype about a specific point-in-time activity based on a lot of manual setup of prompts compared to naive "do this thing for me" that eventually faded as the tooling started integrating all the lessons learned directly.

I'd expect that if there is a usable quality of output from these approaches it will get rolled into existing tools similarly, like how multi-agents using worktrees already was.

reply

upvote

by eddythompson802 days ago|

[-]

2023 was the year of “look at this dank prompt I wrote yo”-weekly demos.

reply

upvote

by lxgr1 days ago|

[-]

And 2026 is shaping up to be the year of "look at this prompt my middle manager agent wrote for his direct reports" :)

reply

upvote

by overgard22 hours ago|

[-]

Maybe this is just a skill issue on my part, but I'm still trying to wrap my head around the workflow of running multiple claude agents at once. How do they not conflict with each other? Also how do you have a project well specified enough that you can have these agents working for hours on end heads down? My experience as a developer (even pre AI) has mostly been that writing-code-fast has rarely been the progress limiter.. usually the obstacles are more like, underspecified projects, needing user testing, disagreements on the value of specific features, subtle hard to fix bugs, communication issues, dealing with other teams and their tech, etc. If I have days where I can just be heads down writing a ton of code I'm very happy.

reply

upvote

by vanviegen2 days ago|

[-]

I can't imagine letting a current gen LLM supervise Claude Code instances. How could that possibly lead to even remotely acceptable software quality?

reply

upvote

by bobjordan2 days ago|

[-]

I spec out everything in excruciating detail with spec docs. Then I actually read them. Finally, we create granular tasks called "beads" (see https://github.com/steveyegge/beads). The beads allows us to create epics/tasks/subtasks and associated dependency structure down to a granular bead, and then the agents pull a "bead" to implement. So, mostly we're either creating spec docs and creating beads or implementing, quality checking, and testing the code created from an agent implementing a bead. I can say this produces better code than I could write after 10yrs of focused daily coding myself. However, I don't think "vibe coders" that have never truly learned to code, have any realistic chance of creating decent code in a large complex code base that requires a complex backend schema to be built. They can only build relatively trivial apps. But, I do believe what I am building is as solid as if I had a millions of dollars of staff doing it with me.

reply

upvote

by Dzugaru2 days ago|

[-]

But how is that less work and allows you to do that in Disneyland with your kids? For me, personally, there is little difference between "speccing out everything in excruciating detail in spec docs" and "writing actual implementation in high-level code". Speccing in detail requires deep thought, whiteboard, experimentation etc. All of this cannot be done in Disneyland, and no AI can do this at good level (that's why you "spec out everything in detail", create "beads" and so on?)

reply

upvote

by bobjordan2 days ago|

[-]

Yes, I normally draft spec docs in the office at my desk, this is true. However, when I have the spec ready for implementation with clear "beads", I can reasonably plan to leave my office and work from my phone. Its not at a point where I can just work 100% remote from my phone (I probably could but this is all still new to me too). But it does give me the option to be vastly more productive, away from my desk.

reply

upvote

by embedding-shape2 days ago|

[-]

Do you have any code publicly available so we could see what kind of code this sort of setup produces?

reply

upvote

by bobjordan2 days ago|

[-]

Not yet, but I can tell you that producing "good" code is another layer altogether. I have custom linters, code standardization docs, custom prompts, strictly enforced test architecture (enforced by the custom linters in pre-commit hooks which run before an agent tries to commit). Ultimately, it's a lot of work to get all the agents with a limited context writing code in the way you want. In the main large complex project I am generally working on now, I have hand-held and struggled for over a year getting it all setup the way I need it. So I can't say its been a weekend setup for me. It's been a long arduous process to get where I am now in my 2-3 main repos that I work on. However, the workflow I just shared above, can help people get there a lot faster.

reply

upvote

by embedding-shape2 days ago|

[-]

> but I can tell you that producing "good" code is another layer altogether.

I feel like it isn't. If the fundamental approach is good, "good" code should be created as a necessity and because there wouldn't be another way. If it's already a mess with leaking abstractions and architecture that doesn't actually enforce any design, then it feels unlikely you'll be able to stack anything on top of below it to actually fix that.

And then you end up with some spaghetti that the agent takes longer and longer to edit as things get more and more messy.

reply

upvote

by bobjordan2 days ago|

[-]

Here is my view after putting in my 10,000+ hours learning to code pre-llm days, while also building a pretty complex design + contract manufacturing company, with a lot of processes in place for that to happen. If you have a bunch of human junior devs and even a senior dev or two that join your org to help you build an app, and you don't have any dev/ops structure in place for them, then you will end up with "spaghetti" throughout all your code/systems, from those relatively bright humans. Its the same managing agents. It cannot be expected to build a complex system from simple "one shot me a <x> feature" from a bunch of different agents, each with a finite ~150k token context limit. It must be done in context of the system you have in place. If you have a poor/no system structure, you'll end up with garbage for code. Everything that I said I had to guide the agents, is also useful for human devs. I'm sure that all the FANGS and various advanced software companies also use custom linters, etc., for every code check in. It's just now become easier to have these advanced code quality structures in place, and it is absolutely necessary when managing/coordinating agents to build a complex application.

reply

upvote

by embedding-shape2 days ago|

[-]

I've clocked some hours too, and I think as soon as you let something messy in, you're already losing. The trick isn't "how to manage spaghetti" with LLMs (nor humans), because the context gets all wired up, but how to avoid it from first place. You can definitely do "one-shot" over and over again with a small context and build something complex, as long as you take great care about what goes into the context, more isn't better.

Anyways, feels like we have pretty opposite perspectives, I'm glad we're multiple people attacking similar problems but from seemingly pretty different angles, helps to find the best solutions. I wish you well regardless and hope you manage to achieve what you set out to do :)

reply

upvote

by isatty2 days ago|

[-]

I don’t get it, and that doesn’t mean it’s not a bad thing necessarily. I’ve been doing systems things for a long time and I’m quite good at it but this is the first time none of this excites me.

reply

upvote

by bobjordan2 days ago|

[-]

Instead of sitting in my office for 12 hours working with 20 open terminals (exactly what I have open right now on my machine). I can take my kids to Disneyland (I live in Southern California and it's nearby) and work on my iphone talking to "Patch" while we stand in line for an hour to get on a ride. Meanwhile. my openclaw agent "Patch" manages my 20 open terminals on my development workstation in my office. Patch updates me and I can make decisions, away from my desk. That should excite anyone. It gives me back more of my time on earth, while getting about the same (or more) work done. There is literally nothing more valuable to me than being able to spend more time away from my desk.

reply

upvote

by wussboy2 days ago|

[-]

If this is actually true, then what will soon happen is you will be expected to manage more separate “Patch” instances until you are once again chained to your desk.

Maybe the next bottleneck will be the time needed to understand what features actually bring value?

reply

upvote

by DANmode1 days ago|

[-]

What if he works for himself?

Not a $DayJob?

reply

upvote

by engineer_2222 hours ago|

[-]

Then he is in competition with everyone else who are at their desk managing ### of open terminals

reply

upvote

by johnh-hn2 days ago|

[-]

I appreciate your insight, even if the workflow seems alien to me. I admit I like the idea of freeing myself from a desk though. If you don't mind me asking, how much does this all cost per month?

Edit: I see you've answered this here: https://news.ycombinator.com/item?id=46839725 Thanks for being open about it.

reply

upvote

by bobjordan2 days ago|

[-]

Thanks. I just mentioned elsewhere, right now I spend $200 on claude code 20x plan + $200 on openAI's similar plan, per month. I probably have a few more small conveniences that cost ~$10-$20 a few places, like an obsidian vault synch for documentation vaults on both my dev workstation and my phone, comes to mind. Most weeks I could cut one of the $200 plans, but both claude code and codex have different strengths, and I like to have them double check each others work, so to me that's worth carrying both subscriptions.

reply

upvote

by ricktdotorg1 days ago|

[-]

i have been recently quite enamoured with using both the ChatGPT mobile app (specifically the Codex part) and the Github mobile app, along with Codex. with an appropriate workflow, i've been able to deploy features to some [simple] customer-facing apps while on the go. it's very liberating!

GP's setup sounds like the logical extension to what i'm doing. not just code, but sessions within servers? are sysadmins letting openclawd out and about on their boxes these days?

reply

upvote

by bobjordan1 days ago|

[-]

Yes I've also used that codex workflow and its pretty useful, but the "real time" interactivity and control is just not at the same level.

reply

upvote

by what1 days ago|

[-]

Please show us something you’ve produced this way.

reply

upvote

by dmd2 days ago|

[-]

> MULTIPLE CLAUDE CODE INSTANCES

a lotta yall still dont get it

molt holders can use multiple claude code instances on a single molt

reply

upvote

by dispersed1 days ago|

[-]

Slurp Juice is still the only good thing to come out of crypto. I hope AI leaves us with at least one good meme.

reply

upvote

by bobjordan1 days ago|

[-]

You are absolutely right that I probably still "don't" get it, I am still shocking myself on a daily basis with all the stuff I didn't fully get grasp ahold of. I recently updated claude code and yesterday had one agent that used the new task system and blew my mind with what he got accomplished. This tech is all moving so fast!

reply

upvote

by cadamsdotcom1 days ago|

[-]

My multitasking operating system would like a word..

/s

reply

upvote

by ryanackley2 days ago|

[-]

What are you coding with this? Is it a product you're trying to launch, an existing product with customers or custom work for someone else?

reply

upvote

by woeirua2 days ago|

[-]

This just sounds ridiculously expensive. Burning hundreds of dollars a day to generate code of questionable utility.

reply

upvote

by bobjordan2 days ago|

[-]

Personally, I spend $200 on claude code 20x plan + $200 on openAI's similar plan, per month. So, yeah, I spend $400 per month. I buy and use both because they have different and complimentary strengths. I have only very rarely almost reached the weekly capacity limit on either of those plans. Usually I don't need to worry about how much I use them. The $400 may be expensive to some people but frankly I pay some employees a lot more each month and get a lot less for my money.

reply

upvote

by GardenLetter272 days ago|

[-]

Automated usage like you described violates Anthropic's terms of service.

It's just a matter of time until they ban your account.

reply

upvote

by bobjordan2 days ago|

[-]

They make it easy to spin up parallel agents. Managing them efficiently through a shared tmux instance isn't banned anywhere in the TOS, AFAIK. I'd worry more about it if I had to use multiple accounts or something using round-the-clock "automated" work flow. I'm using one account. Hell, the workflow I described, I am even actively logged in to my dev workstation with tmux and able to see and interact with each instance and "micro-manage" them myself, individually. The main benefit of this workflow is that I also have a single shared LLM instance that also has access to all the instances, together with me. I have plenty of other things to worry about besides a banned account from an efficient workflow I've set up.

reply

upvote

by audg21 hours ago|

[-]

just throwing out there that yesterday Boris (lead engineer for Claude code) literally told everyone on Twitter that the CC teams number one recommendation for users is that they should be kicking off multiple instances / agents in parallel. not sure if that's what you're referring to, but if so I'd be very surprised if they ban someone for heavy use of that workflow

reply

upvote

by someguyiguess1 days ago|

[-]

Gastown also had a supervisor “mayor”. How is this one different?

reply