My main complaints with Gastown are that (1) it's expensive, partly because (2) it refuses to use anything but Claude models, in spite of my configuration attempts, (3) I can't figure out how to back up or add a remote to its beads/dolt bug database, which makes me afraid to touch the installation, and (4) upgrading it often causes yak shaving and lost context. These might all be my own skill issues, but I do RTFM.
But wow, Gastown gets results. There's something magic about the dialogue and coordination between the mayor and the polecats that leads to an even better experience than Claude Code alone.
Out the other end over about 3-4 five-hour-sessions comes about 85% functional code for every single listed thing. I'd guess you'd be looking at a team for months, give or take, without the automation. Total cost was around $50 in VM time (not counting claude since I would be subscribed anyway) I'm not letting that thing anywhere near a computer I care about and rust compiles are resource intensive, so I paid for a nice VM that I could smash the abort button on if it started looking at me funny.
So I liken it to buying an enormous bulldozer. If you're a skilled operator you can move mountains, but there'll still be a lot of manual work and planning involved. It's very clearly directionally where the industry will go once the models are improved and the harnesses and orchestration are more mature than "30% of the development effort is fixing the harness and orchestration itself", plus an additional "20% of your personal time will be knocking two robots heads together and getting them to actually do work"
Edit: some more details of other knock on work - I asked for a complexity metadata field to automatically dispatch work to cheaper/faster models, set up harnesses to make opencode and codex work similarly to how claude works, troubleshot some bugs in the underlying gastown system. Gastown fork is public if you'd like to have a look.
Does it deliver on the "realistic" part? My experience with most models is they make something that technically fulfills the ask, but often in a way that doesn't really capture my intent (this is with regular Claude Code though).
We also added pi-mono, and started using more and more other models for different tasks (Gemini, K2.5, GLM-5, you name it).
I think the problem is that most are building solutions that rely in one provider, instead of focusing self learning capabilities on improving the cost-quality-speed ratio.
For reference: https://github.com/desplega-ai/agent-swarm
Work is divided into individual tasks. I could have used Plan Mode or TodoWriter tool to implement tasks - all agents have them nowadays. But instead I chose to plan in task.md files because they can be edited iteratively, start as a user request, develop into a plan with checkbox-able steps, the plan is reviewed by judge agent (in yolo mode, and fresh context), then worker agent solves gates. The gates enforce a workflow of testing soon, testing extensively. There is another implementation judge again in yolo mode. And at the end we update the memory/bootstrap document.
Task files go into the git repo. I also log all user messages and implement intent validation with the judge agents. The judges validate intent along the chain "chat -> task -> plan -> code -> tests". Nothing is lost, the project remembers and understands its history. In fact I like to run retrospective tasks where a task.md 'eats' previous tasks and produces a general project perspective not visible locally.
In my system everything is a md file, logged and versioned on git. You have no issue extracting your memories, in fact I made reflection on past work a primitive operation of this harness. I am using it for coding primarily, but it is just as good for deep research, literature reviews, organizing subject matter and tutoring me on topics, investment planning and orchestrating agent experiment loops like autoresearch. That is because the task.md is just a generic programming pipeline, gates are instructions in natural language, you can use it for any cognitive work. Longest task.md I ran was 700 steps, took hours to complete, but worked reliably.
Gastown goes further than Scion in that it chains agents together into an ecosystem. My sense is that Gastown or similar could be built as a layer on top of Scion.
Dan Shapiro helped shape my thinking on the two most important capabilities for agent orchestration as concurrency and loops. Scion provides concurrency only at present, and Gastown is also more concurrency-oriented than loops.
Fabro is a new OSS project I am working on which attempts to do both loops and concurrency well: https://github.com/fabro-sh/fabro (Maybe someday it should be built on top of Scion.)
I was much more focused on integrating with ticketing systems (Notion, Github Issues, Jira, Linear), and then having coding agents specifically work towards merging a PR. Scion's support for long running agents and inter-container communication looks really interesting though. I think I'll have to go plan some features around that. Some of their concepts, make less sense to me, I chose to build on top of k8s whereas they seem to be trying to make something that recreates the control plane. Somewhat skeptical that the recreation and grove/hub are needed, but maybe they'll make more sense once I see them in action the first time.
ADK was (and is) exceptional, but nobody is actually making noise and pushing for it as they should. It feels like Microsoft .net back in the day.
Let's see how it goes. I'm rooting for y'all
If it supports OCI runtimes though then maybe kata containers can be plugged in, I'll have to dig in after work and see.
If you look at this orchestration example
https://github.com/ptone/scion-athenaeum
its just markdown - Scion is the game engine
(a port of gastown to run on scion is in progress)
i guess gastown is a better choice for now? idk i don't feel good about "relatively stable"
> https://en.wikipedia.org/wiki/SCION_(Internet_architecture)
and also wrote about it https://s2.dev/blog/distributed-ai-agents
When agents process EU user data (names, emails, IBANs) and
route it to US model providers, that's a GDPR violation.
I open sourced a routing layer that detects PII in prompts and
forces EU-only inference when personal data is found:
https://github.com/mahadillahm4di-cyber/mh-gdpr-ai.euI've not been impressed with any of them. I do use their ADK in my custom agent stack for the core runtime. That one I think is good and has legs for longevity.
The main enterprise problem here is getting the various agent frameworks to play nice. How should one have shared runtimes, session clones, sandboxes, memory, etc between the tooling and/or employees?
I modified file_read/write/edit to put the contents in the system prompt. This saves context space, i.e. when it rereads a file after failed edit, even though it has the most recent contents. It also does not need to infer modified content from read+edits. It still sees the edits as messages, but the current actual contents are always there.
My AGENTS.md loader. The agent does not decide, it's deterministic based on what other files/dirs it has interacted with. It can still ask to read them, but it rarely does this now.
I've also backed the agents environment or sandbox with Dagger, which brings a number of capabilities like being able to drop into a shell in the same environment, make changes, and have those propagate back to the session. Time travel, clone/fork, and a VS Code virtual FS are some others. I can go into a shell at any point in the session history. If my agent deletes a file it shouldn't, I can undo it with the click of a button.
I can also interact with the same session, at the same time, from VS Code, the TUI, or the API. Different modalities are ideal for different tasks (e.g. VS Code multi-diff for code review / edits; TUI for session management / cleanup).
You...do have all the same abstraction layers, right? No? Oh. Well, don't worry, Google/Amazon/Microsoft can sell you those if you don't want to pay your IT staff to prop it up for you.
---
Look, snark aside, yours is the correct take. Google's solutions are amazing, but they're also built for an organization as large and complex as Google. Time will tell if this is an industry-standard abstraction (a la S3 APIs) or just a Google product for Google-like orgs/functions (a la K8s).
One that I retired was used for serving ftp(among other transfer stuff), ftp of all things, it needs to have ports open and routed back from the client. And for extra points they had the pods capped at 1 cpu. And I had to explain the thing to the perpetrator and their boss, madness.
It must have been a while ago - FTP was practically killed the moment browsers stopped supporting it.
I have no love for the original bash scripts that booted the cluster from your dev machine.
Now we also have k3s that is a easy option for self hosting something simple (like homelab).
I would place Google ADK in alignment with Kubernetes more than this project, for the well designed abstractions, the controlplane, and handling the boring parts that every alternative will at maturity.
I can see the agent framework ignorance to the container analogy about what's running inside. ADK lacks the ability to run any agent tool, but you can build most of this projects controlplane on top of it with minimal effort, most of the bookkeeping is there already. It's more about what experience you want to have.
If someone wants production K8s, I'm steering them (and their budget) to a managed control plane from one of the major cloud providers. Trying to prop it up locally when it really hates having to work directly with bare metal does not spark joy.