upvote
Context is going to be the next big advancement.

When a model is trained on multi-contexts, some growing over time like we see now (conversations), some rolling at various sizes (as in, always on), such as a clock, video feed, audio feed, data streams, tool calling, we no longer have to 'pollute' the main context with a bunch of repetitive data.

But this is going in the direction of 1agent=1mind. When much more likely human (and maybe all cognition) requires 'ghosts' and sub processes. It is much more likely an agent is more like a configurable building piece to a(n alien) mind.

reply
I've been working on a coding agent that does this on and of for about a year. Here's my latest attempt: https://github.com/vanviegen/maca#maca - This one allows agents to request (and later on drop) 'views' on functions and other logical pieces of code, and always get to see the latest version of it. (With some heuristics to not destroy kv-caches at every turn.)

The problem is that the models are not trained for this, nor for any other non-standard agentic approach. It's like fighting their 'instincts' at every step, and the results I've been getting were not great.

reply
So we agree on a message system having potential. But why the vectors? In any case, interesting stuff.
reply
A smalltalk or Erlang for AI agents is an interesting thought. Smalltalk for the design in terms of message passing and object-oriented holding of state (agents are stateful and are reached via their public interfaces), Erlang for the elegant execution of it with actors and mailboxes (agents have inboxes and outboxes and can work concurrently at scale). Might as well go the whole hog and put a supervisor AI agent in as a switchboard.
reply
> "and which ones are no longer relevant."

This is absolutely the hardest bit.

I guess the short-cut is to include all the chat conversation history, and then if the history contains "do X" followed by "no actually do Y instead", then the LLM can figure that out. But isn't it fairly tricky for the agent harness to figure that out, to work out relevancy, and to work out what context to keep? Perhaps this is why the industry defaults to concatenating messages into a conversation stream?

reply
Shortcut works sometimes. But if X is common in training and Y is rare, the model regresses on the next turn even with 'do Y, not X' right there in history. @vanviegen's 'fighting instincts' — you can't trust the model to read the correction. Gate it before the model runs instead of inferring it from context
reply
That's what the embedding model is for. It's like a tack-on LLM that works out the relevancy and context to grab.
reply
God knows why you think this is possible. If I don't even know what might be relevant to the conversation in several turns, there's no way an agent could either.
reply
One of us is confusing prediction with retrieval. The embedding model doesn't predict what is going to be relevant in several turns, just on the turn at hand. Each turn gets a fresh semantic search against the full body of memory/agent comms. If the conversation or prompt changes the next query surfaces different context automatically.

As you build up a "body of work" it gets better at handling massive, disparate tasks in my admittedly short experience. Been running this for two weeks. Trying to improve it.

reply
As you noted briefly, a big drawback is not getting to take advantage of the cache. Seems like a pretty big drawback.
reply
Yes, it will destroy most of the caching potential. On the other hand, the average context window needed to achieve the same type of task may be much smaller. This might make up for it. And with a better harness, fewer rounds may be needed. Plus, hopefully costs will go down. There is a lot of hope in this comment though.
reply
Yeah, opencode was/is like this and they never got caching right. Caching is a BIG DEAL to get right.
reply
Now I see why Anthropic isn't too happy with third party clients. The clients may not be so nice to their capacity as their own client, which has the interests aligned with minimum token consumption. A tricky dynamic.
reply
> the industry obsession

Or maybe they haven't thought about it?

Or they tried some simple alternatives and didn't find clear benefits?

> The key is to give the agent not just the ability to pull things into context, but also remove from it.

But then you need rules to figure out what to remove. Which probably involves feeding the whole thing to a(nother?) model anyway, to do that fuzzy heuristic judgment of what's important and what's a distraction. And simply removing messages doesn't add any structure, you still just have a sequence of whatever remains.

reply
What I'm thinking is: When the agent wants to open more files or open more messages, eventually there will be no more context left. The agent is then essentially forced to hide some files and messages in order to be able to proceed. Any other commands are refused until the agent makes room in the context. Maybe the best models will be able to handle this responsibility. A bad model will just hide everything and then forgot what they were working on.
reply
> The key is to give the agent not just the ability to pull things into context, but also remove from it

Of course Anthropic/OpenAI can do it. And the next day everyone will be complaining how much Claude/Codex has been dumbed down. They don't even comply to the context anymore!

reply
To be utterly shameless, this what I've been building: https://github.com/ASIXicle/persMEM

Three persistent Claude instances share AMQ with an additional Memory Index to query with an embedding model (that I'm literally upgrading to Voyage 4 nano as I type). It's working well so far, I have an instance Wren "alive" and functioning very well for 12 days going, swapping in-and-out of context from the MCP without relying on any of Anthropic's tools.

And it's on a cheap LXC, 8GB of RAM, N97.

reply
Why is shame a factor at all in sharing your work?
reply
Good point. I guess because I'm new here I'm not positive on the decorum-policy for self-promotion.

I just make stuff to share with others, so yeah, good point.

reply
Hmm.

Maybe there’s a way to play around with this idea in pi. I’ll dig into it.

reply