upvote
> If LLMs stop improving at the pace of the last few years (I believe they already are slowing down)

Depending on how you measure "improvement" they already have or they never will :-/

Measuring capability of the model as a ratio of context length, you reach the limits at around 300k-400k tokens of context; after that you have diminishing returns. We passed this point.

Measuring capability purely by output, smarter harnesses in the future may unlock even more improvements in outputs; basically a twist on the "Sufficiently Smart Compiler" (https://wiki.c2.com/?SufficientlySmartCompiler=)

That's the two extremes but there's more on the spectrum in between.

reply
300k-400k isn’t the current limit if you create modules and/or organize the code reasonably.. for the same reason we do this for humans: it allows us to interact with a component without loading the internals into out context.

you can also execute larger tasks than this using subagents to divide the work so each segment doesn’t exceed the usable context window. i regular execute tasks that require hundreds of subagents, for example.

in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens. it just requires you to structure the work so it can be done effectively — not so dissimilar to what you would do for a person

reply
That makes it not a context window.

How to organize code like you said, and how agents interact with it, to keep the actual context window small is the fundamental challenge.

reply
I keep getting surprised that people who are all-in on this (" i regular execute tasks that require hundreds of subagents ") don't have any idea of what is happening even a single layer below their interface to the LLM ("in practice the context window is effectively unlimited or at least exceptionally high — 100m+ tokens.")

I looked at that response by GP (rgbrenner) and refrained from replying because if someone is both running hundreds of agents at a time AND oblivious to what "context window" means, there is no possible sane discourse that would result from any engagement.

reply
ok "series of context windows spread across many agents".. sure much clearer.

Doesn't change my point: the amount of code the agent can operate on is very large, if not unlimited, as long as you put even a little bit of thought into structuring things so it can be divided along a boundary.

If you let the codebase degrade into spaghetti, then the LLM is going to have the same problem any engineer would have with that. The rules for good code didn't disappear.

reply
Context windows don't necessarily cleanly divide. Getting each agent to be able to task within a context window is a hard problem.

It's like like if your context window with one agent is n, your context window with 10 agents is n/10. It is some skill, but that is also where a lot of the advances are coming in.

reply
300k tokens--the useable context window of a single agent--is about 40k lines of code and you can't figure out a natural breakpoint within that code to divide up the task?
reply
I wish I got to hallucinate at work, and just get a pat on the head for constantly doing the wrong thing.
reply
Maybe I am unlucky but I had worked with too many developers who couldn't make a good decision if their life depended on it. LLMs at least know how to convince you of their decisions with strong arguments.
reply
Mmm, I feel it’s more common for them to just blindly agree with whatever you say.

Assistant: “I propose A”

User: “Actually B is better”

Assistant: “you’re absolutely right”

User: “actually let’s go with C”

Assistant: “Good choice, reasons

User: “wait A is better”

Assistant: “Great decision!”

reply
The title for that is Director, VP, or CTO at any given large enterprise company.
reply
People downvoted you, but I actually know a few of these people.
reply
I mean you can do that, but the job probably doesn't pay too much. Might enrich your spirituality though.
reply