How long can you keep adding novel things into the start of every session's context and get good performance, before it loses track of which parts of that context are relevant to what tasks?
IMO for working on large codebases sticking to "what the out of the box training does" is going to scale better for larger amounts of business logic than creating ever-more not-in-model-training context that has to be bootstrapped on every task. Every "here's an example to think about" is taking away from space that could be used by "here is the specific code I want modified."
The sort of framework you mention in a different reply - "No, it was created by our team of engineers over the last three years based on years of previous PhD research." - is likely a bit special, if you gain a lot of expressibility for the up-front cost, but this is very much not the common situation for in-house framework development, and could likely get even more rare over time with current trends.
Today, yes. I assume in the future it will be integrated differently, maybe we'll have JIT fine-tuning. This is where the innovation for the foundation model providers will come in -- figuring out how to quickly add new knowledge to the model.
Or maybe we'll have lots of small fine tuned models. But the point is, we have ways today to "teach" models about new things. Those ways will get better. Just like we have ways to teach humans new things, and we get better at that too.
A human seeing a new programming language still has to apply previous knowledge of other programming languages to the problem before they can really understand it. We're making LLMs do the same thing.
LLMs are really good at doing that. Arguably better than humans at RTFM and then applying what's there.