I do think we need another layer, but it should be a routing layer. I am finalizing my pi-brains extension for Pi (https://github.com/earendil-works/pi) which does this:
https://github.com/gitsense/pi-brains
Right now "humans" need to define the routing rules for how to access information, but I will support what I call "knowledge agents" that can monitor conversations to inject context when needed.
What do you think is the potential value that you might get out of this, which is not already available with the existing options?
If this works, it means we can probably get by with smaller models (since it doesn't need to know everything). LLMs are pattern matchers, and if you can provide them with the right shape (context), they should produce the expected output.
For my solution to work, you need business buy-in, which I don't think will be a problem. Enterprise wants to know how tokens are being spent, so I can see them wanting structured analysis during code reviews.
What may also not be obvious is that the information is ultimately designed to live with your code. Lessons and notes are designed to be mapped to files, so if you want to know why a piece of code is implemented in a certain way, you can have the LLM filter by files to help find the needle in the haystack.
It is a hard problem, but the only missing piece is discipline, which I believe business leaders will not have an issue with enforcing since we are ultimately talking about eliminating/significantly reducing the bus factor in our code.
If you look at https://github.com/gitsense/smart-ripgrep, you can get a better sense of how context can be injected when it is needed.
By propery categorizing lessons and notes, it should make it easy to scrub and keep up to date.
I also suggest mapping lessons and notes to files when possible to make discovery and cleanup easier.
Also if context runs out you can just do "cat todo.md | agent" and you're off to the races again.
That is a sophisticated memory system though -- maybe not to you experienced humans!