upvote
You could, but it's driving in the wrong direction to try to build that knowledge into the model weights because you'll always run into a capacity limit sooner with a small model than with a larger one. The thing the model is specialised for is linguistic understanding and the reasoning process itself, and you max that out at the expense of domain-specific knowledge. If you take "as few weights as possible" as a given, I think the interesting question is how small you can make the model with externalised memory. The openclaw and hermes people are all over this sort of memory problem: using the local filesystem or a local database of some sort is exactly a "very fast local memory" where the more you use it, the more knowledge it gathers. Whether that translates to it being "smarter" is a deeper question than it looks.
reply