upvote
Couldn't you build some internal knowledge that would stay and you could teach a model this way. A very fast local memory of some sort. You could also specialize model this way so it is very skilled in your domain. The more you use it, the smarter it gets. I guess the problem is for the model to decide whether the information stored in memory is sufficient or not.
reply
You could, but it's driving in the wrong direction to try to build that knowledge into the model weights because you'll always run into a capacity limit sooner with a small model than with a larger one. The thing the model is specialised for is linguistic understanding and the reasoning process itself, and you max that out at the expense of domain-specific knowledge. If you take "as few weights as possible" as a given, I think the interesting question is how small you can make the model with externalised memory. The openclaw and hermes people are all over this sort of memory problem: using the local filesystem or a local database of some sort is exactly a "very fast local memory" where the more you use it, the more knowledge it gathers. Whether that translates to it being "smarter" is a deeper question than it looks.
reply