That doesn't fix the "you don't know what you don't know" problem which is huge with smaller models. A bigger model with more world knowledge really is a lot smarter in practice, though at a huge cost in efficiency.
Is there already some research or experimentation done into this area?
Picking a model that's juuust smart enough to know it doesn't know is the key.