Given the incredible progress of local models, on present trajectory I think we see comparable levels of performance to frontier models in two years on 128GB unified RAM and 6-bit quantisation. Note how the frontier models are now hitting superior benchmarks with only 200,000 tokens. I think we still have a long way to go with distillation.
reply