as kimi did a huge amount of claude distilation it seems to be somewhat based in data
https://www.anthropic.com/news/detecting-and-preventing-dist...
I'm curious how the bang for buck ratio works in comparison. My initial tests for coding tasks have been positive and I can run it at home. Bigger models I assume are still better on harder tasks.