The tricky part is that the "number of tokens to good result" does absolutely vary, and you need a decent harness to make it work without too much manual intervention, so figuring out which model is most cost-effective for which tasks is becoming increasingly hard, but several are cost-effective enough.