Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.
Scale was really the unlock; the new pre and post training techniques and architectures are very cool and useful but they definitely aren't the differentiators when comparing to the previous era of NLP.
They were allegedly massive but the cost and returns were not worth it.