Some of the benchmarks appear to back this up [0]
Of course, a lot depends how you are using it (inference parameters, harness, prompting, etc.), but the model is quite important too.
[0]: https://artificialanalysis.ai/models/open-source/small?model...