I think so. The last few months have shown us that it isn't necessarily the models themselves that provide good results, but the tooling / harness around it. Codex, Opus, GLM 5, Kimi 2.5, etc. all each have their quirks. Use a harness like opencode and give the model the right amount of context, they'll all perform well and you'll get a correct answer every time.
So in my opinion, in a scenario like this where the token output is near instant but you're running a lower tier model, good tooling can overcome the differences between a frontier cloud model.