A single DGX Spark can service a whole department of mathematicians (or programmers), and you can cluster up to 4 of them them to fit very large models like GLM-5 and quants of Kimi K2.5. This is nearing frontier-level model size.
I understand the value proposition of the frontier cloud models, but we're not as far off from self-hosting as you think, and it's becoming more viable for domain-specific models.