Yes this is exactly what I'm doing. I isolated the actual math question, and then sent it to my two servers to process and that's what's taking 10m+ to return. I'm asking them to solve the question and return the full answer along with their steps. I care about correctness so taking time is okay but I can't use 10m per solution.
I saw your heat comments about the RTX 6000 Pro as well. I bought a few of them recently and I'm running 2 of them in a 2U case in a colo. You need a lot of active airflow to keep them cool. Mine range from 23 C to 80 C.
After my last run, I'm going to wait for the new case I ordered to come in and cannibalize my kid's PC that we built beginning of this year to form an entirely separate computer. And then figure out better ways to deal with the heat, especially with summer coming up. I'll have to play around with undervolting and running vents directly outside my house to see if that helps.
But the trend here is interesting. I think by 2030 you'll be able to buy fairly cheap hardware that is currently $10k+. I don't know what this does to the trillions invested in AI data centers because the next NVidia architecture after Blackwell will essentially half the value of purchased cards overnight.
I'm not convinced Apple has yet pivoted the Mac Studio line towards this market and the expected M5 Ultras in Q3 2026 will likely be an incremental improvement rather than big leap forward but I'd like to be proven wrong.
I feel that the open weight models pale in comparison to the frontier models, and I believe that if the gap closes quickly, that the open weight vendors will stop releasing it for free.