To be cost effective with inference providers, you have to find some way to be using it 24/7.
If they decided to collude, they could absolutely say "from now on you no longer have access to model X because you're an asshole"
The commercial inference offering are also downstream of one of those 3 projects (or trt-LLM if they're nvidia). It would impact Ollama, and fireworks, together, and everyone else.
Don't tempt fate.
You're better off setting a budget and buying the best machine you can afford in that range, or picking a VRAM target and accepting the class of models you can run on it. Those models will almost certainly improve over time and your skills will adapt to the limitations. Hardware is so valuable right now that it's not even likely to be a significant loss if you had to sell.
Right now I think 24 GB is probably the best bang for your buck (used 3090), because you also get a high end gaming/gpgpu device which is nice anyway. 32 GB you can do with AMD or Intel, but NVIDIA is megabucks and at this point you're really paying for RAM. Unfortunately the ship has sailed on "reasonably" priced RTX 6000s, which at one point were about $7k and are being listed at $10k++.