If you want true reliability, you need redundant physical locations, power, networking. That’s extremely easy to achieve on cloud providers.
It doesn't make sense if you only have few servers, but if you are renting equivalent of multiple racks of servers from cloud and run them for most of the day, on-prem is staggeringly cheaper.
We have few racks and we do "move to cloud" calculation every few years and without fail they come up at least 3x the cost.
And before the "but you need to do more work" whining I hear from people that never did that - it's not much more than navigating forest of cloud APIs and dealing with random blackbox issues in cloud that you can't really debug, just go around it.
On cloud it's out of your control when an AZ goes down. When it's your server you can do things to increase reliability. Most colos have redundant power feeds and internet. On prem that's a bit harder, but you can buy a UPS.
If your head office is hit by a meteor your business is over. Don't need to prepare for that.
It is a different skillset. SRE is also an under-valued/paid (unless one is in FAANGO).
It’s also nontrivial once you go past some level of complexity and volume. I have made my career at building software and part of that requires understanding the limitations and specifics of the underlying hardware but at the end of the day I simply want to provision and run a container, I don’t want to think about the security and networking setup it’s not worth my time.
Because those services solve the problem for them. It is the same thing with GitHub.
However, as predicted half a decade ago with GitHub becoming unreliable [0] and as price increases begin to happen, you can see that self-hosting begins to make more sense and you have complete control of the infrastructure and it has never been more easier to self host and bring control over costs.
> its also fun to solve technical issues you may have.
What you have just seen with coding agents is going to have the same effect on "developers" that will have a decline in skills the moment they become over-reliant on coding agents and won't be able to write a single line of code at all to fix a problem they don't fully understand.
I agree that solving technical issues is very fun, and hosting services is usually easy, but having resilient infrastructure is costly and I simply don't like to be woken up at night to fix stuff while the company is bleeding money and customers.