upvote
No backups?
reply
I once worked at a company that had a wealth of backups. A backup generator, backup batteries as the generator takes a few seconds to start, a contract for emergency fuel deliveries, a complete failover data centre full of hot standby hardware, 24/7 ops presence, UPSes on the ops PCs just in case, weekly checks that the generators start, quarterly checks by turning off the breakers to the data centre, and so on.

It wasn't until a real incident that we learned: (a) the system wasn't resilient to the utility power going on-off-on-off-on-off as each 'off' drained the batteries while the generator started, and each 'on' made the generator shut down again; (b) the ops PCs were on UPSes but their monitors weren't (C13 vs C5 power connector) and (c) the generator couldn't be refuelled while running.

Even if you've got backup systems and you test them - you can never be 100% sure.

reply
> 24/7 ops presence [...] utility power going on-off-on-off-on-off

Wouldn't the 24/7 ops presence switch to manual generator once they hit the second or third utility outage?

reply
A plan that has never been executed is really just hope and wishful thinking.
reply
What happens when the backup breaks?
reply
At a certain point earth is a single point of failure.
reply
You have a back up for the back up backup.

Turtles all the way down.

At AWS scale even unlikely hardware events become more common I guess.

reply
Each turtle gives them another 9. How many 9s are they down due to incidents over the past year?
reply
They're definitely more than half a day now, which is only two and a half nines.
reply
They absolutely have backups, I presume they were ineffective or also down for _reasons_.
reply
The point of being "cloud native" is you build redundancy at higher levels. Instead of having extra pipes and wires, you have extra software that handles physical failures.
reply