upvote
The system had RAID, so disks could be replaced without taking anything down.

We never had a catastrophic failure, but yeah, and hour of lost data wouldn't be the end of the world. For regular maintenance we would just coordinate with the QA team to pause testing temporarily. (Testing was only around-the-clock near a product release.)

reply