Kubernetes adds a lot of complexity initially, but it does make it easier to add fault tolerance for hardware failures, especially in conjunction with a replicating filesystem provider like Longhorn. I only knew that I had a failed node because some services didn't come back up until I drained and cordoned the node from the cluster (looks like there are various projects to automate this—I should look into those).