undefined

points

[-]

Other than the firewall (itself a minipc), I only have one server where a failure would cause issues: it's connected to the HDDs I use for high-capacity storage, and has a GPU that Jellyfin uses for transcoding. That would only cause Jellyfin to stop working—the other services that have lower storage needs would continue working, since their storage is replicated across multiple nodes using Longhorn.

Kubernetes adds a lot of complexity initially, but it does make it easier to add fault tolerance for hardware failures, especially in conjunction with a replicating filesystem provider like Longhorn. I only knew that I had a failed node because some services didn't come back up until I drained and cordoned the node from the cluster (looks like there are various projects to automate this—I should look into those).