Item 42254486

merpkz • 2 months ago

Why would those brick everything? You update node one by one and take it slow, so issues will become apparent after upgrade and you have time to solve those - whole point of having clusters comprised of many redundand nodes.

karmarepellent • 2 months ago

I think it depends on the definition of "bricking the cluster". When you start to upgrade your control plane, your control plane pods restart one after one, and not only those on the specific control plane node. So at this point your control plane might not respond anymore if you happen to run into a bug or some other issue. You might call it "bricking the cluster", since it is not possible to interact with the control plane for some time. Personally I would not call it "bricked", since your production workloads on worker nodes continue to function.

Edit: And even when you "brick" it and cannot roll back, there is still a way to bring your control plane back by using an etcd backup, right?

mrweasel • 2 months ago

Not sure if this has changed, but there have been companies admitting to simply nuking Kubernetes clusters if they fail, because it does happens. The argument, which I completely believe, is that it's faster to build a brand new cluster than debugging a failed one.

1 reply

nasmorn • 2 months ago

I had this happen on a small scale and it scared me a lot. It felt like your executable suddenly falling apart and you now need to fix it in assembly. My takeaway was that the k8s abstraction is way leakier than it is made out to be