pm90 2 months ago

There are genuine reasons for running multiple clusters. It helps to sometimes keep stateful (databases generally) workloads on one cluster, have another for stateless workloads etc. Sometimes customers demand complete isolation so they get their own cluster (although somehow its ok that the nodes are still VMs that are probably running on shared nodes… these requirements can be arbitrary sometimes).

6
raverbashing 2 months ago

This doesn't make sense to me, and I feel like this is "holding it wrong"

But this is also a caveat with "managing it yourself", you get a lot of people having ideas and shooting themselves in the foot with it

oofbey 2 months ago

Honest question - why would you want stateful workloads in a separate cluster from stateless? Why not just use a namespace?

pm90 2 months ago

because namespaces aren’t a failure boundary.

If your api gets hosed, you can create a new cluster and tear down the old one and call it a day.

With a stateful cluster, you can’t do that. As such you put in a lot more care with eg k8s upgrades, or introducing new controllers or admissions/mutating webhooks.

yjftsjthsd-h 2 months ago

> It helps to sometimes keep stateful (databases generally) workloads on one cluster, have another for stateless workloads etc. Sometimes customers demand complete isolation

Are these not covered by taint/toleration? I guess maybe isolation depending on what exactly they're demanding but even then I'd think it could work.

pm90 2 months ago

Yes only at the node (data plane) level but not the api (control plane).

lenkite 2 months ago

Most important reason is the ridiculous etcd limit of 8Gb. That alone is the reason for most k8s cluster splits.

mdaniel 2 months ago

I hate etcd probably more than most, but that 8Gb seems to just be a warning, unless you have information otherwise https://etcd.io/docs/v3.5/dev-guide/limit/#storage-size-limi...

I'll take this opportunity to once again bitch and moan that Kubernetes just fucking refuses to allow the KV store to be pluggable, unlike damn near everything else in their world, because they think that's funny or something

lenkite 2 months ago

It isn't a mere warning. It is strongly recommended as the upper limit.

https://www.perfectscale.io/blog/etcd-8gb

https://github.com/etcd-io/etcd/issues/9771

And yes, I agree not allowing a pluggable replacement is really stupid.

mdaniel 2 months ago

> https://github.com/etcd-io/etcd/issues/9771

> stale bot marked this as completed (by fucking closing it)

Ah, yes, what would a Kubernetes-adjacent project be without a fucking stale bot to close issues willy nilly

wasmitnetzen 2 months ago

Yes, a few, maybe even 10, 12, but 47? It's also a prime number, so it's not something like running each thing three times for dev, stage and prod.

pm90 2 months ago

yeahhhh 47 seems insane

rvense 2 months ago

> It helps

Can you elaborate on that? What does it do?