>Often you'd need production, staging, test, development, something like that.
Normally in K8s, segregating environments is done via namespaces, not clusters (unless there are some very specific resource constraints).
Which in many cases would break SOC2 compliance (co-mingling of development and customer resources), and even goes against the basic advice offered in the K8s manual. Beyond that, this limits your ability to test Control Plane upgrades against your stack, though that has generally been very stable in my experience.
To be clear I'm not defending the 47 Cluster setup of the OP, just the practice of separating Development/Production.
Why would you commingle development and customer resources? A k8s cluster is just a control plane, that specifically controls where things are running, and if you specify they can’t share resources, that’s the end of that.
If you say they share the same control plane is commingling… then what do you think a cloud console is? And if you are using different accounts there… then I hope you are using dedicated resources for absolutely everything in prod (can’t imagine what you’d pay for dedicated s3, sqs) because god forbid those two accounts end up on the same machine. Heh, you are probably violating compliance and didn’t even know it!
Sigh. I digress.
The frustrating thing with SOC2, or pretty much most compliance requirements, is that they are less about what’s “technically true”, and more about minimizing raised eyebrows.
It does make some sense though. People are not perfect, especially in large organizations, so there is value in just following the masses rather than doing everything your own way.
Yes. But it also isn’t a regulation. It is pretty much whatever you say it is.
I would want to have at least dev + prod clusters, sometimes people want to test controllers or they have badly behaved workloads that k8s doesn't isolate well (making lots of giant etcd objects). You can also test k8s version upgrades in non-prod.
That said it sounds like these people just made a cluster per service which adds a ton of complexity and loses all the benefits of k8s.
In this case, I use a script to spin up another production cluster, perform my changes, and send some traffic to it. If everything looks good, we shift over all traffic to the new cluster and shutdown the old one. Easy peasy. Have you turned your pets into cattle only to create a pet ranch?
You always want lots of very specific resource constraints between those.