mrweasel 1 day ago

> Isn’t the point of kubernetes that you can run your entire infra in a single cluster

I've never seen that, but yes 47 seems like a lot. Often you'd need production, staging, test, development, something like that. Then you'd add an additional cluster for running auxiliary service, this is services that has special network access or are not related to you "main product". Maybe a few of these. Still there's a long way to 47.

Out in the real world I've frequently seen companies build a cluster per service, or group of services, to better control load and scaling and again to control network access. It could also be as simple as not all staff being allowed to access the same cluster, due to regulatory concerns. Also you might not want internal tooling to run on the public facing production cluster.

You also don't want one service, either do to misconfiguration or design flaws, taking down everything, because you placed all your infrastructure in one cluster. I've seen Kubernetes crash because some service spun out of control and then causing the networking pods to crash, taking out the entire cluster. You don't really want that.

Kubernetes doesn't really provide the same type of isolation as something like VmWare, or at least it's not trusted to the same extend.

6
anthonybsd 1 day ago

>Often you'd need production, staging, test, development, something like that.

Normally in K8s, segregating environments is done via namespaces, not clusters (unless there are some very specific resource constraints).

Elidrake24 1 day ago

Which in many cases would break SOC2 compliance (co-mingling of development and customer resources), and even goes against the basic advice offered in the K8s manual. Beyond that, this limits your ability to test Control Plane upgrades against your stack, though that has generally been very stable in my experience.

To be clear I'm not defending the 47 Cluster setup of the OP, just the practice of separating Development/Production.

withinboredom 22 hours ago

Why would you commingle development and customer resources? A k8s cluster is just a control plane, that specifically controls where things are running, and if you specify they can’t share resources, that’s the end of that.

If you say they share the same control plane is commingling… then what do you think a cloud console is? And if you are using different accounts there… then I hope you are using dedicated resources for absolutely everything in prod (can’t imagine what you’d pay for dedicated s3, sqs) because god forbid those two accounts end up on the same machine. Heh, you are probably violating compliance and didn’t even know it!

Sigh. I digress.

_hl_ 8 hours ago

The frustrating thing with SOC2, or pretty much most compliance requirements, is that they are less about what’s “technically true”, and more about minimizing raised eyebrows.

It does make some sense though. People are not perfect, especially in large organizations, so there is value in just following the masses rather than doing everything your own way.

withinboredom 8 hours ago

Yes. But it also isn’t a regulation. It is pretty much whatever you say it is.

bdndndndbve 1 day ago

I would want to have at least dev + prod clusters, sometimes people want to test controllers or they have badly behaved workloads that k8s doesn't isolate well (making lots of giant etcd objects). You can also test k8s version upgrades in non-prod.

That said it sounds like these people just made a cluster per service which adds a ton of complexity and loses all the benefits of k8s.

withinboredom 22 hours ago

In this case, I use a script to spin up another production cluster, perform my changes, and send some traffic to it. If everything looks good, we shift over all traffic to the new cluster and shutdown the old one. Easy peasy. Have you turned your pets into cattle only to create a pet ranch?

mmcnl 20 hours ago

Sometimes there are requirements to separate clusters on the network level.

marcosdumay 23 hours ago

You always want lots of very specific resource constraints between those.

nwatson 1 day ago

The constraint often would be regulatory. Even if technically isolation is possible, management won't risk SOC2 or GDPR non-compliance.

Zambyte 1 day ago

SOC2 is voluntary, not regulatory.

whstl 1 day ago

Indeed. My previous company did this due to regulatory concerns.

One cluster per country in prod, one cluster per team in staging, plus individual clusters for some important customers.

A DevOps engineer famously pointed that it was stupid since they could access everything with the same SSO user anyway, and the CISO demanded individual accounts with separate passwords and separate SSO keys.

rootlocus 1 day ago

> Then you'd add an additional cluster for running auxiliary service, this is services that has special network access or are not related to you "main product". Maybe a few of these. Still there's a long way to 47.

Why couldn't you do that with a dedicated node pool, namespaces, taints and affinities? This is how we run our simulators and analytics within the same k8s cluster.

mrweasel 1 day ago

You could do a dedicated node pool and limit the egress to those nodes, but it seems simpler, as in someone is less likely to provision something incorrect, by having a separate cluster.

In my experience companies do not trust Kubernetes to the same extend as they'd trust VLANs and VMs. That's probably not entirely fair, but as you can see from many of the other comments, people find managing Kubernetes extremely difficult to get right.

For some special cases you also have regulatory requirements that maybe could be fulfilled by some Kubernetes combination of node pools, namespacing and so on, but it's not really worth the risk.

From dealing with clients wanting hosted Kubernetes, I can only say that 100% of them have been running multiple clusters. Sometimes for good reason, other times because hosting costs where per project and it's just easier to price out a cluster, compared to buying X% of the capacity on an existing cluster.

One customer I've worked with even ran an entire cluster for a single container, but that was done because no one told the developers to not use that service as an excuse to play with Kubernetes. That was its own kind of stupid.

supersixirene 1 day ago

What you just described with one bad actor bringing the entire cluster down is yet another really good reason I’ll never put any serious app on that platform.

AtlasBarfed 1 day ago

K8s requires a flat plane addressability model across all containers, meaning anyone can see and call anyone else?

I can see security teams getting uppity about that.

Also budgetary and org boundaries, cloud providers, disaster recovery/hot spares/redundancy/AB hotswap, avoid single tank point of failure.

wbl 1 day ago

Addressability is not accessibility . It's easy to control how services talk to each other through NetworkPolicy.

withinboredom 21 hours ago

This… sounds remarkably like the problems kubernetes solves.

AtlasBarfed 21 hours ago

single tank point of failure should be

single YAML point of failure

mobile autocorrect is super "helpful"

withinboredom 8 hours ago

I have completely tanked a kubernetes cluster before. Everything kept working. The only problem was that we couldn’t spin up new containers and if any of the running ones stopped, dns/networking wouldn’t get updated. So for a few hours while we figured out how to fix what I broke, not many issues happened.

So sure, I can kinda see your point, but it feels rather moot. In the cluster, there isnt much that is a single point of failure that also wouldn’t be a point of failure in multiple clusters.

mschuster91 1 day ago

> Out in the real world I've frequently seen companies build a cluster per service, or group of services, to better control load and scaling and again to control network access.

Network Policies have solved that at least for ingress traffic.

Egress traffic is another beast, you can't allow egress traffic to a service, only to pods or IP ranges.

mrweasel 1 day ago

I was thinking egress, but you're correct on ingress.