If your application doesn't need and likely won't need to scale to large clusters, or multiple clusters, then there's nothing wrong per se. with your solution. I don't think k8s is that hard but there are a lot of moving pieces and there's a bit to learn. Finding someone with experience to help you can make a ton of difference.
Questions worth asking:
- Do you need a load balancer?
- TLS certs and rotation?
- Horizontal scalability.
- HA/DR
- dev/stage/production + being able to test/stage your complete stack on demand.
- CI/CD integrations, tools like ArgoCD or Spinnaker
- Monitoring and/or alerting with Prometheus and Grafana
- Would you benefit from being able to deploy a lot of off the shelf software (lessay Elastic Search, or some random database, or a monitoring stack) via helm quickly/easily.
- "Ingress"/proxy.
- DNS integrations.
If you answer yes to many of those questions there's really no better alternative than k8s. If you're building large enough scale web applications the almost to most of these will end up being yes at some point.
Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
There are good reasons to use Kubernetes, mainly if you are using public clouds and want to avoid lock-in. I may be partial, since managing it pays my bills. But it is complex, mostly unnecessarily so, and no one should be able to say with a straight face that it achieves better uptime or requires less personnel than any alternative. That's just sales talk, and should be a big warning sign.
It's the way things work together. If you want to add a new service you just annotate that service and DNS gets updated, your ingress gets the route added, cert-manager gets you the certs from let's encrypt. You want Prometheus to monitor your pod you just add the right annotation. When your server goes down k8s will move your pod around. k8s storage will take care of having the storage follow your pod. Your entire configuration is highly available and replicated in etcd.
It's just very different than your legacy "standard" technology.
None of this is difficult to do or automate, and we've done it for years. Kubernetes simply makes it more complex by adding additional abstractions in the pursuit of pretending hardware doesn't exist.
There are, maybe, a dozen companies in the world with a large enough physical footprint where Kubernetes might make sense. Everyone else is either engaged in resume-driven development, or has gone down some profoundly wrong path with their application architecture to where it is somehow the lesser evil.
I used to feel the same way, but have come around. I think it's great for small companies for a few reasons. I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I basically don't have to think about ops anymore until something exotic comes up, it's nice. I agree that it feels clunky, and it was annoying to learn, but once you have something working it's a huge time saver. The ability to scale without drastically changing the system is a bonus.
> I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I can do the same thing with `make local` invoking a few bash commands. If the complexity increases beyond that, a mistake has been made.
You could say the same thing about Ansible or Vagrant or Nomad or Salt or anything else.
I can say with complete confidence however, that if you are running Kubernetes and not thinking about ops, you are simply not operating it yourself. You are paying someone else to think about it for you. Which is fine, but says nothing about the technology.
You always have to think about ops, regardless of tooling. I agree that you can have a very nice, reproducible setup with any of those tools though. Personally, I haven't found those alternatives to be significantly easier to use (though I don't have experience with Salt).
For me personally, self hosted k3s on Hetzner with FluxCD is the least painful option I've found.
Managed k8s is great if you already in the cloud, selfhosting it as a small company is waste of money.
I've found self hosted k3s to be about the same effort as EKS for my workloads, and maybe 20-30% of the cost for similar capability.
> Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
You could make the same argument against using cloud at all, or against using CI. The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
> The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
Kubernetes definitely makes things consistent, but I do not think that it makes them easy.
There’s certainly a lot to learn from Kubernetes, but I strongly believe that a more tasteful successor is possible, and I hope that it is inevitable.
I haven't worked in k8s, but really what is being argued is that it is a cross cloud standardization API, largely because the buzzword became big enough that the cloud providers conformed to it rather than keep their API moat.
However all clouds will want API moats.
It is also true that k8s appears too complex for the low end, and there is a strong lack of a cross cloud standardization (maybe docker but that is too low) for that use case.
K8s is bad at databases. So k8s is incomplete as well. It also seems to lack good UIs, but that impression/claim may be only lack of exposure.
What is blindingly true to me is that the building blocks at a cli level for running and manipulating processes/programs/servers in a data center, what was once kind of called a "dc os" is really lacking.
Remote command exec needs ugly ssh wrapping assuming the network boundaries are free enough (k8s requires an open network between all servers iirc), and of course ssh is under attack by teleport and other enterprise fiefdom builders.
Docker was a great start. Parallel ssh is a crude tool.
I've tried multiple times to make a swarm admin tool that was cross cloud and cross framework and cross command and stdin srrout stderr transport agnostic. It's hard.
But none of those things are easy. All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
Sure, it means that two people that already understand k8s can easily exchange or handover a project, which might be harder to understand if done with other means. But that's about the only bonus it brings in most situations.
> All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
The first time you do it, sure, like any other tool. But once you're comfortable with it and have a working setup, you can bash out "one more service deployment" in a few minutes. That's the key capability.
The other bonus is most opensource software support a Kubernetes deployment. This means I can find software and have it deployed pretty quickly.
Kubernetes is boring tech as well.
And the advantage of it is one way to manage resources, scaling, logging, observability, hardware etc.
All of which is stored in Git and so audited, reviewed, versioned, tested etc in exactly the same way.
> But it is complex, mostly unnecessarily so
Unnecessary complexity sounds like something that should be fixed. Can you give an example?
Kubernetes is great example of the "second-system effect".
Kubernetes only works if you have a webapp written in a slow interpreted language. For anything else it is a huge impedance mismatch with what you're actually trying to do.
P.S. In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities. I'm sure there might be an easier way to solve that problem without dragging in Google's ridiculous and broken tech stack.
> It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets
That depends on your definition. If the ops team is solely responsibly for running the Kubernetes cluster, then yes. In reality that's rarely how things turns out. Developers want Kubernetes, because.... I don't know. Ops doesn't even want Kubernetes in many cases. Kubernetes is amazing, for those few organisations that really need it.
My rule of thumb is: If your worker nodes aren't entire physical hosts, then you might not need Kubernetes. I've seen some absolutely crazy setups where developers had designed this entire solution around Kubernetes, only to run one or two containers. The reasoning is pretty much always the same, they know absolutely nothing about operations, and fail to understand that load balancers exists outside of Kubernetes, or that their solution could be an nginx configuration, 100 lines of Python and some systemd configuration.
I accept that I lost the fight that Kubernetes is overly complex and a nightmare to debug. In my current position I can even see some advantages to Kubernetes, so I was at least a little of in my criticism. Still I don't think Kubernetes should be your default deployment platform, unless you have very specific needs.
I think I live in the real world and your statement is not true for any of the projects I've been involved in. Kubernetes is absolutely used to solve real technical problems that would otherwise require a lot of work to solve. I would say as a rule it's not a webapp in a slow interpreted language that's hosted in k8s. It truly is about decoupling from the need to manage machines and operating systems at a lower level and being able to scale seamlessly.
I'm really not following on the impedance mismatch from what you're actually trying to do. Where is that impedance mismatch? Let's take a simple example, Elastic Search and the k8s operator. You can edit a single line in your yaml and grow your cluster. That takes care of resources, storage, network etc. Can you do this manually with Elastic running on bare metal or in containers or in VM? Absolutely, it's a nightmare, non-replicable, process that will take you days. You don't need elastic search, or you never need to scale it, fine. You can run it on a single machine and lose all your data if that machine dies - fine.
I’m curious if you’ve ever built and maintained a k8s cluster capable of reliably hosting an ES cluster? Because I have, and it was painful enough that we swapped to provisioning real HW with ansible. It is much easier to manage.
I should note, we still manage a K8s cluster, but not for anything using persistent storage.
> In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities.
At my company I’m both the dev and the ops team, and I’ve used Kubernetes and found it pleasant and intuitive? I’m able to have confidence that situations that arise in production can be recreated in dev, updates are easy, I can tie services together in a way that makes sense. I arrived at K8s after rolling my own scripts and deployment methods for years and I like its well-considered approach.
So maybe resist passing off your opinions as sweeping generalizations about “the real world”.
Contrary to popular belief, k8s is not Google's tech stack.
My understanding is that it was initially sold as Google's tech to benefit from Google's tech reputation (exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers), and today it's also Google trying to pose as k8s inventor, to benefit from its popularity. Interesting case of host/parasite symbiosis, it seams.
Just my impression though, I can be wrong, please comment if you know more about the history of k8s.
Is there anyone that works at Google that can confirm this?
What's left of Borg at Google? Did the company switch to the open source Kubernetes distribution at any point? I'd love to know more about this as well.
> exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers
What about the fact that many active Kubernetes developers, are also active Googlers?
I'm an Ex-Google SRE. Kubernetes is not Borg, will never be Borg, and Borg does not need to borrow from k8s - Most of the "New Features" in K8s were things Google had been doing internally for 5+ years before k8s launched. Many of the current new features being added to k8s are things that Google has already studied and rejected - It breaks my heart to see k8s becoming actively worse on each release.
A ton of the experience of Borg is in k8s. Most of the concepts translate directly. The specifics about how borg works have changed over the years, and will continue to change, but have never really matched K8s - Google is in the business of deploying massive fleets, and k8s has never really supported cluster sizes above a few thousand. Google's service naming and service authentication is fully custom, and k8s is... fine, but makes a lot of concessions to more general ideas. Google was doing containerization before containerization was a thing - See https://lkml.org/lkml/2006/9/14/370 ( https://lwn.net/Articles/199643/ doesn't elide the e-mail address) for the introduction of the term to the kernel.
The point of k8s was to make "The Cloud" an attractive platform to deploy to, instead of EC2. Amazon EC2 had huge mindshare, and Google wanted some of those dollars. Google Cloud sponsored K8s because it was a way to a) Apply Google learnings to the wider developer community and b) Reduce AWS lock-in, by reducing the amount of applications that relied on EC2 APIs specifically - K8s genericized the "launch me a machine" process. The whole goal was making it easier for Google to sell it's cloud services, because the difference in deployment models (Mostly around lifetimes of processes, but also around how applications were coupled to infrastructure) were a huge impedance to migrating to the "cloud". Kubernetes was an attempt to make an attractive target - That would work on AWS, but commoditized it, so that you could easily migrate to, or simply target first, GCP.
Thank you for the exhaustive depiction of the situation. Also an ex SRE from long ago, although not for borg. One of the learnings I took with me is that there is no technical solution that is good for several orders of magnitude. The tool you need for 10 servers is not the one you need for 1000, etc.
kubernetes is an API for your cluster, that is portable between providers, more or less. there are other abstractions, but they are not portable, e.g. fly.io, DO etc. so unless you want a vendor lock-in, you need it. for one of my products, I had to migrate due to business reasons 4 times into different kube flavors, from self-manged ( 2 times ) to GKE and EKS.
> there are other abstractions, but they are not portable
Not true. Unix itself is an API for your cluster too, like the original post implies.
Personally, as a "tech lead" I use NixOS. (Yes, I am that guy.)
The point is, k8s is a shitty API because it's built only for Google's "run a huge webapp built on shitty Python scripts" use case.
Most people don't need this, what they actually want is some way for dev to pass the buck to ops in some way that PM's can track on a Gantt chart.
I'm not an insider but afaik anything heavy lifting in Google is C++ or Go. There's no way you can use Python for anything heavy at Google scale, it's just too slow and bloated.
Most stuff I've seen run on k8s is not crappy webapp in Python. If anything that is less likely to be hosted in k8s.
I'm not sure why you call k8s api shitty. What is the NixOS API for "deploy an auto-scaling application with load balancing and storage"? Does NixOS manager clusters?
How much experience do you have with k8s?
There is no such thing as "auto-scaling".
You can only "auto-scale" something that is horizontally scalable and trivially depends on on the number of incoming requests. I.e., "a shitty web-app". (A well designed web-app doesn't need to be "auto-scaled" because you can serve the world from three modern servers. StackOverflow only uses nine and has done so for years.)
As an obvious example, no database can be "auto-scaled". Neither can numeric methods.
If you think StackOverflow is the epitome of scale, then your view of the world is somewhat limited. I worked for a flash sale site in 2008 that had to handle 3 million users, all trying to connect to your site simultaneously to buy a minimal supply of inventory. After 15 minutes of peak scale, traffic will scale back down by 80-90%. I am pretty sure StackOverflow never had to deal with such a problem.
> If you answer yes to many of those questions there's really no better alternative than k8s.
This is not even close to true with even a small number of resources. The notion that k8s somehow is the only choice is right along the lines of “Java Enterprise Edition is the only choice” — ie a real failure of the imagination.
For startups and teams with limited resources, DO, fly.io and render are doing lots of interesting work. But what if you can’t use them? Is k8s your only choice?
Let’s say you’re a large orgs with good engineering leadership, and you have high-revenue systems where downtime isn’t okay. Also for compliance reasons public cloud isn’t okay.
DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
Load Balancers: lots of non-k8s options exist (software and hardware appliances).
Prometheus / Grafana (and things like Netdata) work very well even without k8s.
Load Balancing and Ingress is definitely the most interesting piece of the puzzle. Some choose nginx or Envoy, but there’s also teams that use their own ingress solution (sometimes open-sourced!)
But why would a team do this? Or more appropriately, why would their management spend on this? Answer: many don’t! But for those that do — the driver is usually cost*, availability and accountability, along with engineering capability as a secondary driver.
(*cost because it’s easy to set up a mixed ability team with experienced, mid-career and new engineers for this. You don’t need a team full of kernel hackers.)
It costs less than you think, it creates real accountability throughout the stack and most importantly you’ve now got a team of engineers who can rise to any reasonable challenge, and who can be cross pollinated throughout the org. In brief the goal is to have engineers not “k8s implementers” or “OpenShift implementers” or “Cloud Foundry implementers”.
> DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
And it will likely be buggy with all sorts of edge cases.
> Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
In my experience financial services have been notably not doing it.
> Load Balancers: lots of non-k8s options exist (software and hardware appliances).
The problem isn't running a load balancer with a given configuration at a given point in time. It's how you manage the required changes to load balancers and configuration as time goes on. It's very common for that to be a pile of perl scripts that add up to an ad-hoc informally specified bug-ridden implementation of half of kubernetes.
> And it will likely be buggy with all sorts of edge cases.
I have seen this view in corporate IT teams who’re happy to be “implementers” rather than engineers.
In real life, many orgs will in fact have third party vendor products for internal DNS and cert authorities. Writing bridge APIs to these isn’t difficult and it keeps the IT guys happy.
A relatively few orgs have written their own APIs, typically to manage a delegated zone. Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares. The proof of the pudding is how well it works.
Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
> manage changes to load balancers … perl
That’s a very black and white view, that teams are either on k8s (which to you is the bees knees) or a pile of Perl (presumably unmaintainable). Speaks to interesting unconscious bias.
Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
But if your starter stance is that “k8s is the only way”, no one can talk you out of your own mental hard lines.
> Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares.
Agreed, but internal products are generally buggier, because an internal product is in a kind of monopoly position. You generally want to be using a product that is subject to competition, that is a profit center rather than a cost center for the people who are making it.
> Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
Your team probably aren't DNS experts, and why should they be? You're not a DNS company. If you could make a better DNS - or a better DNS-deployment integration - than the pros, you'd be selling it. (The exception is if you really are a DNS company, either because you actually do sell it, or because you have some deep integration with DNS that enables your competitive advantage)
> Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
I'd say that's a contradiction in terms, because modern best practice is to not run your own stack.
I don't particularly like kubernetes qua kubernetes (indeed I'd generally pick nomad instead). But I absolutely do think you need a declarative, single-source-of-truth way of managing your full deployment, end-to-end. And if your deployment is made up of a standard load balancer (or an equivalent of one), a standard DNS, and prometheus or grafana, then you've either got one of these products or you've got an internal product that does the same thing, which is something I'm extremely skeptical of for the same reason as above - if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it? (And if an engineer was capable of creating a better solution to this standard problem, why would they work for you rather than one of the big cloud corps?)
In the same way I'm very skeptical of any company with an "internal cloud" - in my experience such a thing is usually a significantly worse implementation of AWS, and, yes, is usually held together with some flaky Perl scripts. Or an internal load balancer. It's generally NIH, or at best a cost-cutting exercise which tends to show; a company might have an internal cloud that's cheaper than AWS (I've worked for one), but you'll notice the cheapness.
Now again, if you really are gaining a competitive advantage from your things then it may make sense to not use a standard solution. But in that case you'll have something deeply integrated, i.e. monolithic, and that's precisely the case where you're not deploying separate standard DNS, separate standard load balancers, separate standard monitoring etc.. And in that case, as grandparent said, not using k8s makes total sense.
But if you're just deploying a standard Rails (or what have you) app with a standard database, load balancer, DNS, monitoring setup? Then 95% of the time your company can't solve that problem better than the companies that are dedicated to solving that problem. Either you don't have a solution at all (beyond doing it manually), you use k8s or similar, or you NIH it. Writing custom code to solve custom problems can be smart, but writing custom code to solve standard problems usually isn't.
> if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it?
Let's pretend I'm the greatest DevOps software developer engineer ever, and I write a Kubernetes replacement that's 100x better. Since it's 100x better, I simply charge 100x as much as it costs per CPU/RAM for a Kubernetes license to a 1,000 customers, and take all of that money to the bank and I deposit my check for $0.
I don't disagree with the rest of the comment, but the market for the software to host a web app is a weird market.
> and I deposit my check for $0.
Given the number of Nomad fans that show up to every one of these threads, I don't think that's the whole story given https://www.hashicorp.com/products/nomad/pricing (and I'll save everyone the click: it's not $0)
Reasonable people can 100% disagree about approaches, but I don't think the TAM for "software to host a web app" is as small as you implied (although it certainly would be if we took your description literally)
fly.io, vercel, and heroku shows you're right about the TAM for the broader problem, and that it's possible to capture some value somewhere, but that's a different beast entirely than just selling a standard solution to a standard problem.
Developers are a hard market to sell to, and deployment software is no exception.
> If you answer yes to many of those questions there's really no better alternative than k8s.
Nah, most of that list is basically free for any company that uses an amazon loadbalancer and an autoscale group. In terms of likeliness of incidents, time, and cost, those will each be an order of magnitude higher with a team of kubernetes engineers than less complex setup.
Oz Nova nailed it nicely in "You Are Not Google"
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
If you were Google k8s wouldn't cut it. I have experience of both options in multiple projects. Managing containers yourself and the surrounding infrastructure vs. using k8s. k8s just works. It's a mature ecosystem. It's really not as hard as people make of it. Replicating all the functionality of k8s and the ecosystem yourself is a ton more work.
There are definitely wide swaths of applications that don't need container, don't need high availability, don't need load balancing, don't need monitoring, don't need any of this stuff or need some simpler subset. Then by all means don't k8s, don't use containers etc.
If I need "some" of the above, Kubernetes forces me to grapple with "all" of the above. I think that is the issue.
Containerization and orchestration of containers vs learning how to configure HaProxy, how to use Certbot, hmmmm
The questions you pose are legit skills web developers need to have. Nothing you mentioned is obviated by K8s or containerization.
"oh but you can get someone elses pre-configured image" uh huh... sure, you can also install malware. You will also need to one day maintain or configure the software running in them. You may even need to address issues your running software causes. You can't do that without mastering the software you are running!