For what it's worth, I've worked at multiple places that ran shell scripts just fine for their deploys.
- One had only 2 services [php] and ran over 1 billion requests a day. Deploy was trivial, ssh some new files to the server and run a migration, 0 downtime.
- One was in an industry that didn't need "Webscale" (retirement accounts). Prod deploys were just docker commands run by jenkins. We ran two servers per service from the day I joined the day I left 4 years later (3x growth), and ultimately removed one service and one database during all that growth.
Another outstanding thing about both of these places was that we had all the testing environments you need, on-demand, in minutes.
The place I'm at now is trying to do kubernetes and is failing miserably (ongoing nightmare 4 months in and probably at least 8 to go, when it was allegedly supposed to only take 3 total). It has one shared test environment that it takes 3-hours to see your changes in.
I don't fault kubernetes directly, I fault the overall complexity. But at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place.
If your application doesn't need and likely won't need to scale to large clusters, or multiple clusters, then there's nothing wrong per se. with your solution. I don't think k8s is that hard but there are a lot of moving pieces and there's a bit to learn. Finding someone with experience to help you can make a ton of difference.
Questions worth asking:
- Do you need a load balancer?
- TLS certs and rotation?
- Horizontal scalability.
- HA/DR
- dev/stage/production + being able to test/stage your complete stack on demand.
- CI/CD integrations, tools like ArgoCD or Spinnaker
- Monitoring and/or alerting with Prometheus and Grafana
- Would you benefit from being able to deploy a lot of off the shelf software (lessay Elastic Search, or some random database, or a monitoring stack) via helm quickly/easily.
- "Ingress"/proxy.
- DNS integrations.
If you answer yes to many of those questions there's really no better alternative than k8s. If you're building large enough scale web applications the almost to most of these will end up being yes at some point.
Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
There are good reasons to use Kubernetes, mainly if you are using public clouds and want to avoid lock-in. I may be partial, since managing it pays my bills. But it is complex, mostly unnecessarily so, and no one should be able to say with a straight face that it achieves better uptime or requires less personnel than any alternative. That's just sales talk, and should be a big warning sign.
It's the way things work together. If you want to add a new service you just annotate that service and DNS gets updated, your ingress gets the route added, cert-manager gets you the certs from let's encrypt. You want Prometheus to monitor your pod you just add the right annotation. When your server goes down k8s will move your pod around. k8s storage will take care of having the storage follow your pod. Your entire configuration is highly available and replicated in etcd.
It's just very different than your legacy "standard" technology.
None of this is difficult to do or automate, and we've done it for years. Kubernetes simply makes it more complex by adding additional abstractions in the pursuit of pretending hardware doesn't exist.
There are, maybe, a dozen companies in the world with a large enough physical footprint where Kubernetes might make sense. Everyone else is either engaged in resume-driven development, or has gone down some profoundly wrong path with their application architecture to where it is somehow the lesser evil.
I used to feel the same way, but have come around. I think it's great for small companies for a few reasons. I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I basically don't have to think about ops anymore until something exotic comes up, it's nice. I agree that it feels clunky, and it was annoying to learn, but once you have something working it's a huge time saver. The ability to scale without drastically changing the system is a bonus.
> I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I can do the same thing with `make local` invoking a few bash commands. If the complexity increases beyond that, a mistake has been made.
You could say the same thing about Ansible or Vagrant or Nomad or Salt or anything else.
I can say with complete confidence however, that if you are running Kubernetes and not thinking about ops, you are simply not operating it yourself. You are paying someone else to think about it for you. Which is fine, but says nothing about the technology.
You always have to think about ops, regardless of tooling. I agree that you can have a very nice, reproducible setup with any of those tools though. Personally, I haven't found those alternatives to be significantly easier to use (though I don't have experience with Salt).
For me personally, self hosted k3s on Hetzner with FluxCD is the least painful option I've found.
Managed k8s is great if you already in the cloud, selfhosting it as a small company is waste of money.
I've found self hosted k3s to be about the same effort as EKS for my workloads, and maybe 20-30% of the cost for similar capability.
> Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
You could make the same argument against using cloud at all, or against using CI. The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
> The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
Kubernetes definitely makes things consistent, but I do not think that it makes them easy.
There’s certainly a lot to learn from Kubernetes, but I strongly believe that a more tasteful successor is possible, and I hope that it is inevitable.
I haven't worked in k8s, but really what is being argued is that it is a cross cloud standardization API, largely because the buzzword became big enough that the cloud providers conformed to it rather than keep their API moat.
However all clouds will want API moats.
It is also true that k8s appears too complex for the low end, and there is a strong lack of a cross cloud standardization (maybe docker but that is too low) for that use case.
K8s is bad at databases. So k8s is incomplete as well. It also seems to lack good UIs, but that impression/claim may be only lack of exposure.
What is blindingly true to me is that the building blocks at a cli level for running and manipulating processes/programs/servers in a data center, what was once kind of called a "dc os" is really lacking.
Remote command exec needs ugly ssh wrapping assuming the network boundaries are free enough (k8s requires an open network between all servers iirc), and of course ssh is under attack by teleport and other enterprise fiefdom builders.
Docker was a great start. Parallel ssh is a crude tool.
I've tried multiple times to make a swarm admin tool that was cross cloud and cross framework and cross command and stdin srrout stderr transport agnostic. It's hard.
But none of those things are easy. All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
Sure, it means that two people that already understand k8s can easily exchange or handover a project, which might be harder to understand if done with other means. But that's about the only bonus it brings in most situations.
> All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
The first time you do it, sure, like any other tool. But once you're comfortable with it and have a working setup, you can bash out "one more service deployment" in a few minutes. That's the key capability.
The other bonus is most opensource software support a Kubernetes deployment. This means I can find software and have it deployed pretty quickly.
Kubernetes is boring tech as well.
And the advantage of it is one way to manage resources, scaling, logging, observability, hardware etc.
All of which is stored in Git and so audited, reviewed, versioned, tested etc in exactly the same way.
> But it is complex, mostly unnecessarily so
Unnecessary complexity sounds like something that should be fixed. Can you give an example?
Kubernetes is great example of the "second-system effect".
Kubernetes only works if you have a webapp written in a slow interpreted language. For anything else it is a huge impedance mismatch with what you're actually trying to do.
P.S. In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities. I'm sure there might be an easier way to solve that problem without dragging in Google's ridiculous and broken tech stack.
> It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets
That depends on your definition. If the ops team is solely responsibly for running the Kubernetes cluster, then yes. In reality that's rarely how things turns out. Developers want Kubernetes, because.... I don't know. Ops doesn't even want Kubernetes in many cases. Kubernetes is amazing, for those few organisations that really need it.
My rule of thumb is: If your worker nodes aren't entire physical hosts, then you might not need Kubernetes. I've seen some absolutely crazy setups where developers had designed this entire solution around Kubernetes, only to run one or two containers. The reasoning is pretty much always the same, they know absolutely nothing about operations, and fail to understand that load balancers exists outside of Kubernetes, or that their solution could be an nginx configuration, 100 lines of Python and some systemd configuration.
I accept that I lost the fight that Kubernetes is overly complex and a nightmare to debug. In my current position I can even see some advantages to Kubernetes, so I was at least a little of in my criticism. Still I don't think Kubernetes should be your default deployment platform, unless you have very specific needs.
I think I live in the real world and your statement is not true for any of the projects I've been involved in. Kubernetes is absolutely used to solve real technical problems that would otherwise require a lot of work to solve. I would say as a rule it's not a webapp in a slow interpreted language that's hosted in k8s. It truly is about decoupling from the need to manage machines and operating systems at a lower level and being able to scale seamlessly.
I'm really not following on the impedance mismatch from what you're actually trying to do. Where is that impedance mismatch? Let's take a simple example, Elastic Search and the k8s operator. You can edit a single line in your yaml and grow your cluster. That takes care of resources, storage, network etc. Can you do this manually with Elastic running on bare metal or in containers or in VM? Absolutely, it's a nightmare, non-replicable, process that will take you days. You don't need elastic search, or you never need to scale it, fine. You can run it on a single machine and lose all your data if that machine dies - fine.
I’m curious if you’ve ever built and maintained a k8s cluster capable of reliably hosting an ES cluster? Because I have, and it was painful enough that we swapped to provisioning real HW with ansible. It is much easier to manage.
I should note, we still manage a K8s cluster, but not for anything using persistent storage.
> In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities.
At my company I’m both the dev and the ops team, and I’ve used Kubernetes and found it pleasant and intuitive? I’m able to have confidence that situations that arise in production can be recreated in dev, updates are easy, I can tie services together in a way that makes sense. I arrived at K8s after rolling my own scripts and deployment methods for years and I like its well-considered approach.
So maybe resist passing off your opinions as sweeping generalizations about “the real world”.
Contrary to popular belief, k8s is not Google's tech stack.
My understanding is that it was initially sold as Google's tech to benefit from Google's tech reputation (exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers), and today it's also Google trying to pose as k8s inventor, to benefit from its popularity. Interesting case of host/parasite symbiosis, it seams.
Just my impression though, I can be wrong, please comment if you know more about the history of k8s.
Is there anyone that works at Google that can confirm this?
What's left of Borg at Google? Did the company switch to the open source Kubernetes distribution at any point? I'd love to know more about this as well.
> exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers
What about the fact that many active Kubernetes developers, are also active Googlers?
I'm an Ex-Google SRE. Kubernetes is not Borg, will never be Borg, and Borg does not need to borrow from k8s - Most of the "New Features" in K8s were things Google had been doing internally for 5+ years before k8s launched. Many of the current new features being added to k8s are things that Google has already studied and rejected - It breaks my heart to see k8s becoming actively worse on each release.
A ton of the experience of Borg is in k8s. Most of the concepts translate directly. The specifics about how borg works have changed over the years, and will continue to change, but have never really matched K8s - Google is in the business of deploying massive fleets, and k8s has never really supported cluster sizes above a few thousand. Google's service naming and service authentication is fully custom, and k8s is... fine, but makes a lot of concessions to more general ideas. Google was doing containerization before containerization was a thing - See https://lkml.org/lkml/2006/9/14/370 ( https://lwn.net/Articles/199643/ doesn't elide the e-mail address) for the introduction of the term to the kernel.
The point of k8s was to make "The Cloud" an attractive platform to deploy to, instead of EC2. Amazon EC2 had huge mindshare, and Google wanted some of those dollars. Google Cloud sponsored K8s because it was a way to a) Apply Google learnings to the wider developer community and b) Reduce AWS lock-in, by reducing the amount of applications that relied on EC2 APIs specifically - K8s genericized the "launch me a machine" process. The whole goal was making it easier for Google to sell it's cloud services, because the difference in deployment models (Mostly around lifetimes of processes, but also around how applications were coupled to infrastructure) were a huge impedance to migrating to the "cloud". Kubernetes was an attempt to make an attractive target - That would work on AWS, but commoditized it, so that you could easily migrate to, or simply target first, GCP.
Thank you for the exhaustive depiction of the situation. Also an ex SRE from long ago, although not for borg. One of the learnings I took with me is that there is no technical solution that is good for several orders of magnitude. The tool you need for 10 servers is not the one you need for 1000, etc.
kubernetes is an API for your cluster, that is portable between providers, more or less. there are other abstractions, but they are not portable, e.g. fly.io, DO etc. so unless you want a vendor lock-in, you need it. for one of my products, I had to migrate due to business reasons 4 times into different kube flavors, from self-manged ( 2 times ) to GKE and EKS.
> there are other abstractions, but they are not portable
Not true. Unix itself is an API for your cluster too, like the original post implies.
Personally, as a "tech lead" I use NixOS. (Yes, I am that guy.)
The point is, k8s is a shitty API because it's built only for Google's "run a huge webapp built on shitty Python scripts" use case.
Most people don't need this, what they actually want is some way for dev to pass the buck to ops in some way that PM's can track on a Gantt chart.
I'm not an insider but afaik anything heavy lifting in Google is C++ or Go. There's no way you can use Python for anything heavy at Google scale, it's just too slow and bloated.
Most stuff I've seen run on k8s is not crappy webapp in Python. If anything that is less likely to be hosted in k8s.
I'm not sure why you call k8s api shitty. What is the NixOS API for "deploy an auto-scaling application with load balancing and storage"? Does NixOS manager clusters?
How much experience do you have with k8s?
There is no such thing as "auto-scaling".
You can only "auto-scale" something that is horizontally scalable and trivially depends on on the number of incoming requests. I.e., "a shitty web-app". (A well designed web-app doesn't need to be "auto-scaled" because you can serve the world from three modern servers. StackOverflow only uses nine and has done so for years.)
As an obvious example, no database can be "auto-scaled". Neither can numeric methods.
If you think StackOverflow is the epitome of scale, then your view of the world is somewhat limited. I worked for a flash sale site in 2008 that had to handle 3 million users, all trying to connect to your site simultaneously to buy a minimal supply of inventory. After 15 minutes of peak scale, traffic will scale back down by 80-90%. I am pretty sure StackOverflow never had to deal with such a problem.
> If you answer yes to many of those questions there's really no better alternative than k8s.
This is not even close to true with even a small number of resources. The notion that k8s somehow is the only choice is right along the lines of “Java Enterprise Edition is the only choice” — ie a real failure of the imagination.
For startups and teams with limited resources, DO, fly.io and render are doing lots of interesting work. But what if you can’t use them? Is k8s your only choice?
Let’s say you’re a large orgs with good engineering leadership, and you have high-revenue systems where downtime isn’t okay. Also for compliance reasons public cloud isn’t okay.
DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
Load Balancers: lots of non-k8s options exist (software and hardware appliances).
Prometheus / Grafana (and things like Netdata) work very well even without k8s.
Load Balancing and Ingress is definitely the most interesting piece of the puzzle. Some choose nginx or Envoy, but there’s also teams that use their own ingress solution (sometimes open-sourced!)
But why would a team do this? Or more appropriately, why would their management spend on this? Answer: many don’t! But for those that do — the driver is usually cost*, availability and accountability, along with engineering capability as a secondary driver.
(*cost because it’s easy to set up a mixed ability team with experienced, mid-career and new engineers for this. You don’t need a team full of kernel hackers.)
It costs less than you think, it creates real accountability throughout the stack and most importantly you’ve now got a team of engineers who can rise to any reasonable challenge, and who can be cross pollinated throughout the org. In brief the goal is to have engineers not “k8s implementers” or “OpenShift implementers” or “Cloud Foundry implementers”.
> DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
And it will likely be buggy with all sorts of edge cases.
> Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
In my experience financial services have been notably not doing it.
> Load Balancers: lots of non-k8s options exist (software and hardware appliances).
The problem isn't running a load balancer with a given configuration at a given point in time. It's how you manage the required changes to load balancers and configuration as time goes on. It's very common for that to be a pile of perl scripts that add up to an ad-hoc informally specified bug-ridden implementation of half of kubernetes.
> And it will likely be buggy with all sorts of edge cases.
I have seen this view in corporate IT teams who’re happy to be “implementers” rather than engineers.
In real life, many orgs will in fact have third party vendor products for internal DNS and cert authorities. Writing bridge APIs to these isn’t difficult and it keeps the IT guys happy.
A relatively few orgs have written their own APIs, typically to manage a delegated zone. Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares. The proof of the pudding is how well it works.
Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
> manage changes to load balancers … perl
That’s a very black and white view, that teams are either on k8s (which to you is the bees knees) or a pile of Perl (presumably unmaintainable). Speaks to interesting unconscious bias.
Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
But if your starter stance is that “k8s is the only way”, no one can talk you out of your own mental hard lines.
> Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares.
Agreed, but internal products are generally buggier, because an internal product is in a kind of monopoly position. You generally want to be using a product that is subject to competition, that is a profit center rather than a cost center for the people who are making it.
> Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
Your team probably aren't DNS experts, and why should they be? You're not a DNS company. If you could make a better DNS - or a better DNS-deployment integration - than the pros, you'd be selling it. (The exception is if you really are a DNS company, either because you actually do sell it, or because you have some deep integration with DNS that enables your competitive advantage)
> Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
I'd say that's a contradiction in terms, because modern best practice is to not run your own stack.
I don't particularly like kubernetes qua kubernetes (indeed I'd generally pick nomad instead). But I absolutely do think you need a declarative, single-source-of-truth way of managing your full deployment, end-to-end. And if your deployment is made up of a standard load balancer (or an equivalent of one), a standard DNS, and prometheus or grafana, then you've either got one of these products or you've got an internal product that does the same thing, which is something I'm extremely skeptical of for the same reason as above - if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it? (And if an engineer was capable of creating a better solution to this standard problem, why would they work for you rather than one of the big cloud corps?)
In the same way I'm very skeptical of any company with an "internal cloud" - in my experience such a thing is usually a significantly worse implementation of AWS, and, yes, is usually held together with some flaky Perl scripts. Or an internal load balancer. It's generally NIH, or at best a cost-cutting exercise which tends to show; a company might have an internal cloud that's cheaper than AWS (I've worked for one), but you'll notice the cheapness.
Now again, if you really are gaining a competitive advantage from your things then it may make sense to not use a standard solution. But in that case you'll have something deeply integrated, i.e. monolithic, and that's precisely the case where you're not deploying separate standard DNS, separate standard load balancers, separate standard monitoring etc.. And in that case, as grandparent said, not using k8s makes total sense.
But if you're just deploying a standard Rails (or what have you) app with a standard database, load balancer, DNS, monitoring setup? Then 95% of the time your company can't solve that problem better than the companies that are dedicated to solving that problem. Either you don't have a solution at all (beyond doing it manually), you use k8s or similar, or you NIH it. Writing custom code to solve custom problems can be smart, but writing custom code to solve standard problems usually isn't.
> if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it?
Let's pretend I'm the greatest DevOps software developer engineer ever, and I write a Kubernetes replacement that's 100x better. Since it's 100x better, I simply charge 100x as much as it costs per CPU/RAM for a Kubernetes license to a 1,000 customers, and take all of that money to the bank and I deposit my check for $0.
I don't disagree with the rest of the comment, but the market for the software to host a web app is a weird market.
> and I deposit my check for $0.
Given the number of Nomad fans that show up to every one of these threads, I don't think that's the whole story given https://www.hashicorp.com/products/nomad/pricing (and I'll save everyone the click: it's not $0)
Reasonable people can 100% disagree about approaches, but I don't think the TAM for "software to host a web app" is as small as you implied (although it certainly would be if we took your description literally)
fly.io, vercel, and heroku shows you're right about the TAM for the broader problem, and that it's possible to capture some value somewhere, but that's a different beast entirely than just selling a standard solution to a standard problem.
Developers are a hard market to sell to, and deployment software is no exception.
> If you answer yes to many of those questions there's really no better alternative than k8s.
Nah, most of that list is basically free for any company that uses an amazon loadbalancer and an autoscale group. In terms of likeliness of incidents, time, and cost, those will each be an order of magnitude higher with a team of kubernetes engineers than less complex setup.
Oz Nova nailed it nicely in "You Are Not Google"
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
If you were Google k8s wouldn't cut it. I have experience of both options in multiple projects. Managing containers yourself and the surrounding infrastructure vs. using k8s. k8s just works. It's a mature ecosystem. It's really not as hard as people make of it. Replicating all the functionality of k8s and the ecosystem yourself is a ton more work.
There are definitely wide swaths of applications that don't need container, don't need high availability, don't need load balancing, don't need monitoring, don't need any of this stuff or need some simpler subset. Then by all means don't k8s, don't use containers etc.
If I need "some" of the above, Kubernetes forces me to grapple with "all" of the above. I think that is the issue.
Containerization and orchestration of containers vs learning how to configure HaProxy, how to use Certbot, hmmmm
The questions you pose are legit skills web developers need to have. Nothing you mentioned is obviated by K8s or containerization.
"oh but you can get someone elses pre-configured image" uh huh... sure, you can also install malware. You will also need to one day maintain or configure the software running in them. You may even need to address issues your running software causes. You can't do that without mastering the software you are running!
On the other hand, my team slapped 3 servers down in a datacenter, had each of them configured in a Proxmox cluster within a few hours. Some 8-10 hours later we had a fully configured kubernetes cluster running within Proxmox VMs, where the VMs and k8s cluster are created and configured using an automation workflow that we have running in GitHub Actions. An hour or two worth of work later we had several deployments running on it and serving requests.
Kubernetes is not simple. In fact it's even more complex than just running an executable with your linux distro's init system. The difference in my mind is that it's more complex for the system maintainer, but less complex for the person deploying workloads to it.
And that's before exploring all the benefits of kubernetes-ecosystem tooling like the Prometheus operator for k8s, or the horizontally scalable Loki deployments, for centrally collecting infrastructure and application metrics, and logs. In my mind, making the most of these kinds of tools, things start to look a bit easier even for the systems maintainers.
Not trying to discount your workplace too much. But I'd wager there's a few people that are maybe not owning up to the fact that it's their first time messing around with kubernetes.
As long as your organisation can cleanly either a) split the responsibility for the platform from the responsibility for the apps that run on it, and fund it properly, or b) do the exact opposite and accommodate all the responsibility for the platform into the app team, I can see it working.
The problems start when you're somewhere between those two points. If you've got a "throw it over the wall to ops" type organisation, it's going to go bad. If you've got an underfunded platform team so the app team has to pick up some of the slack, it's going to go bad. If the app team have to ask permission from the platform team before doing anything interesting, it's going to go bad.
The problem is that a lot of organisations will look at k8s and think it means something it doesn't. If you weren't willing to fund a platform team before k8s, I'd be sceptical that moving to it is going to end well.
People really underestimate the power of a shell scripts and ssh and trusted developers.
> People really underestimate the power of a shell scripts and ssh and trusted developers.
On the other hand, you seem to be underestimating the fact that even the best, most trusted developer can make a mistake from time to time. It's no disgrace, it's just life.
Besides the fact that shell scripts aren't scalable (in terms of horizontal scalability like actor model), I would also like to point out that shell scripts should be simple, but if you want to handle something that big, you essentially and definitely is using it as a programming language in disguise -- not ideal and I would like to go Go or Rust instead.
We don't live in 1999 any more. A big machine with a database can serve ervyone in the US and I can fit it in my closet.
It's like people are stuck in the early 2000s when they start thinking about computer capabilities. Today I have more flops in a single GPU under my desk than did the worlds largest super computer in 2004.
> It's like people are stuck in the early 2000s when they start thinking about computer capabilities.
This makes sense, because the code people write makes machines feel like they're from the early 2000's.
This is partially a joke, of course, but I think there is a massive chasm between the people who think you immediately need several computers to do things for anything other than redundancy, and the people who see how ridiculously much you can do with one.
> It's like people are stuck in the early 2000s when they start thinking about computer capabilities.
Partly because the "cloud" makes all its money renting you 2010s-era hardware at inflated prices, and people are either too naive or their career is so invested in it that they can't admit to being ripped off and complicit of the scam.
That's what gets me about AWS.
When it came out in 2006 the m1.small was about what you'd get on a mid range desktop at that point. It cost $876 a year [0]. Today for an 8 core machine with 32 gb ram you'll pay $3145.19 [1].
It used to take 12-24 months for you to pay enough AWS bills that it would make sense to buy the hardware outright. Now it's 3 months or less for every category and people still defend this. For ML work stations it's weeks.
[0] https://aws.amazon.com/blogs/aws/dropping-prices-again-ec2-r...
[1] https://instances.vantage.sh/aws/ec2/m8g.2xlarge?region=us-e...
Hardware has gotten so much cheaper and easier and yet everyone is happy nobody has "raised prices".....
I added performance testing to all our endpoints from the start, so that people don’t start to normalize those 10s response times that our last system had (cry)
Well that's what happens when you move away from compiled languages to interpreted.
> Besides the fact that shell scripts aren't scalable…
What are you trying to say there? My understanding is that, way under the hood, a set of shell scripts is in fact enabling the scalable nature of… the internet.
...that's only for early internet, and the early internet is effing broken at best
> My understanding is that, way under the hood, a set of shell scripts is in fact enabling the scalable nature of… the internet.
I sure hope not. The state of error handling in shell scripts alone is enough to disqualify them for serious production systems.
If you're extremely smart and disciplined it's theoretically possible to write a shell script that handles error states correctly. But there are better things to spend your discipline budget on.
My half tongue-in-cheek comment was implying things like "you can't boot a linux/bsd box without shell scripts" which would make the whole "serving a website" bit hard.
I realize that there exists OS's that are an exception to this rule. I didn't understand the comment about scripts scaling. It's a script, it can do whatever you want.
Shell scripts don't scale up to implementing complex things, IME. If I needed to do something complex that had to be a shell script for some reason, I'd probably write a program to generate that shell script rather than writing it by hand, and I think many of those system boot scripts etc. are generated rather than written directly.
Are you self hosting kubernetes or running it managed?
I've only used it managed. There is a bit of a learning curve but it's not so bad. I can't see how it can take 4 months to figure it out.
We are using EKS
> I can't see how it can take 4 months to figure it out.
Well have you ever tried moving a company with a dozen services onto kubernetes piece-by-piece, with zero downtime? How long would it take you to correctly move and test every permission, environment variable, and issue you run into?
Then if you get a single setting wrong (e.g. memory size) and don't load-test with realistic traffic, you bring down production, potentially lose customers, and have to do a public post-mortem about your mistakes? [true story for current employer]
I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing.
Took us three-four years to go from self hosted multi-dc to getting the main product almost fully in k8s (some parts didn't make sense in k8s and was pushed to our geo-distributed edge nodes). Dozens of services and teams and keeping the old stuff working while changing the tire on the car while driving. All while the company continues to grow and scale doubles every year or so. It takes maturity in testing and monitoring and it takes longer that everyone estimates
It sounds like it's not easy to figure out the permissions, envvars, memory size, etc. of your existing system, and that's why the migration is so difficult? That's not really one of Kubernetes' (many) failings.
Yes, and now we are back at the ancestor comment’s original point: “at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place”
Which I understand to mean “some people think using Kubernetes will make managing a system easier, but it often will not do that”
Can you elaborate on other things you think Kubernetes gets wrong? Asking out of curiosity because I haven't delved deep into it.
It's good you asked, but I'm not ready to answer it in a useful way. It depends entirely on your use cases.
Some un-nuanced observations as starting points:
- Helm sucks, but so does Kustomize
- Cluster networking and security is annoying to set up
- Observability is awkward. Some things aren't exposed as cluster metrics/events, so you need to look at, say, service and pod state. It's not easy to see, e.g. how many times your app OOMed in the last hour.
- There's a lot of complexity you can avoid for a while, but eventually some "simple" use case will only be solvable that way, and now you're doing service meshes.
Maybe "wrong" is the wrong word, but there are spots that feel overkill, and spots that feel immature.
> - Helm sucks, but so does Kustomize
Helm != Kubernetes, FWIW
I'd argue that Kustomize is the bee's knees but editor support for it sucks (or, I'd also accept that the docs suck, and/or are missing a bazillion examples so us mere mortals could enlighten ourselves to what all nouns and verbs are supported in the damn thing)
> how many times your app OOMed in the last hour.
heh, I'd love to hear those "shell scripts are all I need" folks chime in on how they'd get metrics for such a thing :-D (or Nomad, for that matter)
That said, one of the other common themes in this discussion is how Kubernetes jams people up because there are a bazillion ways of doing anything, with wildly differing levels of "it just works" versus "someone's promo packet that was abandoned". Monitoring falls squarely in the bazillion-ways category, in that it for sure does not come batteries included but there are a lot of cool toys if one has the cluster headroom to install them
https://github.com/google/cadvisor/blob/v0.51.0/metrics/prom... which is allegedly exposed in kubelet since 2022 https://github.com/kubernetes/kubernetes/pull/108004 (I don't have a cluster in front of me to test it, though)
https://github.com/kubernetes/kube-state-metrics/blob/v2.14.... shows the metric that KSM exposes in kube_pod_container_status_terminated_reason
https://opentelemetry.io/docs/specs/semconv/attributes-regis... shows the OTel version of what I suspect is that same one
And then in the "boil the ocean" version one could egress actual $(kubectl get events -w) payloads if using something where one is not charged by the metric: https://github.com/open-telemetry/opentelemetry-collector-co...
It largely depends how customized each microservice is, and how many people are working on this project.
I've seen migrations of thousands of microservices happening with the span of two years. Longer timeline, yes, but the number of microservices is orders of magnitude larger.
Though I suppose the organization works differently at this level. The Kubernetes team build a tool to migrate the microservices, and each owner was asked to perform the migration themselves. Small microservices could be migrated in less than three days, while the large and risk-critical ones took a couple weeks. This all happened in less than two years, but it took more than that in terms of engineer/weeks.
The project was very successful though. The company spends way less money now because of the autoscaling features, and the ability to run multiple microservices in the same node.
Regardless, if the company is running 12 microservices and this number is expected to grow, this is probably a good time to migrate. How did they account for the different shape of services (stateful, stateless, leader elected, cron, etc), networking settings, styles of deployment (blue-green, rolling updates, etc), secret management, load testing, bug bashing, gradual rollouts, dockerizing the containers, etc? If it's taking 4x longer than originally anticipated, it seems like there was a massive failure in project design.
2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Similarly, the actual migration times you estimate add up to decades of engineer time.
It’s possible kubernetes saves more time than using the alternative costs, but that definitely wasn’t the case at my previous two jobs. The jury is out at the current job.
I see the opportunity cost of this stuff every day at work, and am patiently waiting for a replacement.
> 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Learning k8s enough to be able to work with it isn't that hard. Have a centralized team write up a decent template for a CI/CD pipeline, Dockerfile for the most common stacks you use and a Helm chart with an example for a Deployment, PersistentVolumeClaim, Service and Ingress, distribute that, and be available for support should the need for Kubernetes be beyond "we need 1-N pods for this service, they got some environment variables from which they are configured, and maybe a Secret/ConfigMap if the application rather wants configuration to be done in files" is enough in my experience.
> Learning k8s enough to be able to work with it isn't that hard.
I’ve seen a lot of people learn enough k8s to be dangerous.
Learning it well enough to not get wrapped around the axle with some networking or storage details is quite a bit harder.
For sure but that's the job of a good ops department - where I work at for example, every project's CI/CD pipeline has its own IAM user mapping to a Kubernetes role that only has explicitly defined capabilities: create, modify and delete just the utter basics. Even if they'd commit something into the Helm chart that could cause an annoyance, the service account wouldn't be able to call the required APIs. And the templates themselves come with security built-in - privileges are all explicitly dropped, pod UIDs/GIDs hardcoded to non-root, and we're deploying Network Policies at least for ingress as well now. Only egress network policies aren't available, we haven't been able to make these work with services.
Anyone wishing to do stuff like use the RDS database provisioner gets an introduction from us on how to use it and what the pitfalls are, and regular reviews of their code. They're flexible but we keep tabs on what they're doing, and when they have done something useful we aren't shy from integrating whatever they have done to our shared template repository.
> 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Not really, they only had to use the tool to run the migration and then validate that it worked properly. As the other commenter said, a very basic setup for kubernetes is not that hard; the difficult set up is left to the devops team, while the service owners just need to see the basics.
But sure, we can estimate it at 38 engineering years. That's still 38 years for 2,000 microservices; it's way better than 1 year for 12 microservices like in OP's case. Savings that we got was enough to offset these 38 years of work, so this project is now paying dividends.
Comparing the simplicity of two PHP servers against a setup with a dozen services is always going to be one sided. The difference in complexity alone is massive, regardless of whether you use k8s or not.
My current employer did something similar, but with fewer services. The upshot is that with terraform and helm and all the other yaml files defining our cluster, we have test environments on demand, and our uptime is 100x better.
Fair enough that sounds hard.
Memory size is an interesting example. A typical Kubernetes deployment has much more control over this than a typical non-container setup. It is costing you to figure out the right setting but in the long term you are rewarded with a more robust and more re-deployable application.
> has much more control over this than a typical non-container setup
Actually not true, k8s uses the exact same cgroups API for this under the hood that systemd does.
> I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing.
Unfortunately, I do. Somebody says that when the culture of the organization expects to be told and hear what they want to hear rather than the cold hard truth. And likely the person saying that says it from a perch up high and not responsible for the day to day work of actually implementing the change. I see this happen when the person, management/leadership, lacks the skills and knowledge to perform the work themselves. They've never been in the trenches and had to actually deal face to face with the devil in the details.
Canary deploy dude (or dude-ette), route 0.001% of service traffic and then slowly move it over. Then set error budgets. Then a bad service wont "bring down production".
Thats how we did it at Google (I was part of the core team responsible for ad serving infra - billions of ads to billions of users a day)
Using microk8s or k3s on one node works fine. As the author of "one big server," I am now working on an application that needs some GPUs and needs to be able to deploy on customer hardware, so k8s is natural. Our own hosted product runs on 2 servers, but it's ~10 containers (including databases, etc).
Yup, I like this approach a lot. With cloud providers considering VMs durable these days (they get new hardware for your VM if the hardware it's on dies, without dropping any TCP connections), I think a 1 node approach is enough for small things. You can get like 192 vCPUs per node. This is enough for a lot of small companies.
I occasionally try non-k8s approaches to see what I'm missing. I have a small ARM machine that runs Home Assistant and some other stuff. My first instinct was to run k8s (probably kind honestly), but didn't really want to write a bunch of manifests and let myself scope creep to running ArgoCD. I decided on `podman generate systemd` instead (with nightly re-pulls of the "latest" tag; I live and die by the bleeding edge). This was OK, until I added zwavejs, and now the versions sometimes get out of sync, which I notice by a certain light switch not working anymore. What I should have done instead was have some sort of git repository where I have the versions of these two things, and to update them atomically both at the exact same time. Oh wow, I really did need ArgoCD and Kubernetes ;)
I get by with podman by angrily ssh-ing in in my winter jacket when I'm trying to leave my house but can't turn the lights off. Maybe this can be blamed on auto-updates, but frankly anything exposed to a network that is out of date is also a risk, so, I don't think you can ever really win.
Yea but that doesn't sound shiny on your resume.
I never did choose any single thing in my job, just because of how it could look in my resume.
After +20 years of Linux sysadmin/devops, and because a spinal disc herniation last year, now I'm looking for a job.
99% of job offers, will ask for EKS/Kubernetes now.
It's like the VMware of the years 200[1-9], or like the "Cloud" of the years 201[1-9].
I've always specialized in physical datacenters and servers, being it on-premises, colocation, embedded, etc... so I'm out of the market now, at least in Spain (which always goes like 8 years behind the market).
You can try to avoid it, and it's nice when you save thousands of operational/performance/security/etc issues and dollars to your company across the years, and you look like a guru that goes ahead of industry issue to your boss eyes, but, it will make finding a job... 99% harder.
It doesn't matter if you demonstrate the highest level on Linux, scripting, ansible, networking, security, hardware, performance tuning, high availability, all kind of balancers, switching, routing, firewalls, encryption, backups, monitoring, log management, compliance, architecture, isolation, budget management, team management, provider/customer management, debugging, automation, programming full stack, and a long etc. If you say "I never worked with Kubernetes, but I learn fast", with your best sincerity at the interview, then you're automatically out of the process. No matter if you're talking with human resources, a helper of the CTO, or the CTO. You're out.
If you say "I never worked with X, but I learn fast", with your best sincerity at the interview, then you're automatically out of the process.
Where X can be not just k8s but any other bullet point on the job req.
It's interesting that the very things that people used to say to get the job 20 years ago -- and not as a plattitude (it's a perfectly reasonable and intelligent thing to say, and in a rational world, exactly what one would hope to hear from a candidate) -- are now considered as red flags that immediately disqualify one for the job.
Very sorry to hear about your current situation - best of luck.
Ive never heard of this - has this been your direct experience?
It's somewhat speculative (because no one ever tells you the reason for dropping your application or not contacting you in the first place) but the impression I have, echoed by what many others seem to be saying, is that the process has shifted greatly from "Is this a strong, reliable, motivated person?" (with toolchain overlap being mostly gravy) to "Do they have 5-8 recent years of X, Y and Z?".
As if years of doing anything is a reliable predictor of anything, or can even be effectively measured.
Depends on what kind of company you want to join. Some value simplicity and efficiency more.
I think porting to k8s can succeed or fail, like any other project. I switched an app that I alone worked on, from Elastic Beanstalk (with Bash), to Kubernetes (with Babashka/Clojure). It didn't seem bad. I think k8s is basically a well-designed solution. I think of it as a declarative language which is sent to interpreters in k8s's control plane.
Obviously, some parts of took a while to figure out. For example, I needed to figure out an AWS security group problem with Ingress objects, that I recall wasn't well-documented. So I think parts of that declarative language can suck, if the declarative parts aren't well factored-out from the imperative parts. Or if the log messages don't help you diagnose errors, or if there isn't some kind of (dynamic?) linter that helps you notice problems quickly
In your team's case, more information seems needed to help us evaluate the problems. Why was it easier before to make testing environments, and harder now?
So, my current experience somewhere most old apps are very old school:
- most server software is waaaaaaay out of date so getting a dev / test env is a little harder (like last problem we got was the HAproxy version does not do ECDA keys for ssl certs, which is the default with certbot) - yeah pushing to prod is "easy": FTP directly. But now which version of which files are really in prod? No idea. Yeah when I say old school it's old school before things like Jenkins. - need something done around the servers? That's the OPS team job. Team which also has too much different work to do so now you'll have to wait a week or two for this simple "add an upload file" endpoint to this old API because you need somewhere to put those files.
Now we've started setting up some on-prem k8s nodes for the new developments. Not because we need crazy scaling but so the dev team can do most OPS they need. It takes time to have everything setup but once it started chugging along it felt good to be able to just declare whatever we need and get it. You still need to get the devs to learn k8s which is not fun but that's the life of a dev: learning new things every day.
Also k8s does not do data. You want a database or anything managing files: you want to do most of the job outside k8s.