For what it's worth, I've worked at multiple places that ran shell scripts just fine for their deploys.
- One had only 2 services [php] and ran over 1 billion requests a day. Deploy was trivial, ssh some new files to the server and run a migration, 0 downtime.
- One was in an industry that didn't need "Webscale" (retirement accounts). Prod deploys were just docker commands run by jenkins. We ran two servers per service from the day I joined the day I left 4 years later (3x growth), and ultimately removed one service and one database during all that growth.
Another outstanding thing about both of these places was that we had all the testing environments you need, on-demand, in minutes.
The place I'm at now is trying to do kubernetes and is failing miserably (ongoing nightmare 4 months in and probably at least 8 to go, when it was allegedly supposed to only take 3 total). It has one shared test environment that it takes 3-hours to see your changes in.
I don't fault kubernetes directly, I fault the overall complexity. But at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place.
If your application doesn't need and likely won't need to scale to large clusters, or multiple clusters, then there's nothing wrong per se. with your solution. I don't think k8s is that hard but there are a lot of moving pieces and there's a bit to learn. Finding someone with experience to help you can make a ton of difference.
Questions worth asking:
- Do you need a load balancer?
- TLS certs and rotation?
- Horizontal scalability.
- HA/DR
- dev/stage/production + being able to test/stage your complete stack on demand.
- CI/CD integrations, tools like ArgoCD or Spinnaker
- Monitoring and/or alerting with Prometheus and Grafana
- Would you benefit from being able to deploy a lot of off the shelf software (lessay Elastic Search, or some random database, or a monitoring stack) via helm quickly/easily.
- "Ingress"/proxy.
- DNS integrations.
If you answer yes to many of those questions there's really no better alternative than k8s. If you're building large enough scale web applications the almost to most of these will end up being yes at some point.
Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
There are good reasons to use Kubernetes, mainly if you are using public clouds and want to avoid lock-in. I may be partial, since managing it pays my bills. But it is complex, mostly unnecessarily so, and no one should be able to say with a straight face that it achieves better uptime or requires less personnel than any alternative. That's just sales talk, and should be a big warning sign.
It's the way things work together. If you want to add a new service you just annotate that service and DNS gets updated, your ingress gets the route added, cert-manager gets you the certs from let's encrypt. You want Prometheus to monitor your pod you just add the right annotation. When your server goes down k8s will move your pod around. k8s storage will take care of having the storage follow your pod. Your entire configuration is highly available and replicated in etcd.
It's just very different than your legacy "standard" technology.
None of this is difficult to do or automate, and we've done it for years. Kubernetes simply makes it more complex by adding additional abstractions in the pursuit of pretending hardware doesn't exist.
There are, maybe, a dozen companies in the world with a large enough physical footprint where Kubernetes might make sense. Everyone else is either engaged in resume-driven development, or has gone down some profoundly wrong path with their application architecture to where it is somehow the lesser evil.
I used to feel the same way, but have come around. I think it's great for small companies for a few reasons. I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I basically don't have to think about ops anymore until something exotic comes up, it's nice. I agree that it feels clunky, and it was annoying to learn, but once you have something working it's a huge time saver. The ability to scale without drastically changing the system is a bonus.
> I can spin up effectively identical dev/ci/stg/prod clusters for a new project in an hour for a medium sized project, with CD in addition to everything GP mentioned.
I can do the same thing with `make local` invoking a few bash commands. If the complexity increases beyond that, a mistake has been made.
You could say the same thing about Ansible or Vagrant or Nomad or Salt or anything else.
I can say with complete confidence however, that if you are running Kubernetes and not thinking about ops, you are simply not operating it yourself. You are paying someone else to think about it for you. Which is fine, but says nothing about the technology.
You always have to think about ops, regardless of tooling. I agree that you can have a very nice, reproducible setup with any of those tools though. Personally, I haven't found those alternatives to be significantly easier to use (though I don't have experience with Salt).
For me personally, self hosted k3s on Hetzner with FluxCD is the least painful option I've found.
Managed k8s is great if you already in the cloud, selfhosting it as a small company is waste of money.
I've found self hosted k3s to be about the same effort as EKS for my workloads, and maybe 20-30% of the cost for similar capability.
> Every item on that list is "boring" tech. Approximately everyone have used load balancers, test environments and monitoring since the 90s just fine. What is it that you think make Kubernetes especially suited for this compared to every other solution during the past three decades?
You could make the same argument against using cloud at all, or against using CI. The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
> The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
Kubernetes definitely makes things consistent, but I do not think that it makes them easy.
There’s certainly a lot to learn from Kubernetes, but I strongly believe that a more tasteful successor is possible, and I hope that it is inevitable.
I haven't worked in k8s, but really what is being argued is that it is a cross cloud standardization API, largely because the buzzword became big enough that the cloud providers conformed to it rather than keep their API moat.
However all clouds will want API moats.
It is also true that k8s appears too complex for the low end, and there is a strong lack of a cross cloud standardization (maybe docker but that is too low) for that use case.
K8s is bad at databases. So k8s is incomplete as well. It also seems to lack good UIs, but that impression/claim may be only lack of exposure.
What is blindingly true to me is that the building blocks at a cli level for running and manipulating processes/programs/servers in a data center, what was once kind of called a "dc os" is really lacking.
Remote command exec needs ugly ssh wrapping assuming the network boundaries are free enough (k8s requires an open network between all servers iirc), and of course ssh is under attack by teleport and other enterprise fiefdom builders.
Docker was a great start. Parallel ssh is a crude tool.
I've tried multiple times to make a swarm admin tool that was cross cloud and cross framework and cross command and stdin srrout stderr transport agnostic. It's hard.
But none of those things are easy. All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
Sure, it means that two people that already understand k8s can easily exchange or handover a project, which might be harder to understand if done with other means. But that's about the only bonus it brings in most situations.
> All cloud environments are fairly complex and kubernetes is not something that you just do in an afternoon. You need to learn about how it works, which takes about the same time as using 'simpler' means to do things directly.
The first time you do it, sure, like any other tool. But once you're comfortable with it and have a working setup, you can bash out "one more service deployment" in a few minutes. That's the key capability.
The other bonus is most opensource software support a Kubernetes deployment. This means I can find software and have it deployed pretty quickly.
Kubernetes is boring tech as well.
And the advantage of it is one way to manage resources, scaling, logging, observability, hardware etc.
All of which is stored in Git and so audited, reviewed, versioned, tested etc in exactly the same way.
> But it is complex, mostly unnecessarily so
Unnecessary complexity sounds like something that should be fixed. Can you give an example?
Kubernetes is great example of the "second-system effect".
Kubernetes only works if you have a webapp written in a slow interpreted language. For anything else it is a huge impedance mismatch with what you're actually trying to do.
P.S. In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities. I'm sure there might be an easier way to solve that problem without dragging in Google's ridiculous and broken tech stack.
> It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets
That depends on your definition. If the ops team is solely responsibly for running the Kubernetes cluster, then yes. In reality that's rarely how things turns out. Developers want Kubernetes, because.... I don't know. Ops doesn't even want Kubernetes in many cases. Kubernetes is amazing, for those few organisations that really need it.
My rule of thumb is: If your worker nodes aren't entire physical hosts, then you might not need Kubernetes. I've seen some absolutely crazy setups where developers had designed this entire solution around Kubernetes, only to run one or two containers. The reasoning is pretty much always the same, they know absolutely nothing about operations, and fail to understand that load balancers exists outside of Kubernetes, or that their solution could be an nginx configuration, 100 lines of Python and some systemd configuration.
I accept that I lost the fight that Kubernetes is overly complex and a nightmare to debug. In my current position I can even see some advantages to Kubernetes, so I was at least a little of in my criticism. Still I don't think Kubernetes should be your default deployment platform, unless you have very specific needs.
I think I live in the real world and your statement is not true for any of the projects I've been involved in. Kubernetes is absolutely used to solve real technical problems that would otherwise require a lot of work to solve. I would say as a rule it's not a webapp in a slow interpreted language that's hosted in k8s. It truly is about decoupling from the need to manage machines and operating systems at a lower level and being able to scale seamlessly.
I'm really not following on the impedance mismatch from what you're actually trying to do. Where is that impedance mismatch? Let's take a simple example, Elastic Search and the k8s operator. You can edit a single line in your yaml and grow your cluster. That takes care of resources, storage, network etc. Can you do this manually with Elastic running on bare metal or in containers or in VM? Absolutely, it's a nightmare, non-replicable, process that will take you days. You don't need elastic search, or you never need to scale it, fine. You can run it on a single machine and lose all your data if that machine dies - fine.
I’m curious if you’ve ever built and maintained a k8s cluster capable of reliably hosting an ES cluster? Because I have, and it was painful enough that we swapped to provisioning real HW with ansible. It is much easier to manage.
I should note, we still manage a K8s cluster, but not for anything using persistent storage.
> In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities.
At my company I’m both the dev and the ops team, and I’ve used Kubernetes and found it pleasant and intuitive? I’m able to have confidence that situations that arise in production can be recreated in dev, updates are easy, I can tie services together in a way that makes sense. I arrived at K8s after rolling my own scripts and deployment methods for years and I like its well-considered approach.
So maybe resist passing off your opinions as sweeping generalizations about “the real world”.
Contrary to popular belief, k8s is not Google's tech stack.
My understanding is that it was initially sold as Google's tech to benefit from Google's tech reputation (exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers), and today it's also Google trying to pose as k8s inventor, to benefit from its popularity. Interesting case of host/parasite symbiosis, it seams.
Just my impression though, I can be wrong, please comment if you know more about the history of k8s.
Is there anyone that works at Google that can confirm this?
What's left of Borg at Google? Did the company switch to the open source Kubernetes distribution at any point? I'd love to know more about this as well.
> exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers
What about the fact that many active Kubernetes developers, are also active Googlers?
I'm an Ex-Google SRE. Kubernetes is not Borg, will never be Borg, and Borg does not need to borrow from k8s - Most of the "New Features" in K8s were things Google had been doing internally for 5+ years before k8s launched. Many of the current new features being added to k8s are things that Google has already studied and rejected - It breaks my heart to see k8s becoming actively worse on each release.
A ton of the experience of Borg is in k8s. Most of the concepts translate directly. The specifics about how borg works have changed over the years, and will continue to change, but have never really matched K8s - Google is in the business of deploying massive fleets, and k8s has never really supported cluster sizes above a few thousand. Google's service naming and service authentication is fully custom, and k8s is... fine, but makes a lot of concessions to more general ideas. Google was doing containerization before containerization was a thing - See https://lkml.org/lkml/2006/9/14/370 ( https://lwn.net/Articles/199643/ doesn't elide the e-mail address) for the introduction of the term to the kernel.
The point of k8s was to make "The Cloud" an attractive platform to deploy to, instead of EC2. Amazon EC2 had huge mindshare, and Google wanted some of those dollars. Google Cloud sponsored K8s because it was a way to a) Apply Google learnings to the wider developer community and b) Reduce AWS lock-in, by reducing the amount of applications that relied on EC2 APIs specifically - K8s genericized the "launch me a machine" process. The whole goal was making it easier for Google to sell it's cloud services, because the difference in deployment models (Mostly around lifetimes of processes, but also around how applications were coupled to infrastructure) were a huge impedance to migrating to the "cloud". Kubernetes was an attempt to make an attractive target - That would work on AWS, but commoditized it, so that you could easily migrate to, or simply target first, GCP.
Thank you for the exhaustive depiction of the situation. Also an ex SRE from long ago, although not for borg. One of the learnings I took with me is that there is no technical solution that is good for several orders of magnitude. The tool you need for 10 servers is not the one you need for 1000, etc.
kubernetes is an API for your cluster, that is portable between providers, more or less. there are other abstractions, but they are not portable, e.g. fly.io, DO etc. so unless you want a vendor lock-in, you need it. for one of my products, I had to migrate due to business reasons 4 times into different kube flavors, from self-manged ( 2 times ) to GKE and EKS.
> there are other abstractions, but they are not portable
Not true. Unix itself is an API for your cluster too, like the original post implies.
Personally, as a "tech lead" I use NixOS. (Yes, I am that guy.)
The point is, k8s is a shitty API because it's built only for Google's "run a huge webapp built on shitty Python scripts" use case.
Most people don't need this, what they actually want is some way for dev to pass the buck to ops in some way that PM's can track on a Gantt chart.
I'm not an insider but afaik anything heavy lifting in Google is C++ or Go. There's no way you can use Python for anything heavy at Google scale, it's just too slow and bloated.
Most stuff I've seen run on k8s is not crappy webapp in Python. If anything that is less likely to be hosted in k8s.
I'm not sure why you call k8s api shitty. What is the NixOS API for "deploy an auto-scaling application with load balancing and storage"? Does NixOS manager clusters?
How much experience do you have with k8s?
There is no such thing as "auto-scaling".
You can only "auto-scale" something that is horizontally scalable and trivially depends on on the number of incoming requests. I.e., "a shitty web-app". (A well designed web-app doesn't need to be "auto-scaled" because you can serve the world from three modern servers. StackOverflow only uses nine and has done so for years.)
As an obvious example, no database can be "auto-scaled". Neither can numeric methods.
If you think StackOverflow is the epitome of scale, then your view of the world is somewhat limited. I worked for a flash sale site in 2008 that had to handle 3 million users, all trying to connect to your site simultaneously to buy a minimal supply of inventory. After 15 minutes of peak scale, traffic will scale back down by 80-90%. I am pretty sure StackOverflow never had to deal with such a problem.
> If you answer yes to many of those questions there's really no better alternative than k8s.
This is not even close to true with even a small number of resources. The notion that k8s somehow is the only choice is right along the lines of “Java Enterprise Edition is the only choice” — ie a real failure of the imagination.
For startups and teams with limited resources, DO, fly.io and render are doing lots of interesting work. But what if you can’t use them? Is k8s your only choice?
Let’s say you’re a large orgs with good engineering leadership, and you have high-revenue systems where downtime isn’t okay. Also for compliance reasons public cloud isn’t okay.
DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
Load Balancers: lots of non-k8s options exist (software and hardware appliances).
Prometheus / Grafana (and things like Netdata) work very well even without k8s.
Load Balancing and Ingress is definitely the most interesting piece of the puzzle. Some choose nginx or Envoy, but there’s also teams that use their own ingress solution (sometimes open-sourced!)
But why would a team do this? Or more appropriately, why would their management spend on this? Answer: many don’t! But for those that do — the driver is usually cost*, availability and accountability, along with engineering capability as a secondary driver.
(*cost because it’s easy to set up a mixed ability team with experienced, mid-career and new engineers for this. You don’t need a team full of kernel hackers.)
It costs less than you think, it creates real accountability throughout the stack and most importantly you’ve now got a team of engineers who can rise to any reasonable challenge, and who can be cross pollinated throughout the org. In brief the goal is to have engineers not “k8s implementers” or “OpenShift implementers” or “Cloud Foundry implementers”.
> DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
And it will likely be buggy with all sorts of edge cases.
> Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
In my experience financial services have been notably not doing it.
> Load Balancers: lots of non-k8s options exist (software and hardware appliances).
The problem isn't running a load balancer with a given configuration at a given point in time. It's how you manage the required changes to load balancers and configuration as time goes on. It's very common for that to be a pile of perl scripts that add up to an ad-hoc informally specified bug-ridden implementation of half of kubernetes.
> And it will likely be buggy with all sorts of edge cases.
I have seen this view in corporate IT teams who’re happy to be “implementers” rather than engineers.
In real life, many orgs will in fact have third party vendor products for internal DNS and cert authorities. Writing bridge APIs to these isn’t difficult and it keeps the IT guys happy.
A relatively few orgs have written their own APIs, typically to manage a delegated zone. Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares. The proof of the pudding is how well it works.
Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
> manage changes to load balancers … perl
That’s a very black and white view, that teams are either on k8s (which to you is the bees knees) or a pile of Perl (presumably unmaintainable). Speaks to interesting unconscious bias.
Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
But if your starter stance is that “k8s is the only way”, no one can talk you out of your own mental hard lines.
> Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares.
Agreed, but internal products are generally buggier, because an internal product is in a kind of monopoly position. You generally want to be using a product that is subject to competition, that is a profit center rather than a cost center for the people who are making it.
> Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
Your team probably aren't DNS experts, and why should they be? You're not a DNS company. If you could make a better DNS - or a better DNS-deployment integration - than the pros, you'd be selling it. (The exception is if you really are a DNS company, either because you actually do sell it, or because you have some deep integration with DNS that enables your competitive advantage)
> Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
I'd say that's a contradiction in terms, because modern best practice is to not run your own stack.
I don't particularly like kubernetes qua kubernetes (indeed I'd generally pick nomad instead). But I absolutely do think you need a declarative, single-source-of-truth way of managing your full deployment, end-to-end. And if your deployment is made up of a standard load balancer (or an equivalent of one), a standard DNS, and prometheus or grafana, then you've either got one of these products or you've got an internal product that does the same thing, which is something I'm extremely skeptical of for the same reason as above - if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it? (And if an engineer was capable of creating a better solution to this standard problem, why would they work for you rather than one of the big cloud corps?)
In the same way I'm very skeptical of any company with an "internal cloud" - in my experience such a thing is usually a significantly worse implementation of AWS, and, yes, is usually held together with some flaky Perl scripts. Or an internal load balancer. It's generally NIH, or at best a cost-cutting exercise which tends to show; a company might have an internal cloud that's cheaper than AWS (I've worked for one), but you'll notice the cheapness.
Now again, if you really are gaining a competitive advantage from your things then it may make sense to not use a standard solution. But in that case you'll have something deeply integrated, i.e. monolithic, and that's precisely the case where you're not deploying separate standard DNS, separate standard load balancers, separate standard monitoring etc.. And in that case, as grandparent said, not using k8s makes total sense.
But if you're just deploying a standard Rails (or what have you) app with a standard database, load balancer, DNS, monitoring setup? Then 95% of the time your company can't solve that problem better than the companies that are dedicated to solving that problem. Either you don't have a solution at all (beyond doing it manually), you use k8s or similar, or you NIH it. Writing custom code to solve custom problems can be smart, but writing custom code to solve standard problems usually isn't.
> if your company was capable of creating a better solution to this standard problem, why wouldn't you be selling it?
Let's pretend I'm the greatest DevOps software developer engineer ever, and I write a Kubernetes replacement that's 100x better. Since it's 100x better, I simply charge 100x as much as it costs per CPU/RAM for a Kubernetes license to a 1,000 customers, and take all of that money to the bank and I deposit my check for $0.
I don't disagree with the rest of the comment, but the market for the software to host a web app is a weird market.
> and I deposit my check for $0.
Given the number of Nomad fans that show up to every one of these threads, I don't think that's the whole story given https://www.hashicorp.com/products/nomad/pricing (and I'll save everyone the click: it's not $0)
Reasonable people can 100% disagree about approaches, but I don't think the TAM for "software to host a web app" is as small as you implied (although it certainly would be if we took your description literally)
fly.io, vercel, and heroku shows you're right about the TAM for the broader problem, and that it's possible to capture some value somewhere, but that's a different beast entirely than just selling a standard solution to a standard problem.
Developers are a hard market to sell to, and deployment software is no exception.
> If you answer yes to many of those questions there's really no better alternative than k8s.
Nah, most of that list is basically free for any company that uses an amazon loadbalancer and an autoscale group. In terms of likeliness of incidents, time, and cost, those will each be an order of magnitude higher with a team of kubernetes engineers than less complex setup.
Oz Nova nailed it nicely in "You Are Not Google"
https://blog.bradfieldcs.com/you-are-not-google-84912cf44afb
If you were Google k8s wouldn't cut it. I have experience of both options in multiple projects. Managing containers yourself and the surrounding infrastructure vs. using k8s. k8s just works. It's a mature ecosystem. It's really not as hard as people make of it. Replicating all the functionality of k8s and the ecosystem yourself is a ton more work.
There are definitely wide swaths of applications that don't need container, don't need high availability, don't need load balancing, don't need monitoring, don't need any of this stuff or need some simpler subset. Then by all means don't k8s, don't use containers etc.
If I need "some" of the above, Kubernetes forces me to grapple with "all" of the above. I think that is the issue.
Containerization and orchestration of containers vs learning how to configure HaProxy, how to use Certbot, hmmmm
The questions you pose are legit skills web developers need to have. Nothing you mentioned is obviated by K8s or containerization.
"oh but you can get someone elses pre-configured image" uh huh... sure, you can also install malware. You will also need to one day maintain or configure the software running in them. You may even need to address issues your running software causes. You can't do that without mastering the software you are running!
On the other hand, my team slapped 3 servers down in a datacenter, had each of them configured in a Proxmox cluster within a few hours. Some 8-10 hours later we had a fully configured kubernetes cluster running within Proxmox VMs, where the VMs and k8s cluster are created and configured using an automation workflow that we have running in GitHub Actions. An hour or two worth of work later we had several deployments running on it and serving requests.
Kubernetes is not simple. In fact it's even more complex than just running an executable with your linux distro's init system. The difference in my mind is that it's more complex for the system maintainer, but less complex for the person deploying workloads to it.
And that's before exploring all the benefits of kubernetes-ecosystem tooling like the Prometheus operator for k8s, or the horizontally scalable Loki deployments, for centrally collecting infrastructure and application metrics, and logs. In my mind, making the most of these kinds of tools, things start to look a bit easier even for the systems maintainers.
Not trying to discount your workplace too much. But I'd wager there's a few people that are maybe not owning up to the fact that it's their first time messing around with kubernetes.
As long as your organisation can cleanly either a) split the responsibility for the platform from the responsibility for the apps that run on it, and fund it properly, or b) do the exact opposite and accommodate all the responsibility for the platform into the app team, I can see it working.
The problems start when you're somewhere between those two points. If you've got a "throw it over the wall to ops" type organisation, it's going to go bad. If you've got an underfunded platform team so the app team has to pick up some of the slack, it's going to go bad. If the app team have to ask permission from the platform team before doing anything interesting, it's going to go bad.
The problem is that a lot of organisations will look at k8s and think it means something it doesn't. If you weren't willing to fund a platform team before k8s, I'd be sceptical that moving to it is going to end well.
People really underestimate the power of a shell scripts and ssh and trusted developers.
> People really underestimate the power of a shell scripts and ssh and trusted developers.
On the other hand, you seem to be underestimating the fact that even the best, most trusted developer can make a mistake from time to time. It's no disgrace, it's just life.
Besides the fact that shell scripts aren't scalable (in terms of horizontal scalability like actor model), I would also like to point out that shell scripts should be simple, but if you want to handle something that big, you essentially and definitely is using it as a programming language in disguise -- not ideal and I would like to go Go or Rust instead.
We don't live in 1999 any more. A big machine with a database can serve ervyone in the US and I can fit it in my closet.
It's like people are stuck in the early 2000s when they start thinking about computer capabilities. Today I have more flops in a single GPU under my desk than did the worlds largest super computer in 2004.
> It's like people are stuck in the early 2000s when they start thinking about computer capabilities.
This makes sense, because the code people write makes machines feel like they're from the early 2000's.
This is partially a joke, of course, but I think there is a massive chasm between the people who think you immediately need several computers to do things for anything other than redundancy, and the people who see how ridiculously much you can do with one.
> It's like people are stuck in the early 2000s when they start thinking about computer capabilities.
Partly because the "cloud" makes all its money renting you 2010s-era hardware at inflated prices, and people are either too naive or their career is so invested in it that they can't admit to being ripped off and complicit of the scam.
That's what gets me about AWS.
When it came out in 2006 the m1.small was about what you'd get on a mid range desktop at that point. It cost $876 a year [0]. Today for an 8 core machine with 32 gb ram you'll pay $3145.19 [1].
It used to take 12-24 months for you to pay enough AWS bills that it would make sense to buy the hardware outright. Now it's 3 months or less for every category and people still defend this. For ML work stations it's weeks.
[0] https://aws.amazon.com/blogs/aws/dropping-prices-again-ec2-r...
[1] https://instances.vantage.sh/aws/ec2/m8g.2xlarge?region=us-e...
Hardware has gotten so much cheaper and easier and yet everyone is happy nobody has "raised prices".....
I added performance testing to all our endpoints from the start, so that people don’t start to normalize those 10s response times that our last system had (cry)
Well that's what happens when you move away from compiled languages to interpreted.
> Besides the fact that shell scripts aren't scalable…
What are you trying to say there? My understanding is that, way under the hood, a set of shell scripts is in fact enabling the scalable nature of… the internet.
...that's only for early internet, and the early internet is effing broken at best
> My understanding is that, way under the hood, a set of shell scripts is in fact enabling the scalable nature of… the internet.
I sure hope not. The state of error handling in shell scripts alone is enough to disqualify them for serious production systems.
If you're extremely smart and disciplined it's theoretically possible to write a shell script that handles error states correctly. But there are better things to spend your discipline budget on.
My half tongue-in-cheek comment was implying things like "you can't boot a linux/bsd box without shell scripts" which would make the whole "serving a website" bit hard.
I realize that there exists OS's that are an exception to this rule. I didn't understand the comment about scripts scaling. It's a script, it can do whatever you want.
Shell scripts don't scale up to implementing complex things, IME. If I needed to do something complex that had to be a shell script for some reason, I'd probably write a program to generate that shell script rather than writing it by hand, and I think many of those system boot scripts etc. are generated rather than written directly.
Are you self hosting kubernetes or running it managed?
I've only used it managed. There is a bit of a learning curve but it's not so bad. I can't see how it can take 4 months to figure it out.
We are using EKS
> I can't see how it can take 4 months to figure it out.
Well have you ever tried moving a company with a dozen services onto kubernetes piece-by-piece, with zero downtime? How long would it take you to correctly move and test every permission, environment variable, and issue you run into?
Then if you get a single setting wrong (e.g. memory size) and don't load-test with realistic traffic, you bring down production, potentially lose customers, and have to do a public post-mortem about your mistakes? [true story for current employer]
I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing.
Took us three-four years to go from self hosted multi-dc to getting the main product almost fully in k8s (some parts didn't make sense in k8s and was pushed to our geo-distributed edge nodes). Dozens of services and teams and keeping the old stuff working while changing the tire on the car while driving. All while the company continues to grow and scale doubles every year or so. It takes maturity in testing and monitoring and it takes longer that everyone estimates
It sounds like it's not easy to figure out the permissions, envvars, memory size, etc. of your existing system, and that's why the migration is so difficult? That's not really one of Kubernetes' (many) failings.
Yes, and now we are back at the ancestor comment’s original point: “at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place”
Which I understand to mean “some people think using Kubernetes will make managing a system easier, but it often will not do that”
Can you elaborate on other things you think Kubernetes gets wrong? Asking out of curiosity because I haven't delved deep into it.
It's good you asked, but I'm not ready to answer it in a useful way. It depends entirely on your use cases.
Some un-nuanced observations as starting points:
- Helm sucks, but so does Kustomize
- Cluster networking and security is annoying to set up
- Observability is awkward. Some things aren't exposed as cluster metrics/events, so you need to look at, say, service and pod state. It's not easy to see, e.g. how many times your app OOMed in the last hour.
- There's a lot of complexity you can avoid for a while, but eventually some "simple" use case will only be solvable that way, and now you're doing service meshes.
Maybe "wrong" is the wrong word, but there are spots that feel overkill, and spots that feel immature.
> - Helm sucks, but so does Kustomize
Helm != Kubernetes, FWIW
I'd argue that Kustomize is the bee's knees but editor support for it sucks (or, I'd also accept that the docs suck, and/or are missing a bazillion examples so us mere mortals could enlighten ourselves to what all nouns and verbs are supported in the damn thing)
> how many times your app OOMed in the last hour.
heh, I'd love to hear those "shell scripts are all I need" folks chime in on how they'd get metrics for such a thing :-D (or Nomad, for that matter)
That said, one of the other common themes in this discussion is how Kubernetes jams people up because there are a bazillion ways of doing anything, with wildly differing levels of "it just works" versus "someone's promo packet that was abandoned". Monitoring falls squarely in the bazillion-ways category, in that it for sure does not come batteries included but there are a lot of cool toys if one has the cluster headroom to install them
https://github.com/google/cadvisor/blob/v0.51.0/metrics/prom... which is allegedly exposed in kubelet since 2022 https://github.com/kubernetes/kubernetes/pull/108004 (I don't have a cluster in front of me to test it, though)
https://github.com/kubernetes/kube-state-metrics/blob/v2.14.... shows the metric that KSM exposes in kube_pod_container_status_terminated_reason
https://opentelemetry.io/docs/specs/semconv/attributes-regis... shows the OTel version of what I suspect is that same one
And then in the "boil the ocean" version one could egress actual $(kubectl get events -w) payloads if using something where one is not charged by the metric: https://github.com/open-telemetry/opentelemetry-collector-co...
It largely depends how customized each microservice is, and how many people are working on this project.
I've seen migrations of thousands of microservices happening with the span of two years. Longer timeline, yes, but the number of microservices is orders of magnitude larger.
Though I suppose the organization works differently at this level. The Kubernetes team build a tool to migrate the microservices, and each owner was asked to perform the migration themselves. Small microservices could be migrated in less than three days, while the large and risk-critical ones took a couple weeks. This all happened in less than two years, but it took more than that in terms of engineer/weeks.
The project was very successful though. The company spends way less money now because of the autoscaling features, and the ability to run multiple microservices in the same node.
Regardless, if the company is running 12 microservices and this number is expected to grow, this is probably a good time to migrate. How did they account for the different shape of services (stateful, stateless, leader elected, cron, etc), networking settings, styles of deployment (blue-green, rolling updates, etc), secret management, load testing, bug bashing, gradual rollouts, dockerizing the containers, etc? If it's taking 4x longer than originally anticipated, it seems like there was a massive failure in project design.
2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Similarly, the actual migration times you estimate add up to decades of engineer time.
It’s possible kubernetes saves more time than using the alternative costs, but that definitely wasn’t the case at my previous two jobs. The jury is out at the current job.
I see the opportunity cost of this stuff every day at work, and am patiently waiting for a replacement.
> 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Learning k8s enough to be able to work with it isn't that hard. Have a centralized team write up a decent template for a CI/CD pipeline, Dockerfile for the most common stacks you use and a Helm chart with an example for a Deployment, PersistentVolumeClaim, Service and Ingress, distribute that, and be available for support should the need for Kubernetes be beyond "we need 1-N pods for this service, they got some environment variables from which they are configured, and maybe a Secret/ConfigMap if the application rather wants configuration to be done in files" is enough in my experience.
> Learning k8s enough to be able to work with it isn't that hard.
I’ve seen a lot of people learn enough k8s to be dangerous.
Learning it well enough to not get wrapped around the axle with some networking or storage details is quite a bit harder.
For sure but that's the job of a good ops department - where I work at for example, every project's CI/CD pipeline has its own IAM user mapping to a Kubernetes role that only has explicitly defined capabilities: create, modify and delete just the utter basics. Even if they'd commit something into the Helm chart that could cause an annoyance, the service account wouldn't be able to call the required APIs. And the templates themselves come with security built-in - privileges are all explicitly dropped, pod UIDs/GIDs hardcoded to non-root, and we're deploying Network Policies at least for ingress as well now. Only egress network policies aren't available, we haven't been able to make these work with services.
Anyone wishing to do stuff like use the RDS database provisioner gets an introduction from us on how to use it and what the pitfalls are, and regular reviews of their code. They're flexible but we keep tabs on what they're doing, and when they have done something useful we aren't shy from integrating whatever they have done to our shared template repository.
> 2000 products sounds like you made 2000 engineers learn kubernetes (a week, optimistically, 2000/52 = 38 engineer years, or roughly one wasted career).
Not really, they only had to use the tool to run the migration and then validate that it worked properly. As the other commenter said, a very basic setup for kubernetes is not that hard; the difficult set up is left to the devops team, while the service owners just need to see the basics.
But sure, we can estimate it at 38 engineering years. That's still 38 years for 2,000 microservices; it's way better than 1 year for 12 microservices like in OP's case. Savings that we got was enough to offset these 38 years of work, so this project is now paying dividends.
Comparing the simplicity of two PHP servers against a setup with a dozen services is always going to be one sided. The difference in complexity alone is massive, regardless of whether you use k8s or not.
My current employer did something similar, but with fewer services. The upshot is that with terraform and helm and all the other yaml files defining our cluster, we have test environments on demand, and our uptime is 100x better.
Fair enough that sounds hard.
Memory size is an interesting example. A typical Kubernetes deployment has much more control over this than a typical non-container setup. It is costing you to figure out the right setting but in the long term you are rewarded with a more robust and more re-deployable application.
> has much more control over this than a typical non-container setup
Actually not true, k8s uses the exact same cgroups API for this under the hood that systemd does.
> I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing.
Unfortunately, I do. Somebody says that when the culture of the organization expects to be told and hear what they want to hear rather than the cold hard truth. And likely the person saying that says it from a perch up high and not responsible for the day to day work of actually implementing the change. I see this happen when the person, management/leadership, lacks the skills and knowledge to perform the work themselves. They've never been in the trenches and had to actually deal face to face with the devil in the details.
Canary deploy dude (or dude-ette), route 0.001% of service traffic and then slowly move it over. Then set error budgets. Then a bad service wont "bring down production".
Thats how we did it at Google (I was part of the core team responsible for ad serving infra - billions of ads to billions of users a day)
Using microk8s or k3s on one node works fine. As the author of "one big server," I am now working on an application that needs some GPUs and needs to be able to deploy on customer hardware, so k8s is natural. Our own hosted product runs on 2 servers, but it's ~10 containers (including databases, etc).
Yup, I like this approach a lot. With cloud providers considering VMs durable these days (they get new hardware for your VM if the hardware it's on dies, without dropping any TCP connections), I think a 1 node approach is enough for small things. You can get like 192 vCPUs per node. This is enough for a lot of small companies.
I occasionally try non-k8s approaches to see what I'm missing. I have a small ARM machine that runs Home Assistant and some other stuff. My first instinct was to run k8s (probably kind honestly), but didn't really want to write a bunch of manifests and let myself scope creep to running ArgoCD. I decided on `podman generate systemd` instead (with nightly re-pulls of the "latest" tag; I live and die by the bleeding edge). This was OK, until I added zwavejs, and now the versions sometimes get out of sync, which I notice by a certain light switch not working anymore. What I should have done instead was have some sort of git repository where I have the versions of these two things, and to update them atomically both at the exact same time. Oh wow, I really did need ArgoCD and Kubernetes ;)
I get by with podman by angrily ssh-ing in in my winter jacket when I'm trying to leave my house but can't turn the lights off. Maybe this can be blamed on auto-updates, but frankly anything exposed to a network that is out of date is also a risk, so, I don't think you can ever really win.
Yea but that doesn't sound shiny on your resume.
I never did choose any single thing in my job, just because of how it could look in my resume.
After +20 years of Linux sysadmin/devops, and because a spinal disc herniation last year, now I'm looking for a job.
99% of job offers, will ask for EKS/Kubernetes now.
It's like the VMware of the years 200[1-9], or like the "Cloud" of the years 201[1-9].
I've always specialized in physical datacenters and servers, being it on-premises, colocation, embedded, etc... so I'm out of the market now, at least in Spain (which always goes like 8 years behind the market).
You can try to avoid it, and it's nice when you save thousands of operational/performance/security/etc issues and dollars to your company across the years, and you look like a guru that goes ahead of industry issue to your boss eyes, but, it will make finding a job... 99% harder.
It doesn't matter if you demonstrate the highest level on Linux, scripting, ansible, networking, security, hardware, performance tuning, high availability, all kind of balancers, switching, routing, firewalls, encryption, backups, monitoring, log management, compliance, architecture, isolation, budget management, team management, provider/customer management, debugging, automation, programming full stack, and a long etc. If you say "I never worked with Kubernetes, but I learn fast", with your best sincerity at the interview, then you're automatically out of the process. No matter if you're talking with human resources, a helper of the CTO, or the CTO. You're out.
If you say "I never worked with X, but I learn fast", with your best sincerity at the interview, then you're automatically out of the process.
Where X can be not just k8s but any other bullet point on the job req.
It's interesting that the very things that people used to say to get the job 20 years ago -- and not as a plattitude (it's a perfectly reasonable and intelligent thing to say, and in a rational world, exactly what one would hope to hear from a candidate) -- are now considered as red flags that immediately disqualify one for the job.
Very sorry to hear about your current situation - best of luck.
Ive never heard of this - has this been your direct experience?
It's somewhat speculative (because no one ever tells you the reason for dropping your application or not contacting you in the first place) but the impression I have, echoed by what many others seem to be saying, is that the process has shifted greatly from "Is this a strong, reliable, motivated person?" (with toolchain overlap being mostly gravy) to "Do they have 5-8 recent years of X, Y and Z?".
As if years of doing anything is a reliable predictor of anything, or can even be effectively measured.
Depends on what kind of company you want to join. Some value simplicity and efficiency more.
I think porting to k8s can succeed or fail, like any other project. I switched an app that I alone worked on, from Elastic Beanstalk (with Bash), to Kubernetes (with Babashka/Clojure). It didn't seem bad. I think k8s is basically a well-designed solution. I think of it as a declarative language which is sent to interpreters in k8s's control plane.
Obviously, some parts of took a while to figure out. For example, I needed to figure out an AWS security group problem with Ingress objects, that I recall wasn't well-documented. So I think parts of that declarative language can suck, if the declarative parts aren't well factored-out from the imperative parts. Or if the log messages don't help you diagnose errors, or if there isn't some kind of (dynamic?) linter that helps you notice problems quickly
In your team's case, more information seems needed to help us evaluate the problems. Why was it easier before to make testing environments, and harder now?
So, my current experience somewhere most old apps are very old school:
- most server software is waaaaaaay out of date so getting a dev / test env is a little harder (like last problem we got was the HAproxy version does not do ECDA keys for ssl certs, which is the default with certbot) - yeah pushing to prod is "easy": FTP directly. But now which version of which files are really in prod? No idea. Yeah when I say old school it's old school before things like Jenkins. - need something done around the servers? That's the OPS team job. Team which also has too much different work to do so now you'll have to wait a week or two for this simple "add an upload file" endpoint to this old API because you need somewhere to put those files.
Now we've started setting up some on-prem k8s nodes for the new developments. Not because we need crazy scaling but so the dev team can do most OPS they need. It takes time to have everything setup but once it started chugging along it felt good to be able to just declare whatever we need and get it. You still need to get the devs to learn k8s which is not fun but that's the life of a dev: learning new things every day.
Also k8s does not do data. You want a database or anything managing files: you want to do most of the job outside k8s.
Kubernetes is so easy that you only need two or three dedicated full-time employees to keep the mountains of YAML from collapsing in on themselves before cutting costs and outsourcing your cluster management to someone else.
Sure, it can be easy, just pick one of the many cloud providers that fix all the complicated parts for you. Though, when you do that, expect to pay extra for the privilege, and maybe take a look at the much easier proprietary alternatives. In theory the entire thing is portable enough that you can just switch hosting providers, in practice you're never going to be able to do that without seriously rewriting part of your stack anyway.
The worst part is that the mountains of YAML were never supposed to be written by humans anyway, they're readable configuration your tooling is supposed to generate for you. You still need your bash scripts and your complicated deployment strategies, but rather than using them directly you're supposed to compile them into YAML first.
Kubernetes is nice and all but it's not worth the effort for the vast majority of websites and services. WordPress works just fine without automatic replication and end-to-end microservice TLS encryption.
I went down the Kubernetes path. The product I picked 4 years ago is no longer maintained :(
The biggest breaking change to docker compose since it was introduced was that the docker-compose command stopped working and I had to switch to «docker compose» with a space. Had I stuck with docker and docker-compose I could have trivially kept everything up to date and running smoothly.
I ran small bootstrapped startup , I used GKE. Everything was templated.
each app has it's own template e.g. nodejs-worker, and you don't change the template unless you really needed.
i spent ~2% of my manger+eng leader+hiring manger+ god knows what else people do at startup on managing 100+ microservices because they were templates.
We may have a different understanding of "small" if you say 100+ services.
Did each employee have 2 to 3 services to maintain? If so, that sounds like an architectural mistake to me.
That works great until you want to change something low-level and have to apply it to all those templates.
That's when you go a level deeper and have every template use another template (e.g. a Helm subchart or Helm library) only to realize scoping and templating is completely fucked in Helm.
Nonono you.missed the part where you add Infra as Code, with requires another 2-3 fulltime yaml engineers to be really vendor agnostic.
> to be really vendor agnostic.
This is an anti-pattern in my opinion. If you're on cloud provider A, might as well just write code for cloud provider A. If and when you'll be asked to switch to B you'll change the code to work on both A and B.
This is so unnuanced that it reads like rationalization to me. People seem to get stuck on mantras that simple things are inherently fragile which isn't really true, or at least not particularly more fragile than navigating a jungle of yaml files and k8s cottage industry products that link together in arcane ways and tend to be very hard to debug, or just to understand all the moving parts involved in the flow of a request and thus what can go wrong. I get the feeling that they mostly just don't like that it doesn't have professional aesthetics.
This reminds me of the famous Taco Bell Programming post [1]. Simple can surprisingly often be good enough.
[1] http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra...
woah, that's a fantastic blog post!
"""After all, functionality is an asset, but code is a liability."""
I say this all the time! But i've not heard others saying this also. Great to see some like minded developers!!
> People seem to get stuck on mantras that simple things are inherently fragile which isn't really true...
Ofc it isn't true.
Kubernetes was designed at Google at a time when Google was already a behemoth. 99.99% of all startups and SMEs out there shall never ever have the same scaling issues and automation needs that Google has.
Now that said... When you begin running VMs and containers, even only a very few of them, you immediately run into issues and then you begin to think: "Kubernetes is the solution". And it is. But it is also, in many cases, a solution to a problem you created. Still... the justification for creating that problem, if you're not Google scale, are highly disputable.
And, deep down, there's another very fundamental issue IMO: many of those "let's have only one process in one container" solutions actually mean "we're totally unable to write portable software working on several configs, so let's start with a machine with zero libs and dependencies and install exactly the minimum deps needed to make our ultra-fragile piece of shit of a software kinda work. And because it's still going to be a brittle piece of shit, let's make sure we use heartbeats and try to shut it down and back up again once it'll invariably have memory leaked and/or whatnots".
Then you also gained the right to be sloppy in the software you write: not respecting it. Treating it as cattle to be slaughtered, so it can be shitty. But you've now added an insane layer of complexity.
How do you like your uninitialized var when a container launchs but then silently doesn't work as expected? How do you like them logs in that case? Someone here as described the lack of instant failure on any uninitialized var as the "billion dollar mistake of the devops world".
Meanwhile look at some proper software like, say, the Linux kernel or a distro like Debian. Or compile Emacs or a browser from source and marvel at what's happening. Sure, there may be hickups but it works. On many configs. On many different hardware. On many different architectures. These are robust software that don't need to be "pid 1 on a pristine filesystem" to work properly.
In a way this whole "let's have all our software each as pid 1 each on a pristine OS and filesystem" is an admission of a very deep and profound failure of our entire field.
I don't think it's something to be celebrated.
And don't get me started on security: you know have ultra complicated LANs and VLANs, with a near impossible to monitor traffic, with shitloads of ports open everywhere, the most gigantic attack surface of them all and heartbeats and whatsnots constantly polluting the network, where nobody doesn't even know anymore what's going on. Where the only actual security seems to rely on the firewall being up and correctly configured, which is incredibly complicated to do seen the insane network complexity you added to your stack. "Oh wait, I have an idea, let's make configuring the firewall a service!" (and make sure to not forget to initialize one of the countless var or it'll all silently break and just be not be configuring firewalling for anything).
Now though love is true love: even at home I'm running an hypervisor with VMs and OCI containers ; )
> Meanwhile look at some proper software like, say, the Linux kernel or a distro like Debian. Or compile Emacs or a browser from source and marvel at what's happening. Sure, there may be hickups but it works. On many configs. On many different hardware. On many different architectures. These are robust software
Lol no. The build systems flake out if you look at them funny. The build requirements are whatever Joe in Nebraska happened to have installed on his machine that day (I mean sure there's a text file supposedly listing them, but it hasn't been accurate for 6 years). They list systems that they haven't actually supported for years, because no-one's actually testing them.
I hate containers as much as anyone, but the state of "native" unix software is even worse.
+1 for talking about attack surface. Every service is a potential gateway for bad people. Locking them all down is incredibly difficult to get right.
99.99% of startups and SMEs should not be writing microservices.
But "I wrote a commercial system that served thousands of users, it ran on a single process on a spare box out the back" doesn't look good on resumes.
I love that the only alternative is a "pile of shell scripts". Nobody has posted a legitimate alternative to the complexity of K8S or the simplicity of doctor compose. Certainly feels like there's a gap in the market for an opinionated deployment solution that works locally and on the cloud, with less functionality than K8S and a bit more complexity than docker compose.
K8s just drowns out all other options. Hashicorp Nomad is great, https://www.nomadproject.io/
I am puzzled by the fact that no successful forks of Nomad and Consul have emerged since the licence change and acquisition of Hashicorp.
If you need a quick scheduler, orchestrator and services control pane without fully embracing containers you might soon be out of luck.
Nomad was amazing at every step of my experiments on it, except one. Simply including a file from the Nomad control to the Nomad host is... impossible? I saw indications of how to tell the host to get it from a file host, and I saw people complaining that they had to do it through the file host, with the response being security (I have thoughts about this and so did the complainants).
I was rather baffled to an extent. I was just trying to push a configuration file that would be the primary difference between a couple otherwise samey apps.
https://github.com/hashicorp/nomad/blob/v1.6.0/website/conte... seems to have existed since before the license rug-pull. However I'm open to there being some miscommunication because https://developer.hashicorp.com/nomad/docs/glossary doesn't mention the word "control" and the word "host" could mean any number of things in this context
+1 to miscommunication, but host_volume is indeed what I’ve used to allow host files into the chroot. Not all drivers support it, and there are some nomad config implications, but it otherwise works great for storing db’s or configurations.
Thumbs up for Nomad. We've been running it for about 3 years in prod now and it hasn't failed us a single time.
I coined a term for this because I see it so often.
“People will always defend complexity, stating that the only alternative is shell scripts”.
I saw people defending docker this way, ansible this way and most recently systemd this way.
Now we’re on to kubernetes.
>and most recently systemd this way.
To be fair, most people attacking systemd say they want to return to shell scripts.
No, there are alternatives like runit and SMF that do not use shell scripts.
Its conveniently ignored by systemd-supporters and the conversation always revolves around the fact that we used to use shell scripts. Despite the fact that there are sensible inits that predate systemd that did not use shell languages.
Hey, systemd supporter here and yes, I do ignore runit and SMF.
systemd is great and has essentially solved the system management problem once and for all. it's license is open enough not to worry about it.
SMF is proprietary oracle stuff.
Runit... tried a few years ago on void linux (I think?) and was largely unimpressed.
Runit absolutely uses shell scripts. All services are started via a shell script that exec's the final process with the right environment / arguments. If you use runit as your system init, the early stages are also shell scripts.
At least I never saw anyone arguing that the only alternative to git was shell scripts.
Wait. Wouldn't that be a good idea?
Kamal was also built with that purpose in mind.
This looks cool and +1 for the 37Signals and Basecamp folks. I need to verify that I'll be able to spin up GPU enabled containers, but I can't imagine why that wouldn't work...
Docker Swarm is exactly what tried to fill that niche. It's basically an extension to Docker Compose that adds clustering support and overlay networks.
Docker Swarm is a good idea that sorely needs a revival. There are lots of places that need something more structured than a homemade deploy.sh, but less than... K8s.
Completely adectonal, but I see more and more people using in /r/selfhosted
This is basically exactly what we needed at the start up I worked at, with the added need of being able to host open source projects (airbyte, metabase) with a reasonable level of confidence.
We ended up migrating from Heroku to Kubernetes. I tried to take some of the learnings to build https://github.com/czhu12/canine
It basically wraps Kubernetes and tries to hide as much complexity from Kubernetes as possible, and only expose the good parts that will be enough for 95% of web application work loads.
I've personally been investing heavily in [Incus](https://linuxcontainers.org/incus/), which is the Linux Containers project fork and continuation of LXD post Canonical takeover of the LXD codebase. The mainline branch has been seeing some rapid growth, with the ability to deploy OCI Application Containers in addition to the System containers (think Xen paravirtualized systems if you know about those) and VMs, complete with clustering and SDN. There's work by others in the community to create [incus-compose](https://github.com/bketelsen/incus-compose), a way to use Compose spec manifests to define application stacks. I'm personally working on middleware to expose instance options under the user keyspace to a Redis API compliant KV store for use with Traefik as an ingress controller.
Too much to go into with what Incus does to tell you everything in a comment, but for me, Incus really feels like the right level of "old school" infrastructure platform tooling with "new school" cloud tech to deploy and manage application stacks, the odd Windows VM that accounting/HR/whoever needs to do that thing that can't be done anywhere else, and a great deal more.
For others interested in such things, colima also supports it: https://github.com/abiosoft/colima/tree/v0.8.0#incus
While not opinionated but you can go with cloud specific tools (e.g. ECS in AWS).
Sure, but those don't support local deployment, at least not in any sort of easy way.
That very much depends on what you’re doing. ECS works great if your developers can start a couple of containers, but if they need a dense thicket of microservices and cloud infrastructure you’re probably going to need remote development environments once you outgrow what you can do with localstack but that’s not really something Kubernetes fixes and really means that you want to reconsider your architecture.
Remember using shell scripts to remove some insane node/js-brain-thonk hints, being easier than trying to reverse engineering how the project was supposed to be "compiled" to properly use those hints.
Docker Swarm mode? I know it’s not as well maintained, but I think it’s exactly what you talk about here (forget K3s, etc). I believe smaller companies run it still and it’s perfect for personal projects. I myself run mostly docker compose + shell scripts though because I don’t really need zero-downtime deployments or redundancy/fault tolerance.
Somebody gave me the advice that we shouldn't start our new project on k8s, but should instead adopt it only after its value became apparent.
So we started by using docker swarm mode for our dev env, and made it all the way to production using docker swarm. Still using it happily.
I hate to shill my own company, but I took the job because I believe in it.
You should check out DBOS and see if it meets your middle ground requirements.
Works locally and in the cloud, has all the things you’d need to build a reliable and stateful application.
[0] https://dbos.dev
Looking at your page, it looks like Lambdas/Functions but on your system, not Amazon/Microsoft/Google.
Every company I've ever had try to do this has ended in crying after some part of the system doesn't fit neat into Serverless box and it becomes painful to extract from your system into "Run FastAPI in containers."
We run on bare metal in AWS, so you get access to all your other AWS services. We can also run on bare metal in whatever cloud you want.
Sure but I'm still wrapped around your library no? So if your "Process Kafka events" decorator in Python doesn't quite do what I need to, I'm forced to grab the Kafka library, write my code and then learn to build my own container since I assume you were handling the build part. Finally, figure out which 17 ways to run containers on AWS (https://www.lastweekinaws.com/blog/the-17-ways-to-run-contai...) is proper for me and away I go?
That's my SRE recommendation of "These serverless are a trap, it's quick to get going but you can quickly get locked into a bad place."
No, not at all. We run standard python, so we can build with any kafka library. Our decorator is just a subclass of the default decorator to add some kafka stuff, but you can use the generic decorator around whatever kafka library you want. We can build and run any arbitrary Python.
But yes, if you find there is something you can't do, you would have to build a container for it or deploy it to an instance of however you want. Although I'd say that mostly likely we'd work with you to make whatever it is you want to do possible.
I'd also consider that an advantage. You aren't locked into the platform, you can expand it to do whatever you want. The whole point of serverless is to make most things easy, not all things. If you can get your POC working without doing anything, isn't that a great advantage to your business?
Let's be real, if you start with containers, it will be a lot harder to get started and then still hard to add whatever functionality you want. Containers doesn't really make anything easier, it just makes things more consistent.
Nice, but I like my servers and find serverless difficult to debug.
That's the beauty of this system. You build it all locally, test it locally, debug it locally. Only then do you deploy to the cloud. And since you can build the whole thing with one file, it's really easy to reason about.
And if somehow you get a bug in production, you have the time travel debugger to replay exactly what the state of the cloud was at the time.
Great to hear you've improved serverless debugging. What if my endpoint wants to run ffmpeg and extract frames from video. How does that work on serverless?
That particular use case requires some pretty heavy binaries and isn't really suited to serverless. However, you could still use DBOS to manage chunking the work and managing to workflows to make sure every frame is only processed once. Then you could call out to some of the existing serverless offerings that do exactly what you suggest (extract frames from video).
Or you could launch an EC2 instance that is running ffmpeg and takes in videos and spits out frames, and then use DBOS to manage launching and closing down those instances as well as the workflows of getting the work done.
Looks interesting, but this is a bit worrying:
... build reliable AI agents with automatic retries and no limit on how long they can
run for.
It's pretty easy to see how that could go badly wrong. ;)(and yeah, obviously "don't deploy that stuff" is the solution)
---
That being said, is it all OSS? I can see some stuff here that seems to be, but it mostly seems to be the client side stuff?
Maybe that is worded poorly. :). It's supposed to mean there are no timeouts -- you can wait as long as you want between retries.
> That being said, is it all OSS?
The Transact library is open source and always will be. That is what you gets you the durability, statefulness, some observability, and local testing.
We also offer a hosted cloud product that adds in the reliability, scalability, more observability, and a time travel debugger.
Capistrano, Ansible et al. have existed this whole time if you want to do that.
The real difference in approaches is between short lived environments that you redeploy from scratch all the time and long lived environments we nurse back to health with runbooks.
You can use lambda, kube, etc. or chef, puppet etc. but you end up at this same crossroad.
Just starting a process and keeping it alive for a long time is easy to get started with but eventually you have to pay the runbook tax. Instead you could pay the kubernetes tax or the nomad tax at the start instead of the 12am ansible tax later.
Powershell or even typescript are better suit for deploying stuff but for some reason the industry sticks to bash and python spaghetti.
> some reason
Probably because except in specific niche industries, every Linux box you ever experience is extremely likely to have Bash and Python installed.
Also, because Powershell is hideously verbose and obnoxious, and JS and its ilk belong on a frontend, not running servers.
> The inscrutable iptables rules?
You mean the list of calls right there in the shell script?
> Who will know about those undocumented sysctl edits you made on the VM?
You mean those calls to `sysctl` conveniently right there in the shell script?
> your app needs to programmatically spawn other containers
Or you could run a job queue and push tasks to it (gaining all the usual benefits of observability, concurrency limits, etc), instead of spawning ad-hoc containers and hoping for the best.
"We don't know how to learn/read code we are unfamiliar with... Nor do we know how to grok and learn things quickly. Heck, we don't know what grok means "
Who do you quote?
This quote mostly applies to people who don't want to spend the time learning existing tooling, making improvements and instead create a slightly different wheel but with different problems. It also applies to people trying to apply "google" solutions to a non-google company.
Kubernetes and all tooling in the cloud native computing foundation(CNCF) were created to have people adopt the cloud and build communities that then created jobs roles that facilitated hiring people to maintain cloud presences that then fund cloud providers.
This is the same playbook that Microsoft did at Universities. They would give the entire suite of tools in the MSDN library away then then in roughly (4) years collect when another seat needs to be purchased for a new hire that has only used Microsoft tools for the last (4) years.
> You mean the list of calls right there in the shell script?
This is about the worst encoding for network rules I can think of.
Worse than yaml generated by string interpolation?
You'd have to give me an example. YAML is certainly better at representing tables of data than a shell script is.
Not entirely a fair comparison, but here. Can you honestly tell me you'd take the yaml over the shell script?
(If you've never had to use Helm, I envy you. And if you have, I genuinely look forward to you showing me an easier way to do this, since it would make my life easier.)
-------------------------------------
Shell script:
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
Multiple ports: for port in 80 443 8080; do
iptables -A INPUT -p tcp --dport "$port" -j ACCEPT
done
Easy and concise.-------------------------------------
Kubernetes (disclaimer: untested, obviously)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
spec:
trafficPolicy:
firewall:
rules:
- name: allow-port-8080
ports:
- port: 8080
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/name: my-app
Multiple ports: firewall:
rules:
- name: allow-port-80
ports:
- port: 80
protocol: TCP
- name: allow-port-443
ports:
- port: 443
protocol: TCP
- name: allow-port-8080
ports:
- port: 8080
protocol: TCP
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: firewall
spec:
trafficPolicy:
firewall:
rules:
{{- range .Values.firewall.rules }}
- name: {{ .name }}
ports:
{{- range .ports }}
- port: {{ .port }}
protocol: {{ .protocol }}
{{- end }}
{{- end }}
podSelector:
matchLabels:
app.kubernetes.io/name: my-app
I don't know why on earth you'd use mustache with yaml, but the unmustached yaml is much more readable. The reviewer doesn't even need to know iptables. (Which is good; i've only ever worked with nftables (which has the same issue of leaning in to serializing tables as commands) and pf.) Concision is not working in your favor here.
I would take the YAML any day.
Because if one of those iptables fails above you're in an inconsistent state.
Also if I want to swap from iptables to something like Istio then it's basically the same YAML.
> Because if one of those iptables fails above you're in an inconsistent state.
These days iptables is a legacy interface implemented on top of nftables. And nftables does provide atomic rule replacement: https://wiki.nftables.org/wiki-nftables/index.php/Atomic_rul...
So you would have a file with something like:
table inet filter {
chain input {
tcp dport 8080 accept
}
}
The you would atomic apply it with: $ nft -f input_file
You obviously didn't use k8s (or k3s or anything other implementation) a lot, because it also messed us iptables randomly sometimes due to bugs, version miss match etc.
Have been Kubernetes for the last decade across multiple implementations.
Never had an iptable issue and these days eBPF is the standard.
Everyone talking about Kubernetes as if it was merely a "hyperscaler" and the biggest benefit of Kubernetes over a bunch of custom scripts is consistency and the ability to have everyone work on an industry standard, which makes it easier to onboard new hires, to write scripts and documentation against, etc…
Repeat with me: K8s is but an API.
I'm giggling at the idea you'd need Kubernetes for a mere two servers. We don't run any application with less than two instances for redundancy.
We've just never seen the need for Kubernetes. We're not against it as much as the need to replace our working setup just never arrived. We run EC2 instances with a setup shell script under 50loc. We autoscale up to 40-50 web servers at peak load of a little over 100k concurrent users.
Different strokes for different folks but moreso if it ain't broke, don't fix it
Highly amateurish take if you call shell spaghetti a Kubernates, especially if we compare complexity of both...
You know what would be even more bad? Introducing kubernates for your non-Google/Netflix/WhateverPlanetaryScale App instead of just writing few scripts...
Hell, I’m a fan of k8s even for sub-planetary scale (assuming that scale is ultimately a goal of your business, it’s nice to build for success). But I agree that saying “well, it’s either k8s or you will build k8s yourself” is just ignorant. There are a lot of options between the two poles that can be both cheap and easy and offload the ugly bits of server management for the right price and complexity that your business needs.
Both this piece and the piece it’s imitating seem to have 2 central implicit axioms that in my opinion don’t hold. The first, that the constraints of the home grown systems are all cost and the second that the flexibility of the general purpose solution is all benefit.
You generally speaking do not want a code generation or service orchestration system that will support the entire universe of choices. You want your programs and idioms to follow similar patterns across your codebase and you want your services architected and deployed the same way. You want to know when outliers get introduced and similarly you want to make it costly enough to require introspection on if the value of the benefit out ways the cost of oddity.
The compiler one read to me like a reminder to not ignore the lessons of compiler design. The premise being that even though you have small scope project compared to a "real" compiler, you will evolve towards analogues of those design ideas. The databases and k8s pieces are more like don't even try a small scope project because you'll want the same features eventually.
I suppose I can see how people are taking this piece that way, but I don't see it like that. It is snarky and ranty, which makes it hard to express or perceive nuance. They do explicitly acknowledge that "a single server can go a long way" though.
I think the real point, better expressed, is that if you find yourself building a system with like a third of the features of K8s but composed of hand-rolled scripts and random third-party tools kludged together, maybe you should have just bit the bullet and moved to K8s instead.
You probably shouldn't start your project on it unless you have a dedicated DevOps department maintaining your cluster for you, but don't be afraid to move to it if your needs start getting more complex.
Author here. Yes there were many times while writing this that I wanted to insert nuance, but couldn't without breaking the format too much.
I appreciate the wide range of interpretations! I don't necessarily think you should always move to k8s in those situations. I just want people to not dismiss k8s outright for being overly-complex without thinking too hard about it. "You will evolve towards analogues of those design ideas" is a good way to put it.
That's also how I interpreted the original post about compilers. The reader is stubbornly refusing to acknowledge that compilers have irreducible complexity. They think they can build something simpler, but end up rediscovering the same path that lead to the creation of compilers in the first place.
I had a hard time putting my finger on what was so annoying about the follow-ons to the compiler post, and this nails it for me. Thanks!
> You generally speaking do not want a code generation or service orchestration system that will support the entire universe of choices.
This. I will gladly give up the universe of choices for a one size fits most solution that just works. I will bend my use cases to fit the mold if it means not having to write k8s configuration in a twisty maze of managed services.
I like to say, you can make anything look good by considering only the benefits and anything look bad by considering only the costs.
It's a fun philosophy for online debates, but an expensive one to use in real engineering.
outweighs*
Only offering the correction because I was confused at what you meant by “out ways” until I figured it out.
The elephant in the room: People who have gotten over the K8s learning curve almost all tell you it isn't actually that bad. Most people who have not attempted the learning curve, or have just dipped their toe in, will tell you they are scared of the complexity.
An anecdotal datapoint: My standard lecture teaching developers how to interact with K8s takes almost precisely 30 minutes to have them writing Helm charts for themselves. I have given it a whole bunch of times and it seems to do the job.
> People who have gotten over the K8s learning curve almost all tell you it isn't actually that bad.
I have been using K8s for nearly a decade. I use it both professionally and personally. I chose to use it personally. I appreciate why it exists, and I appreciate what it does. I believe that I have gotten over the learning curve.
And I will tell you: it really is that bad. I mean, it’s not worse than childhood cancer. But it is a terrible, resource-heavy, misbegotten system. Its data structures are diseased. Its architecture is baroque. It is a disaster.
But it’s also useful, and there is currently no real alternative. There really should be, though. I strongly believe that there can be, and I hope that there will be.
The first time someone started writing templated YAML should have been the moment of clarity.
> My standard lecture teaching developers how to interact with K8s takes almost precisely 30 minutes to have them writing Helm charts for themselves
And I can teach someone to write "hello world" in 10 languages in 30 minutes, but that doesn't mean they're qualified to develop or fix production software.
One has to start from somewhere I guess. I doubt anyone would learn K8s thoroughly before getting any such job. Tried once and the whole thing bored me out in the fourth video.
I personally know many k8s experts that vehemently recommend against using it unless you have no other option.
Dear Amazon Elastic Beanstalk, Google App Engine, Heroku, Digital Ocean App Platform, and friends,
Thank you for building "a kubernetes" for me so I don't have to muck with that nonsense, or have to hire people that do.
I don't know what that other guy is talking about.
Big fan of Digital Ocean App Platform here. Deploying, running and scaling containers is so easy and their very good abstractions and tooling are worth the money.
Much like Javascript, the problem isn't Kubernetes, its the zillions of half-tested open-source libraries that promise to make things easier but actually completely obfuscate what the system is doing while injecting fantastic amounts of bugs.
Author presumes we need Docker in the first paragraph. I suppose this is why they think we need Kubernetes. I propose using "the operating system" as the basic-unit here. It already runs on shared hardware thanks to a hypervisor. Operating systems know how to network.
The entire desire for Docker in a production app comes down to willful ignorance of how software one depends upon is configured.
Most of the complaints in this fun post are just bad practice, and really nothing to do with “making a Kubernetes”.
Sans bad engineering practices, if you built a system that did the same things as kubernetes I would have no problem with it.
In reality I don’t want everybody to use k8s. I want people finding different solutions to solve similar problems. Homogenized ecosystems create walls they block progress.
One is the big things that is overlooked when people move to k8s, and why things get better when moving to k8s, is that k8s made a set of rules that forced service owners to fix all of their bad practices.
Most deployment systems would work fine if the same work to remove bad practices from their stack occurred.
K8s is the hot thing today, but mark my words, it will be replaced with something far more simple and much nicer to integrate with. And this will come from some engineer “creating a kubernetes”
Don’t even get me started on how crappy the culture of “you are doing something hard that I think is already a solved problem” is. This goes for compilers and databases too. None is these are hard, and neither is k8s, and all good engineers tasked with making one, be able to do so.
I welcome a k8s replacement! Just how there are better compilers and better databases than we had 10-20 years ago, we need better deployment methods. I just believe those better methods came from really understanding the compilers and databases that came before, rather than dismissing them out of hand.
Can you give examples of what "bad practices" does k8s force to fix?
To name a few:
K8s really kills the urge to say “oh well I guess we can just do that file onto the server as a part of startup rather than use a db/config system/etc.” No more “oh shit the VM died and we lost the file that was supposed to be static except for that thing John wrote to update it only if X happened, but now X happens everyday and the file is gone”.. or worse: it’s in git but now you have 3 different versions that have all drifted due to the John code change.
K8s makes you use containers, which makes you not run things on your machine, which makes you better at CI, which.. (the list goes on, containers are industry standard for a lot of reasons). In general the 12 Factor App is a great set of ideas, and k8s lets you do them (this is not exclusive, though). Containers alone are a huge game changer compared to “cp a JAR to the server and restart it”
K8s makes it really really really easy to just split off that one weird cronjob part of the codebase that Mike needed and man, it would be really nice to just use the same code and dependencies rather than boilerplating a whole new app and deploy, CI, configs, and yamls to make that run. See points about containerization.
K8s doesn’t assume that your business will always be a website/mobile app. See the whole “edge computing” trend.
I do want to stress that k8s is not the only thing in the world that can do these or promote good development practices, and I do think it’s overkill to say that it MAKES you do things well - a foolhardy person can mess any well-intentioned system up.
So you're saying companies should move to k8s and then immediately move to bash scripts
No. I am saying that companies should have their engineers understand why k8s works and make those reasons an engineering practice.
As it is today the pattern is spend a ton of money moving to k8s (mostly costly managed solutions) in the process fix all the bad engineering patterns, forced by k8s. To then have an engineer save the company money by moving back to a more home grown solution, a solution that fits the companies needs and saves money, something that would only be possible once the engineering practices were fixed.
I think one thing that is under appreciated with kubernetes is how massive the package library is. It becomes trivial to stand up basically every open source project with a single command via helm. It gets a lot of hate but for medium sized deployments, it’s fantastic.
Before helm, just trying to run third party containers on bare metal resulted in constant downtime when the process would just hang for no reason, and and engineer would have to SSH and manually restart the instance.
We used this as a previous start up to host metabase, sentry and airbyte seamlessly, on our own cluster. Which let us break out of the constant price increases we faced for hosted versions of these products.
Shameless plug: I’ve been building https://github.com/czhu12/canine to try to make Kubernetes easier to use for solo developers. Would love any feedback from anyone looking to deploy something new to K8s!
Right, but this isn't a post about why K8s is good, it's a post about why K8s is effectively mandatory, and it isn't, which is why the post rankles some people.
Exactly, an alternative reading here is "metabase, sentry and airbyte are so complicated to self-host you'll need Kubernetes for it".
Yeah I mostly agree. I'd even add that even K8 YAML's are not trivial to maintain, especially if you need to have them be produced by a templating engine.
They become trivial once you stop templating them with text templating engine.
They are serialized json objects, the YAML is there just because raw JSON is not user friendly when you need something done quick and dirty or include comments.
Proper templating should never use text templating on manifests.
Kubernetes biggest competitor isn’t a pile of bash scripts and docker running on a server, it’s something like ECS which comes with a lot of the benefits but a hell of a lot less complexity
FWIW I’ve been using ECS at my current work (previously K8s) and to me it feels just flat worse:
- only some of the features
- none of the community
- all of the complexity but none of the upsides.
It was genuinely a bit shocking that it was considered a serious product seeing as how chaotic it was.
Can you elaborate on some of the issues you faced? I was considering deploying to ECS fargate as we are all-in on AWS.
Any kind of git-ops style deployment was out.
ECS merges “AWS config” and “app/deployment config together” so it was difficult to separate “what should go in TF, and what is a runtime app configuration. In comparison this is basically trivial ootb with K8s.
I personally found a lot of the moving parts and names needlessly confusing. Tasks e.g. were not your equivalent to “Deployment”.
Want to just deploy something like Prometheus Agent? Well, too bad, the networking doesn’t work the same, so here’s some overly complicated guide where you have to deploy some extra stuff which will no doubt not work right the first dozen times you try. Admittedly, Prom can be a right pain to manage, but the fact that ECS makes you do _extra_ work on top of an already fiddly piece of software left a bad taste in my mouth.
I think ECS get a lot of airtime because of Fargate, but you can use Fargate on K8s these days, or, if you can afford the small increase in initial setup complexity, you can just have Fargates less-expensive, less-restrictive, better sibling: Karpenter on Spot instances.
I think the initial setup complexity is less with ECS personally, and the ongoing maintenance cost is significantly worse on K8s when you run anything serious which leads to people taking shortcuts.
Every time you have a cluster upgrade with K8s there’s a risk something breaks. For any product at scale, you’re likely to be using things like Istio and Metricbeat. You have a whole level of complexity in adding auth to your cluster on top of your existing SSO for the cloud provider. We’ve had to spend quite some time changing the plugin for AKS/EntraID recently which has also meant a change in workflow for users. Upgrading clusters can break things since plenty of stuff (less these days) lives in beta namespaces, and there’s no LTS.
Again, it’s less bad than it was, but many core things live(d) in plugins for clusters which have a risk of breaking when you upgrade cluster.
My view was that the initial startup cost for ECS is lower and once it’s done, that’s kind of it - it’s stable and doesn’t change. With K8s it’s much more a moving target, and it requires someone to actively be maintaining it, which takes time.
In a small team I don’t think that cost and complexity is worth it - there are so many more concepts that you have to learn even on top of the cloud specific ones. It requires a real level of expertise so if you try and adopt it without someone who’s already worked with it for some time you can end up in a real mess
If your workloads are fairly static,ECS is fine. Bringing up new containers and nodes takes ages with very little feedback as to what's going on. It's very frustrating when iterating on workloads.
Also fargate is very expensive and inflexible. If you fit the narrow particular use case it's quicker for bringing up workloads, but you pay extra for it.
Can confirm. I've used ECS with Fargate successfully at multiple companies. Some eventually outgrew it. Some failed first. Some continue to use ECS happily.
Regardless of the outcome, it always felt more important to keep things simple and focus on product and business needs.
Like, okay, if that's how you see it, but what's with the tone and content?
The tone's vapidity is only comparable to the content's.
This reads like mocking the target audience rather than showing them how you can help.
A write up that took said "pile of shell scripts that do not work" and showed how to "make it work" with your technology of choice would have been more interesting than whatever this is.
> Spawning containers, of course, requires you to mount the Docker socket in your web app, which is wildly insecure
Dear friend, you are not a systems programmer
To expand on this, the author is describing the so-called "Docker-out-of-Docker (DooD) pattern", i.e. exposing Docker's Unix socket into the container. Since Docker was designed to work remotely (CLI on another machine than DOCKER_HOST), this works fine, but essentially negates all isolation.
For many years now, all major container runtimes support nesting. Some make it easy (podman and runc just work), some hard (systemd-nspawn requires setting many flags to work nested). This is called "Docker-in-a-Docker (DinD)".
FreeBSD has supported nesting of jails natively since version 8.0, which dates back to 2009.
I prefer FreeBSD to K8s.
It sometimes blows my mind how reductionist and simplistic a world-view it's possible to have and yet still attain some degree of success.
Shovels and mechanical excavators both exist and have a place on a building site. If you talk to a workman he may well tell you he has regular hammer with him at all times but will use a sledgehammer and even rent a pile driver on occasion if the task demands it.
And yet somehow we as software engineers are supposed to restrict ourselves to The One True Tool[tm] (which varies based on time and fashion) and use it for everything. It's such an obviously dumb approach that even people who do basic manual labour realise its shortcomings. Sometimes they will use a forklift truck to move things, sometimes an HGV, sometimes they will put things in a wheelbarrow and sometimes they will carry them by hand. But us? No. Sophisticated engineers as we are there is One Way and it doesn't matter if you're a 3 person startup or you're Google, if you deploy once per year to a single big server or multiple times per day to a farm of thousands of hosts you're supposed to do it that one way no matter what.
The real rule is this: Use your judgement.
You're supposed to be smart. You're supposed to be good. Be good. Figure out what's actually going on and how best to solve the problems in your situation. Don't rely on everyone else to tell you what to do or blindly apply "best practises" invented by someone who doesn't know a thing about what you're trying to do. Yes consider the experiences of others and learn from their mistakes where possible, but use your own goddamn brain and skill. That's why they pay you the big bucks.
“Everything is a tradeoff” is the popular (if not the most popular) maxim in distributed computing.
I think we need to distinguish between two cases:
For a hobby project, using Docker Compose or Podman combined with systemd and some shell scripts is perfectly fine. You’re the only one responsible, and you have the freedom to choose whatever works best for you.
However, in a company setting, things are quite different. Your boss may assign you new tasks that could require writing a lot of custom scripts. This can become a problem for other team members and contractors, as such scripts are often undocumented and don’t follow industry standards.
In this case, I would recommend using Kubernetes (k8s), but only if the company has a dedicated Kubernetes team with an established on-call rotation. Alternatively, I suggest leveraging a managed cloud service like ECS Fargate to handle container orchestration.
There’s also strong competition in the "Container as a Service" (CaaS) space, with smaller and more cost-effective options available if you prefer to avoid the major cloud providers. Overall, these CaaS solutions require far less maintenance compared to managing your own cluster.
> dedicated Kubernetes team with an established on-call rotation.
Using EKS or GKS is basically this. K8s is much nicer than ECS in terms of development and packaging your own apps.
Up until a few thousand instances, a well designed setup should be a part time job for a couple of people.
To that scale you can write a custom orchestrator that is likely to be smaller and simpler than the equivalent K8S setup. Been there, done that.
How would you feel if bash scripts were replaced with Ansible playbooks?
At a previous job at a teeny startup, each instance of the environment is a docker-compose instance on a VPS. It works great, but they’re starting to get a bunch of new clients, and some of them need fully independent instances of the app.
Deployment gets harder with every instance because it’s just a pile of bash scripts on each server. My old coworkers have to run a build for each instance for every deploy.
None of us had used ansible, which seems like it could be a solution. It would be a new headache to learn, but it seems like less of a headache than kubernetes!
Ansible is better than Bash if your goals include:
* Automating repetitive tasks across many servers.
* Ensuring idempotent configurations (e.g., setting up web servers, installing packages consistently).
* Managing infrastructure as code for better version control and collaboration.
* Orchestrating complex workflows that involve multiple steps or dependencies.
However, Ansible is not a container orchestrator.
Kubernetes (K8s) provides capabilities that Ansible or Docker-Compose cannot match. While Docker-Compose only supports a basic subset, Kubernetes offers:
* Advanced orchestration features, such as rolling updates, health checks, scaling, and self-healing.
* Automatic maintenance of the desired state for running workloads.
* Restarting failed containers, rescheduling pods, and replacing unhealthy nodes.
* Horizontal pod auto-scaling based on metrics (e.g., CPU, memory, or custom metrics).
* Continuous monitoring and reconciliation of the actual state with the desired state.
* Immediate application of changes to bring resources to the desired configuration.
* Service discovery via DNS and automatic load balancing across pods.
* Native support for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) for storage management.
* Abstraction of storage providers, supporting local, cloud, and network storage.
If you need these features but are concerned about the complexity of Kubernetes, consider using a managed Kubernetes service like GKE or EKS to simplify deployment and management. Alternatively, and this is my prefered option, combining Terraform with a Container-as-a-Service (CaaS) platform allows the provider to handle most of the operational complexity for you.
Ansible ultimately runs scripts, in parallel, in a defined order across machines. It can help a lot, but it's subject to a lot of the same state bitrot issues as a pole of shell scripts.
I was using some ansible playbook scripts to deploy to production some web app. One day the scripts stopped working because of a boring error about python version mismatch.
I rewrite all the deployment scripts with bash (took less than a hour) and never had a problem since.
Morality: it's hard to find the right tool for the job
I was very scared of K8s for a long time then we started using it and it's actually great. Much less complex than its reputation suggests.
I had the exact opposite experience. I had a cloud run app in gcp and experimented with moving it to k8s and I was astonished with the amount of new complexity I had to manage
Infra person here, this is such the wrong take.
> Do I really need a separate solution for deployment, rolling updates, rollbacks, and scaling.
Yes it's called an ASG.
> Inevitably, you find a reason to expand to a second server.
ALB, target group, ASG, done.
> Who will know about those undocumented sysctl edits you made on the VM
You put all your modifications and CIS benchmark tweaks in a repo and build a new AMI off it every night. Patching is switching the AMI and triggering a rolling update.
> The inscrutable iptables rules
These are security groups, lord have mercy on anyone who thinks k8s network policy is simple.
> One of your team members suggests connecting the servers with Tailscale: an overlay network with service discovery
Nobody does this, you're in AWS. If you use separate VPCs you can peer them but generally it's just editing some security groups and target groups. k8s is forced into needing to overlay on an already virtual network because they need to address pods rather than VMs, when VMs are your unit you're just doing basic networking.
You reach for k8s when you need control loops beyond what ASGs can provide. The magic of k8s is "continuous terraform," you will know when you need it and you likely never will. If your infra moves from one static config to another static config on deploy (by far the usual case) then no k8s is fine.
You’d be swapping an open-source vendor independent API for a cloud-specific vendor locked one. And paying more for the “privilege”
I mean that's the sales pitch but it's really not vendor independent in practice. We have a mountain of EKS specific code. It would be easier for me to migrate our apps that use ASGs than to migrate our charts. AWS's API isn't actually all that special, they're just modeling the datacenter in code. Anywhere you migrate to will have all the same primitives because the underlying infrastructure is basically the same.
EKS isn't any cheaper either from experience and in hindsight of course it isn't, it's backed by the same things you would deploy without EKS just with another layer. The dream of gains from "OS overhead" and efficient tight-packed pod scheduling doesn't match the reality that our VMs are right-sized for our workloads already and aren't sitting idle. You can't squeeze that much water from the stone even in theory and in practice k8s comes with its own overhead.
Another reason to use k8s is the original:
When you deploy on physical hardware, not VMs, or have to otherwise optimize maximum utilization out of gear you have.
Especially since sometimes Cloud just means hemorrhaging money in comparison to something else, especially with ASGs
We found that the savings from switching from VMs in ASGs to k8s never really materialized. OS overhead wasn't actually that much and once you're requesting cpu / memory you can't fit as many pods per host as you think.
Plus you're competing with hypervisors for maxing out hardware which is rock solid stable.
My experience was quite the opposite, but it depends very much on the workload.
That is, I didn't say the competition was between AWS ASGs and k8s running on EC2, but having already a certain amount of capacity that you want to max out in flexible ways.
You don't need to use an overlay network. Calico works just fine without an overlay.
I'm sure the American Sewing Guild is fantastic, but how do they help here?
I don’t think scale is the only consideration for using Kubernetes. The ops overhead in managing traditional infrastructure, especially if you’re a large enterprise, drops massively if you really buy into cloud native. Kubernetes converges application orchestration, job scheduling, scaling, monitoring/observability, networking, load balancing, certificate management, storage management, compute provisioning - and more. In a typical enterprise, doing all this requires multiple teams. Changes are request driven and take forever. Operating systems need to be patched. This all happens after hours and costs time and money. When properly implemented and backed by the right level of stakeholder, I’ve seen orgs move to business day maintenance, while gaining the confidence to release during peak times. It’s not just about scale, it’s about converging traditional infra practices into a single, declarative and eventually consistent platform that handles it all for you.
> Inevitably, you find a reason to expand to a second server.
The author has some good points, but not every project needs multiple servers for the same reasons as a typical Kubernetes setup. In many scenarios those servers are dedicated to separate tasks.
For example, you can have a separate server for a redundant copy of your application layer, one server for load balancing and caching, one or more servers for the database, another for backups, and none of these servers requires anything more than separate Docker Compose configs for each server.
I'm not saying that Kubernetes is a bad idea, even for the hypothetical setup above, but you don't necessarily need advanced service discovery tools for every workload.
Yes, but the people just cannot comprehend the complexity of it. Even my academic professor for my FYP back when was an undergrad, now he reverted back to Docker Compose, citing the integration is so convoluted that developing for it is very difficult. That's why I'm aiming to cut down the complexity of Kubernetes with a low-friction, turnkey solution, but I guess the angel investors in Hong Kong aren't buying into it yet. I'm still aiming to try again after 2 years when I can at least get an MVP that is complete though (I don't like to present imperfect stuff, either you just have the idea or you give me the full product and not half baked shit)
I thought k8s might be a solution so I decided to learn through doing. It quickly became obvious that we didn't need 90% of its capabilities but more important it'd put undue load/training on the rest of the team. It would be a lot more sensible to write custom orchestration using the docker API - that was straightforward.
Experimenting with k8s was very much worthwhile. It's an amazing thing and was in many ways inspirational. But using it would have been swimming against the tide so to speak. So sure I built a mini-k8s-lite, it's better for us, it fits better than wrapping docker compose.
My only doubt is whether I should have used podman instead but at the time podman seemed to be in an odd place (3-4 years ago now). Though it'd be quite easy to switch now it hardly seems worthwhile.
One can build a better container orchestration than kubernetes; things don't need to be that complex.
Why do I feel this is not so simple as the compiler scenario?
I've seen a lot of "piles of YAML", even contributed to some. There were some good projects that didn't end up in disaster, but to me the same could be said for the shell.
For my own websites, I host everything on a a single $20/month hetzner instance using https://dokploy.com/ and I'm never going back.
It was horrible advice from Big Tech professionals when I asked whether I should use docker compose or Kubernetes for a project. The thing is, they were so condescending and belittling that I wasn't using Kubernetes. After switching over to Kubernetes, I realized that it was a huge mistake as I had another layer of complexity, installs, and domain specific languages that infected every part of the code base.
> In the last leg of your journey to avoid building a Kubernetes, your manager tells you that your app needs to programmatically spawn other containers. Spawning containers, of course, requires you to mount the Docker socket in your web app, which is wildly insecure.
This was true ten years ago, it's not been true for at least 2-3 years.
You can run rootless podman in kubernetes (I did) and you can launch pods from there. Securely.
You did a no-SQL, you did a serverless, you did a micro-services. This makes it abundantly clear you do not understand the nature of your architectural patterns and the multiplicity of your offenses.
What I really like are solutions that are simple to both install and maintain.
For example, look at Docker Swarm: https://docs.docker.com/engine/swarm/swarm-tutorial/create-s... or Hashicorp Nomad: https://developer.hashicorp.com/nomad/tutorials/get-started/... or Kubernetes distros like K3s: https://docs.k3s.io/quick-start
Either of those would work better than a bespoke solution a lot of the time, for anyone who is not the original person writing that hodgepodge of shell scripts, because of how much information and documentation exists out there about those solutions.
Either of those would also work better than a very complex Kubernetes cluster that was built with HA in mind and tries to be web scale for anyone who isn't a DevOps engineer that's ready to spend a bunch of time maintaining it, given how complex things can get.
Curiously, a previous article by the author calls out Docker Compose as not being good enough https://www.macchaffee.com/blog/2024/docker-compose/ and also critiques Docker Swarm without much elaboration (because that's not the main focus):
> Single-node only. Many apps never reach a point where they need more than one node, but having to either rip out your entire existing deployment method or invest in Swarm are not good options.
I've seen both Docker Compose and Docker Swarm be used in production with good results - in problem spaces where minutes or hours of downtime (e.g. non-global audience, evenings) was acceptable and having zero downtime deployments wasn't required, though usually there were no big outages anyways. Admittedly, that's a pretty good place to be in, even if not that much interesting happens there.
Just pick whatever fits your requirements, there are also very sane Kubernetes distros out there, even for something like test environments or running locally.
We chose Docker Swarm for a new project that we started two years ago, mostly because our developers liked its learning curve more than k8s's.
We are happily running on Docker Swarm in production now, and don't see that changing in the foreseeable future.
Started with a large shell script, the next itération was written in go and less specific. I still think for some things, k8s is just too much
For the uninitiated: how does k8s handle OS upgrades? If development moves to next version of Debian, because it should eventually, are upgrades, for example, 2x harder vs docker-compose? 2x easier? About the same? Is it even right question to ask?
It doesn't. The usual approach is to create new nodes with the updated OS, migrate all workloads over and then throw away the old ones
Your cluster consists of multiple machines ('nodes'). Upgrading is as simple as adding a new, upgraded node, then evicting everything from one of the existing nodes, then take it down. Repeat until every node is replaced.
Downtime is the same as with a deploment, so if you run at least 2 copies of everything there should be no downtime.
As for updating the images of your containers, you build them again with the newer base image, then deploy.
Are you talking about upgrades of the host OS or the base of the image? I think you are talking about the latter. Others covered updating the host.
Upgrades of the Docker image are done by pushing a new image, and updating the Deployment to use the new image, and applying it. Kubernetes will start new containers for the new image, and when they are running, kill off the old containers. There should be no interruption. It isn't any different than normal deploy.
i'm at this crossroads right now. somebody talk me out of deploying a dagster etl on azure kubernetes service rather than deploying all of the pieces onto azure container apps with my own bespoke scripts / config
writing this out helped me re-validate what i need to do
what did you decide to do?
kubernetes. it's well documented and better designed than whatever i could put together instead.
I am 100% sure that the author of this post has never "built a kubernetes", holds at least one kubernetes cert, and maybe even works for a company that sells kubernetes products and services. Never been more certain of anything in my life. You could go point by point but its just so tiring arguing with these people. Like, the whole "who will maintain these scripts when you go on vacation" my brother in christ have you seen the kubernetes setups some of these people invent? They are not easier to be read into, this much is absolute. At least a shell script has a chance of encoding all of its behavior in the one file, versus putting a third of its behavior in helm variables, a third in poorly-named and documented YAML keys, and a third in some "manifest orchestrator reconciler service deployment system" that's six major versions behind an open source project that no one knows who maintains anymore because their critical developer was a Belarusian 10x'er who got mad about a code of conduct that asked him to stop mispronouning contributors.
I wish the world hadn't consolidated around Kubernetes. Rancher was fantastic. Did what 95% of us need, and dead simple to add and manage services.
Did you find Rancher v2 (which uses Kubernetes instead of their own Cattle system) is worse?
Dir friend, you just built a giant YAML-based RPC system to manage your symbol table when you should have just used a linker.
>Tired, you parameterize your deploy script and configure firewall rules, distracted from the crucial features you should be working on and shipping.
Where's your Sysop?
Dear friend, you have made a slippery slope argument.
Yes, because the whole situation is a slippery slope (ony upwards). In the initial state, k8s is obviously overkill; in the end state, k8s is obviously adequate.
The problem is choosing the point of transition, and allocating resources for said transition. Sometimes it's easier to allocate a small chunk to update your bespoke script right now instead of sinking more to a proper migration. It's a typical dilemma of taking debt vs paying upfront.
(BTW the same dilemma exists with running in the cloud vs running on bare metal; the only time when a migration from the cloud is easy is the beginning, when it does not make financial sense.)
Odds are you have 100 DAUs and your "end state" is an "our incredible journey" blog post. I understand that people want to pad their resume with buzzwords on the way, but I don't accept making a virtue out of it.
Exactly. Don't start with k8s unless you're already comfortable troubleshooting it at 3am half asleep. Start with one of the things you're comfortable with. Among these things, apply YAGNI liberally, only making certain that you're not going to paint yourself into a corner.
Then, if and when you've become so large that the previous thing has become painful and k8s started looking like a really right tool for the job, allocate time and resources, plan a transition, implement it smoothly. If you have grown to such a size, you must have had a few such transitions in your architecture and infrastructure already, and learned to handle them.
Dear friend, you should first look into using Nomad or Kamal deploy instead of K8S
You mean the rugpull-stack? "Pray we do not alter the deal further when the investors really grumble" https://github.com/hashicorp/nomad/blob/v1.9.3/LICENSE
As for Kamal, I shudder to think of the hubris required to say "pfft, haproxy is for lamez, how hard can it be to make my own lb?!" https://github.com/basecamp/kamal-proxy
>I am afraid to inform you that you have built a Kubernetes. I know you wanted to "choose boring tech" to just run some containers. You said that "Kubernetes is overkill" and "it's just way too complex for a simple task" and yet, six months later, you have pile of shell scripts that do not work—breaking every time there's a slight shift in the winds of production.
Or you know, I have a set of scripts that work all the time, with 1/1000 the feature-creep and complexity of Kubernetes, tailored precisely to my needs!
why adding complexity when many services don't even need horizontal scaling, servers are powerful enough that if you're not stupid to write horrible code, it's fine for millions of requests a day without much of work
Even without needing to spawn additional Docker containers, I think people are more afraid of Kubernetes than is warranted. If you use a managed K8s service like Azure, AWS, GCP, and tons of others provide, it's... Pretty simple and pretty bulletproof, assuming you're doing simple stuff with it (i.e. running a standard web app).
The docs for K8s are incredibly bad for solo devs or small teams, and introduce you to a lot of unnecessary complexity upfront that you just don't need: the docs seem to be written with megacorps in mind who have teams managing large infrastructure migrations with existing, complex needs. To get started on a new project with K8s, you just need a pretty simple set of YAML files:
1. An "ingress" YAML file that defines the ports you listen to for the outside world (typically port 80), and how you listen to them. Using Helm, the K8s package manager, you can install a simple default Nginx-based ingress with minimal config. You probably were going to put Nginx/Caddy/etc in front of your app anyway, so why not do it this way?
2. A "service" YAML file that allocates some internal port mapping used for your web application (i.e. what port do you listen on within the cluster's network, and what port should that map to for the container).
3. A "deployment" YAML file that sets up some number of containers inside your service.
And that's it. As necessary you can start opting into more features; for example, you can add health checks to your deployment file, so that K8s auto-restarts your containers when they die, and you can add deployment strategies there as well, such as rolling deployments and limits on how many new containers can be started before old ones are killed during the deploy, etc. You can add resource requests and limits e.g. make sure my app has at least 500MB RAM, and kill+restart it if it cross 1GB. But it's actually really simple to get started! I think it compares pretty well even to the modern Heroku-replacements like Fly.io... It's just that the docs are bad and the reputation is that it's complicated — and a large part of that reputation is from existing teams who try to do a large migration, and who have very complex needs that have evolved over time. K8s generally is flexible enough to support even those complex needs, but... It's gonna be complex if you have them. For new projects, it really isn't. Part of the reason other platforms are viewed as simpler IMO is just that they lack so many features that teams with complex needs don't bother trying to migrate (and thus never complain about how complicated it is to do complicated things with it).
You can have Claude or ChatGPT walk you through a lot of this stuff though, and thereby get an easier introduction than having to pore through the pretty corporate official docs. And since K8s supports both YAML and JSON, in my opinion it's worth just generating JSON using whatever programming language you already use for your app; it'll help reduce some of the verbosity of YAML.
What you’re saying is that starting a service in kubernetes as a dev is ok, what other people say is that operating a k8s cluster is hard.
Unless I’m mistaken the managed kubernetes instances were introduced by cloud vendors because regular people couldn’t run kubernetes clusters reliably, and when they went wrong they couldn’t fix them.
Where I am, since cloud is not an option ( large mega corp with regulatory constraints ) they’ve decided to run their own k8s cluster. It doesn’t work well, it’s hard to debug, and they don’t know why it doesn’t work.
Now if you have the right people or can have your cluster managed for you, I guess it’s a different story.
Most megacorps use AWS. It's regrettable that your company can't, but that's pretty atypical. Using AWS Kubernetes is easy and simple.
Not sure why you think this is just "as a dev" rather than operating in production — K8s is much more battle-hardened than someone's random shell scripts.
Personally, I've run K8s clusters for a Very Large tech megacorp (not using managed clusters; we ran the clusters ourselves). It was honestly pretty easy, but we were very experienced infra engineers, and I wouldn't recommend doing it for startups or new projects. However, most startups and new projects will be running in the cloud, and you might as well use managed K8s: it's simple.
Sorry I wasn’t clear.
By ´As a dev´ I meant that listing three files was not going to help you when things went wrong in production, and you had bizarre network problems because of problems with your cni, kernel vs kubernetes issues, etc. I understand fully it can be made to work, but not by everyone and not everyone has the time to do so.
So I take from your explanations that as a k8s user I have two options : have very good infra engineers, or rent k8s from a cloud provider.
( side note : I work primarily in hpc, and have found that having strong infra engineers makes a world of difference , and many software engineers don’t appreciate it enough )
> Most megacorps use AWS. It's regrettable that your company can't, but that's pretty atypical.
Even then, it seems like you can run EKS yourself:
https://github.com/aws/eks-anywhere
"EKS Anywhere is free, open source software that you can download, install on your existing hardware, and run in your own data centers."
(Never done it myself, no idea if it's a good option)
Dear Friend,
This fascination with this new garbage-collected language from a Santa Clara vendor is perplexing. You’ve built yourself a COBOL system by another name.
/s
I love the “untested” criticism in a lot of these use-k8s screeds, and also the suggestion that they’re hanging together because of one guy. The implicit criticism is that doing your own engineering is bad, really, you should follow the crowd.
Here’s a counterpoint.
Sometimes just writing YAML is enough. Sometimes it’s not. Eg there are times when managed k8s is just not on the table, eg because of compliance or business issues. Then you’ve to think about self-managed k8s. That’s rather hard to do well. And often, you don’t need all of that complexity.
Yet — sometimes availability and accountability reasons mean that you need to have a really deep understanding of your stack.
And in those cases, having the engineering capability to orchestrate isolated workloads, move them around, resize them, monitor them, etc is imperative — and engineering capability means understanding the code, fixing bugs, improving the system. Not just writing YAML.
It’s shockingly inexpensive to get this started with a two-pizza team that understands Linux well. You do need a couple really good, experienced engineers to start this off though. Onboarding newcomers is relatively easy — there’s plenty of mid-career candidates and you’ll find talent at many LUGs.
But yes, a lot of orgs won’t want to commit to this because they don’t want that engineering capability. But a few do - and having that capability really pays off in the ownership the team can take for the platform.
For the orgs that do invest in the engineering capability, the benefit isn’t just a well-running platform, it’s having access to a team of engineers who feel they can deal with anything the business throws at them. And really, creating that high-performing trusted team is the end-goal, it really pays off for all sorts of things. Especially when you start cross-pollinating your other teams.
This is definitely not for everyone though!