tl;dr "we had our vLLM fork and it's unmaintainable now; guess we are going to rebuild it, in the public this time"
I get the impression their setup is very hard to maintain but it's worth every penny. They've done optimizations that wring incredible performance out of the hardware they have, but they also have specific machine configurations and I wouldn't be surprised if they have complicated hacks that get 100% speedups for some stuff but those speedups disappear if you have a slightly different motherboard configuration. Also there's suggestion they've made firmware hacks which are worth it at their scale, but might be very dangerous and difficult to apply especially on a small scale. (And some of their hacks might involve both firmware and cluster-level optimizations, which would be useless or counterproductive independently.)
And even if you have somewhat similar hardware, the code might not be that helpful, you might be better off with a sketch of the solution and implementing it yourself. If you've got a large enough cluster it's going to pay for itself anyway.
Unmaintainable seems unduly harsh. There is a big gap between maintainable internally and ready for public consumption
> Codebase Divergence: Our engine is based on an early fork of vLLM from over a year ago
If you are in the same boat you'll see how much changed in vLLM compared to one year ago. Also, this meant that they haven't rebased for over a year, I don't believe that's because they don't want, it's because they effectively can't.
Yeah, surely they can maintain it as-is. But it will be increasingly hard to port over anything community has.
They're going to spend time and effort into making their optimizations public. Would you rather have them keep their changes internal?