Item 42743076

dban • 8 hours ago

This is our first post about building out data centers. If you have any questions, we're happy to answer them here :)

coder543 • 3 hours ago

I thought it was an interesting post, so I tried to add Railway's blog to my RSS reader... but it didn't work. I tried searching the page source for RSS and also found nothing. Eventually, I noticed the RSS icon in the top right, but it's some kind of special button that I can't right click and copy the link from, and Safari prevents me from knowing what the URL is... so I had to open that from Firefox to find it.

Could be worth adding a <meta> tag to the <head> so that RSS readers can autodiscover the feed. A random link I found on Google: https://www.petefreitag.com/blog/rss-autodiscovery/

gschier • 7 hours ago

How do you deal with drive failures? How often does a Railway team member need to visit a DC? What's it like inside?

1 reply

justjake • 6 hours ago

Everything is dual redundancy. We run RAID so if a drive fails it's fine; alerting will page oncall which will trigger remote hands onsite, where we have spares for everything in each datacenter

1 reply

gschier • 6 hours ago

How much additional overhead is there for managing the bare-metal vs cloud? Is it mostly fine after the big effort for initial setup?

1 reply

ca508 • 6 hours ago

We built some internal tooling to help manage the hosts. Once a host is onboarded onto it, it's a few button clicks on an internal dashboard to provision a QEMU VM. We made a custom ansible inventory plugin so we can manage these VMs the same as we do machines on GCP.

The host runs a custom daemon that programs FRR (an OSS routing stack), so that it advertises addresses assigned to a VM to the rest of the cluster via BGP. So zero config of network switches, etc... required after initial setup.

We'll blog about this system at some point in the coming months.