My fly.io-hosted website went down for 5 minutes (6 hours ago), but then came right back up, and has been up ever since. I use a free monitoring service that checks it every 5 minutes, so it's possible it missed another short bit of downtime. But fly.io has been pretty reliable overall for me!
Would be fascinated to see your data over a period of months.
Application up time is flakey, but what was worse were fly deploys failing for no clear reason. Sometimes layers would just hang and eventually fail for no particular reason; I'd run the same command an hour or two later without any changes and it would just work as expected.
I'd love to make a monitoring service to deploy a basic app (i.e. run the fly deploy command) every 5 minutes and see how often those deploys fail or hang. I'd guess ~5% inexplicably fail, which is frustrating unless you've got a lot of spare time.
I used to run a service that created k8s clusters on GCP for our customers. We did want to check that that functionality kept working and had a prober test it periodically. It was actually broken a lot.
Always good to monitor your dependencies if you have the time. Then when someone complains about an issue in your service, you can check your monitoring to see if your upstream services are broken. If they are, at least you know where to start debugging.
My downtimes from fly are pretty rare but generally global when they happen, in this outage we had no downtime but couldn't deploy for a few hours. I have issues with deploying about once per quarter(deploy most days across a few apps)
If that’s the case I suspect fly is getting a lot more reliable. I stopped using them about a year ago so haven’t kept up on their reliability since. Glad to hear, it’s good for a competitive market to have many providers, and fly might have issues but hopefully has a bright future
They are definitely getting more reliable. I was an early user and moved off them to self hosted for quite a while because of the frequent downtime in early days.
Their support still leaves a lot to be desired even as someone that pays for it but the ease of running and deploying a distributed front end keeps bringing me back.
I externally monitor fly.io and it's docs here: https://flyio.onlineornot.com/
Looks like it lasted 16 minutes for them.
It wasn't a request routing outage; apps running on Fly.io didn't stop running. It was a deployments outage. For reasons passing understanding (I am reliably informed I'm wrong to complain about this), our website is the same Elixir app as our dashboard, and the dashboard got redeployed at one point. Our website being down is not the same as the whole service being down, though I guess there's a truth-in-advertising poetry to it being down when deployments are busted.
A lot of apps did stop running - https://community.fly.io/t/fly-io-site-is-currently-inaccess...
The entire API was also unusable, not just deployments.
Sorry, you're right: pretty much any time I'm saying deployments are blocked, I'm really saying the API was down.
I'm not sure if your explanation is comforting or disconcerting.
Why not both? Tell me what's comforting and I'll tell you why you shouldn't be comforted; tell me why you're disconcerted and I'll tell you maybe something else. All we can do is be straight about things.
Same for us, down for ~5 mins, back up and fine, error was 501
Do you mind if I ask what monitoring service that is?