benhoyt 3 days ago

My fly.io-hosted website went down for 5 minutes (6 hours ago), but then came right back up, and has been up ever since. I use a free monitoring service that checks it every 5 minutes, so it's possible it missed another short bit of downtime. But fly.io has been pretty reliable overall for me!

5
nomilk 3 days ago

Would be fascinated to see your data over a period of months.

Application up time is flakey, but what was worse were fly deploys failing for no clear reason. Sometimes layers would just hang and eventually fail for no particular reason; I'd run the same command an hour or two later without any changes and it would just work as expected.

I'd love to make a monitoring service to deploy a basic app (i.e. run the fly deploy command) every 5 minutes and see how often those deploys fail or hang. I'd guess ~5% inexplicably fail, which is frustrating unless you've got a lot of spare time.

jrockway 2 days ago

I used to run a service that created k8s clusters on GCP for our customers. We did want to check that that functionality kept working and had a prober test it periodically. It was actually broken a lot.

Always good to monitor your dependencies if you have the time. Then when someone complains about an issue in your service, you can check your monitoring to see if your upstream services are broken. If they are, at least you know where to start debugging.

sanswork 2 days ago

My downtimes from fly are pretty rare but generally global when they happen, in this outage we had no downtime but couldn't deploy for a few hours. I have issues with deploying about once per quarter(deploy most days across a few apps)

nomilk 2 days ago

If that’s the case I suspect fly is getting a lot more reliable. I stopped using them about a year ago so haven’t kept up on their reliability since. Glad to hear, it’s good for a competitive market to have many providers, and fly might have issues but hopefully has a bright future

sanswork 2 days ago

They are definitely getting more reliable. I was an early user and moved off them to self hosted for quite a while because of the frequent downtime in early days.

Their support still leaves a lot to be desired even as someone that pays for it but the ease of running and deploying a distributed front end keeps bringing me back.

rozenmd 2 days ago

This may be of interest to you: https://news.ycombinator.com/item?id=42243282

rozenmd 2 days ago

I externally monitor fly.io and it's docs here: https://flyio.onlineornot.com/

Looks like it lasted 16 minutes for them.

tptacek 2 days ago

It wasn't a request routing outage; apps running on Fly.io didn't stop running. It was a deployments outage. For reasons passing understanding (I am reliably informed I'm wrong to complain about this), our website is the same Elixir app as our dashboard, and the dashboard got redeployed at one point. Our website being down is not the same as the whole service being down, though I guess there's a truth-in-advertising poetry to it being down when deployments are busted.

sevenseacat 1 day ago

A lot of apps did stop running - https://community.fly.io/t/fly-io-site-is-currently-inaccess...

The entire API was also unusable, not just deployments.

tptacek 1 day ago

Sorry, you're right: pretty much any time I'm saying deployments are blocked, I'm really saying the API was down.

itbeho 1 day ago

I'm not sure if your explanation is comforting or disconcerting.

tptacek 1 day ago

Why not both? Tell me what's comforting and I'll tell you why you shouldn't be comforted; tell me why you're disconcerted and I'll tell you maybe something else. All we can do is be straight about things.

davidgl 2 days ago

Same for us, down for ~5 mins, back up and fine, error was 501

TacticalCoder 2 days ago

Someone said 16 minutes: so it's not even 5 nines service.

beezlewax 2 days ago

Do you mind if I ask what monitoring service that is?

benhoyt 2 days ago

Sure, it's UptimeRobot: https://uptimerobot.com/

vextea 2 days ago

Is it your service?

dprotaso 2 days ago

What free monitoring tool do you use?