No dog in this fight, all props to the Fly.io team for having the gumption to do what they are doing, I genuinely hope they are successful...
> It's still 99.99+% SLA
But this is simply not accurate. 99.99% uptime is < 52m 9.8s annually of downtime. They apparently blew well through that today. Looks like they essentially had the equivalent of 4 years of 99.99% uptime equivalent this evening.
Four nines is so unforgiving that it's almost the case that if people are required to be in the loop at any point during an incident, you will blow the fourth nine for the whole year in a single incident.
Again, I know it's hard. I would not want to be in the space. That fourth nine is really difficult to earn.
In the meanwhile, <hugops> to the Fly team as they work to resolve this (and hopefully get some rest).
99.99+% SLA typically means you get some billing credits for the downtime exceeding 99.99+ availability. So technically do get a "99.99+% SLA", but you don't get 99.99+% availability.
Other circles use "SLO" (where the O stands for objective).
(Anyone know what the details in fly.io SLA are?)
Answering myself, https://fly.io/legal/sla-uptime/ says you get some credits for under 99.9% uptime "provided that Customer reports to Fly.io such failure to meet the Uptime Commitment". So at least currently there's no talk of 99.99%.
You are correct in the legal/technical sense!
Technically, anyone could offer five- or six-nines and just depend on most customers not to claim the credits :-D
Actually hitting/exceeding four nines is still tough.
My app didn't go down yesterday, this was a downtime related to internal API and some specific regions.