Only tangentially related to the article, but I’ve never understood how R2 offers 11 9s of durability. I trust that S3 offers 11 9s because Amazon has shown, publicly, that they care a ton about designing reliable, fault tolerant, correct systems (eg Shardstore and Shuttle)
Cloudflare’s documentation just says “we offer 11 9s, same as S3”, and that’s that. It’s not that I don’t believe them but… how can a smaller organization make the same guarantees?
It implies to me that either Amazon is wasting a ton of money on their reliability work (possible) or that cloudflare’s 11 9s guarantee comes with some asterisks.
What makes you think it did cost aws that much moneu at their scale to achieve 11 9s that cloudflare cannot afford it?
Minimally, the two examples I cited: Shardstore and Shuttle. The former is a (lightweight) formally verified key value store used by S3, and the latter is a model checker for concurrent rust code.
Amazon has an entire automated reasoning group (researchers who mostly work on formal methods) working specifically on S3.
As far as I’m aware, nobody at cloudflare is doing similar work for R2. If they are, they’re certainly not publishing!
Money might not be the bottleneck for cloudflare though, you’re totally right
S3 has touted 11 9's for many years, so before shardstore definitely.
The 11 9's is for durability, which is really more about the redundancy setup, erasure coding, etc. (https://cloud.google.com/blog/products/storage-data-transfer...)
fwiw availability is 4 9's (https://aws.amazon.com/s3/storage-classes/)
That’s a good point!
I think I overstated the case a little, I definitely don’t think automated reasoning is some “secret reliability sauce” that nobody else can replicate; it does give me more confidence that Amazon takes reliability very seriously, and is less likely to ship a terrible bug that messes up my data.