Very good article and interesting read. I did want to clarify some misconceptions I noted while reading (working from memory so hopefully I don’t get anything wrong myself).
> As explained here, Durable Objects are single threaded and thus limited by nature in the throughput they can offer.
R2 bucket operations do not use single threaded durable objects but did a one off thing just for R2 to let it run multiple instances even. That’s why the limits were lifted in the open beta.
> they mentioned that each zone's assets are sharded across multiple R2 buckets to distribute load which may indicated that a single R2 bucket was not able to handle the load for user-facing traffic. Things may have improve since thought.
I would not use this as general advice. Cache Reserve was architected to serve an absurd amount of traffic that almost no customer or application will see. If you’re having that much traffic I’d expect you to be an ENT customer working with their solutions engineers to design your application.
> First, R2 is not 100% compatible with the S3 API. One notable missing feature are data-integrity checks with SHA256 checksums.
This doesn’t sound right. I distinctly remember when this was implemented for uploading objects. Sha-1 and sha-256 should be supported (don’t remember about crc). For some reason it’s missing from the docs though. The trailer version isn’t supported and likely won’t be for a while though for technical reasons (the workers platform doesn’t support http trailers as it uses http1 internally). Overall compatibility should be pretty decent.
The section on “The problem with cross-datacenter traffic” seems to be flawed assumptions rather than data driven. Their own graphs only show that while public buckets have some occasional weird spikes it’s pretty constantly the same performance while the S3 API has more spikeness and time of day variability is much more muted than the CPU variability. Same with the assumption on bandwidth or other limitations of data centers. The more likely explanation would be the S3 auth layer and the time of day variability experienced matches more closely with how that layer works. I don’t know enough of the particulars of this author’s zones to hypothesize but the s3 with layer was always challenging from a perf perspective.
> This is really, really, really annoying. For example you know that all your compute instances are in Paris, and you know that Cloudflare has a big datacenter in Paris, so you want your bucket to be in Paris, but you can't. If you are unlucky when creating your bucket, it will be placed in Warsaw or some other place far away and you will have huge latencies for every request.
I understand the frustration but there are very good technical and UX reasons this wasn’t done. For example while you may think that “Paris datacenter” is well defined, it isn’t for R2 because unlike S3 your metadata is stored regionally across multiple data centers whereas S3 if I recall correctly uses what they call a region which is a single location broken up into multiple availability zones which are basically isolated power and connectivity domains. This is an availability tradeoff - us-east-1 will never go offline on Cloudflare because it just doesn’t exist - the location hint is the size of the availability region. This is done at both the metadata and storage layers too. The location hint should definitely be followed when you create the bucket but maybe there are bugs or other issues.
As others noted throughput data would also have been interesting.
> First, R2 is not 100% compatible with the S3 API. One notable missing feature are data-integrity checks with SHA256 checksums.
Maybe it was an old thing? The changelog [0] for 2023-06-16 says:
"S3 putObject now supports sha256 and sha1 checksums."
[0]: https://developers.cloudflare.com/r2/platform/changelog/#2023-06-16
I suspect the author is going by the documentation rather than having tried themselves