breckognize 19 hours ago

To measure performance the author looked at latency, but most S3 workloads are throughput oriented. The magic of S3 is that it's cheap because it's built on spinning HDDs, which are slow and unreliable individually, but when you have millions of them, you can mask the tail and deliver multi TBs/sec of throughput.

It's misleading to look at S3 as a CDN. It's fine for that, but it's real strength is backing the world's data lakes and cloud data warehouses. Those workloads have a lot of data that's often cold, but S3 can deliver massive throughout when you need it. R2 can't do that, and as far as I can tell, isn't trying to.

Source: I used to work on S3

3
JoshTriplett 19 hours ago

Yeah, I'd be interested in the bandwidth as well. Can R2 saturate 10/25/50 gigabit links? Can it do so with single requests, or if not, how many parallel requests does that require?

JoshTriplett 18 hours ago

That's unrelated to the performance of (for instance) the R2 storage layer. All the bandwidth in the world won't help you if you're blocked on storage. It isn't clear whether the overall performance of R2 is capable of saturating user bandwidth, or whether it'll be blocked on something.

S3 can't saturate user bandwidth unless you make many parallel requests. I'd be (pleasantly) surprised if R2 can.

moralestapia 18 hours ago

I'm confused, I assumed we were talking about the network layer.

If we are talking about storage, well, SATA can't give you more than ~5Gbps so I guess the answer is no? But also no one else can do it, unless they're using super exotic HDD tech (hint: they're not, it's actually the opposite).

What a weird thing to argue about, btw, literally everybody is running a network layer on top of storage that lets you have much higher throughput. When one talks about R2/S3 throughput no one (on my circle, ofc.) would think we are referring to the speed of their HDDs, lmao. But it's nice to see this, it's always amusing to stumble upon people with a wildly different point of view on things.

JoshTriplett 17 hours ago

We're talking about the user-visible behavior. You argued that because Cloudflare's CDN has an obscene amount of bandwidth, R2 will be able to saturate user bandwidth; that doesn't follow, hence my counterpoint that it could be bottlenecked on storage rather than network. The question at hand is what performance R2 offers, and that hasn't been answered.

There are any number of ways they could implement R2 that would allow it to run at full wire speed, but S3 doesn't run at full wire speed by default (unless you make many parallel requests) and I'd be surprised if R2 does.

aipatselarom 16 hours ago

n = 1 aside.

I have some large files stored in R2 and a 50Gbps interface to the world.

curl to Linode's speed test is ~200MB/sec.

curl to R2 is also ~200MB/sec.

I'm only getting 1Gbps but given that Linode's speed is pretty much the same I would think the bottleneck is somewhere else. Dually, R2 gives you at least 1Gbps.

renewiltord 17 hours ago

No, most people aren’t interested in subcomponent performance, just in total performance. A trivial example is that even a 4-striped U2 NVMe disk array exported over Ethernet can deliver a lot more data than 5 Gbps and store mucho TiB.

moralestapia 17 hours ago

Thanks for +1 what I just said. So, apparently, it's not just me and my peers who think like this.

fragmede 18 hours ago

Cloudflare's paid DDoS protection product being able to soak up insane L3/4 DDoS attacks doesn't answer the question as to whether or not the specific product, R2 from Cloudflare which has free egress is able to saturate a pipe.

Cloudflare has the network to do that, but they charge money to do so with their other offerings, so why would they give that to you for free? R2 is not a CDN.

moralestapia 18 hours ago

>Can do 3.8 Tbps

>Can't do 10 Gbps

k

fragmede 18 hours ago

> can't read CDN

> Can't read R2

k

bananapub 18 hours ago

that's completely unrelated. the way to soak up a ddos at scale is just "have lots of peering and a fucking massive amount of ingress".

neither of these tell you how fast you can serve static data.

moralestapia 18 hours ago

>that's completely unrelated

Yeah, I'm sure they use a completely different network infrastructure to serve R2 requests.

vtuulos 18 hours ago

yes, this. In case you are interested in seeing some numbers backing this claim, see here https://outerbounds.com/blog/metaflow-fast-data

Source: I used to work at Netflix, building systems that pull TBs from S3 hourly

michaelt 19 hours ago

I mean, it may be true in practice that most S3 workloads are throughput oriented and unconcerned with latency.

But if you look at https://aws.amazon.com/s3/ it says things like:

"Object storage built to retrieve any amount of data from anywhere"

"any amount of data for virtually any use case"

"S3 delivers the resiliency, flexibility, latency, and throughput, to ensure storage never limits performance"

So if S3 is not intended for low-latency applications, the marketing team haven't gotten the message :)

troyvit 18 hours ago

lol I think the only reason you're being downvoted is because the common belief at HN is, "of course marketing is lying and/or doesn't know what they're talking about."

Personally I think you have a point.

mikeshi42 18 hours ago

I didn’t downvote but s3 does have low latency offerings (express). Which has reasonable latency compared to EFS iirc. I’d be shocked if it was as popular as the other higher latency s3 tiers though.