Item 42257105

breckognize • 19 hours ago

To measure performance the author looked at latency, but most S3 workloads are throughput oriented. The magic of S3 is that it's cheap because it's built on spinning HDDs, which are slow and unreliable individually, but when you have millions of them, you can mask the tail and deliver multi TBs/sec of throughput.

It's misleading to look at S3 as a CDN. It's fine for that, but it's real strength is backing the world's data lakes and cloud data warehouses. Those workloads have a lot of data that's often cold, but S3 can deliver massive throughout when you need it. R2 can't do that, and as far as I can tell, isn't trying to.

Source: I used to work on S3

JoshTriplett • 19 hours ago

Yeah, I'd be interested in the bandwidth as well. Can R2 saturate 10/25/50 gigabit links? Can it do so with single requests, or if not, how many parallel requests does that require?

1 reply

moralestapia • 19 hours ago

Yes, they absolutely can [1].

1: https://blog.cloudflare.com/how-cloudflare-auto-mitigated-wo...

3 replies

JoshTriplett • 18 hours ago

That's unrelated to the performance of (for instance) the R2 storage layer. All the bandwidth in the world won't help you if you're blocked on storage. It isn't clear whether the overall performance of R2 is capable of saturating user bandwidth, or whether it'll be blocked on something.

S3 can't saturate user bandwidth unless you make many parallel requests. I'd be (pleasantly) surprised if R2 can.

1 reply

moralestapia • 18 hours ago

I'm confused, I assumed we were talking about the network layer.

If we are talking about storage, well, SATA can't give you more than ~5Gbps so I guess the answer is no? But also no one else can do it, unless they're using super exotic HDD tech (hint: they're not, it's actually the opposite).

What a weird thing to argue about, btw, literally everybody is running a network layer on top of storage that lets you have much higher throughput. When one talks about R2/S3 throughput no one (on my circle, ofc.) would think we are referring to the speed of their HDDs, lmao. But it's nice to see this, it's always amusing to stumble upon people with a wildly different point of view on things.

2 replies

JoshTriplett • 17 hours ago

We're talking about the user-visible behavior. You argued that because Cloudflare's CDN has an obscene amount of bandwidth, R2 will be able to saturate user bandwidth; that doesn't follow, hence my counterpoint that it could be bottlenecked on storage rather than network. The question at hand is what performance R2 offers, and that hasn't been answered.

There are any number of ways they could implement R2 that would allow it to run at full wire speed, but S3 doesn't run at full wire speed by default (unless you make many parallel requests) and I'd be surprised if R2 does.

1 reply

aipatselarom • 16 hours ago

n = 1 aside.

I have some large files stored in R2 and a 50Gbps interface to the world.

curl to Linode's speed test is ~200MB/sec.

curl to R2 is also ~200MB/sec.

I'm only getting 1Gbps but given that Linode's speed is pretty much the same I would think the bottleneck is somewhere else. Dually, R2 gives you at least 1Gbps.

renewiltord • 17 hours ago

No, most people aren’t interested in subcomponent performance, just in total performance. A trivial example is that even a 4-striped U2 NVMe disk array exported over Ethernet can deliver a lot more data than 5 Gbps and store mucho TiB.

1 reply

moralestapia • 17 hours ago

Thanks for +1 what I just said. So, apparently, it's not just me and my peers who think like this.

fragmede • 18 hours ago

Cloudflare's paid DDoS protection product being able to soak up insane L3/4 DDoS attacks doesn't answer the question as to whether or not the specific product, R2 from Cloudflare which has free egress is able to saturate a pipe.

Cloudflare has the network to do that, but they charge money to do so with their other offerings, so why would they give that to you for free? R2 is not a CDN.

1 reply

moralestapia • 18 hours ago

>Can do 3.8 Tbps

>Can't do 10 Gbps

1 reply

fragmede • 18 hours ago

> can't read CDN

> Can't read R2

bananapub • 18 hours ago

that's completely unrelated. the way to soak up a ddos at scale is just "have lots of peering and a fucking massive amount of ingress".

neither of these tell you how fast you can serve static data.

1 reply

moralestapia • 18 hours ago

>that's completely unrelated

Yeah, I'm sure they use a completely different network infrastructure to serve R2 requests.

vtuulos • 18 hours ago

yes, this. In case you are interested in seeing some numbers backing this claim, see here https://outerbounds.com/blog/metaflow-fast-data

Source: I used to work at Netflix, building systems that pull TBs from S3 hourly

michaelt • 19 hours ago

I mean, it may be true in practice that most S3 workloads are throughput oriented and unconcerned with latency.

But if you look at https://aws.amazon.com/s3/ it says things like:

"Object storage built to retrieve any amount of data from anywhere"

"any amount of data for virtually any use case"

"S3 delivers the resiliency, flexibility, latency, and throughput, to ensure storage never limits performance"

So if S3 is not intended for low-latency applications, the marketing team haven't gotten the message :)

1 reply

troyvit • 18 hours ago

lol I think the only reason you're being downvoted is because the common belief at HN is, "of course marketing is lying and/or doesn't know what they're talking about."

Personally I think you have a point.

1 reply

mikeshi42 • 18 hours ago

I didn’t downvote but s3 does have low latency offerings (express). Which has reasonable latency compared to EFS iirc. I’d be shocked if it was as popular as the other higher latency s3 tiers though.