196
116
mherrmann 27 minutes ago

Reading around online, it seems egress is free until Cloudflare asks for big bucks or threatens to terminate your service. It's probably fine for hobbyists or small web sites. But as a business, I would never trust something that is "free". There has to be a catch somewhere.

A business contact of mine asked Cloudflare for a quote for setting up a CDN for 7B requests per month. They dragged him through five sales calls to eventually deliver an offer with request costs 30% above CloudFront's public pricing. He said the costs per GB were ok though.

The cheapest reliable CDN I've found is bunny.net. Unlike OP, who is selling a book about Cloudflare, I have no conflict of interest by recommending Bunny, other than being a customer. I serve around 100 TB per month through them, at $0.005/GB. You can put it in front of a bucket to have cheap (globally distributed) egress with all the benefits of your favorite object storage provider. In my case, the buckets lie on Linode/Akamai. Bunny uses CDN77's infrastructure, which I have also heard good things about.

MobileVet 13 hours ago

One thing that I haven't seen discussed in the comments is the inherent vulnerability of S3 pricing. Like all things AWS, if something goes sideways, you are suddenly on the wrong side of a very large bill. For instance, someone can easily blow your egress charges through the roof by making a massive amount of requests for your assets hosted there.

While Cloudflare may reach out and say 'you should be on enterprise' when that happens on R2, the fact they also handle DDoS and similar attacks as part of their offering means the likelihood of success is much lower (as is the final bill).

akira2501 12 hours ago

Typically you would use S3 with CloudFront for hosting. S3 provides no protections because it's meant to be a durable and global service. CloudFront provides DDoS and other types of protection while making it easy to get prepaid bandwidth discounts.

danielheath 10 hours ago

Just one data point, but adding Cloudflare to our stack (in front of "CloudFront with bandwidth discounts") took about $30k USD per year off our bandwidth bill.

kmos17 7 hours ago

In my experience the AWS waf and ddos mitigation are really expensive (min $40k per year contract) and are missing really basic ddos handling capabilities (last I evaluated it they did not have the ability to enforce client js validation which can be very effective against some bot networks). Maybe it has evolved since but Cloudflare enterprise was cheaper and more capable out of the box.

jgalt212 8 hours ago

Yes, obviously. But just as obviously is there's rarely an easy and safe path with AWS. By default, R2 is easy, safe, and cheaper.

jonathannorris 12 hours ago

Also, once you are on Enterprise, they will not bug/charge you for contracted overages very often (like once a year) and will forgive significant overages if you resolve them quickly, in my experience.

tengbretson 7 hours ago

Its a welcome change from Vercel, who, if you site is under attack will email you saying congrats on using 75% of your quota in a day.

sroussey 13 hours ago

Cloudflare has DDOS roots and it plays well here.

lopkeny12ko 12 hours ago

I'm not really sure what point you're trying to make here. S3 bills you on, essentially, serving files to your customers. So yes if your customers download more files then you get charged more. What exactly is the surprising part here

karmakaze 12 hours ago

There was a backlash about being billed for unauthorized requests. It's since been updated[0]. I don't know that all affected was retroactively refunded.

[0] https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3...

zaptheimpaler 12 hours ago

The surprise is any ne'er-do-well can DDoS your bucket even if they aren't a customer. Genuine customer traffic volume will probably be known and expected, but putting an S3 bucket in the open is something like leaving a blank check on the internet.

lopkeny12ko 12 hours ago

It's a bit unfair to characterize that as a surprise on how much S3 bills you, no? The surprising part here is lack of DDoS protection on your end or leaving a bucket public and exposed. AWS is just charging you for how much it served, it doesn't make sense to hold them to a fault here.

Dylan16807 10 hours ago

> The surprising part here is lack of DDoS protection on your end or leaving a bucket public and exposed.

It doesn't take anything near DDoS. If you dare to put up a website that serves images from S3, and one guy on one normal connection decides to cause you problems, they can pull down a hundred terabytes in a month.

Is serving images from S3 a crazy use case? Even if you have signed and expiring URLs it's hard to avoid someone visiting your site every half hour and then using the URL over and over.

> AWS is just charging you for how much it served, it doesn't make sense to hold them to a fault here.

Even if it's not their fault, it's still an "inherent vulnerability of S3 pricing". But since they charge so much per byte with bad controls over it, I think it does make sense to hold them to a good chunk of fault.

zaptheimpaler 9 hours ago

I don't know about fair or unfair, but it's just a problem you don't have to worry about if there's no egress fees.

bobthebutcher 11 hours ago

If you want to hire someone to walk your dog you probably won't put an ad in the New york times to a head hunter that you will pay by the hour with no oversight and it would be totally unfair to that head hunter when you don't want to pay them for the time of all those interviews. But an infinitely scalable service you somehow can't put immediately terminal limits on is somehow fine on the cloud.

bippihippi1 11 hours ago

it loses trust with customers when the simple setup is flawed. S3 is rightly built to support as much egress as any customer would want, but wrong to make it complex to set up rules to limit the bandwidth and price.

It should be possible to use the service, especially common ones like S3 with little knowledge of architecture and stuff.

fnikacevic 11 hours ago

AWS will also forgive mistakes or negligence based bills, in my case 3 times.

tlarkworthy 12 hours ago

I've benchmarked R2 and S3 and S3 is well ahead in terms of latency especially on ListObject requests. I think R2 has come kind of concurrency limit as concurrent ListObject requests seem to to have increase failure rate when serving simultaneous requests

I have a few of the S3-like wired up live over the internet you can try yourself in your browser. Backblaze is surprisingly performant which I did not expect (S3 is still king though)

https://observablehq.com/@tomlarkworthy/mps3-vendor-examples

kmos17 7 hours ago

thanks for sharing, that’s a good data point in comparing those services.

matteocontrini 34 minutes ago

I don't know what people use object storage for, but R2 is missing lots of features and it's not a replacement for S3 in many cases.

For example: no regions, no replication (and no AZs either), limited lifecycle management, no versioning, no MFA protection, no intelligent tiering, no customer encryption, no IAM, etc.

jsheard 16 hours ago

Is R2 egress actually free, or is it like CFs CDN egress which is "free" until they arbitrarily decide you're using it too much or using it for the wrong things so now you have to pay $undisclosed per GB?

steelbrain 16 hours ago

Do you have any examples of the latter? From what I remember reading, the most recent case was a gambling website and cloudflare wanted them to upgrade to a tier where they’d have their own IPs. This makes sense because some countries blanket ban gambling website IPs.

So apart from ToS abuse cases, do you know any other cases? I ask as a genuine curiosity because I’m currently paying for Cloudflare to host a bunch of our websites at work.

jsheard 16 hours ago

Here's some anecdotes I dug up: https://news.ycombinator.com/item?id=38960189

Put another way, if Cloudflare really had free unlimited CDN egress then every ultra-bandwidth-intensive service like Imgur or Steam would use them, but they rarely do, because at their scale they get shunted onto the secret real pricing that often ends up being more expensive than something like Fastly or Akamai. Those competitors would be out of business if CF were really as cheap as they want you to think they are.

The point where it stops being free seems to depend on a few factors, obviously how much data you're moving is one, but also the type of data (1GB of images or other binary data is considered more harshly than 1GB of HTML/JS/CSS) and where the data is served to (1GB of data served to Australia or New Zealand is considered much more harshly than 1GB to EU/NA). And how much the salesperson assigned to your account thinks they can shake you down for, of course.

hiatus 11 hours ago

Their terms specifically address video/images:

> Cloudflare’s content delivery network (the “CDN”) Service can be used to cache and serve web pages and websites. Unless you are an Enterprise customer, Cloudflare offers specific Paid Services (e.g., the Developer Platform, Images, and Stream) that you must use in order to serve video and other large files via the CDN. Cloudflare reserves the right to disable or limit your access to or use of the CDN, or to limit your End Users’ access to certain of your resources through the CDN, if you use or are suspected of using the CDN without such Paid Services to serve video or a disproportionate percentage of pictures, audio files, or other large files. We will use reasonable efforts to provide you with notice of such action.

https://www.cloudflare.com/service-specific-terms-applicatio...

Aachen 11 hours ago

I was going to say that it's odd, then, that reddit doesn't serve all the posts' json via a free account at cloudflare and save a ton of money, but maybe actually it's just peanuts on the total costs? So cloudflare is basically only happy to host the peanuts for you to get you on their platform, but once you want to serve things where CDNs (and especially "free" bandwidth) really help, it stops being allowed?

Aperocky 15 hours ago

I think the comment section of that story is a gold mine: https://robindev.substack.com/p/cloudflare-took-down-our-web.... Not necessarily authentic, but apply your own judgement.

akira2501 12 hours ago

Their ToS enforcement seems weak and/or arbitrary. There are a lot of scummy and criminal sites that use their services without any issues it seems. At least they generally cooperate with law enforcement when requested to do so but they otherwise don't seem to notice on their own.

shivasaxena 16 hours ago

I would say don't run a casino on cloudflare

MortyWaves 16 hours ago

I am also surprised that 4chan is using Cloudflare captcha and bot protection

byyll 16 hours ago

What is surprising about that? Cloudflare also provides services to terrorists, CSAM websites and more.

telgareith 12 hours ago

Nice job painting CF as the had guy. They do NOT provide services to such, again and again they have terminated such for breach of TOS and cooperated with the legal system.

dmd 16 hours ago

Good to know. Please make an uncontroversial list of all the human activities that you think shouldn't be allowed on cloudflare (or perhaps in general). Then we can all agree to abide by it, and human conflict will end!

troyvit 16 hours ago

Cloudflare is a company, not a public utility. If they want to disallow any sites that make fun of cuttlefish they get to do that. If you want a CDN that follows the rules of a public utility I think you're out of luck on this planet.

neom 15 hours ago

In addition to this, if CFs say...payment provider, hated people making fun of cuttlefish, it might make sense for CF to ban marine molluscs maming there also.

dmd 15 hours ago

I agree with you. I'm saying that cloudflare gets to decide that, not a random HN commenter.

troyvit 12 hours ago

Doh! Sorry I misunderstood you dmd

tshaddox 15 hours ago

It's not unreasonable for a service provider to describe their service as "free" even though they will throttle or ban you for excessive use.

machinekob 15 hours ago

Happen before will happen again. CF is a publicly traded company and when the squeeze comes, they’ll just tax your egress as hard as amazon.

nickjj 15 hours ago

One thing to think about with S3 is there's use cases where the price is very low which the article didn't mention.

For example maybe you have ~500 GB of data across millions of objects that has accumulated over 10 years. You don't even know how many reads or writes you have on a monthly basis because your S3 bill is $11 while your total AWS bill is orders of magnitude more.

If you're in a spot like this, moving to R2 to potentially save $7 or whatever it ends up being would end up being a lot more expensive from the engineering costs to do the move. Plus there's old links that might be pointing to a public S3 object which would break if you moved them to another location such as email campaign links, etc..

philistine 14 hours ago

Even simpler: I'm using Glacier Deep Archive for my personal backups, and I don't see how R2 would be cheaper for me.

Dylan16807 10 hours ago

I think the most reasonable way to analyze this puts non-instant-access Glacier in a separate category from the rest of S3. R2 doesn't beat it, but R2 is not a competitor in the first place.

telgareith 12 hours ago

Have you priced the retrieval cost? You quickly run into high 3 and then 4 figures worth of bandwidth.

seized 11 hours ago

Yes, but if it's your third location of 3-2-1 then it can also make sense to weigh it against data recovery costs on damaged hardware.

I backup to Glacier as well. For me to need to pull from it (and pay that $90/TB or so) means I've lost more than two drives in a historically very reliable RAIDZ2 pool, or lost my NAS entirely.

I'll pay $90/TB over unknown $$$$ for a data recovery from burned/flooded/fried/failed disks.

philistine 12 hours ago

Retrieval? For an external backup? If I need to restore and my local backup is completely down, it either means I lost two drives (very unlikely) or the house is a calcinated husk and at this point I'm insured.

And let's be honest. If the house burns down, the computers are the third thing I get out of there after the wife and the dog. My external backup is peace of mind, nothing more. I don't ever expect to need it in my lifetime.

kiwijamo 10 hours ago

High 3 and 4 figures wouldn't occur for personal backups though. I've done a big retrieval once and the cost was literally just single digits dollars for me. So the total lifetime cost (including retrievals) is cheaper on S3 than R2 for my personal backup use case. This is why I struggle to take seriously any analysis that says S3 is expensive -- it is only expensive if you use the most expensive (default) S3 product. S3 has more options to offer than than R2 or other competitors which is why I stay with S3 and pay <$1.00 a month for my entire backup. Most competitors (including R2) would have me pay significantly more than I spend on the appropriate S3 product.

mimischi 9 hours ago

Curious, did you go through the math of figuring out how much the initial file transfer and ongoing cost will set you back (not a lot from the sounds of it). Should be way to do, but I’ve just not found the time yet to do that for a backup I’m intending to send to S3 as well

asteroidburger 12 hours ago

“Here’s a bunch of great things about CloudFlare R2 - and please buy my book about it” leaves a bad taste in my mouth.

Also, has CF improved their stance around hosting hate groups? They have strongly resisted pressure to stop hosting/supporting hate sites like 8chan and Kiwifarms, and only stopped reluctantly.

gjsman-1000 11 hours ago

I don’t have to support 8chan or KiwiFarms to say that Cloudflare has absolutely no role in policing the internet. The job of policing the internet is for the police. If it’s illegal, let them investigate.

asteroidburger 11 hours ago

There is a difference between policing the internet and supplying resources and services to known bad actors.

Their job isn’t to investigate and punish harassment and criminal behavior, but they certainly don’t have to condone it via their support.

gjsman-1000 11 hours ago

> known bad actors

If they are known bad actors, let the police do the job of policing the internet. Otherwise, all bad actors are ultimately arbitrarily defined. Who said they are known bad actors? What does that even mean? Why does that person determining bad actors get their authority? Were they duly elected? Or did one of hundreds of partisan NGOs claim this? Who elected the NGO? Does PETA get a say on bad actors?

Be careful what you wish for. In some US States, I am sure the attorney general would send a letter saying to shut down the marijuana dispensary - they're known bad actors, after all. They might not win a lawsuit, but winning the support of private organizations would be just as good.

> they certainly don’t have to condone it via their support

Wow, what a great argument. Hacker News supports all arguments here by tolerating people speaking and not deleting everything they could possibly disagree with.

Or maybe, providing a service to someone, should not be seen as condoning all possible uses of the service. Just because water can be used to waterboard someone, doesn't mean Walmart should be checking IDs for water purchasers. Just because YouTube has information on how to pick locks, does not mean YouTube should be restricted to adults over 21 on a licensed list of people entrusted with lock-picking knowledge.

johnklos 8 hours ago

Bull.

Cloudflare protects these organizations. Cloudflare goes far above and beyond what most other companies do, and I personally can't wait to see Cloudflare held liable for content they host for which they ignore legitimate complaints.

Imagine if I were to call your phone every hour, on the hour, and my position was that it's not illegal until someone reports it as a crime and the police or a court contacts me to tell me to not do it. That's Cloudflare. They're assholes, and insisting that they're not responsible for anything, any time, until there's a court order is just them being assholes.

gjsman-1000 8 hours ago

> Cloudflare goes far above and beyond what most other companies do

Is this reflective of Cloudflare, or instead reflective of other companies’ willingness to play judge, jury, and executioner?

I like knowing my business on Cloudflare won’t be subject to extrajudicial punishment regardless of the grounds.

> They're assholes

Being an asshole is your constitutional right.

jgalt212 8 hours ago

Has AWS improved their stance around hosting resource draining robots.txt defying scrapers and spiders?

Every large company does business with many people / orgs I don't like. I'm not defending or attacking AWS or CF, but merely stating the deeper you dig the more objectionable stuff you'll find everywhere. There are shades of gray of course, but at the end of the day we're all sinners.

jjeaff 34 minutes ago

I don't have a judgement in this particular case. But I really hate the nobody is perfect argument, especially considering the current political climate. We are all sinners, but we are not all equally bad sinners. Some are deplorable and some are just imperfect and there are a million shades of grey in between. And I don't know where to draw the line, but I can at least make a judgement if you are obviously far, far over that line.

kevlened 16 hours ago

It's not mentioned, but important to note, that R2 lacks object versioning.

https://community.cloudflare.com/t/r2-object-versioning-and-...

yawnxyz 16 hours ago

I built a thin Cloudflare workers script for object versioning and it works great

jw1224 16 hours ago

Is this something you’d consider sharing? I know many of us would find it really useful!

yawnxyz 5 hours ago

sure! I'll clean it up a bit and show it on HN. For some reason I figure Cloudflare would have built that by now?!

UltraSane 16 hours ago

Ouch. Object versioning is one of the best features of object storage. It provides excellent protection from malware and human error. My company makes extensive use of versioning and Object Lock for protection from malware and data retention purposes.

CharlesW 12 hours ago

As @yawnxyz mentioned, versioning is straightforward to do via Workers (untested sample: https://gist.github.com/CharlesWiltgen/84ab145ceda1a972422a8...), and you can also configure things so any deletes and other modifications must happen through Workers.

UltraSane 11 hours ago

Interesting, thanks!

kansi 10 hours ago

I have tried to find a CDN provider which would offer access control similar to Cloudfront's signed cookies but failed to find something that would match it. This is a major drawback with these providers offering S3 style bucket storage because most of time you would want to serve the content from a CDN and offloading access control to CDN via cookies makes life so much easier. You only need to set the cookies for the user's session once and they are automatically sent (by the web browser) to the CDN with no additional work needed

saurik 7 hours ago

This is supported by Google Cloud using literally the same wording:

https://cloud.google.com/cdn/docs/using-signed-cookies

As far as I can tell, this feature is also supported by Akamai here:

https://techdocs.akamai.com/property-mgr/docs/cookie-authz

I am pretty sure you can implement this on CDNetworks using eval_func:

https://docs.cdnetworks.com/en/cdn/docs/recipes/secure-deliv...

With AWS Cloudfront, I'd think you--worst case--pull out Lambda@Edge?

donavanm 1 hour ago

Cloudfront “signed cookie” auth should “just work”: https://docs.aws.amazon.com/AmazonCloudFront/latest/Develope...

IIRC its essentially the same as the aws style signed urls and header bearer token auth. I _think_ lambda@edge is only relevant if you want to do the initial sig generation in the cdn instead of your api/app endpoint.

Edit: actually GP mentioned Cloudfront already, so yes works as theyre asking for AFAICT

viraptor 13 hours ago

IAM gets only a tiny mention as not present, therefore making R2 simpler. But also... IAM is missing and a lot of interesting use cases are not possible there. No access by path, no 2fa enforcing, no easy SSO management, no blast radius limits - just would you like a token which can write a file, but also delete everything? This is also annoying for their zone management for the same reason.

cube2222 15 hours ago

R2 and its pricing is quite fantastic.

We’re using it to power the OpenTofu Provider&Modules Registry[0][1] and it’s honestly been nothing but a great experience overall.

[0]: https://registry.opentofu.org

[1]: https://github.com/opentofu/registry

Disclaimer: CloudFlare did sponsor us their business plan so we got access to higher-tier functionality

orf 14 hours ago

My experience: I put parquet files on R2, but HTTP Range requests were failing. 50% of the time it would work, and 50% of the time it would return all of the content and not the subset requested. That’s a nightmare to debug, given that software expects it to work consistently or not work at all.

Seems like a bug. Had to crawl through documentation to find out the only support is on Discord (??), so I had to sign up.

Go through some more hoops and eventually get to a channel where I received a prompt reply: it’s not an R2 issue, it’s “expected behaviour due to an issue with “the CDN service”.

I mean, sure. On a technical level. But I shoved some data into your service and basic standard HTTP semantics where intermittently not respected: that’s a bug in your service, even if the root cause is another team.

None of this is documented anywhere, even if it is “expected”. Searching for [1] “r2 http range” shows I’m not the only one surprised

Not impressed, especially as R2 seems ideal for serving Parquet data for small projects. This and the janky UI plus weird restrictions makes the entire product feel distinctly half finished and not a serious competitor.

1. https://www.google.com/search?q=r2+http+range&ie=UTF-8&oe=UT...

saurik 12 hours ago

> given that software expects it to work consistently or not work at all

I mean... that's wrong? If you come across such software, do you at least file a bug?

orf 12 hours ago

Of course not, and it’s completely correct behaviour: if a server advertises it supports Range requests for a given URL, it’s expected to support it. Garbage in, garbage out.

It’s not clear how you’d expect to handle a webserver trying to send you 1Gb of data after you asked for a specific 10kb range other than aborting.

saurik 11 hours ago

"Conversely, a client MUST NOT assume that receiving an Accept-Ranges field means that future range requests will return partial responses. The content might change, the server might only support range requests at certain times or under certain conditions, or a different intermediary might process the next request." -- RFC 9110

orf 11 hours ago

Sure, but that’s utterly useless in practice because there is no way to handle that gracefully.

To be clear: most software does handle it, because it detects this case and aborts.

But to a user who is explicitly asking to read a parquet file without buffering the entire file into memory, there is no distinction between a server that cannot handle any range requests and a server that can occasionally handle range requests.

Other than one being much, much more annoying.

jzelinskie 16 hours ago

This is a great comparison and a great step towards pressure to improve cloud service pricing.

The magic that moves the region sounds like a dealbreaker for any use cases that aren't public, internet-facing. I use $CLOUD_PROVIDER because I can be in the same regions as customers and know the latency will (for the most part) remain consistent. Has anyone measured latencies from R2 -> AWS/GCP/Azure regions similar to this[0]?

Also does anyone know if the R2 supports the CAS operations that so many people are hyped about right now?

[0]: https://www.cloudping.co/grid

xhkkffbf 16 hours ago

This really is a good article. My only issue is that it pretends that the only competition is between Cloudflare and AWS. There are several other low rent storage providers that offer an S3 compatible API. It's also worth looking at Backblaze and Wasabi, for instance. But I don't want to take anything away from this article.

denysvitali 10 hours ago

No mention of Backblaze's B2? It's cheaper than these two at just 6$/TB

russelg 8 hours ago

It's also dogshit slow, and only available in 2 regions. At least you can choose the region, unlike R2...

bassp 15 hours ago

Only tangentially related to the article, but I’ve never understood how R2 offers 11 9s of durability. I trust that S3 offers 11 9s because Amazon has shown, publicly, that they care a ton about designing reliable, fault tolerant, correct systems (eg Shardstore and Shuttle)

Cloudflare’s documentation just says “we offer 11 9s, same as S3”, and that’s that. It’s not that I don’t believe them but… how can a smaller organization make the same guarantees?

It implies to me that either Amazon is wasting a ton of money on their reliability work (possible) or that cloudflare’s 11 9s guarantee comes with some asterisks.

rat9988 15 hours ago

What makes you think it did cost aws that much moneu at their scale to achieve 11 9s that cloudflare cannot afford it?

bassp 15 hours ago

Minimally, the two examples I cited: Shardstore and Shuttle. The former is a (lightweight) formally verified key value store used by S3, and the latter is a model checker for concurrent rust code.

Amazon has an entire automated reasoning group (researchers who mostly work on formal methods) working specifically on S3.

As far as I’m aware, nobody at cloudflare is doing similar work for R2. If they are, they’re certainly not publishing!

Money might not be the bottleneck for cloudflare though, you’re totally right

zild3d 14 hours ago

S3 has touted 11 9's for many years, so before shardstore definitely.

The 11 9's is for durability, which is really more about the redundancy setup, erasure coding, etc. (https://cloud.google.com/blog/products/storage-data-transfer...)

fwiw availability is 4 9's (https://aws.amazon.com/s3/storage-classes/)

bassp 14 hours ago

That’s a good point!

I think I overstated the case a little, I definitely don’t think automated reasoning is some “secret reliability sauce” that nobody else can replicate; it does give me more confidence that Amazon takes reliability very seriously, and is less likely to ship a terrible bug that messes up my data.

suryao 16 hours ago

Great article. Do you have throughput comparisons? I've found r2 to be highly variable in throughput, especially with concurrent downloads. s3 feels very consistent, but I haven't measured the difference.

pier25 12 hours ago

I'm also interested in upload speeds.

I've seen complaints of users about R2 having erratic upload speeds.

postatic 12 hours ago

I do mostly CRUD apps with Laravel and Vue. Nothing too complicated. Allows users to post stuff with images and files. I’ve moved ALL of my files from S3 to R2 in the past 2 years. It’s been slow as any migrations are but painless.

But most importantly for an indie dev like me the cost became $0.

karmakaze 11 hours ago

At one company we were uploading videos to S3 and finding a lot of errors or stalls in the process. That led to evaluating GCP and Azure. I found that Azure had the most consistent (least variance) in upload durations and better pricing. We ended up using GCP for other reasons like resumable uploads (IIRC). AWS now supports appending to S3 objects which might have worked to avoid upload stalls. CloudFront for us at the time was overpriced.

breckognize 16 hours ago

To measure performance the author looked at latency, but most S3 workloads are throughput oriented. The magic of S3 is that it's cheap because it's built on spinning HDDs, which are slow and unreliable individually, but when you have millions of them, you can mask the tail and deliver multi TBs/sec of throughput.

It's misleading to look at S3 as a CDN. It's fine for that, but it's real strength is backing the world's data lakes and cloud data warehouses. Those workloads have a lot of data that's often cold, but S3 can deliver massive throughout when you need it. R2 can't do that, and as far as I can tell, isn't trying to.

Source: I used to work on S3

JoshTriplett 16 hours ago

Yeah, I'd be interested in the bandwidth as well. Can R2 saturate 10/25/50 gigabit links? Can it do so with single requests, or if not, how many parallel requests does that require?

JoshTriplett 15 hours ago

That's unrelated to the performance of (for instance) the R2 storage layer. All the bandwidth in the world won't help you if you're blocked on storage. It isn't clear whether the overall performance of R2 is capable of saturating user bandwidth, or whether it'll be blocked on something.

S3 can't saturate user bandwidth unless you make many parallel requests. I'd be (pleasantly) surprised if R2 can.

moralestapia 15 hours ago

I'm confused, I assumed we were talking about the network layer.

If we are talking about storage, well, SATA can't give you more than ~5Gbps so I guess the answer is no? But also no one else can do it, unless they're using super exotic HDD tech (hint: they're not, it's actually the opposite).

What a weird thing to argue about, btw, literally everybody is running a network layer on top of storage that lets you have much higher throughput. When one talks about R2/S3 throughput no one (on my circle, ofc.) would think we are referring to the speed of their HDDs, lmao. But it's nice to see this, it's always amusing to stumble upon people with a wildly different point of view on things.

JoshTriplett 14 hours ago

We're talking about the user-visible behavior. You argued that because Cloudflare's CDN has an obscene amount of bandwidth, R2 will be able to saturate user bandwidth; that doesn't follow, hence my counterpoint that it could be bottlenecked on storage rather than network. The question at hand is what performance R2 offers, and that hasn't been answered.

There are any number of ways they could implement R2 that would allow it to run at full wire speed, but S3 doesn't run at full wire speed by default (unless you make many parallel requests) and I'd be surprised if R2 does.

aipatselarom 13 hours ago

n = 1 aside.

I have some large files stored in R2 and a 50Gbps interface to the world.

curl to Linode's speed test is ~200MB/sec.

curl to R2 is also ~200MB/sec.

I'm only getting 1Gbps but given that Linode's speed is pretty much the same I would think the bottleneck is somewhere else. Dually, R2 gives you at least 1Gbps.

renewiltord 14 hours ago

No, most people aren’t interested in subcomponent performance, just in total performance. A trivial example is that even a 4-striped U2 NVMe disk array exported over Ethernet can deliver a lot more data than 5 Gbps and store mucho TiB.

moralestapia 14 hours ago

Thanks for +1 what I just said. So, apparently, it's not just me and my peers who think like this.

fragmede 16 hours ago

Cloudflare's paid DDoS protection product being able to soak up insane L3/4 DDoS attacks doesn't answer the question as to whether or not the specific product, R2 from Cloudflare which has free egress is able to saturate a pipe.

Cloudflare has the network to do that, but they charge money to do so with their other offerings, so why would they give that to you for free? R2 is not a CDN.

moralestapia 16 hours ago

>Can do 3.8 Tbps

>Can't do 10 Gbps

k

fragmede 15 hours ago

> can't read CDN

> Can't read R2

k

bananapub 16 hours ago

that's completely unrelated. the way to soak up a ddos at scale is just "have lots of peering and a fucking massive amount of ingress".

neither of these tell you how fast you can serve static data.

moralestapia 15 hours ago

>that's completely unrelated

Yeah, I'm sure they use a completely different network infrastructure to serve R2 requests.

vtuulos 15 hours ago

yes, this. In case you are interested in seeing some numbers backing this claim, see here https://outerbounds.com/blog/metaflow-fast-data

Source: I used to work at Netflix, building systems that pull TBs from S3 hourly

michaelt 16 hours ago

I mean, it may be true in practice that most S3 workloads are throughput oriented and unconcerned with latency.

But if you look at https://aws.amazon.com/s3/ it says things like:

"Object storage built to retrieve any amount of data from anywhere"

"any amount of data for virtually any use case"

"S3 delivers the resiliency, flexibility, latency, and throughput, to ensure storage never limits performance"

So if S3 is not intended for low-latency applications, the marketing team haven't gotten the message :)

troyvit 15 hours ago

lol I think the only reason you're being downvoted is because the common belief at HN is, "of course marketing is lying and/or doesn't know what they're talking about."

Personally I think you have a point.

mikeshi42 15 hours ago

I didn’t downvote but s3 does have low latency offerings (express). Which has reasonable latency compared to EFS iirc. I’d be shocked if it was as popular as the other higher latency s3 tiers though.

dxxvi 5 hours ago

I have 250GB on S3. To get them out and store in R2, AWS will charge me 9 cents * 250 ~ $24: ouch.

mythz 2 hours ago

Thanks to the EU AWS now offers "Free data transfer out to internet when moving out of AWS" [1]

[1] https://aws.amazon.com/blogs/aws/free-data-transfer-out-to-i...

JOnAgain 16 hours ago

I _love_ articles like this. Hacker News peeps, please make more!

vlovich123 16 hours ago

Very good article and interesting read. I did want to clarify some misconceptions I noted while reading (working from memory so hopefully I don’t get anything wrong myself).

> As explained here, Durable Objects are single threaded and thus limited by nature in the throughput they can offer.

R2 bucket operations do not use single threaded durable objects but did a one off thing just for R2 to let it run multiple instances even. That’s why the limits were lifted in the open beta.

> they mentioned that each zone's assets are sharded across multiple R2 buckets to distribute load which may indicated that a single R2 bucket was not able to handle the load for user-facing traffic. Things may have improve since thought.

I would not use this as general advice. Cache Reserve was architected to serve an absurd amount of traffic that almost no customer or application will see. If you’re having that much traffic I’d expect you to be an ENT customer working with their solutions engineers to design your application.

> First, R2 is not 100% compatible with the S3 API. One notable missing feature are data-integrity checks with SHA256 checksums.

This doesn’t sound right. I distinctly remember when this was implemented for uploading objects. Sha-1 and sha-256 should be supported (don’t remember about crc). For some reason it’s missing from the docs though. The trailer version isn’t supported and likely won’t be for a while though for technical reasons (the workers platform doesn’t support http trailers as it uses http1 internally). Overall compatibility should be pretty decent.

The section on “The problem with cross-datacenter traffic” seems to be flawed assumptions rather than data driven. Their own graphs only show that while public buckets have some occasional weird spikes it’s pretty constantly the same performance while the S3 API has more spikeness and time of day variability is much more muted than the CPU variability. Same with the assumption on bandwidth or other limitations of data centers. The more likely explanation would be the S3 auth layer and the time of day variability experienced matches more closely with how that layer works. I don’t know enough of the particulars of this author’s zones to hypothesize but the s3 with layer was always challenging from a perf perspective.

> This is really, really, really annoying. For example you know that all your compute instances are in Paris, and you know that Cloudflare has a big datacenter in Paris, so you want your bucket to be in Paris, but you can't. If you are unlucky when creating your bucket, it will be placed in Warsaw or some other place far away and you will have huge latencies for every request.

I understand the frustration but there are very good technical and UX reasons this wasn’t done. For example while you may think that “Paris datacenter” is well defined, it isn’t for R2 because unlike S3 your metadata is stored regionally across multiple data centers whereas S3 if I recall correctly uses what they call a region which is a single location broken up into multiple availability zones which are basically isolated power and connectivity domains. This is an availability tradeoff - us-east-1 will never go offline on Cloudflare because it just doesn’t exist - the location hint is the size of the availability region. This is done at both the metadata and storage layers too. The location hint should definitely be followed when you create the bucket but maybe there are bugs or other issues.

As others noted throughput data would also have been interesting.

tecleandor 15 hours ago

> First, R2 is not 100% compatible with the S3 API. One notable missing feature are data-integrity checks with SHA256 checksums.

Maybe it was an old thing? The changelog [0] for 2023-06-16 says:

"S3 putObject now supports sha256 and sha1 checksums."

  [0]: https://developers.cloudflare.com/r2/platform/changelog/#2023-06-16

vlovich123 13 hours ago

I suspect the author is going by the documentation rather than having tried themselves

snihalani 12 hours ago

>Generally, R2's user experience is way better and simpler than S3. As always with AWS, you need 5 certifications and 3 months to securely deploy a bucket.

+1

pier25 12 hours ago

> you can't chose the location of your R2 bucket!

Yeah this is really annoying. That and replication to multiple regions is the reason we're not using R2.

Global replication was a feature announced in 2021 but still hasn't happened:

> R2 will replicate data across multiple regions and support jurisdictional restrictions, giving businesses the ability to control where their data is stored to meet their local and global needs.

https://www.cloudflare.com/press-releases/2021/cloudflare-an...

deanCommie 15 hours ago

The innovator's dilemma is really interesting.

Whenever a new incumbent gets on the scene offering the same thing as some entrenched leader only better, faster, and cheaper, the standard response is "Yeah but it's less reliable. This may be fine for startups but if you're <enterprise|government|military|medical|etc>, you gotta stick with the tried tested and true <leader>"

You see this in almost every discussion of Cloudflare, which seems to be rapidly rebuilding a full cloud, in direct competition with AWS specifically. (I guess it wants to be evaluated as a fellow leader, not an also-ran like GCP/Azure fighting for 2nd place)

The thing is, all the points are right. Cloudflare IS different - by using exclusively edge networks and tying everything to CDNs, it's both a strength and a weakness. There's dozens of reasons to be critical of them and dozens more to explain why you'd trust AWS more.

But I can't help but wonder that surely the same happened (i wasn't on here, or really tech-aware enough) when S3 and EC2 came on the scene. I'm sure everyone said it was unreliable, uncertain, and had dozens of reasons why people should stick with (I can only presume - VMWare, IBM, Oracle, etc?)

This is all a shallow observation though.

Here's my real question, though. How does one go deeper and evaluate what is real disruption and what is fluff. Does Cloudflare have something that's unique and different that demonstrates a new world for cloud services I can't even imagine right now, as AWS did before it. Or does AWS have a durable advantage and benefits that will allow it to keep being #1 indefinitely? (GCP and Azure, as I see it, are trying to compete on specific slices of merit. GCP is all-in on 'portability', that's why they came up with Kubernetes to devalue the idea of any one public cloud, and make workloads cross-platform across all clouds and on-prem. Azure seems to be competitive because of Microsoft's otherwise vertical integration with business/windows/office, and now AI services).

Cloudflare is the only one that seems to show up over and over again and say "hey you know that thing that you think is the best cloud service? We made it cheaper, faster, and with nicer developer experience." That feels really hard to ignore. But also seems really easy to market only-semi-honestly by hand-waving past the hard stuff at scale.

everfrustrated 14 hours ago

Cloudflares architecture is driven purely by their history of being a CDN and trying to find new product lines to generate new revenue streams to keep share price up.

You wouldn't build a cloud from scratch in this way.

youngtaff 13 hours ago

Maybe Cloudflare will even be profitable in the next year or two…

theryanteodoro 16 hours ago

love a good comparison!