Item 42231620

blibble • 4 days ago

7-10us for what is a hashtable set/get is really, really bad

I can get a packet out to a switch and back to another machine and in 1-2us

gtirloni • 4 days ago

Do you mean 1-2ms?

1 reply

eqvinox • 4 days ago

No, 1-2us is correct for that — in a datacenter, with cut-through switching.

2 replies

gtirloni • 4 days ago

That's really impressive. I need to update myself on this topic. Thanks.

1 reply

mickg10 • 4 days ago

In reality - with decent switches at 25g - and no fec - node to node is reliably under 300ns (0.3 us)

2 replies

znyboy • 3 days ago

Considering that 300 light-nanoseconds is about 90m, getting a response (or even just one-way) in that time is essentially running right at the limits of physics/causality.

davekeck • 4 days ago

Out of curiosity, how is that measured across machines?

(The first thing that comes to my mind would be to use an oscilloscope with two probes, one to each machine, but I’m guessing that’s not it.)

1 reply

toast0 • 3 days ago

Measure the round trip and divide by two for the approximate one way time. It'd be really neat to measure the time it takes for a packet to travel in one direction, but it's somewhere between hard and impossible[1]; a very short path has less room to be asymetric though.

[1] If the clocks are synchronized, you can measure send time on one end, and receive time on the other. But synchronizing clocks involves estimating the time it takes for signals to pass im each direction, typically assuming each direction takes half the round trip.

1 reply

pkhuong • 3 days ago

You can use something like White Rabbit (https://en.wikipedia.org/wiki/White_Rabbit_Project) to keep clocks in sync. That still involves estimates, but a dedicated time sync network can do things like make sure all the cables are the same length.

1 reply

namibj • 1 day ago

Copper white rabbit is special, it uses the same wire in both directions (1000BASE-T phy with added carrier phase lock to and from outside clocks).

jiggawatts • 3 days ago

Meanwhile the best network I’ve ever benchmarked was AWS and measured about 55µs for a round trip!

What on earth are you using that gets you down to single digits!?

4 replies

Galanwe • 3 days ago

> the best network I’ve ever benchmarked was AWS and measured about 55µs for a round trip

What is "a network" here?

Few infrastructures are optimised for latency, most are geared toward providing high throughput instead.

In fact, apart from HFT, I don't think most businesses are all that latency sensitive. Most infrastructure providers will give you SLAs of high single or low double digits microseconds from Mahwa/Carteret to NY4, but these are private/dedicated links. There's little point to optimising latency when your network ends up on internet where the smallest hops are milliseconds away.

1 reply

jiggawatts • 3 days ago

> There's little point to optimising latency when your network ends up on internet where the smallest hops are milliseconds away.

That's just plain wrong. Lower latency always improves everything. Not just responsiveness, but also bandwidth! Because of TCP slow-start and congestion control algorithms, lower latency directly results in higher throughputs.

Not to mention that these latencies add up, which is especially important with chatty microservices applications. Don't forget that typical TCP+HTTPS connections require something like 5 round trips, and that's assuming that the DNS record is already cached! Add in firewalls, load balancers, proxies, side-cars, ingress, and who knows what else, suddenly you're staring down the barrel of 15 millisecond latencies before the data can exit the data centre.

The threshold for "instant" response is 16.7 ms end-to-end, including refreshing the HTML DOM and painting pixels to the screen.

Google and AWS knows this, which is why their data centre networking have ~50µs latencies, some of the best in the industry.

Everyone else: "Nah, don't bother!"

1 reply

Galanwe • 2 days ago

I think you're getting pissed of at a strawman. Everyone obviously _care_ about latency. All things equal, better latency always makes things better, there is no arguing with that.

Yet, that doesn't mean latency is at the same priority spot on everyone's list. If you're using TCP on internet, you have already put latency far down in your concerns. That doesn't make you _not want_ better latency, but that does make it a _nice to have_.

There's no obvious shortcut to latency that doesn't involve either loosing on reliability (not requiring ordered messages, not re-requesting dropped messages), or loosing throughput (not assembling small messages on bigger ones), or limiting yourself to private links.

If you do all the above (as in TCP over the internet), then you've made no sacrifice for latency over throughput and resiliency, which to me makes latency a nice to have, but certainly not a primary concern.

dahfizz • 3 days ago

The key is that blibbe is talking about switches. Modern switches can process packets at line rate.

If you're working in AWS, you almost certainly are hitting a router, which is comparably slower. Not to mention you are dealing with virtualized hardware, and you are probably sharing all the switches & routers along your path (if someone else's packet is ahead of yours in the queue, you have to wait).

crest • 3 days ago

I assume 1-3 hops of modern switches without congestion. Given 100Gb/s lanes these numbers are possible if you get all the bottlenecks out of the way. The moment you hit a deep queue the latency explodes.

1 reply

jiggawatts • 3 days ago

So, are you talking about theoretical latencies here based on bandwidths and cable lengths, or actual measured latencies end-to-end between hosts?

I know that "in principle" the physics of the cabling allows single digit microseconds, but I've never seen it anywhere near that low even with cross-over cables with zero switches in-path!

1 reply

eqvinox • 3 days ago

You need high bandwidth links (time to get the entire packet across starts to matter), run on bare metal (or have very well working HW virtualisation support), and tune NIC parameters and OS processing appropriately. But it's practically achievable.

Switches in these scenarios (e.g. 25GE DC targeted) are pretty predictable and add <1μs (unless misconfigured)

1 reply

jiggawatts • 3 days ago

> But it's practically achievable.

I've never seen this in practice. Maaaaybe with Infiniband and custom-written apps that use a proprietary SDK.

I'd love to see references to actual benchmarks.

blibble • 3 days ago

that's because cloud networks are complete shit

this is xilinux/mellanox cards with kernel bypass and cut-through switches with busy-waiting

in reality, in a prod system

1 reply

jiggawatts • 2 days ago

Both Azure and AWS have kernel-bypass, and they use 100 to 200 Gbps NICs that are either bespoke silicon or have onboard FPGAs for offloading various things such as encryption and packet header rewrites.

I wouldn't rate them as "complete shit".