You need high bandwidth links (time to get the entire packet across starts to matter), run on bare metal (or have very well working HW virtualisation support), and tune NIC parameters and OS processing appropriately. But it's practically achievable.
Switches in these scenarios (e.g. 25GE DC targeted) are pretty predictable and add <1μs (unless misconfigured)
> But it's practically achievable.
I've never seen this in practice. Maaaaybe with Infiniband and custom-written apps that use a proprietary SDK.
I'd love to see references to actual benchmarks.