ryao 3 days ago

On Zen 3, I am able to use nearly the full 51.2GB/sec from a single CPU core. I have not tried using two as I got so close to 51.2GB/sec that I had assumed that going higher was not possible. Off the top of my head, I got 49-50GB/sec, but I last measured a couple years ago.

By the way, if the cores were able to load things at full speed, they would be able to use 640GB/sec each. That is 2 AVX-512 loads per cycle at 5GHz. Of course, they never are able to do this due to memory bottlenecks. Maybe Intel’s Xeon Max series with HBM can, but I would not be surprised to see an unadvertised internal bottleneck there too. That said, it is so expensive and rare that few people will ever run code on one.

1
buildbot 3 days ago

People have studied the Xeon Max! Spoiler - yes, it's limited to ~23GB/s per core. It can't achieve anywhere close to the theoretical bandwidth of the HBM even, with all cores active. It's a pretty bad design in my opinion.

https://www.ixpug.org/images/docs/ISC23/McCalpin_SPR_BW_limi...

electricshampo1 3 days ago

It is integer factors better overall total BW than ddr5 spr; I think they went for minimal investment + time to market for the spr w/ hbm product rather than heavy investment to hit full bw utilization. Which may have made sense for intel overall given business context etc