Item 42239911

jmb99 • 3 days ago

> The cited system has 2x89.6 GB/s bandwidth.

The following applies for certain only to the Zen4 system; I have no experience with Zen5.

That is the theoretical max bandwidth of the DDR5 memory (/controller) running at 5600 MT/s (roughly: 5600MT/s ÷ 2MT/s × 32 bits/T = 89.6GB/s). There is also a bandwidth limitation between the memory controller (IO die) and the cores themselves (CCDs), along the Infinity Fabric. Infinity Fabric runs at a different clock speed than the cores, their cache(s), and the memory controller; by default, 2/3 of the memory controller. So, if the Memory controller's CLocK (MCLK) is 2800MHz (for 5600MT/s), the FCLK (infinity Fabrick CLocK) will run at 1866.66MHz. With 32 bytes per clock read bandwidth, you get 59.7GB/s maximum sequential memory read bandwidth per CCD<->IOD interconnect.

Many systems (read: motherboard manufacturers) will overclock the FCLK when applying automatic overclocking (such as when selecting XMP/EXPO profiles, and I believe some EXPO profiles include overclocking the FCLK as well. (Note that 5600MT/s RAM is overclocked; the fastest officially supported Zen4 memory speed is 5200MT/s, and most memory kits are 3600MT/s or less until overclocked with their built-in profiles.) In my experience, Zen4 will happily accept FCLK up to 2000MHz, while Zen4 Threadripper (7000 series) seems happy up to 2200MHz. This particular system has the FCLK overclocked to 2000MHz, which will hurt latency[0] (due to not being 2/3 of MCLK) but increase bandwidth. 2000MHz × 32 bytes/cycle = 64GB/s read bandwidth, as quoted in the article.

First: these are theoretical maximums. Even the most "perfect" benchmark won't hit these, and if they do, there are other variables at play not being taken into account (likely lower level caches). You will never, ever see theoretical maximum memory bandwidth in any real application.

Second: no, it is not possible to see maximum memory bandwidth on Zen4 from only one CCD, assuming you have sufficiently fast DDR5 that the FCLK cannot be equal to the MCLK. This is an architecture limitation, although rarely hit in practice for most of the target market. A dual-CCD chip has sufficient memory bandwidth to saturate the memory before the Infinity Fabric (but as alluded to in the article, unless tuned incredibly well, you'll likely run into contention issues and either hit a latency or bandwidth wall in real applications). My quad-CCD Threadripper can achieve nearly 300GB/s, due to having 8 (technically 16) DDR5 channels operating at 5800MT/s and FCLK at 2200MHz; I would need an octo-CCD chip to achieve maximum memory bandwidth utilization.

Third: no, claims like "Apple Silicon having 400GB/s) are not meaningless. Those numbers are achieved the exact same way as above, and the same way Nvidia determines their maximum memory bandwidth on their GPUs. Platform differences (especially CPU vs GPU, but even CPU vs CPU since Apple, AMD, and Intel all have very different topologies) make the numbers incomparable to each other directly. As an example, Apple Silicon can probably achieve higher per-core memory bandwidth than Zen4 (or 5), but also shares bandwidth with the GPU; this may not be great for gaming applications, for instance, where memory bandwidth requirements will be high for both the CPU and GPU, but may be fine for ML inference since the CPU sits mostly idle while the GPU does most of the work.

[0] I'm surprised the author didn't mention this. I can only assume they didn't know this, and haven't tested over frequencies or read much on the overclocking forums about Zen4. Which is fair enough, it's a very complicated topic with a lot of hidden nuances.

bpye • 3 days ago

> Note that 5600MT/s RAM is overclocked; the fastest officially supported Zen4 memory speed is 5200MT/s

This specifically did change in Zen 5, the max supported is now 5600MT/s