majke 3 days ago

I don't think single M1 cpu can do 100GB/s. This source says 68GB/s peak: https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

2
jeffbee 3 days ago

That's the plain M1. The Max can do a bit more. Same site since you favor it: https://www.anandtech.com/show/17024/apple-m1-max-performanc...

majke 3 days ago

> From a single core perspective, meaning from a single software thread, things are quite impressive for the chip, as it’s able to stress the memory fabric to up to 102GB/s. This is extremely impressive and outperforms any other design in the industry by multiple factors, we had already noted that the M1 chip was able to fully saturate its memory bandwidth with a single core and that the bottleneck had been on the DRAM itself. On the M1 Max, it seems that we’re hitting the limit of what a core can do – or more precisely, a limit to what the CPU cluster can do.

Wow

wizzard0 3 days ago

btw what's about as important is that in practice you don't need to write super clever code to do that, these 68GB/s are easy to reach with textbook code without any cleverness

zamadatix 3 days ago

68 Gbps of memory read/write can be easily reached (assuming the memory bandwidth is there to reach it with) on any current architecture by running a basic loop adding 64 bit scalars. What could be even less clever than that?

namibj 2 days ago

Needs to be more than one accumulator.

zamadatix 2 days ago

I mean:

  const uint64_t size = // Some large value
  uint64_t a[size] = // Some random values
  uint64_t b[size] = // Some random values
  uint64_t c[size] = {0};

  uint64_t i = 0;
  while(i < size) {
    c[i] = a[i] + b[i];
  }

  // Disable all optimizations so the above isn't optimized away/vectorized
That's the world's simplest loop with 16 bytes of memory read per loop so even if your core is a piece of crap that averages a single increment and addition per cycle it just needs to run at ~4.3 GHz to still pass the bar anyways. Running this code on my MacBook and my x86 desktop with compiler optimizations off I'm not seeing either fail to reach 64 GB/s.