wizzard0 3 days ago

btw what's about as important is that in practice you don't need to write super clever code to do that, these 68GB/s are easy to reach with textbook code without any cleverness

1
zamadatix 3 days ago

68 Gbps of memory read/write can be easily reached (assuming the memory bandwidth is there to reach it with) on any current architecture by running a basic loop adding 64 bit scalars. What could be even less clever than that?

namibj 2 days ago

Needs to be more than one accumulator.

zamadatix 2 days ago

I mean:

  const uint64_t size = // Some large value
  uint64_t a[size] = // Some random values
  uint64_t b[size] = // Some random values
  uint64_t c[size] = {0};

  uint64_t i = 0;
  while(i < size) {
    c[i] = a[i] + b[i];
  }

  // Disable all optimizations so the above isn't optimized away/vectorized
That's the world's simplest loop with 16 bytes of memory read per loop so even if your core is a piece of crap that averages a single increment and addition per cycle it just needs to run at ~4.3 GHz to still pass the bar anyways. Running this code on my MacBook and my x86 desktop with compiler optimizations off I'm not seeing either fail to reach 64 GB/s.