Aren't those 400 GB/s a figure which only apply when the GPU with its much wider interface is accessing the memory?
That figure is at the memory controller.
It applies as a maximum speed limit all the time, but it's unlikely that a CPU would cause the memory controller to reach it. Why it's important is that it causes increased latency whenever other bus controllers are competing for bandwidth, but I don't think Apple has documented their internal bus architecture or performance counters necessary to see how.
Another POV is that maybe the max memory bandwidth figure is too vague to guide people optimizing libraries. It would be nice if Apple Silicon was as fast as "400GB/s" sounds. Grounded closer to reality, the parts are 65W.