> a repeated convolution
I really wonder what's the field of reference of "quickly" there. To me convolution is one of the last resort techniques in signal processing given how expensive it is (O(size of input data * size of convolution kernel)). It's of course still much faster than gaussian blur which is still non-trivial to manage at a barely decent 120fps even on huge Nvidia GPUs but still.
How are we supposed to think about SIMD in Big-O? Because this is still linear time if the kernel width is less than the max SIMD width (which is 16 I think on x64?)