I'll start off admitting to not digging into the article yet, but is this something that can be broken up and done in parallel? I'm an Elixir fanboy so I try to parallelize anything I can, either to speed it up or because I'm an Elixir fanboy.
Most image processing can be multithreaded, you just have to chunk the image and send each chunk to a thread. You might have to pad the chunk based on the processing you’re doing i.e convolution which needs the neighboring pixels per pixel to work.