It doesn't seem improbable considering it's a huge performance win and perhaps many won't notice?
It is the performance win for similar looking results that I find improbable. For a box blur to look like gaussian blur, you would need multiple passes. Even though each pass is now O(1) instead of O(n) (with n the blur radius), due to caching effects I think a gaussian kernel would still be faster, especially for the small blur radius as described in the article.