> My understanding is that the proposed method is faster in the sense of sampling efficiency (of the cost function to construct the Taylor series), but not in the sense of FLOPS. The higher derivatives do not come for free.
Sure, but as long as this remains cheaper than the process of computing the next convergence, this would still be a net win. For example the article talks about how AI training uses gradient descent and I’m pretty sure that the gradient descent part is a tiny fraction of the time spent training vs evaluating the math kernels in all the layers; therefore taking fewer steps should be a substantial win.
> Like the original version of Newton’s method, each iteration of this new algorithm is still computationally more expensive than methods such as gradient descent. As a result, for the moment, the new work won’t change the way self-driving cars, machine learning algorithms or air traffic control systems work. The best bet in these cases is still gradient descent.
Unfortunately not.