Item 43469178

magicalhippo • 5 days ago

In the paper[1], they note that their method seems much more well-behaved for multi-variate problems. In the case they numerically examined, the regular Newton's method had a fractal pattern of which starting points converged, while their method had a nice contiguous area with fairly sharp boundary.

That is, if using Newton's method one could start close to the solution but still not converge, while their method seems much more well-behaved.

I guess that might be a more attractive property than "just" being faster. However not an expert so not sure if other, cheaper alternatives also has this property.

That said, the whole point of their method is to use higher-order derivatives. Admittedly I don't do optimization problems that often, but when I do my function is seldom analytical, and so getting good higher-order derivatives are usually expensive at best and problematic at worst.

They do touch on this in the section about future work, wondering if a quasi-Newton method[2] could be created in a similar vein, using only sparse sampling of the higher-order derivatives.

[1]: https://arxiv.org/abs/2311.06374

[2]: https://en.wikipedia.org/wiki/Quasi-Newton_method

yorwba • 5 days ago

I think the result is primarily relevant from a complexity-theory perspective, showing that this is possible at all in time polynomial in the number n of variables. A big hint is that the paper never explicitly states the degree of that polynomial, but it would need to be at least as large as the degree d of the Taylor expansion used.

The convergence speed depends in a complicated way on the Lipschitz constant of the d-th derivative, but if you ignore that, the number of iterations to reach a fixed target precision is proportional to 1/ln(d). So to halve the number of iterations, you would need to square d, which is bad news if the runtime is already at least exponential in d. It would only make sense if the function you're trying to minimize is extremely costly to compute but the cost of differentiating it d times is negligible in comparison. Even then, there'll be an optimal d beyond which further increases only slow it down.

In the univariate case, if you're doing 1 + d work per iteration for 1/ln(d) iterations, the optimal integer value of d is 4. So at least in that case, going a bit higher than Newton's method can be beneficial.

lupire • 5 days ago

3B1B investigation of fractals derived from Newton's Method

https://m.youtube.com/watch?v=-RdOwhmqP5s