dawnofdusk 8 hours ago

Are there any results about the "optimality" of backpropagation? Can one show that it emerges naturally from some Bayesian optimality criterion or a dynamic programming principle? This is a significant advantage that the "free energy principle" people have.

For example, let's say instead of gradient descent you want to do a Newton descent. Then maybe there's a better way to compute the needed weight updates besides backprop?

2
roenxi 5 hours ago

I'd be willing to be proven wrong, but as a starting point I'd suggest it obviously isn't optimal for what it is being used for. The performance on tasks of AI seems to be quite poor relative to the time spent training. For example, when AIs overtake humans at Baduk it is normal for the AI to have played several orders of magnitude more games than elite human players.

The important thing is backprop does work and so we're just scaling it up to absurd levels to get good results. There is going to be a big step change found sooner or later where training gets a lot better. Maybe there is some sort of threshold we're looking for where a trick only works for models with lots of parameters or something before we stumble on it, but if evolution can do it so will researchers.

mrfox321 8 hours ago

Second order methods, and their approximations, can be used in weight updating, too.