Coming from a layman's perspective, a genuine question regarding: "Implements SAE training with auxiliary loss to prevent and revive dead latents, and gradient projection to stabilize training dynamics".
I struggle to understand this phrase "to prevent and revive ", perhaps this is simple speak to those that understand the subject of SAEs, but it feels a bit self contradictory to me, could anyone elaborate?
Just bad wording from me, trying to combine too much information in 1 sentence. The auxiliary loss is supposed to prevent dead latents from occuring in the first place - therefore "prevent dead latents" - and it is also supposed to revive the latents that are already dead - therefore "revive dead latents".
Now that I review that sentence again I see that I used 2 verbs on the same subject that could be interpreted differently depending on the verb. Me culpa. I hope you still gained some insights into it =)
Thanks for sharing! It is certainly interesting to me who is not in the mainstream, I'm sure your intended audience understood what you were saying.
A latent that is never active and hence doesn't (seem to) represent anything. A loss term to reduce the occurrence of that, and if it does happen, push it back to being active sometimes.
So basically preventing dead latents from occurring and whenever they do occur to possibly reviving them through the use of auxiliary loss term in the loss function? Thanks btw
I imagine this kind of algorithm are like a derivative, they give a unit response, so you would need another filter to stabilize your system, that is some drop out to remove spurious revived latents.