cgadski 10 days ago

Where x is the final hidden layer of the base model, the idea here is to steer outputs in some direction by adding a vector y. More specifically, y is an exponential moving average over a sequence of vectors W(z_t), where z_t are some sort of context vectors and W is a linear map.

Except, the linear map W is just set to a random initialization, so it won't work for obvious reasons in its current form. (I guess this is why there is no example of its output. I'm guessing it was vibe-coded?) Also, since the intervention is only happening at the last hidden layer, I can't imagine this would really change how the model "thinks" in an interesting way. Like, yeah, you can absolutely make a model talk about dogs by adding in control vector for "dogness" somewhere.

Basically, this method is "inspired by graffiti art of tagging and the neuroplastic nature of living brains" in the same way that taking an exponential moving average of a time series would be "informed by state-space dynamics techniques utilized in deep learning, reservoir computing, and quantum mechanics." Really tired of the amount of insincere/pointless language in deep learning nowadays.

2
vessenes 10 days ago

The author said the original liquid paper specifies random starting weights. I think what would happen is you get a bit of a random personality each time you redo the randomization, and then it will self-referentially update over time. I mean you have to start somewhere. You could start with all 1s, I guess, if you’re going to norm.

Update: Even if this is a good idea, and I’m not sure it is, it probably makes sense to have a pretty fast early move away from the random weights, and then slow down.

nullbio 10 days ago

Was definitely vibe coded.