It’s mind blowing LeCun is listed as one of the authors.
I would expect model size to correlate with alignment score because usually model sizes correlate with hidden dimension. But also opposite can be true - bigger models might shift more basic token classification logic into layers and hence embedding alignment can go down. Regardless feels like pretty useless research…
Leaves a bit of a taste considering LeCun's famously critical stance on auto-regressive transformer LLMs.