>The internal model of a LLM is statistical text. Which is linear and fixed.
Not at all. Like seriously, not in the slightest.
What does it encode? Images? Scent? Touch? Some higher dimensional qualia?
Well, a simple description is that they discover circuits that reproduce the training sequence. It turns out that in the process of this, they recover relevant computational structures that generalize the training sequence. The question of how far they generalize is certainly up for debate. But you can't reasonably deny that they generalize to a certain degree. After all, most sentences they are prompted on are brand new and they mostly respond sensibly.
Their representation of the input is also not linear. Transformers use self-attention which relies on the softmax function, which is non-linear.