sixo 6 days ago

When I play chess I filter out all kinds of illegal moves. I also filter out bad moves. Human is more like "recursively thinking of ideas and then evaluating them with another part of your model", why not let the LLMs do the same?

1
skydhash 6 days ago

Because that’s not what happens? We learn through symbolic meaning and rules which then form a consistent system. Then we can have a goal and continuously evaluate if we’re within the system and transitionning towards that goal. The nice thing is that we don’t have to compute the whole simulation in our brains and can start again from the real world. The more you train, the better your heuristics become and the more your efficiency increases.

The internal model of a LLM is statistical text. Which is linear and fixed. Not great other than generating text similar to what was ingested.

fl7305 6 days ago

> The internal model of a LLM is statistical text. Which is linear and fixed. Not great other than generating text similar to what was ingested.

The internal model of a CPU is linear and fixed. Yet, a CPU can still generate an output which is very different from the input. It is not a simple lookup table, instead it executes complex algorithms.

An LLM has large amounts of input processing power. It has a large internal state. It executes "cycle by cycle", processing the inputs and internal state to generate output data and a new internal state.

So why shouldn't LLMs be capable of executing complex algorithms?

skydhash 6 days ago

It probably can, but how will those algorithms be created? And the representation of both input and output. If it’s text, the most efficient way is to construct a formal system. Or a statistical model if ambiguous and incorrect result are ok in the grand scheme of things.

The issue is always inout consumption, and output correctness. In a CPU, we take great care with data representation and protocol definition, then we do formal verification on the algorithms, and we can be pretty sure that the output are correct. So the issue is that the internal model (for a given task) of LLMs are not consistent enough and the referential window (keeping track of each item in the system) is always too small.

fl7305 6 days ago

Neural networks can be evolved to do all sorts of algorithms. For example, controlling an inverted pendulum so that it stays balanced.

> In a CPU, we take great care with data representation and protocol definition, then we do formal verification on the algorithms, and we can be pretty sure that the output are correct.

Sure, intelligent design makes for a better design in many ways.

That doesn't mean that an evolved design doesn't work at all, right?

hackinthebochs 6 days ago

>The internal model of a LLM is statistical text. Which is linear and fixed.

Not at all. Like seriously, not in the slightest.

skydhash 6 days ago

What does it encode? Images? Scent? Touch? Some higher dimensional qualia?

hackinthebochs 6 days ago

Well, a simple description is that they discover circuits that reproduce the training sequence. It turns out that in the process of this, they recover relevant computational structures that generalize the training sequence. The question of how far they generalize is certainly up for debate. But you can't reasonably deny that they generalize to a certain degree. After all, most sentences they are prompted on are brand new and they mostly respond sensibly.

Their representation of the input is also not linear. Transformers use self-attention which relies on the softmax function, which is non-linear.