Item 42208053

Majromax • 7 days ago

But the LLM isn't "using next-token prediction" to solve the problem, that's only how it's evaluated.

The "real processing" happens through the various transformer layers (and token-wise nonlinear networks), where it seems as if progressively richer meanings are added to each token. That rich feature set then decodes to the next predicted token, but that decoding step is throwing away a lot of information contained in the latent space.

If language models (per Anthropic's work) can have a direction in latent space correspond to the concept of the Golden Gate Bridge, then I think it's reasonable (albeit far from certain) to say that LLMs are performing some kind of symbolic-ish reasoning.

griomnib • 7 days ago

Anthropic had a vested interest in people thinking Claude is reasoning.

However, in coding tasks I’ve been able to find it directly regurgitating Stack overflow answers (like literally a google search turns up the code).

Giving coding is supposed to be Claude’s strength, and it’s clearly just parroting web data, I’m not seeing any sort of “reasoning”.

LLM may be useful but they don’t think. They’ve already plateaued, and given the absurd energy requirements I think they will prove to be far less impactful than people think.

1 reply

DiogenesKynikos • 7 days ago

The claim that Claude is just regurgitating answers from Stackoverflow is not tenable, if you've spent time interacting with it.

You can give Claude a complex, novel problem, and it will give you a reasonable solution, which it will be able to explain to you and discuss with you.

You're getting hung up on the fact that LLMs are trained on next-token prediction. I could equally dismiss human intelligence: "The human brain is just a biological neural network that is adapted to maximize the chance of creating successful offspring." Sure, but the way it solves that task is clearly intelligent.

1 reply

griomnib • 7 days ago

I’ve literally spent 100s of hours with it. I’m mystified why so many people use the “you’re holding it wrong” explanation when somebody points out real limitations.

3 replies

int_19h • 6 days ago

You might consider that other people have also spent hundreds of hours with it, and have seen it correctly solve tasks that cannot be explained by regurgitating something from the training set.

I'm not saying that your observations aren't correct, but this is not a binary. It is entirely possible that the tasks you observe the models on are exactly the kind where they tend to regurgitate. But that doesn't mean that it is all they can do.

Ultimately, the question is whether there is a "there" there at all. Even if 9 times out of 10, the model regurgitates, but that one other time it can actually reason, that means that it is capable of reasoning in principle.

vidarh • 6 days ago

When we've spent time with it and gotten novel code, then if you claim that doesn't happen, it is natural to say "you're holding it wrong". If you're just arguing it doesn't happen often enough to be useful to you, that likely depends on your expectations and how complex tasks you need it to carry out to be useful.

gonab • 6 days ago

In many ways, Claude feels like a miracle to me. I no longer have to stress over semantics or searching for patterns I can recognize and work with, but I’ve never actually coded them myself in that language. Now, I don’t have to waste energy looking up things that I find boring

vrighter • 6 days ago

The LLM isn't solving the problem. The LLM is just predicting the next word. It's not "using next-token prediction to solve a problem". It has no concept of "problem". All it can do is predict 1 (one) token that follows another provided set. That running this in a loop provides you with bullshit (with bullshit defined here as things someone or something says neither with good nor bad intent, but just with complete disregard for any factual accuracy or lack thereof, and so the information is unreliable for everyone) does not mean it is thinking.

3 replies

DiogenesKynikos • 6 days ago

All the human brain does is determine how to fire some motor neurons. No, it does not reason.

No, the human brain does not "understand" language. It just knows how to control the firing of neurons that control the vocal chords, in order to maximize an endocrine reward function that has evolved to maximize biological fitness.

I can speak about human brains the same way you speak about LLMs. I'm sure you can spot the problem in my conclusions: just because the human brain is "only" firing neurons, it does actually develop an understanding of the world. The same goes for LLMs and next-word prediction.

quacker • 6 days ago

I agree with you as far as the current state of LLMs, but I also feel like we humans have preconceived notions of “thought” and “reasoning”, and are a bit prideful of them.

We see the LLM sometimes do sort of well at a whole bunch of tasks. But it makes silly mistakes that seem obvious to us. We say, “Ah ha! So it can’t reason after all”.

Say LLMs get a bit better, to the point they can beat chess grandmasters 55% of the time. This is quite good. Low level chess players rarely ever beat grandmasters, after all. But, the LLM spits out illegal moves sometimes and sometimes blunders nonsensically. So we say, “Ah ha! So it can’t reason after all”.

But what would it matter if it can reason? Beating grandmasters 55% of the time would make it among the best chess players in the world.

For now, LLMs just aren’t that good. They are too error prone and inconsistent and nonsensical. But they are also sort weirdly capable at lots of things in strange inconsistent ways, and assuming they continue to improve, I think they will tend to defy our typical notions of human intelligence.

1 reply

vrighter • 1 day ago

reating gradmasters 55% of the time is not good. We've been beating almost 100% of grandmasters in the 90's.

And when even llm's that are good at chess run, I had recently read an article about someone who examined them from this aspect. They said that if the llm fails to come up with a legal move within 10 tries, a random move is played instead. And this was common.

Not even a beginner would attempt 10 illegal moves in a row, after their first few games. The state of LLM chess is laughably bad, honestly. You would guess that even if it didn't play well, it'd at least consistently make legal moves. It doesn't even get to that level.

mhh__ • 6 days ago

I don't see why this isn't a good model for how human reasoning happens either, certainly as a first-order assumption (at least).