Anthropic had a vested interest in people thinking Claude is reasoning.
However, in coding tasks I’ve been able to find it directly regurgitating Stack overflow answers (like literally a google search turns up the code).
Giving coding is supposed to be Claude’s strength, and it’s clearly just parroting web data, I’m not seeing any sort of “reasoning”.
LLM may be useful but they don’t think. They’ve already plateaued, and given the absurd energy requirements I think they will prove to be far less impactful than people think.
The claim that Claude is just regurgitating answers from Stackoverflow is not tenable, if you've spent time interacting with it.
You can give Claude a complex, novel problem, and it will give you a reasonable solution, which it will be able to explain to you and discuss with you.
You're getting hung up on the fact that LLMs are trained on next-token prediction. I could equally dismiss human intelligence: "The human brain is just a biological neural network that is adapted to maximize the chance of creating successful offspring." Sure, but the way it solves that task is clearly intelligent.
I’ve literally spent 100s of hours with it. I’m mystified why so many people use the “you’re holding it wrong” explanation when somebody points out real limitations.
You might consider that other people have also spent hundreds of hours with it, and have seen it correctly solve tasks that cannot be explained by regurgitating something from the training set.
I'm not saying that your observations aren't correct, but this is not a binary. It is entirely possible that the tasks you observe the models on are exactly the kind where they tend to regurgitate. But that doesn't mean that it is all they can do.
Ultimately, the question is whether there is a "there" there at all. Even if 9 times out of 10, the model regurgitates, but that one other time it can actually reason, that means that it is capable of reasoning in principle.
When we've spent time with it and gotten novel code, then if you claim that doesn't happen, it is natural to say "you're holding it wrong". If you're just arguing it doesn't happen often enough to be useful to you, that likely depends on your expectations and how complex tasks you need it to carry out to be useful.
In many ways, Claude feels like a miracle to me. I no longer have to stress over semantics or searching for patterns I can recognize and work with, but I’ve never actually coded them myself in that language. Now, I don’t have to waste energy looking up things that I find boring