Item 43702726

lynguist • 4 days ago

https://www.anthropic.com/research/tracing-thoughts-language...

The section about hallucinations is deeply relevant.

Namely, Claude sometimes provides a plausible but incorrect chain-of-thought reasoning when its “true” computational path isn’t available. The model genuinely believes it’s giving a correct reasoning chain, but the interpretability microscope reveals it is constructing symbolic arguments backward from a conclusion.

https://en.wikipedia.org/wiki/On_Bullshit

This empirically confirms the “theory of bullshit” as a category distinct from lying. It suggests that “truth” emerges secondarily to symbolic coherence and plausibility.

This means knowledge itself is fundamentally symbolic-social, not merely correspondence to external fact.

Knowledge emerges from symbolic coherence, linguistic agreement, and social plausibility rather than purely from logical coherence or factual correctness.

emn13 • 4 days ago

While some of what you say is an interesting thought experiment, I think the second half of this argument has, as you'd put it, a low symbolic coherence and low plausibility.

Recognizing the relevance of coherence and plausibility does not need to imply that other aspects are any less relevant. Redefining truth merely because coherence is important and sometimes misinterpreted is not at all reasonable.

Logically, a falsehood can validly be derived from assumptions when those assumptions are false. That simple reasoning step alone is sufficient to explain how a coherent-looking reasoning chain can result in incorrect conclusions. Also, there are other ways a coherent-looking reasoning chain can fail. What you're saying is just not a convincing argument that we need to redefine what truth is.

1 reply

dcow • 3 days ago

For this to be true everyone must be logically on the same page. They must share the same axioms. Everyone must be operating off the same data and must not make mistakes or have bias evaluating it. Otherwise inevitably sometimes people will arrive at conflicting truths.

In reality it’s messy and not possible with 100% certainty to discern falsehoods and truthoods. Our scientific method does a pretty good job. But it’s not perfect.

You can’t retcon reality and say “well retrospectively we know what happened and one side was just wrong”. That’s called history. It’s not useful or practical working definition of truth when trying to evaluate your possible actions (individually, communally, socially, etc) and make a decision in the moment.

I don’t think it’s accurate to say that we want to redefine truth. I think more accurately truth has inconvenient limitations and it’s arguably really nice most of the time to ignore them.

jimbokun • 3 days ago

> Knowledge emerges from symbolic coherence, linguistic agreement, and social plausibility rather than purely from logical coherence or factual correctness.

This just seems like a redefinition of the word "knowledge" different from how it's commonly used. When most people say "knowledge" they mean beliefs that are also factually correct.

2 replies

indigo945 • 3 days ago

As a digression, the definition of knowledge as justified true belief runs into the Gettier problems:

    > Smith [...] has a justified belief that "Jones owns a Ford". Smith 
    > therefore (justifiably) concludes [...] that "Jones owns a Ford, or Brown 
    > is in Barcelona", even though Smith has no information whatsoever about 
    > the location of Brown. In fact, Jones does not own a Ford, but by sheer 
    > coincidence, Brown really is in Barcelona. Again, Smith had a belief that
    > was true and justified, but not knowledge.

Or from 8th century Indian philosopher Dharmottara:

   > Imagine that we are seeking water on a hot day. We suddenly see water, or so we 
   > think. In fact, we are not seeing water but a mirage, but when we reach the 
   > spot, we are lucky and find water right there under a rock. Can we say that we 
   > had genuine knowledge of water? The answer seems to be negative, for we were 
   > just lucky.

More to the point, the definition of knowledge as linguistic agreement is convincingly supported by much of what has historically been common knowledge, such as the meddling of deities in human affairs, or that the people of Springfield are eating the cats.

dcow • 3 days ago

I don’t think it’s so clear cut… Even the most adamant “facts are immutable” person can agree that we’ve had trouble “fact checking” social media objectively. Fluoride is healthy, meta analysis of the facts reveals fluoride may be unhealthy. The truth of the matter is by and large what’s socially cohesive for doctors’ and dentists’ narrative, that “fluoride is fine any argument to the contrary—even the published meta-analysis—is politically motivated nonsense”.

1 reply

jimbokun • 3 days ago

You are just saying identifying "knowledge" vs "opinion" is difficult to achieve.

1 reply

dcow • 3 days ago

No, I’m saying I’ve seen reasonbly minded experts in a field disagree over things-generally-considered-facts. I’ve seen social impetus and context shape the understanding of where to draw the line between fact and opinion. I do not believe there is an objective answer. I fundamentally believe Anthropic’s explanation is rooted in real phenomena and not just a self serving statement to explain AI hallucination in a positive quasi-intellectual light.

CodesInChaos • 4 days ago

> The model genuinely believes it’s giving a correct reasoning chain, but the interpretability microscope reveals it is constructing symbolic arguments backward from a conclusion.

Sounds very human. It's quite common that we make a decision based on intuition, and the reasons we give are just post-hoc justification (for ourselves and others).

4 replies

RansomStark • 4 days ago

> Sounds very human

well yes, of course it does, that article goes out of its way to anthropomorphize LLMs, while providing very little substance

jimbokun • 3 days ago

Isn't the point of computers to have machines that improve on default human weaknesses, not just reproduce them at scale?

2 replies

canadaduane • 3 days ago

They've largely been complementary strengths, with less overlap. But human language is state-of-the-art, after hundreds of thousands of years of "development". It seems like reproducing SOTA (i.e. the current ongoing effort) is a good milestone for a computer algorithm as it gains language overlap with us.

floydnoel • 3 days ago

Why would computers have just one “point”? They have been used for endless purposes and those uses will expand forever

throwway120385 • 3 days ago

The other very human thing to do is invent disciplines of thought so that we don't just constantly spew bullshit all the time. For example you could have a discipline about "pursuit of facts" which means that before you say something you mentally check yourself and make sure it's actually factually correct. This is how large portions of the populace avoid walking around spewing made up facts and bullshit. In our rush to anthropomorphize ML systems we often forget that there are a lot of disciplines that humans are painstakingly taught from birth and those disciplines often give rise to behaviors that the ML-based system is incapable of like saying "I don't know the answer to that" or "I think that might be an unanswerable question."

1 reply

dcow • 3 days ago

Are they incapable? Or are they just not taught the discipline?

jerf • 3 days ago

In a way, the main problem with LLMs isn't that they are wrong sometimes. We humans are used to that. We encounter people who are professionally wrong all the time. Politicians, con-men, scammers, even people who are just honestly wrong. We have evaluation metrics for those things. Those metrics are flawed because there are humans on the other end intelligently gaming those too, but generally speaking we're all at least trying.

LLMs don't fit those signals properly. They always sound like an intelligent person who knows what they are talking about, even when spewing absolute garbage. Even very intelligent people, even very intelligent people in the field of AI research are routinely bamboozled by the sheer swaggering confidence these models convey in their own results.

My personal opinion is that any AI researcher who was shocked by the paper lynguist mentioned ought to be ashamed of themselves and their credulity. That was all obvious to me; I couldn't have told you the exact mechanism the arithmetic was being performed (though what is was doing was well in the realm of what I would have expected from a linguistic AI trying to do math), but the fact that its chain of reasoning bore no particular resemblance to how it drew its conclusions was always obvious. A neural net has no introspection on itself. It doesn't have any idea "why" it is doing what it is doing. It can't. There's no mechanism for that to even exist. We humans are not directly introspecting our own neural nets, we're building models of our own behavior and then consulting the models, and anyone with any practice doing that should be well aware of how those models can still completely fail to predict reality!

Does that mean the chain of reasoning is "false"? How do we account for it improving performance on certain tasks then? No. It means that it is occurring at a higher level and a different level. It is quite like humans imputing reasons to their gut impulses. With training, combining gut impulses with careful reasoning is actually a very, very potent way to solve problems. The reasoning system needs training or it flies around like an unconstrained fire hose uncontrollably spraying everything around, but brought under control it is the most powerful system we know. But the models should always have been read as providing a rationalization rather than an explanation of something they couldn't possibly have been explaining. I'm also not convinced the models have that "training" either, nor is it obvious to me how to give it to them.

(You can't just prompt it into a human, it's going to be more complicated than just telling a model to "be carefully rational". Intensive and careful RHLF is a bare minimum, but finding humans who can get it right will itself be a challenge, and it's possible that what we're looking for simply doesn't exist in the bias-set of the LLM technology, which is my base case at this point.)

jmaker • 4 days ago

I haven’t used Cursor yet. Some colleagues have and seemed happy. I’ve had GitHub Copilot on for what feels like a couple years, a few days ago VS Code was extended to provide an agentic workflow, MCP, bring-your-own-key, it interprets instructions in a codebase. But the UX and the outputs are bad in over 3/4 of cases. It’s a nuisance to me. It injects bad code even though it has the full context. Is Cursor genuinely any better?

To me it feels like people that benefit from or at least enjoy that sort of assistance and I solve vastly different problems and code very differently.

I’ve done exhausting code reviews on juniors’ and middles’ PRs but what I’ve been feeling lately is that I’m reviewing changes introduced by a very naive poster. It doesn’t even type-check. Regardless of whether it’s Claude 3.7, o1, o3-mini, or a few models from Hugging Face.

I don’t understand how people find that useful. Yesterday I literally wasted half an hour for a test suite setup a colleague of mine introduced to the codebase that wasn’t good, and I tried delegating that fix to several of the Copilot models. All of them missed the point, some even introduced security vulnerabilities in the process invalidating JWT validation, I tried “vide coding” it till it works, until I gave up in frustration and just used an ordinary search engine, which led me to the docs, in which I immediately found the right knob. I reverted all that crap and did the simple and correct thing. So my conclusion was simple: vibe coding and LLMs made the codebase unnecessarily more complicated and wasted my time. How on earth do people code whole apps with that?

1 reply

trilbyglens • 4 days ago

I think it works until it doesn't. The nature of technical debt of this kind means you can sort of coast on things until the complexity of the system reaches such a level that it's effectively painted into a corner, and nothing but a massive teardown will do as a fix.

nickledave • 3 days ago

Yes

https://link.springer.com/article/10.1007/s10676-024-09775-5

> # ChatGPT is bullshit

> Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

ScottBurson • 1 day ago

> The model genuinely believes it’s giving a correct reasoning chain

The model doesn't "genuinely believe" anything.

skrebbel • 4 days ago

Offtopic but I'm still sad that "On Bullshit" didn't go for that highest form of book titles, the single noun like "Capital", "Sapiens", etc

1 reply

mvieira38 • 3 days ago

Starting with "On" is cooler in philosophical tradition, though, starting in classical and medieval times, e.g. On Interpretation, On the Heavens, etc by Aristotle, De Veritate, De Malo, etc. by Aquinas. Capital is actually "Das Kapital", too

2 replies

pas • 3 days ago

It's very hipster, Das Kapital. (with the dot/period, check the cover https://en.wikipedia.org/wiki/Das_Kapital#/media/File:Zentra... )

But in English it would be just "Capital", right? (The uncountable nouns are rarely used with articles, it's "happiness" not "the happiness". See also https://old.reddit.com/r/writing/comments/12hf5wd/comment/jf... )

skrebbel • 3 days ago

Yeah so I meant the Piketty book, not Marx. But I googled it and turns out it's actually named "Capital in the Twenty-First Century", which disappoints me even more than "On Bullshit"

1 reply

pas • 3 days ago

And, for the full picture it's probably important to consider that the main claim of the book is based on very unreliable data/methodology. (Though note that it does not necessarily make the claim false! See [1])

https://marginalrevolution.com/marginalrevolution/2017/10/pi...

And then later similar claims about inequality were similarly made using bad methodology (data).

https://marginalrevolution.com/marginalrevolution/2023/12/th...

[1] "Indeed, in some cases, Sutch argues that it has risen more than Piketty claims. Sutch is rather a journeyman of economic history upset not about Piketty’s conclusions but about the methods Piketty used to reach those conclusions."

1 reply

skrebbel • 2 days ago

You misunderstand. I never read it. I simply liked the title, at least before I understood "Capital" that wasn't actually the title.