>That’s holding LLMs to a significantly higher standard than humans. When I realize there’s a flaw in my reasoning I don’t know that it was caused by specific incorrect neuron connections or activation potentials in my brain, I think of the flaw in domain-specific terms using language or something like it.
LLMs should be held to a higher standard. Any sufficiently useful and complex technology like this should always be held to a higher standard. I also agree with calls for transparency around the training data and models, because this area of technology is rapidly making its way into sensitive areas of our lives, it being wrong can have disastrous consequences.
The context is whether this capability is required to qualify as AGI. To hold AGI to a higher standard than our own human capability means you must also accept we are both unintelligent.
No, it just means you have a stronger prior that a human being is generally intelligent. We don't ask that question of each other because it's obvious.
It doesn't make sense to hold you to the same standard I hold a model to. We scrutinize test scores for hints of ourselves and dress up the process with rigor, formalisms, operationalizations. A machine's beating you on a test, or your favorite set of such, is not very convincing evidence it is generally capable in anything like the way you are, much less sentient. Similarly it would be silly to conclude from your failure on the same battery of tests that you are not generally intelligent. Maybe you were tired or drunk.