It's almost as if this is a non-human intelligence, which presents different strengths and weaknesses than human intelligence.
Is that really so surprising, considering the tremendous differences in underlying hardware and training process?
I think one cause of this (and some other issues with LLM use) is that people see it exhibiting one human-level trait, its capability to use language at a human level, and assume that it then comes with other human-level capabilities such as our ability to reason.
Do you not think it is interesting though? If I had asked you three years back which one of counting rocks vs geoguesser would ChatGPT beat humans on would you have answered correctly?