fcantournet 1 day ago

"Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant."

That's the opposite of a Turing test pass : it shows a very clear bias in selection is present, which means the LLM is significantly different from humans (at least in this test setting).

If the test setting was : 1 humans talk to chatbot and after 5m decides yes/no on human, then yeah that would be a very impressive result.

But in the test setting of this paper, surely a success would be as close as possible to a 50%, i.e: statistically impossible to separate humans and LLMs.

1
svnt 1 day ago

It is interesting, what does it mean? Perhaps it discloses chatgpt is created to align to our idea of a human more than to an actual human.

andai 1 day ago

It means machines are becoming more human and humans are becoming less human.

bluefirebrand 1 day ago

My unscientific wild ass guess would be that because of how LLMs are built to be pleasing, people wind up liking them more and thus lowering their guard with them and therefore judging them less harshly

For a concrete example of what I'm talking about

Imagine if you are really into older movies, like 60s and 70s movies

You start talking to two chat windows about your love for movies

One chat partner shares your love for old movies and is very enthusiastic and wants to talk all about them. In reality, this chat partner is the LLM

The other is lukewarm and maybe tries to steer you away from that conversation because they don't know much about older movies. Maybe they still love movies but they want to talk about more recent movies. In reality, this one is the human

But which one do you think is the human?

If you are self aware that your love for old movies is not really universal, and you are aware that LLMs have a tendency to match enthusiasm, you can probably guess which one is which

If you are less self aware, you are probably just going to guess that the conversation you enjoyed more is the one with the human