hamilyon2 6 days ago

The discussion in this thread is amazing. People, even renowned experts in their field make mistakes, a lot of mistakes, sometimes very costly and very obvious in retrospect. In their craft.

Yet when LLM, trained on corpus of human stupidity, no less, make illegal moves in chess, our brain immediately goes: I don't make illegal moves in chess, how can computer play chess if it does?

Perfect examples of metacognitive bias and general attribution error at least.

3
stonemetal12 6 days ago

It isn't a binary does\doesn't question. It is a question of frequency and "quality" of mistakes. If it is making illegal moves 0.1% of the time then sure everybody makes mistakes. If it is 30% of the time then it isn't doing so well. If the illegal moves it tries to make are basic "pieces don't move like that" sort of errors then the predict next token isn't predicting so well. If the legality of the moves is more subtle then maybe it isn't too bad.

But more than being able to make moves, if we claim it understands chess shouldn't be able to explain why it chose a move over another move?

sourcepluck 6 days ago

You would be correct to be amazed if someone was arguing:

"Look! It made mistakes, therefore it's definitely not reasoning!"

That's certainly not what I'm saying, anyway. I was responding to the argument actually being made by many here, which is:

"Look! It plays pretty poorly, but not totally crap, and it wasn't trained for playing just-above-poor chess, therefore, it understands chess and definitely is reasoning!"

I find this - and much of the surrounding discussion - to be quite an amazing display of people's biases, myself. People want to believe LLMs are reasoning, and so we're treated to these merry-go-round "investigations".

stefan_ 6 days ago

No, my brain goes that the machine constantly suggesting "jump off now!" in between the occasional legal chess move probably isn't quite right in the head. And that the people suggesting this is all perfectly fine because we can post-hoc decide what legal moves are and are not even willing to entertain the notion that this invalidates their little experiment, well, maybe that's not the ones we want deploying this kind of thing.