Item 42214978

wavemode • 6 days ago

As I wrote in another comment - you can write scripts that correct bad math, too. But we don't use that to claim that LLMs have a good understanding of math.

ben_w • 6 days ago

I'd say that's because we don't understand what we mean by "understand".

Hardware that accurately performs maths faster than all of humanity combined is so cheap as to be disposable, but I've yet to see anyone claim that a Pi Zero has "understanding" of anything.

An LLM can display the viva voce approach that Turing suggested[0], and do it well. Ironically for all those now talking about "stochastic parrots", the passage reads:

"""… The game (with the player B omitted) is frequently used in practice under the name of viva voce to discover whether some one really understands something or has ‘learnt it parrot fashion’. …"

Showing that not much has changed on the philosophy of this topic since it was invented.

[0] https://academic.oup.com/mind/article/LIX/236/433/986238

1 reply

danparsonson • 6 days ago

> I'd say that's because we don't understand what we mean by "understand".

I'll have a stab at it. The idea of LLMs 'understanding' maths is that, once having been trained on a set of maths-related material, the LLM will be able to generalise to solve other maths problems that it hasn't encountered before. If an LLM sees all the multiplication tables up to 10x10, and then is correctly able to compute 23x31, we might surmise that it 'understands' multiplication - i.e. that it has built some generalised internal representation of what multiplication is, rather than just memorising all possible answers. Obviously we don't expect generalisation from a Pi Zero without specifically being coded for it, because it's a fixed function piece of hardware.

Personally I think this is highly unlikely given that maths and natural language are very different things, and being good at the latter does not bear any relationship to being good at the former (just ask anyone who struggles with maths - plenty of people do!). Not to mention that it's also much easier to test for understanding of maths because there is (usually!) a single correct answer regardless of how convoluted the problem - compared to natural language where imitation and understanding are much more difficult to tell apart.

SpaceManNabs • 6 days ago

I don't know. I have talked to a few math professors, and they think LLMs are as good as a lot of their peers when it comes hallucinations and being able to discuss ideas on very niche topics, as long as the context is fed in. If Tao is calling some models "a mediocre, but not completely incompetent [...] graduate student", then they seem to understand math to some degree to me.

2 replies

lupire • 6 days ago

Tao said that about a model brainstorming ideas that might be useful, not explaining complex ideas or generating new ideas or selecting a correct idea from a list of brainstormed ideas. Not replacing a human.

1 reply

adelineJoOs • 6 days ago

> Not replacing a human.

Obviously not, but that is tangential to this discussion, I think. A hammer might be a useful tool in certain situations, and surely it does not replace a human (but it might make a human in those situations more productive, compared to a human without a hammer).

> generating new ideas

Is brainstorming not an instance of generating new ideas? I would strongly argue so. And whether the LLM does "understand" (or whatever ill-defined, ill-measurable concept one wants to use here) anything about the ideas if produces, and how they might be novel - that is not important either.

If we assume that Tao is adequately assessing the situation and truthfully reporting his findings, then LLMs can, at the current state, at least occasionally be useful in generating new ideas, at least in mathematics.

fijiaarone • 6 days ago

Being as good as a professor at confidently hallucinating nonsense when you don't know the answer is a very high level skill.

fijiaarone • 6 days ago

Actually, LLMs do call scripts that correct bad math, and have gotten progressively better because of it. It's another special case example.