You can write scripts that correct bad math, too. In fact most of the time ChatGPT will just call out to a calculator function. This is a smart solution, and very useful for end users! But, still, we should not try to use that to make the claim that LLMs have a good understanding of math.
If a script were applied that corrected "bad math" and now the LLM could solve complex math problems that you can't one-shot throw at a calculator, what would you call it?
It's a good point.
But this math analogy is not quite appropriate: there's abstract math and arithmetic. A good math practitioner (LLM or human) can be bad at arithmetic, yet good at abstract reasoning. The later doesn't (necessarily) requires the former.
In chess, I don't think that you can build a good strategy if it relies on illegal moves, because tactics and strategies are tied.
If I had wings, I'd be a bird.
Applying a corrective script to weed out bad answers is also not "one-shot" solving anything, so I would call your example an elaborate guessing machine. That doesn't mean it's not useful, but that's not how a human being does maths, when they understand what they're doing - in fact you can readily program a computer to solve general maths problems correctly the first time. This is also exactly the problem with saying that LLMs can write software - a series of elaborate guesses is undeniably useful and impressive, but without a corrective guiding hand, ultimately useless, and not demonastrating generalised understanding of the problem space. The dream of AI is surely that the corrective hand is unnecessary?
Then you could replace the LLM with a much cheaper RNG and let it guess until the "bad math filter" let something through.
I was once asked by one of the Clueless Admin types if we couldn't just "fix" various sites such that people couldn't input anything wrong. Same principle.
Agreed. It's not the same thing and we should strive for precision (LLMs are already opaque enough as it is).
An LLM that recognizes an input as "math" and calls out to a NON-LLM to solve the problem vs an LLM that recognizes an input as "math" and also uses next-token prediction to produce an accurate response ARE DIFFERENT.
At what point does "knows how to use a calculator" equate to knowing how to do math? Feels pretty close to me...
Well, LLMs are bad at math but they're ok at detecting math and delegating it to a calculator program.
It's kind of like humans.