I asked o1-pro what 99490126816810951552*23977364624054235203 is, yesterday. It took 16 minutes to get an answer which is off by eight orders of magnitude.
https://chatgpt.com/share/67e1eba1-c658-800e-9161-a0b8b7b683...
What in the world is that supposed to prove? Let's see you do that in your head.
Tell it to use code if you want an exact answer. It should do that automatically, of course, and obviously it eventually will, but jeez, that's not a bad Fermi guess for something that wasn't designed to attempt such problems.
Sorry, I'm in a rush, could only afford a couple minutes looking at it, but I'm missing something:
Google: 2.385511e+39 Your chat: "Numerically, that’s about 2.3855 × 10^39"
Also curious how you think about LLM-as-calculator in relation to tool calls.
If you look at the precise answer, it's got 8 too many digits, despite it getting the right number of digits in the estimate you looked at.
> Also curious how you think about LLM-as-calculator in relation to tool calls.
I just tried this because I heard all existing models are bad at this kind of problem, and wanted to try it with the most powerful one I have access to. I think it shows that you really want an AI to be able to use computational tools in appropriate circumstances.