"Humanity’s Last Exam" may be a rigorous test of fact-check and pattern recognition, but does it truly measure intelligence, or just a model's ability to absorb and regurgitate structured knowledge?
I mean it's one thing doing maths, and finding historical facts and figures(Google made that very easy), but a whole other thing to ask meaningful, novel questions.
I think true AGI isn’t about scoring an A on an exam, but about the moment an AI starts asking questions humans never thought to ask.
Claude already does this sometimes. It follows up with a question which is to gather more info, or to sometimes even lead me towards a solution. May it be synthetic, its still very useful