"LLM can reason" is trivially provable - all you need to do is give it a novel task (e.g. a logical puzzle) that requires reasoning, and observe it solving that puzzle.
How do you intend to show your task is novel?
"Novel" here simply means that the exact sequence of moves that is the solution cannot possibly be in the training set (mutatis mutandis). You can easily write a program that generates these kinds of puzzles at random, and feed them to the model.