"I’m not sure, because OpenAI doesn’t deign to share gpt-4-base, nor to allow queries of gpt-4o in completion mode."
I would guess GPT-4o isn't first pre-trained and then instruct-tuned, but trained directly with refined instruction-following material.
This material probably contains way fewer chess games.
Why do you think that? InstructGPT was predominantly trained as a next-token predictor on whatever soup of data OpenAI curated at the time. The alignment signal (both RL part and the supervised prompt/answer pairs) are a tiny bit of the gradient.