Very interesting - have you tried using `o1` yet? I made a program which makes LLM's complete WORDLE puzzles, and the difference between `4o` and `o1` is absolutely astonishing.
4o-mini: 16% 4o: 50% o1-mini: 97% o1: 100%
* disclaimer - only n=7 on o1. Others are like 100-300 each
OK, that was fun. I just tried o1-preview on today's Wordle and it got it on the third guess: https://chatgpt.com/share/673f9169-3654-8006-8c0b-07c53a2c58...
With some transcribing (using another LLM instance) I’ve even gotten it to solve NYT mini crosswords.