As I was building up my understanding/intuition for the internals of transformers + attention, I found 3Blue1Brown's series of videos (specifically on attention) to be super helpful.
This has been good for me, but it is more foundation than what is the latest. https://www.mattprd.com/p/openai-cofounder-27-papers-read-kn...