> I don't see how it's any different than optimizing for new CPU/GPU architectures
I mean that seems wild to say to me. Those architectures have documentation and aren't magic black boxes that we chuck inputs at and hope for the best: we do pretty much that with LLMs.
If that's how you optimise, I'm genuinely shocked.