I think they will though, I think the enormous corpus of video data and the supercluster that powers self driving development are the machine vision analog of internet scale text data that gave rise to LLMs. We'll see the same moment for vision models that text prediction models had once the data is there, where an enormous foundation model becomes much much better, especially at zero-shot tasks.
FSD is already using the fruits of this today with their end to end NN.
And based on what we've seen the results haven't improved enough to put them close to Waymo.