Item 42245945

stormfather • 2 days ago

I think they will though, I think the enormous corpus of video data and the supercluster that powers self driving development are the machine vision analog of internet scale text data that gave rise to LLMs. We'll see the same moment for vision models that text prediction models had once the data is there, where an enormous foundation model becomes much much better, especially at zero-shot tasks.

threeseed • 1 day ago

FSD is already using the fruits of this today with their end to end NN.

And based on what we've seen the results haven't improved enough to put them close to Waymo.