There's quite a few monocular depth estimation models out there, have been for years. This one looks pretty good. That said, the temporal stability seems pretty wobbly, I don't think I'd use it for a self driving car.
The most impressive example was the point cloud they generated from the extreme fisheye lens, that was nice.
Predicting that the background on cloud city was a flat matte painting is also impressive in a way. It does seem to collapse all far field objects into a single plane. That's a decent compromise for many things.
Another leading monocular depth estimation model, Marigold [1] is also from ETH.