Lmarena isn't that useful anymore lol
I actually agree with that, but it's generally better than other scores. Also, the quote is like a year old at this point.
In practice you have to evaluate the models yourself for any non-trivial task.