As a user I'm getting so confused as to what's the "best" for various categories. I don't have time/want to dig into benchmarks for different categories, look into the example data to see which best maps onto my current problems.
The graphs presented don't even show a clear winner across all categories. The one with the biggest "number", GPT-4.5, isn't even in the best in most categories, actually it's like 3rd in a lot of them.
This is quite confusing as a user.
Otherwise big fan of OAI products thus far. I keep paying $20/mo, they keep improving across the board.
I think "best" is slightly subjective / user. But I understand your gripe. I think the only way is using them iteratively, settling on the one that best fits you / your use-case, whilst reading other peoples' experiences and getting a general vibe