Some sources mention that o3 scores 63.8 on SWE-bench, while Gemini 2.5 Pro scores 69.1.
On most other benchmarks, they seem to perform about the same, which is bad news for o3 because it's much more expensive and slower than Gemini 2.5 Pro, and it also hides its reasoning while Gemini shows everything.
We can probably just stick with Gemini 2.5 Pro, since it offers the best combination of price, quality, and speed. No need to worry about finding a replacement (for now).
> Some sources mention that o3 scores 63.8 on SWE-bench, while Gemini 2.5 Pro scores 69.1.
It's the opposite. o3 scores higher