Item 43712041

achierius • 3 days ago

How is this a notable release? It's strictly worse than Gemini 2.5 on coding &c, and only an iterative improvement over their own models. The only thing that struck me as particularly interesting was the native visual reasoning.

og_kalu • 3 days ago

It's not worse on coding. SWE Bench, Aider, live bench coding all show noticeably better results.