Jarwain 1 day ago

Aider's benchmarks show 4.1 (and 4o) work better in its architect mode, for planning the changes, and o3 for making the actual edits

2
SparkyMcUnicorn 23 hours ago

You have that backwards. The leaderboard results have the thinking model as the architect.

In this case, o3 is the architect and 4.1 is the editor.

drewnick 18 hours ago

I see o3 (high) + gpt-4.1 at 82.7% -- the highest on the benchmark currently.