I just finished updating the aider polyglot leaderboard [0] with GPT-4.1, mini and nano. My results basically agree with OpenAI's published numbers.
Results, with other models for comparison:
Model Score Cost
Gemini 2.5 Pro Preview 03-25 72.9% $ 6.32
claude-3-7-sonnet-20250219 64.9% $36.83
o3-mini (high) 60.4% $18.16
Grok 3 Beta 53.3% $11.03
* gpt-4.1 52.4% $ 9.86
Grok 3 Mini Beta (high) 49.3% $ 0.73
* gpt-4.1-mini 32.4% $ 1.99
gpt-4o-2024-11-20 18.2% $ 6.74
* gpt-4.1-nano 8.9% $ 0.43
Aider v0.82.0 is also out with support for these new models [1]. Aider wrote 92% of the code in this release, a tie with v0.78.0 from 3 weeks ago. Did you benchmarked combo: DeepSeek R1 + DeepSeek V3 (0324)? There is combo on 3rd place : DeepSeek R1 + claude-3-5-sonnet-20241022 and also V3 new beating claude 3.5 so in theory R1 + V3 should be even on 2nd place. Just curious if that would be the case
What model are you personally using in your aider coding? :)
Mostly Gemini 2.5 Pro lately.
I get asked this often enough that I have a FAQ entry with automatically updating statistics [0].
Model Tokens Pct
Gemini 2.5 Pro 4,027,983 88.1%
Sonnet 3.7 518,708 11.3%
gpt-4.1-mini 11,775 0.3%
gpt-4.1 10,687 0.2%
[0] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...