Huh, seems like Aider made a special mode specifically for Gemini[1] some time after Google's announcement blog post with official performance numbers. Still not sure it makes sense to quote that new score next to the others. In any case Gemini's 69% is the top score even without a special mode.
[1] https://aider.chat/docs/more/edit-formats.html#diff-fenced:~...
The mode wasn't added after the announcement, Aider has had it for almost a year: https://aider.chat/HISTORY.html#aider-v0320
This benchmark has an authoritative source of results (the leaderboard), so it seems obvious that it's the number that should be used.
OK but it was still added specifically to improve Gemini and nobody else on the leaderboard uses it. Google themselves do not use it when they benchmark their own models against others. They use the regular diff mode that everyone else uses. https://blog.google/technology/google-deepmind/gemini-model-...