jsnell 5 days ago

The mode wasn't added after the announcement, Aider has had it for almost a year: https://aider.chat/HISTORY.html#aider-v0320

This benchmark has an authoritative source of results (the leaderboard), so it seems obvious that it's the number that should be used.

1
modeless 5 days ago

OK but it was still added specifically to improve Gemini and nobody else on the leaderboard uses it. Google themselves do not use it when they benchmark their own models against others. They use the regular diff mode that everyone else uses. https://blog.google/technology/google-deepmind/gemini-model-...