cahaya 5 days ago

Wondering how this compares to the Gemini (preview) embeddings as they seem to perform significantly better than OpenAI embeddings 3 large. I don't see any MTEB scores so hard to compare.

1
mahjongmen 5 days ago

Hey Cahaya,

While we benchmarked internally, on BEIR, we opted not to report our model onto MTEB for the following reason:

1) MTEB has been gamed - if you look at this model (https://huggingface.co/voyageai/voyage-3-m-exp) on the MTEB leaderboard, its an intermediate checkpoint of Voyage-3-Large where they finetuned it on datasets that represent MTEB datasets.

2) If you look at the recent datasets in MMTEB, you'll find that it has quite a lot of machine translated or "weird" datasets that are quite noisy

In general, for our Search Models, we benchmark on these public academic datasets but we definitely do not try to hillclimb in this direction as we find it has little correlation with real use-cases