layer8 4 days ago

Good point, but the similarity score between mutual matches is still different, so it doesn’t seem to be a symmetric measure?

1
antirez 4 days ago

Your observation is really acute: the small difference is due to quantization. When we search for element A, that is int8 quantized by default, the code paths de-quantize it, then re-quantize it and searches. This produces a small loss of precision, like that:

redis-cli -3 VSIM hn_fingerprint ELE pg WITHSCORES | grep montrose

montrose 0.8640020787715912

redis-cli -3 VSIM hn_fingerprint ELE montrose WITHSCORES | grep pg

pg 0.8639097809791565

So why cosine similarity is commutative, the quantization steps lead to a small different result. But the difference is .000092 that is in practical terms not important. Redis can use non quantized vectors using the NOQUANT option in VADD, but this will make the vectors elements using 4 bytes per component: given that the recall difference is minimal, it is almost always not worth it.