Item 42241391

revnode • 3 days ago

MD5 hash collisions are unlikely to happen at random. The defect was that you can make it happen purposefully, making it useless for security.

aphantastic • 3 days ago

Sure, but theoretically you could have a system where a distributed log of user generated content is built via this CAS//MD5 primitive. A malicious actor could craft the data such that entries are dropped.

1 reply

revnode • 2 days ago

My understanding of the feature, and correct me if I'm wrong, is that you are not granted write access based on a hash. You already have write access. You can use the hash to avoid overwriting someone else's data that was appended to the file in between you checking the file and writing to it. If you already have write access, the hash is irrelevant. As a bad actor, you can corrupt the data without it.

MD5 should not be used for anything security related. Granting write access based on an MD5 hash would be a huge no-no.

1 reply

aphantastic • 1 day ago

Right, the issue comes when a trusted writer is logging data that is sourced from an untrusted party.

Imagine a transaction log being a blob per-customer with many lines corresponding to price, sku, etc, that additionally have some “memo” field provided by the customer. A trusted distributed worker process is responsible for taking incoming requests by the user, pulling their blob down, appending the line based on the request, and CAS’ing it back in (retrying on failure). With enough effort, a particularly devious user could issue many requests with ‘memo’s engineered to not alter the MD5 of their log. This would cause some lines to be lost. An audit of their account transaction log would be unable to accurately reflect the requests they made to the service, and the failure would be invisible.

This is obviously a bit contrived – I’ll be the first to admit. But if the incentives were to exist for this to be worth someone’s time for some system, I think it would be likely to see it come up eventually.