The probability of a hash collision

learnbyexample@programming.dev · 1 year ago

The probability of a hash collision

squaresinger@lemmy.world · 1 year ago

I see what you are saying. But if you aren’t using a cryptographic hash function then collisions don’t matter in your use case anyway, otherwise you’d be using a cryptographic hash function.

For example, you’d use a non-cryptographic hash function for a hashmap. While collisions aren’t exactly desireable in that use case, they also aren’t bad and in fact, the whole process is designed with them in mind. And it doesn’t matter at all that the distribution might not be perfect.

So when we are talking about a context where collisions matter, there’s no question whether you should use a cryptographic hash or not.

tyler@programming.dev · 1 year ago

Why wouldn’t collisions matter in a hash map? They’re directly attributable to the speed of the hash map. In fact I would venture to say that collisions are directly attributable to speed in all situations. That matters, right? Especially at the language level.

squaresinger@lemmy.world · 1 year ago

If you have a hash collision in a cryptography context, you have a broken system. E.g. MD5 became useless for validating files, because anyone can create collisions without a ton of effort, and thus comparing an MD5 sum doesn’t tell you whether you have an unmodified file or not.

On a hash map collisions are part of the system. Sure, you’d like to not have collisions if possible, but if not then you’ll just have two values in the same bucket, no big issue.

In fact, having a more complex hashing algorithm that would guarantee that there are no collisions will likely hurt your performance more because calculating the hash will take so long.