MinHash — How To Deal With Finding Similarity At Scale With Python Code To Get Started
MinHash is a technique for estimating the similarity between two sets. It was first introduced in information retrieval to evaluate the similarity between documents quickly. The basic idea is to hash the elements of the sets and then take the minimum hash value as a representation of the set. Because the minimum value is used, the technique is called MinHash
more details