Advertisement
Guest User

Untitled

a guest
Mar 28th, 2017
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.35 KB | None | 0 0
  1.     def simhash(self, texts):
  2.         for text in texts:
  3.             hashes = [bin(int(md5(word).hexdigest(), 16))[2:].zfill(128) for word in text.split()]
  4.  
  5.             sh = [2 * temp.count("1") - len(temp) for temp in zip(*hashes)]
  6.             sh = ''.join(map(str, [1 if sh[i] >= 0 else 0 for i in range(len(sh))]))
  7.  
  8.             self.shs.append(sh)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement