Advertisement
Guest User

Untitled

a guest
May 24th, 2017
66
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.53 KB | None | 0 0
  1. from lsirina.lsi.similarity import calc_similarity
  2.  
  3. d1 = "Shipment of gold damaged in a fire"
  4. d2 = "Delivery of silver arrived in a silver truck"
  5. d3 = "Shipment of gold arrived in a truck"
  6. query = "gold silver truck"
  7.  
  8. docs = [d1,d2,d3]
  9. tokenized_doc = [d.split() for d in docs]
  10.  
  11. sim = calc_similarity(query, tokenized_doc)
  12. sort_by_most_valid = filter(lambda x: x[1] > 0, sorted(enumerate(sim), key=lambda item: -item[1]))
  13.  
  14. for index, sim in sort_by_most_valid:
  15. print "dokumen %s: %s, nilai similaritas: %s" % (index+1, docs[index], sim)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement