Advertisement
Guest User

Untitled

a guest
Dec 8th, 2014
259
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.38 KB | None | 0 0
  1. 0 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  2. 1 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  3. 2 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  4. 3 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  5. 4 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  6. 5 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  7. 6 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  8. 7 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
  9. 8 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
  10. 9 http://m.jnu.ac.kr/board/mboard.aspx?boardID=1...
  11. 10 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
  12. 11 http://m.jnu.ac.kr/subPage/subMenuList.aspx?ta...
  13. 12 http://m.jnu.ac.kr/
  14. 13 http://m.jnu.ac.kr/board/mboard.aspx?boardID=1...
  15. 14 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
  16. ...
  17. 18544 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
  18. 18545 http://m.blog.naver.com/snakeshower/140209383453
  19. 18546 http://m.blog.naver.com/snakeshower/140209383453
  20. 18547 http://m.search.naver.com/search.naver?where=m...
  21. 18548 http://m.search.naver.com/search.naver?where=m...
  22. 18549 http://m.search.naver.com/search.naver?sm=mtb_...
  23. 18550 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
  24. 18551 http://m.blog.naver.com/snakeshower/140209383453
  25. 18552 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
  26. 18553 http://m.search.naver.com/search.naver?where=m...
  27. 18554 http://m.search.naver.com/search.naver?where=m...
  28. 18555 http://m.search.naver.com/search.naver?where=m...
  29. 18556 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
  30. 18557 http://m.blog.naver.com/snakeshower/140209383453
  31. 18558 http://m.search.naver.com/search.naver?sm=mtb_...
  32. Name: url, Length: 18559, dtype: object
  33.  
  34. def distance(url1, url2):
  35. ratio = SequenceMatcher(None, url1, url2).ratio()
  36. return 1.0 - ratio
  37.  
  38. hc = HierarchicalClustering(url, distance)
  39. clusters = hc.getlevel(0.2)
  40. pprint.pprint(clusters)
  41.  
  42. def clusters(urls):
  43. ratio = SequenceMatcher(None, url1, url2).ratio()
  44. distance = 1.0 - ratio
  45. hd = HierarchicalClustering(urls, distance)
  46. clusters = hc.getlevel(0.2)
  47. return clusters
  48.  
  49. def multiProcessing():
  50. p = multiprocessing.Pool(4)
  51. p.map(clusters,urls)
  52.  
  53. if __name__ == '__main__':
  54. multiProcessing()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement