Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 0 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 1 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 2 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 3 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 4 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 5 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 6 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 7 http://m.facebook.com/l.php?u=http%3A%2F%2Fwww...
- 8 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
- 9 http://m.jnu.ac.kr/board/mboard.aspx?boardID=1...
- 10 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
- 11 http://m.jnu.ac.kr/subPage/subMenuList.aspx?ta...
- 12 http://m.jnu.ac.kr/
- 13 http://m.jnu.ac.kr/board/mboard.aspx?boardID=1...
- 14 http://m.jnu.ac.kr/board/mboard.aspx?boardID=13
- ...
- 18544 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
- 18545 http://m.blog.naver.com/snakeshower/140209383453
- 18546 http://m.blog.naver.com/snakeshower/140209383453
- 18547 http://m.search.naver.com/search.naver?where=m...
- 18548 http://m.search.naver.com/search.naver?where=m...
- 18549 http://m.search.naver.com/search.naver?sm=mtb_...
- 18550 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
- 18551 http://m.blog.naver.com/snakeshower/140209383453
- 18552 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
- 18553 http://m.search.naver.com/search.naver?where=m...
- 18554 http://m.search.naver.com/search.naver?where=m...
- 18555 http://m.search.naver.com/search.naver?where=m...
- 18556 http://cr.naver.com/rd?m=1&px=40&py=41&sx=40&s...
- 18557 http://m.blog.naver.com/snakeshower/140209383453
- 18558 http://m.search.naver.com/search.naver?sm=mtb_...
- Name: url, Length: 18559, dtype: object
- def distance(url1, url2):
- ratio = SequenceMatcher(None, url1, url2).ratio()
- return 1.0 - ratio
- hc = HierarchicalClustering(url, distance)
- clusters = hc.getlevel(0.2)
- pprint.pprint(clusters)
- def clusters(urls):
- ratio = SequenceMatcher(None, url1, url2).ratio()
- distance = 1.0 - ratio
- hd = HierarchicalClustering(urls, distance)
- clusters = hc.getlevel(0.2)
- return clusters
- def multiProcessing():
- p = multiprocessing.Pool(4)
- p.map(clusters,urls)
- if __name__ == '__main__':
- multiProcessing()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement