SHARE
TWEET

scrape 1000 random movies from imdb, create graph (actors ..

dirknbr Oct 4th, 2012 122 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1.  
  2. import requests
  3. import re
  4. import random
  5.  
  6. st=1000000
  7. en=1345836
  8. n=1000
  9. done=set()
  10.  
  11. f1=open('data2.csv','w')
  12.  
  13. s='href="/genre/[A-Za-z0-9/]+"|href="/name/[A-Za-z0-9/]+"|href="/keyword/[A-Za-z0-9/]+"|href="/country/[A-Za-z0-9/]+"|href="/language/[A-Za-z0-9/]+"|href="/company/[A-Za-z0-9/]+"'
  14.  
  15. for i in range(n):
  16.     r=int(st+(en-st)*random.random())
  17.     while r in done:
  18.         r=int(st+(en-st)*random.random())
  19.         done.add(r)
  20.     url='http://www.imdb.com/title/tt'+str(r)
  21.     resp=requests.get(url)
  22.     print url,resp.status_code
  23.     m=re.findall(s,resp.text)
  24.     li=[]
  25.     for m2 in m:
  26.         m3=m2[7:len(m2)-1]
  27.         if m3 not in li:
  28.             li.append(m3)
  29.     f1.write(url+':'+' '.join(li)+'\n')
  30.  
  31. f1.close()
RAW Paste Data
Top