Advertisement
Guest User

Untitled

a guest
Dec 27th, 2014
128
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.35 KB | None | 0 0
  1. Traceback (most recent call last):
  2. File "scrapewaybackblog.py", line 17, in <module>
  3. daypos = byline.find(re.compile("[A-Z][a-z]*s"))
  4. TypeError: expected a character buffer object
  5.  
  6. for i in xrange(3, 1, -1):
  7. page = urllib2.urlopen("http://web.archive.org/web/20090204221349/http://www.americansforprosperity.org/nationalblog?page={}".format(i))
  8. soup = BeautifulSoup(page.read())
  9. snippet = soup.find_all('div', attrs={'class': 'blog-box'})
  10. for div in snippet:
  11. byline = div.find('div', attrs={'class': 'date'}).text.encode('utf-8')
  12. text = div.find('div', attrs={'class': 'right-box'}).text.encode('utf-8')
  13.  
  14. monthpos = byline.find(",")
  15. daypos = byline.find(re.compile("[A-Z][a-z]*s"))
  16. yearpos = byline.find(re.compile("[A-Z][a-z]*Dd*w*s"))
  17. endpos = monthpos + len(byline)
  18.  
  19. month = byline[monthpos+1:daypos]
  20. day = byline[daypos+0:yearpos]
  21. year = byline[yearpos+2:endpos]
  22.  
  23. output_files_pathname = 'Data/' # path where output will go
  24. new_filename = year + month + day + ".txt"
  25. outfile = open(output_files_pathname + new_filename,'w')
  26. outfile.write(date)
  27. outfile.write("n")
  28. outfile.write(text)
  29. outfile.close()
  30. print "finished another url from page {}".format(i)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement