Advertisement
sbmonzur

WebScraping

Mar 10th, 2021
87
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.96 KB | None | 0 0
  1. #I scraped the links using beautiful soup (code not included here), and then from those links downloaded the specific html content of the articles I was interested in (titles, dates, names of contributor, main texts) and stored that information in a list. I then saved the list to a text file.
  2.  
  3. for link in urlsPA:
  4.     specificpagePA=requests.get(link) #making a get request and stores the response in an object
  5.     rawAddPagePA=specificpagePA.text # read the content of the server’s response
  6.     PASoup2=BeautifulSoup(rawAddPagePA) # parse the response into an HTML tree
  7.     PAcontent=PASoup2.find_all(class_=["story-element story-element-text", "time-social-share-wrapper storyPageMetaData-m__time-social-share-wrapper__2-RAX", "headline headline-type-9 story-headline bn-story-headline headline-m__headline__3vaq9 headline-m__headline-type-9__3gT8S", "contributor-name contributor-m__contributor-name__1-593"])
  8.     print(PAcontent)
  9.     PAlist.append(PAcontent)
  10.    
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement