Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #I scraped the links using beautiful soup (code not included here), and then from those links downloaded the specific html content of the articles I was interested in (titles, dates, names of contributor, main texts) and stored that information in a list. I then saved the list to a text file.
- for link in urlsPA:
- specificpagePA=requests.get(link) #making a get request and stores the response in an object
- rawAddPagePA=specificpagePA.text # read the content of the server’s response
- PASoup2=BeautifulSoup(rawAddPagePA) # parse the response into an HTML tree
- PAcontent=PASoup2.find_all(class_=["story-element story-element-text", "time-social-share-wrapper storyPageMetaData-m__time-social-share-wrapper__2-RAX", "headline headline-type-9 story-headline bn-story-headline headline-m__headline__3vaq9 headline-m__headline-type-9__3gT8S", "contributor-name contributor-m__contributor-name__1-593"])
- print(PAcontent)
- PAlist.append(PAcontent)
- with open('listfile.txt', 'w') as filehandle:
- for listitem in PAlist:
- filehandle.write('%s\n' % listitem)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement