Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on Aug 12th, 2012  |  syntax: None  |  size: 0.96 KB  |  hits: 15  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. How do I get raw text with beautifulsoup?
  2. <link>
  3. www.link1.com
  4. </link>
  5. <link>
  6. www.link2.com
  7. </link>
  8.        
  9. from BeautifulSoup import BeautifulStoneSoup
  10. soup = BeautifulStoneSoup(results2)     #Beautiful Soup
  11. linklist = soup.findAll('link')
  12. print soup
  13.        
  14. [<link>www.link1.com</link>,<link>www.link2.com</link>]
  15.        
  16. [www.link1.com, www.link2.com]
  17.        
  18. linklist = [el.string for el in soup.findAll('link')]
  19.        
  20. from bs4 import BeautifulSoup
  21.  
  22. xml = """<html><link>
  23. www.link1.com
  24. </link>
  25. <link>
  26. www.link2.com
  27. </link></html>"""
  28.  
  29. soup = BeautifulSoup(xml,features="xml")
  30. linklist = soup.find_all('link')
  31. linklist = map(lambda x: x.string, linklist)
  32.        
  33. links = soup.find_all('link')
  34. link_strings = [s.string for s in links.string]
  35.        
  36. for l in linklist:
  37.     print str(l.split('>')[1].split('<')[0])
  38.        
  39. >>> linklist=["<link>www.google.com</link>", "<link>www.yahoo.com</link>"]
  40. >>> for l in linklist:
  41. ...      print str(l.split('>')[1].split('<')[0])
  42. ...
  43. www.google.com
  44. www.yahoo.com