Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on Apr 30th, 2012  |  syntax: None  |  size: 1.50 KB  |  hits: 27  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. RSS feed scraping with Python
  2. from urllib import urlopen
  3. from BeautifulSoup import BeautifulSoup
  4. import re
  5.  
  6. source  = urlopen('http://feeds.huffingtonpost.com/huffingtonpost/raw_feed').read()
  7.  
  8. title = re.compile('<title>(.*)</title>')
  9. link = re.compile('<link>(.*)</link>')
  10.  
  11. find_title = re.findall(title, source)
  12. find_link = re.findall(link, source)
  13.  
  14. literate = []
  15. literate[:] = range(1, 16)
  16.  
  17. for i in literate:
  18.     print find_title[i]
  19.     print find_link[i]
  20.        
  21. >>> link = re.compile('<link rel="alternate" type="text/html" href=(.*)')
  22. >>> find_link = re.findall(link, source)
  23. >>> find_link[1].strip()
  24. '"http://www.huffingtonpost.com/andrew-brandt/the-peyton-predicament-pa_b_1271834.html" />'
  25. >>> len(find_link)
  26. 15
  27. >>>
  28.        
  29. #!/usr/bin/env python
  30. import feedparser # pip install feedparser
  31.  
  32. d = feedparser.parse('http://feeds.huffingtonpost.com/huffingtonpost/latestnews')
  33. # .. skipped handling http errors, cacheing ..
  34.  
  35. for e in d.entries:
  36.     print(e.title)
  37.     print(e.link)
  38.     print(e.description)
  39.     print("n") # 2 newlines
  40.        
  41. Even Critics Of Safety Net Increasingly Depend On It
  42. http://www.huffingtonpost.com/2012/02/12/safety-net-benefits_n_1271867.html
  43. <p>Ki Gulbranson owns a logo apparel shop, deals in
  44. <!-- ... snip ... -->
  45.  
  46. Christopher Cain, Atlanta Anti-Gay Attack Suspect, Arrested And
  47. Charged With Aggravated Assault And Robbery
  48. http://www.huffingtonpost.com/2012/02/12/atlanta-anti-gay-suspect-christopher-cain-arrested_n_1271811.html
  49. <p>ATLANTA -- Atlanta police have arrested a suspect
  50. <!-- ... snip ... -->