Advertisement
gronke

scraper

May 9th, 2014
263
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.39 KB | None | 0 0
  1. import requests
  2. import lxml.html
  3. import cssselect
  4.  
  5. req = requests.get('http://tdcj.state.tx.us/unit_directory/')
  6. root = lxml.html.fromstring(req.text)
  7.  
  8. tables = root.cssselect('table')
  9. table = tables[0]
  10.  
  11. rows = table.cssselect('tr')
  12. rows = rows[1:]
  13.  
  14. for row in rows:
  15.     cells = row.cssselect('td')
  16.     print cells[0].text_content()
  17.     print cells[0].get('href')
  18.     raw_input()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement