Advertisement
andy_shev

Yandex (vesna.yandex.ru) parser example

Feb 26th, 2013
545
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.86 KB | None | 0 0
  1. #!/usr/bin/python -tt
  2. # -*- coding: UTF-8 -*-
  3. # vim: ts=4 sw=4 et ai si
  4.  
  5. from lxml import etree
  6.  
  7. url = "http://vesna.yandex.ru/all.xml?mix=astronomy%2Cgeology%2Cgyroscope%2Cliterature%2Cmarketing%2Cmathematics%2Cmusic%2Cpolit%2Cagrobiologia%2Claw%2Cpsychology%2Cgeography%2Cphysics%2Cphilosophy%2Cchemistry%2Cestetica&astronomy=on&geology=on&gyroscope=on&literature=on&marketing=on&mathematics=on&music=on&polit=on&agrobiologia=on&law=on&psychology=on&geography=on&physics=on&philosophy=on&chemistry=on&estetica=on"
  8.  
  9. parser = etree.HTMLParser(recover=True)
  10. doc = etree.parse(url, parser)
  11.  
  12. # <td colspan="9" class="text"><div style="min-height:333px; height:expression('333px');">
  13. e = doc.xpath('//td[@colspan="9"]/div')[0]
  14.  
  15. # only text of the children
  16. for x in e.iterchildren():
  17.     print x.text
  18.  
  19. # entire structure
  20. print etree.tounicode(e, pretty_print=True)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement