Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- [<div class="views-field views-field-field-overigeonderdelen"> <span class="views-label views-label-field-overigeonderdelen">Nevenvestiging: </span> <div class="field-content"><div class="wrapper hidden">
- <p>Hak Industrial Services B.V., Hoogeveen<br/>Nederland<br/> blabla useless data<br/></p><hr/>
- Hak Industrial Services B.V., Nieuw Heeten<br/>Nederland<br/>blabla useless data<br/><hr/>
- Hak Industrial Services Middle East LLC, Abu Dhabi<br/>Verenigde Arabische Emiraten<br/>blabla useless data<br/><hr/>
- Hak Industrial Services SEA Sdn. Bhd., Petaling Jaya, Selangor<br/>Maleisië<br/>blabla useless data<br/><hr/>
- Hak Industrial Services USLLC, Houston<br/>Verenigde Staten van Amerika<br/>blabla useless data<br/><hr/>
- </div>
- <a class="toggle" href="#">Toon nevenvestigingen</a></div> </div>]
- [Hak Industrial Services B.V., Hak Industrial Services B.V., Hak Industrial Services Middle East LLC, Hak Industrial Services SEA Sdn. Bhd., Hak Industrial Services USLLC]
- [Nederland, Nederland, Verenigde Arabische Emiraten, Maleisië, Verenigde Staten van Amerika]
- from bs4 import BeautifulSoup as bs
- def parser(data):
- # this will parse the data from ticket and create a list.
- html = data
- parsed = bs(html, "lxml")
- data = [line.strip() for line in parsed.stripped_strings]
- print data
- [u'[', u'Nevenvestiging:', u'Hak Industrial Services B.V., Hoogeveen', u'Nederland', u'blabla useless data', u'Hak Industrial Services B.V., Nieuw Heeten', u'Nederland', u'blabla useless data', u'Hak Industrial Services Middle East LLC, Abu Dhabi', u'Verenigde Arabische Emiraten', u'blabla useless data', u'Hak Industrial Services SEA Sdn. Bhd., Petaling Jaya, Selangor', u'Maleisixeb', u'blabla useless data', u'Hak Industrial Services USLLC, Houston', u'Verenigde Staten van Amerika', u'blabla useless data', u'Toon nevenvestigingen', u']']
Add Comment
Please, Sign In to add comment