Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on Jun 13th, 2012  |  syntax: None  |  size: 1.23 KB  |  hits: 14  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. finding a tag based on what it surrounds (using beautifulsoup)
  2. <td class="1">test1</td>
  3. <td>test2</td>
  4. <td class="3"><a href="/">test3</a></td>
  5. <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
  6. <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
  7.        
  8. soup.findAll("td")
  9.        
  10. soup.findAll("a", {"class":"test4"})
  11.        
  12. >>> from BeautifulSoup import BeautifulSoup
  13. >>> soup = BeautifulSoup('''<td class="1">test1</td>
  14. ... <td>test2</td>
  15. ... <td class="3"><a href="/">test3</a></td>
  16. ... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
  17. ... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
  18. ... ''')
  19. >>> [tag.parent for tag in soup.findAll(attrs = {"class": "test4"})
  20. ...  if tag.name in ['a', 'div'] and tag.parent.name == 'td']
  21. [<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]
  22.        
  23. >>> tdList = []
  24. >>> for td in soup.findAll('td'):
  25. ...     for div in td.findAll('div',{'class':'test4'}):
  26. ...         tdList.append(div.parent)
  27. ...
  28. >>> tdList
  29. [<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]