
Untitled
By: a guest on
Jun 13th, 2012 | syntax:
None | size: 1.23 KB | hits: 14 | expires: Never
finding a tag based on what it surrounds (using beautifulsoup)
<td class="1">test1</td>
<td>test2</td>
<td class="3"><a href="/">test3</a></td>
<td><div class="test4"><a class="test4" href="/">test4</a></div></td>
<td><div class="test4"><a class="test4" href="/">test4</a></div></td>
soup.findAll("td")
soup.findAll("a", {"class":"test4"})
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('''<td class="1">test1</td>
... <td>test2</td>
... <td class="3"><a href="/">test3</a></td>
... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
... <td><div class="test4"><a class="test4" href="/">test4</a></div></td>
... ''')
>>> [tag.parent for tag in soup.findAll(attrs = {"class": "test4"})
... if tag.name in ['a', 'div'] and tag.parent.name == 'td']
[<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]
>>> tdList = []
>>> for td in soup.findAll('td'):
... for div in td.findAll('div',{'class':'test4'}):
... tdList.append(div.parent)
...
>>> tdList
[<td><div class="test4"><a class="test4" href="/">test4</a></div></td>, <td><div class="test4"><a class="test4" href="/">test4</a></div></td>]