
Untitled
By: a guest on
Aug 1st, 2012 | syntax:
None | size: 1.85 KB | hits: 11 | expires: Never
Regular expression pattern for content within HTML tags
import urllib2
import re
request = urllib2.urlopen('http://www.securitytube.net/')
content = request.read()
match = re.findall(r'<a href=".w+.d+">.+</a>', content)
if match:
for i in match:
print i + "n"
else:
print 'Not Found!'
<a href="/video/3878"><img class="corner iradius20 ishadow33" width="100" heigh
t="75" src="http://videothumbs.securitytube.net.s3.amazonaws.com/3878.jpg" alt=
"avatar" /></a>
<a href="/video/3878">NodeZero Linux Review</a>
<a href="/video/3877"><img class="corner iradius20 ishadow33" width="100" heigh
t="75" src="http://videothumbs.securitytube.net.s3.amazonaws.com/3877.jpg" alt=
"avatar" /></a>
<a href="/video/3877">Post Attack Uploading Shell in Real Time</a>
<a href="/video/3867"><img class="corner iradius20 ishadow33" width="100" heigh
t="75" src="http://videothumbs.securitytube.net.s3.amazonaws.com/3867.jpg" alt=
"avatar" /></a>
<a href="/video/3867">Using SQLMAP in Real Time (SQLinjection WEB)</a>
<a href="/video/3866"><img class="corner iradius20 ishadow33" width="100" heigh
t="75" src="http://videothumbs.securitytube.net.s3.amazonaws.com/3866.jpg" alt=
"avatar" /></a>
....
...
...
(...)
request = urllib2.urlopen('http://www.securitytube.net/')
content = request.read()
match = re.findall(r'<a href="(.*?)".*>(.*)</a>', content)
if match:
for link, title in match:
print "link %s -> %s" % (link, title)
link /video/3822 -> SecurityTube SpeakUp: Cloud Computing
link /video/3587 ->
link /video/3587 -> Securitytube Speak Up: Antivirus Evasion attacks
link /video/3489 ->
link /video/3489 -> SecurityTube SpeakUp: ThePirateBay LOSS
link /video/3375 ->
link /video/3375 -> SecurityTube SpeakUp: .COM and .NET Domain Seizures
link /video/3130 ->
link /video/3130 -> SecurityTube Speak Up: The MS12-020 Fiasco!
...etc