Advertisement
Guest User

Untitled

a guest
Nov 8th, 2019
121
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.66 KB | None | 0 0
  1. import re
  2.  
  3. pattern = re.compile("(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?")
  4.  
  5. webList = []
  6.  
  7. ##########################
  8. ## Change Input File Here
  9. ##
  10.  
  11. INPUT_PATH = "test.html"
  12.  
  13. ##
  14. ## Change Output File Here
  15. ##
  16.  
  17. OUTPUT_PATH = "results.txt"
  18.  
  19. ##
  20. ## Change Character Limit Here
  21. ##
  22.  
  23. CHAR_LIMIT = 37
  24.  
  25. ##
  26. ##########################
  27.  
  28. # Append Input to Array
  29. for i, line in enumerate(open(INPUT_PATH)):
  30.     for match in re.finditer(pattern, line):
  31.         webList.append(match.group()[0:CHAR_LIMIT])
  32.  
  33. with open(OUTPUT_PATH, "a") as myfile:
  34.     for i in webList:
  35.         myfile.write(i + '\n')
  36.     myfile.close()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement