Advertisement
Guest User

Untitled

a guest
Apr 23rd, 2019
292
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.76 KB | None | 0 0
  1. '''
  2. scrapes 300 http(s) proxies to ip:port from https://free-proxy-list.net
  3. '''
  4.  
  5. from requests_html import HTMLSession
  6.  
  7. session = HTMLSession()  # session appears to be currently required for a single get request with this library?
  8.  
  9. cells = session.get('https://free-proxy-list.net').html.find('td')  # table cells
  10.  
  11. s = ''  # string as data stream to then parse
  12.  
  13. for cell in cells:
  14.     c = cell.text
  15.     if not c.lower().islower():  # lowercase all letters and then check if islower to determine if the cell contains letters (only ip and port cells will remain)
  16.         if '.' in c: c = 'proxy' + c + ':'  # ip's will have .
  17.         s += c  # string together to be parsed
  18.  
  19. p = s.replace('proxy', '\n')  # proxyip:portproxyip:portproxyip:port -> ip:port\nip:port\n...
  20.  
  21. print(p)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement