Guest User

Untitled

a guest
Jul 16th, 2018
93
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.46 KB | None | 0 0
  1. require 'socket'
  2. require 'open-uri'
  3.  
  4. require 'rubygems'
  5. require 'servolux'
  6. require 'nokogiri'
  7.  
  8. PATH = '/tmp/web_fetch.socket'
  9.  
  10. # Create a UNIX socket at the tmp PATH
  11. # Runs once in the parent; all forked children inherit the socket's
  12. # file descriptor.
  13. $acceptor = UNIXServer.new(PATH)
  14.  
  15. # This module defines the process our forked workers will run. It listens on
  16. # the socket and expects a single URL. It will then fetch this URL and parse
  17. # the contents using nokogiri.
  18. module WebFetch
  19. def execute
  20. if IO.select([$acceptor], nil, nil, 2)
  21. socket, addr = $acceptor.accept_nonblock
  22. url = socket.gets
  23. socket.close
  24.  
  25. doc = Nokogiri::HTML(open(url)) { |config| config.noblanks.noent }
  26. $stderr.puts "child #$$ processed #{url}"
  27. $stderr.flush
  28. end
  29. rescue Errno::EAGAIN, Errno::ECONNABORTED, Errno::EPROTO, Errno::EINTR
  30. end
  31.  
  32. def after_executing
  33. $acceptor.close
  34. end
  35. end
  36.  
  37. # Spin up a pool of these workers
  38. pool = Servolux::Prefork.new(:module => WebFetch)
  39. pool.start 3
  40.  
  41. # 'urls.txt' is a simple text file with one URL per line
  42. urls = File.readlines('urls.txt')
  43.  
  44. begin
  45. # Keeping sending URLs to the workers until we have run out of URLs
  46. until urls.empty?
  47. client = UNIXSocket.open(PATH)
  48. client.puts urls.shift
  49. client.close
  50. end
  51.  
  52. rescue Errno::ECONNREFUSED
  53. retry
  54.  
  55. ensure
  56. # Give the workers time to complete their current task and then stop the pool
  57. sleep 5
  58.  
  59. pool.stop
  60. $acceptor.close
  61.  
  62. File.unlink if File.socket?(PATH)
  63. end
Add Comment
Please, Sign In to add comment