Advertisement
Guest User

archive_api.rb

a guest
Aug 12th, 2018
85
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Ruby 1.25 KB | None | 0 0
  1. module ArchiveAPI
  2.  
  3.     def get_raw_list_from_api url, page_index
  4.         #request_url = "http://web.archive.org/cdx/search/xd?url="
  5.         request_url = "http://web.archive.org/cdx/search/cdx?url="
  6.         request_url += url
  7.         request_url += parameters_for_api page_index
  8.  
  9.         #open(request_url).read
  10.         begin
  11.             puts "\n" + request_url + "\n"
  12.             open(request_url, "Pragma" => "no-cache", "Cache-Control" => "no-cache", "User-Agent" => "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36", "Upgrade-Insecure-Requests" => "1", "Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "Accept-Encoding" => "identity").read
  13.         rescue OpenURI::HTTPError => e
  14.             puts "try"
  15.             retry
  16.         end
  17.  
  18.     end
  19.  
  20.     def parameters_for_api page_index
  21.         parameters = "&fl=timestamp,original&collapse=digest&matchType=domain&gzip=false"
  22.         if @all
  23.             parameters += ""
  24.         else
  25.             parameters += "&filter=statuscode:200"
  26.         end
  27.         if @from_timestamp and @from_timestamp != 0
  28.             parameters += "&from=" + @from_timestamp.to_s
  29.         end
  30.         if @to_timestamp and @to_timestamp != 0
  31.             parameters += "&to=" + @to_timestamp.to_s
  32.         end
  33.         if page_index
  34.             parameters += "&page=#{page_index}"
  35.         end
  36.         parameters
  37.     end
  38.  
  39. end
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement