Guest User

Untitled

a guest
Nov 6th, 2013
93
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Ruby 0.68 KB | None | 0 0
  1. def crawl(lin,level)
  2.   begin
  3.     if ((level <= 4) and !(lin.include? "javascript") and !(lin.to_s.include? "https") and (lin.to_s.include? "http://www.mysite.it") and (lin.include? "http") and filter_by_accent(lin))
  4.       puts "visiting link: " + lin
  5.       page = @agent.get(lin)
  6.       if (page.parser.xpath("//*[@id=\"headerTempoConsegna\"]/text()").to_s.include?("Tempo di consegna medio"))
  7.         @crawler.getDataFrom(lin,page)
  8.       else
  9.         page.links.each { |a|
  10.           crawl(a.href.to_s,(level+1))                
  11.         }
  12.      
  13.      
  14.       end
  15.  
  16.     end
  17.  
  18.   rescue Net::HTTPNotFound => e
  19.       puts "Exception raised... continue on another link"
  20.    
  21.   end
  22. end
Advertisement
Add Comment
Please, Sign In to add comment