Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on Jul 29th, 2012  |  syntax: None  |  size: 2.89 KB  |  hits: 13  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. require 'pdfkit'
  2. require 'nokogiri'
  3. require 'open-uri'
  4.  
  5. class Article do
  6.   attr_reader :url, :title, :content, :blog, :author, :published, :tags, :comments, :favorites, :rating
  7.  
  8.   def initialize options
  9.     @url       = options[:url]
  10.     @title     = options[:title]
  11.     @content   = options[:content]
  12.     @blog      = options[:blog]
  13.     @author    = options[:author]
  14.     @published = options[:published]
  15.     @tags      = options[:tags]
  16.     @comments  = options[:comments]
  17.     @favorites = options[:favorites]
  18.     @rating    = options[:rating]
  19.   end
  20. end
  21.  
  22. class Habr do
  23.   attr_reader :user, :favorites, :pages
  24.  
  25.   def initialize(user)
  26.     raise 'User is not set' unless user
  27.  
  28.     @user = user
  29.   end
  30.  
  31.   def url
  32.     'http://habrahabr.ru'
  33.   end
  34.  
  35.   def favorites_url
  36.     "#{url}/users/#{user}/favorites/"
  37.   end
  38.  
  39.   def pages
  40.     unless @pages
  41.       document = Nokogiri::HTML( open(favorites_url, 'User-Agent' => '') )
  42.  
  43.       @pages = document.at_css('#nav-pages a:last-child')['href'].match(/(\d+)\/$/)[1].to_i
  44.     else
  45.       @pages
  46.     end
  47.   end
  48.  
  49.   def favorites
  50.     @favorites ||= (1..pages).inject([]) do |articles, page|
  51.       document = Nokogiri::HTML( open(favorites_url + "page#{page}/", 'User-Agent' => '') )
  52.  
  53.       document.css('.post').each do |article|
  54.         header = article.css('.title')
  55.         bottom = article.css('.infopanel')
  56.  
  57.         options = {
  58.           url       : header.at_css('.post_title')['href'], # could it be simply #href here?
  59.           title     : header.at_css('.post_title').text,
  60.           content   : nil,
  61.           blog      : header.at_css('.blog_title').text
  62.           author    : bottom.at_css('.author').text
  63.           published : bottom.at_css('.published').text
  64.           tags      : article.at_css('.tags li').map{ |li| li.text }.join(', ')
  65.           comments  : bottom.at_css('.comments .all').text
  66.           favorites : bottom.at_css('.favs_count').text
  67.           rating    : bottom.at_css('.score').text
  68.         }
  69.  
  70.         pages << Article.new(options)
  71.       end
  72.     end
  73.   end
  74. end
  75.  
  76. username             = 'gmile'
  77. habr_url             = "http://habrahabr.ru/users/#{username}/favorites/"
  78. favorites_start_page = Nokogiri::HTML( open(habr_url, 'User-Agent' => '') )
  79. pages                = favorites_start_page.css('ul#nav-pages a:last-child').last['href'].match(/(\d+)\/$/)[1].to_i
  80.  
  81. puts "Fetching links from #{pages} pages..."
  82.  
  83.  
  84. puts links.inspect
  85.  
  86. x = Nokogiri::HTML::Builder.new(:encoding => 'UTF-8') { |doc|
  87.   doc.html {
  88.     doc.head {
  89.       doc.title "Interesting"
  90.     }
  91.  
  92.     doc.body {
  93.       links[1..1].each { |link|
  94.         article = Nokogiri::HTML( open(link, 'User-Agent' => '') )
  95.  
  96.         doc.h2 article.at_css('.title').text.strip
  97.         doc.h4 article.at_css('.author').text.strip
  98.         doc.h4 article.at_css('.published').text.strip
  99.         doc.parent << article.at_css('.content')
  100.       }
  101.     }
  102.   }
  103. }
  104.  
  105. puts 'HTML generation is done'
  106.  
  107. PDFKit.new(x.doc.to_s, :page_size => 'A4').to_file('out.pdf')