Advertisement
demondownload

Scrape Given and Family Names from Wikipedia

Feb 8th, 2023
1,141
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Ruby 1.00 KB | Source Code | 0 0
  1. require 'uri'
  2. require 'net/http'
  3. require 'json'
  4.  
  5. def fetch_names(filepath:, category:, query: {})
  6.   uri = URI.parse("https://en.wikipedia.org/w/api.php")
  7.   full_query = {
  8.     action: 'query',
  9.     list: 'categorymembers',
  10.     cmtitle: category,
  11.     cmlimit: '500',
  12.     cmtype: 'page',
  13.     format: 'json'
  14.   }.merge(query)
  15.   uri.query = URI.encode_www_form(full_query)
  16.  
  17.   req = Net::HTTP::Get.new(uri)
  18.   res = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true){|http| http.request(req) }
  19.  
  20.   body = JSON.parse(res.body)
  21.  
  22.   File.open(filepath, 'a') do |file|
  23.     file.write(body.dig('query', 'categorymembers').map{|n| n['title'].gsub(/ (.*)/, '')}.uniq.join("\n"))
  24.     file.write("\n")
  25.   end
  26.  
  27.   cmcontinue = body.dig('continue', 'cmcontinue')
  28.   if cmcontinue
  29.     query = { cmcontinue: }
  30.     fetch_names(filepath:, query:, category:)
  31.   end
  32. end
  33.  
  34. fetch_names(filepath: './given_names.txt', category: 'Category:Given_names')
  35. fetch_names(filepath: './surnames.txt', category: 'Category:Surnames')
  36.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement