Advertisement
MyNamesNotReallyDave

u/aggiefury101

Jan 22nd, 2020
189
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.27 KB | None | 0 0
  1. #! python
  2.  
  3. """Simple script to iterate over a csv file of URLs and convert those pages to PDF.
  4. Created for u/aggiefury101 on Reddit by u/MyNamesNotReallyDave.
  5. Tested with a csv file of randomly generated Google searches ([0,https://www.google.com/search?q=833]).
  6. """
  7.  
  8. # Necessary imports; dependant on WKHTMLTOPDF being installed
  9. import csv, pdfkit
  10.  
  11. # Additionally import os for filepath management
  12. import os
  13.  
  14. # Input file assumes the csv is in the same folder; provide the absolute path if necessary
  15. INPUT_FILE = 'urls.csv'
  16.  
  17. # Downloaded PDFs will be stored in the 'PDFs' sub-directory; provide the absolute path if necessary
  18. OUTPUT_PATH = os.path.abspath('.\\PDFs')
  19.  
  20. # Context manager for the input file
  21. with open(INPUT_FILE, 'r') as file:
  22.  
  23.     # Create the csv Reader Object
  24.     reader = csv.reader(file, delimiter=',')
  25.  
  26.     # Iterate over each line in the csv
  27.     for row in reader:
  28.  
  29.         # Determines which column holds the URL; change as required
  30.         url = row[1]
  31.  
  32.         # Determines the unique filename (row[1][-3:]); I recommend the unique part of the URL but you could use a counter etc
  33.         output_filename = f'{OUTPUT_PATH}\\{row[1][-3:]}.pdf'
  34.  
  35.         # Run the pdfkit converter on the given URL
  36.         pdfkit.from_url(url, output_filename)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement