szabozoltan69

scrape_pdfs.log

Sep 8th, 2021 (edited)
159
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. 2021-09-06 02:10:05 INFO     Starting PDF scraping.
  2. [2021-09-06 02:10:05,767] scrape_pdfs: INFO - Starting PDF scraping.
  3. 2021-09-06 02:10:05 INFO     Getting document list.
  4. [2021-09-06 02:10:05,769] scrape_pdfs: INFO - Getting document list.
  5. 2021-09-06 02:10:06 INFO     Count of new epoa documents: 0
  6. [2021-09-06 02:10:06,239] scrape_pdfs: INFO - Count of new epoa documents: 0
  7. 2021-09-06 02:10:06 INFO     Starting to process PDFs.
  8. [2021-09-06 02:10:06,239] scrape_pdfs: INFO - Starting to process PDFs.
  9. 2021-09-06 02:10:06 INFO     Getting document list.
  10. [2021-09-06 02:10:06,240] scrape_pdfs: INFO - Getting document list.
  11. 2021-09-06 02:10:06 INFO     Count of new ou documents: 0
  12. [2021-09-06 02:10:06,520] scrape_pdfs: INFO - Count of new ou documents: 0
  13. 2021-09-06 02:10:06 INFO     Starting to process PDFs.
  14. [2021-09-06 02:10:06,521] scrape_pdfs: INFO - Starting to process PDFs.
  15. 2021-09-06 02:10:06 INFO     Getting document list.
  16. [2021-09-06 02:10:06,521] scrape_pdfs: INFO - Getting document list.
  17. 2021-09-06 02:10:06 INFO     Count of new fr documents: 0
  18. [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Count of new fr documents: 0
  19. 2021-09-06 02:10:06 INFO     Starting to process PDFs.
  20. [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Starting to process PDFs.
  21. 2021-09-06 02:10:06 INFO     Getting document list.
  22. [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Getting document list.
  23. 2021-09-06 02:10:07 INFO     Count of new ea documents: 0
  24. [2021-09-06 02:10:07,010] scrape_pdfs: INFO - Count of new ea documents: 0
  25. 2021-09-06 02:10:07 INFO     Starting to process PDFs.
  26. [2021-09-06 02:10:07,013] scrape_pdfs: INFO - Starting to process PDFs.
  27. 2021-09-06 02:10:07 INFO     Processing PDFs finished.
  28. [2021-09-06 02:10:07,014] scrape_pdfs: INFO - Processing PDFs finished.
  29. 2021-09-06 02:10:07 INFO     Starting data-cleaning.
  30. [2021-09-06 02:10:07,014] scrape_pdfs: INFO - Starting data-cleaning.
  31. 2021-09-06 02:10:07 INFO     Adding new EPoA records to DB (count: 0)
  32. [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new EPoA records to DB (count: 0)
  33. 2021-09-06 02:10:07 INFO     Adding new OU records to DB (count: 0)
  34. [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new OU records to DB (count: 0)
  35. 2021-09-06 02:10:07 INFO     Adding new FR records to DB (count: 0)
  36. [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new FR records to DB (count: 0)
  37. 2021-09-06 02:10:07 INFO     Adding new EA records to DB (count: 0)
  38. [2021-09-06 02:10:07,016] scrape_pdfs: INFO - Adding new EA records to DB (count: 0)
  39. 2021-09-06 02:10:07 INFO     Finished the PDF scraping.
  40. [2021-09-06 02:10:07,031] scrape_pdfs: INFO - Finished the PDF scraping.
  41. 2021-09-07 02:10:07 INFO     Starting PDF scraping.
  42. [2021-09-07 02:10:07,870] scrape_pdfs: INFO - Starting PDF scraping.
  43. 2021-09-07 02:10:07 INFO     Getting document list.
  44. [2021-09-07 02:10:07,871] scrape_pdfs: INFO - Getting document list.
  45. 2021-09-07 02:10:08 WARNING  instance does not have an es_id() method
  46. [2021-09-07 02:10:08,305] elasticsearch: WARNING - instance does not have an es_id() method
  47. 2021-09-07 02:10:08 WARNING  instance does not have an es_id() method
  48. [2021-09-07 02:10:08,308] elasticsearch: WARNING - instance does not have an es_id() method
  49. Traceback (most recent call last):
  50.   File "/home/ifrc/go-api/manage.py", line 22, in <module>
  51.     execute_from_command_line(sys.argv)
  52.   File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
  53.     utility.execute()
  54.   File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
  55.     self.fetch_command(subcommand).run_from_argv(self.argv)
  56.   File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
  57.     self.execute(*args, **cmd_options)
  58.   File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
  59.     output = self.handle(*args, **options)
  60.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 621, in handle
  61.     urls_with_filenames = self.get_documents(pdf_type)
  62.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 135, in get_documents
  63.     return get_documents_for(TYPE_URLS[pdf_type], pdf_type, db_set)
  64.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 112, in get_documents_for
  65.     items = xmltodict.parse(response.content)['rss']['channel']['item']
  66.   File "/usr/local/lib/python3.6/site-packages/xmltodict.py", line 330, in parse
  67.     parser.Parse(xml_input, True)
  68. xml.parsers.expat.ExpatError: not well-formed (invalid token): line 101, column 76
  69. 2021-09-08 02:10:06 INFO     Starting PDF scraping.
  70. [2021-09-08 02:10:06,319] scrape_pdfs: INFO - Starting PDF scraping.
  71. 2021-09-08 02:10:06 INFO     Getting document list.
  72. [2021-09-08 02:10:06,320] scrape_pdfs: INFO - Getting document list.
  73. Traceback (most recent call last):
  74.   File "/home/ifrc/go-api/manage.py", line 22, in <module>
  75.     execute_from_command_line(sys.argv)
  76.   File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
  77.     utility.execute()
  78.   File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
  79.     self.fetch_command(subcommand).run_from_argv(self.argv)
  80.   File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
  81.     self.execute(*args, **cmd_options)
  82.   File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
  83.     output = self.handle(*args, **options)
  84.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 621, in handle
  85.     urls_with_filenames = self.get_documents(pdf_type)
  86.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 135, in get_documents
  87.     return get_documents_for(TYPE_URLS[pdf_type], pdf_type, db_set)
  88.   File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 112, in get_documents_for
  89.     items = xmltodict.parse(response.content)['rss']['channel']['item']
  90.   File "/usr/local/lib/python3.6/site-packages/xmltodict.py", line 330, in parse
  91.     parser.Parse(xml_input, True)
  92. xml.parsers.expat.ExpatError: not well-formed (invalid token): line 101, column 76
  93.  
RAW Paste Data Copied