Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 2021-09-06 02:10:05 INFO Starting PDF scraping.
- [2021-09-06 02:10:05,767] scrape_pdfs: INFO - Starting PDF scraping.
- 2021-09-06 02:10:05 INFO Getting document list.
- [2021-09-06 02:10:05,769] scrape_pdfs: INFO - Getting document list.
- 2021-09-06 02:10:06 INFO Count of new epoa documents: 0
- [2021-09-06 02:10:06,239] scrape_pdfs: INFO - Count of new epoa documents: 0
- 2021-09-06 02:10:06 INFO Starting to process PDFs.
- [2021-09-06 02:10:06,239] scrape_pdfs: INFO - Starting to process PDFs.
- 2021-09-06 02:10:06 INFO Getting document list.
- [2021-09-06 02:10:06,240] scrape_pdfs: INFO - Getting document list.
- 2021-09-06 02:10:06 INFO Count of new ou documents: 0
- [2021-09-06 02:10:06,520] scrape_pdfs: INFO - Count of new ou documents: 0
- 2021-09-06 02:10:06 INFO Starting to process PDFs.
- [2021-09-06 02:10:06,521] scrape_pdfs: INFO - Starting to process PDFs.
- 2021-09-06 02:10:06 INFO Getting document list.
- [2021-09-06 02:10:06,521] scrape_pdfs: INFO - Getting document list.
- 2021-09-06 02:10:06 INFO Count of new fr documents: 0
- [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Count of new fr documents: 0
- 2021-09-06 02:10:06 INFO Starting to process PDFs.
- [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Starting to process PDFs.
- 2021-09-06 02:10:06 INFO Getting document list.
- [2021-09-06 02:10:06,753] scrape_pdfs: INFO - Getting document list.
- 2021-09-06 02:10:07 INFO Count of new ea documents: 0
- [2021-09-06 02:10:07,010] scrape_pdfs: INFO - Count of new ea documents: 0
- 2021-09-06 02:10:07 INFO Starting to process PDFs.
- [2021-09-06 02:10:07,013] scrape_pdfs: INFO - Starting to process PDFs.
- 2021-09-06 02:10:07 INFO Processing PDFs finished.
- [2021-09-06 02:10:07,014] scrape_pdfs: INFO - Processing PDFs finished.
- 2021-09-06 02:10:07 INFO Starting data-cleaning.
- [2021-09-06 02:10:07,014] scrape_pdfs: INFO - Starting data-cleaning.
- 2021-09-06 02:10:07 INFO Adding new EPoA records to DB (count: 0)
- [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new EPoA records to DB (count: 0)
- 2021-09-06 02:10:07 INFO Adding new OU records to DB (count: 0)
- [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new OU records to DB (count: 0)
- 2021-09-06 02:10:07 INFO Adding new FR records to DB (count: 0)
- [2021-09-06 02:10:07,015] scrape_pdfs: INFO - Adding new FR records to DB (count: 0)
- 2021-09-06 02:10:07 INFO Adding new EA records to DB (count: 0)
- [2021-09-06 02:10:07,016] scrape_pdfs: INFO - Adding new EA records to DB (count: 0)
- 2021-09-06 02:10:07 INFO Finished the PDF scraping.
- [2021-09-06 02:10:07,031] scrape_pdfs: INFO - Finished the PDF scraping.
- 2021-09-07 02:10:07 INFO Starting PDF scraping.
- [2021-09-07 02:10:07,870] scrape_pdfs: INFO - Starting PDF scraping.
- 2021-09-07 02:10:07 INFO Getting document list.
- [2021-09-07 02:10:07,871] scrape_pdfs: INFO - Getting document list.
- 2021-09-07 02:10:08 WARNING instance does not have an es_id() method
- [2021-09-07 02:10:08,305] elasticsearch: WARNING - instance does not have an es_id() method
- 2021-09-07 02:10:08 WARNING instance does not have an es_id() method
- [2021-09-07 02:10:08,308] elasticsearch: WARNING - instance does not have an es_id() method
- Traceback (most recent call last):
- File "/home/ifrc/go-api/manage.py", line 22, in <module>
- execute_from_command_line(sys.argv)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
- utility.execute()
- File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
- self.fetch_command(subcommand).run_from_argv(self.argv)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
- self.execute(*args, **cmd_options)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
- output = self.handle(*args, **options)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 621, in handle
- urls_with_filenames = self.get_documents(pdf_type)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 135, in get_documents
- return get_documents_for(TYPE_URLS[pdf_type], pdf_type, db_set)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 112, in get_documents_for
- items = xmltodict.parse(response.content)['rss']['channel']['item']
- File "/usr/local/lib/python3.6/site-packages/xmltodict.py", line 330, in parse
- parser.Parse(xml_input, True)
- xml.parsers.expat.ExpatError: not well-formed (invalid token): line 101, column 76
- 2021-09-08 02:10:06 INFO Starting PDF scraping.
- [2021-09-08 02:10:06,319] scrape_pdfs: INFO - Starting PDF scraping.
- 2021-09-08 02:10:06 INFO Getting document list.
- [2021-09-08 02:10:06,320] scrape_pdfs: INFO - Getting document list.
- Traceback (most recent call last):
- File "/home/ifrc/go-api/manage.py", line 22, in <module>
- execute_from_command_line(sys.argv)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
- utility.execute()
- File "/usr/local/lib/python3.6/site-packages/django/core/management/__init__.py", line 375, in execute
- self.fetch_command(subcommand).run_from_argv(self.argv)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 323, in run_from_argv
- self.execute(*args, **cmd_options)
- File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 364, in execute
- output = self.handle(*args, **options)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 621, in handle
- urls_with_filenames = self.get_documents(pdf_type)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 135, in get_documents
- return get_documents_for(TYPE_URLS[pdf_type], pdf_type, db_set)
- File "/home/ifrc/go-api/api/management/commands/scrape_pdfs.py", line 112, in get_documents_for
- items = xmltodict.parse(response.content)['rss']['channel']['item']
- File "/usr/local/lib/python3.6/site-packages/xmltodict.py", line 330, in parse
- parser.Parse(xml_input, True)
- xml.parsers.expat.ExpatError: not well-formed (invalid token): line 101, column 76
Add Comment
Please, Sign In to add comment