Th3NiKo

PDF - EN and HR to text

Dec 9th, 2018
110
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 0.50 KB | None | 0 0
  1. import re     #REGEX
  2. import glob   #Search files
  3. import fitz   #PyMuPDF 1.14.2
  4.  
  5. docEN = fitz.open("EN.pdf")
  6. docHR = fitz.open("HR.pdf")
  7.  
  8. with open("EN", "a+", encoding="utf-8") as f:
  9.         for page in docEN:
  10.             actualPage = page.getText("text")
  11.             f.write(actualPage)
  12.             f.truncate()
  13.  
  14. with open("HR", "a+", encoding="utf-8") as f:
  15.         for page in docHR:
  16.             actualPage = page.getText("text")
  17.             f.write(actualPage)
  18.             f.truncate()
Advertisement
Add Comment
Please, Sign In to add comment