Guest User

Untitled

a guest
Oct 20th, 2018
98
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.53 KB | None | 0 0
  1. #!/usr/bin/env python
  2. """
  3. Convert STDIN to UTF-8
  4. based on character encoding detection
  5. """
  6.  
  7. import sys, json, itertools
  8. from chardet.universaldetector import UniversalDetector
  9.  
  10. detector = UniversalDetector()
  11. lines = []
  12. for line in sys.stdin:
  13. lines.append(line)
  14. detector.feed(line)
  15. if detector.done: break
  16. detector.close()
  17.  
  18. print>>sys.stderr, detector.result
  19.  
  20. encoding = detector.result['encoding']
  21. for line in itertools.chain(lines, sys.stdin):
  22. converted = line.decode(encoding, 'replace').encode('utf8')
  23. sys.stdout.write(converted)
Add Comment
Please, Sign In to add comment