SHARE
TWEET

Split wrong words in Calibre (for French language))

a guest Sep 21st, 2016 103 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. import regex
  2. from calibre import replace_entities, prepare_string_for_xml
  3.  
  4. def replace(match, number, file_name, metadata, dictionaries, data, functions, *args, **kwargs):
  5.     def fix_word(m):
  6.         word = m.group()
  7.         if dictionaries.recognized(word):
  8.             return word
  9.         for i in xrange(1, len(word) - 1):
  10.             a, b = word[:i], word[i:]
  11.             if dictionaries.recognized(a) and dictionaries.recognized(b):
  12.                 return a + ' ' + b
  13.         m = regex.match(r"(\w+)((?:[dlnmts]|qu(?:oi|el)qu|puisqu|lorsqu|jusqu|qu)[’'`]\w+)", word)
  14.         if m:
  15.             return m.group(1) + " " + m.group(2)
  16.         return word
  17.     text = replace_entities(match.group(1))
  18.     text = regex.sub(r"\b\w(?:[\w’'`-]*\w|\w+)\b", fix_word, text, flags=regex.VERSION1)
  19.     text = prepare_string_for_xml(text)
  20.     return '>' + text + '<'
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top