Advertisement
rplantiko

Typographic approximation to Latin-1

Feb 10th, 2021 (edited)
2,493
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
ABAP 2.94 KB | None | 0 0
  1. * Projection of Unicode to Latin-1 (typographic approximation)
  2.  
  3. * Given a unicode string of characters,
  4. * 1.) apply a given replacement mapping, constructed by typographic similarity
  5. * 2.) raise an exception if there still are non-Latin-1 characters in the projection result
  6. *     the exception object has an attribute special_chars,
  7. *     consisting of a list of all non-Latin-1 characters and their offset in the string
  8. * 3.) The method which checks for "only Latin-1" has been extracted
  9. *     so that it can be used elsewhere (method check_only_latin1).
  10.  
  11. * The projection mapping is performed using the TRANSLATE statement,
  12. * with a mapping string defined in the class constructor.
  13. * Currently, we have
  14. *   gv_charmap_latin1 = `“"”"‘'’'‐-‑-‒-–-—-―-⸺-⸻-﹘-♯#♭b`.
  15. * This attribute has been made changeable (as a kind of parameter)
  16. * so that it can be extended by further characters.
  17.  
  18. * The Latin-1 test is based on the special character class
  19. * [:unicode:]
  20. * for regular expressions. This class contains all characters
  21. * having an Unicode Code Point > 255.
  22. * Which are precisely the non-Latin-1 characters.
  23.  
  24. method project_to_latin1.
  25.  
  26. * Performance: it is worth to check first whether
  27. * there are non-Latin-1 characters at all
  28.   find regex `[[:unicode:]]` in cv_string.
  29.   check sy-subrc eq 0.
  30.  
  31. * Mapping rules for single characters using "TRANSLATE"
  32.   if gv_charmap_latin1 is not initial.
  33.     translate cv_string using gv_charmap_latin1.
  34.   endif.
  35.  
  36. * Check wether there are still non-Latin-1 characters
  37. * If yes, raise exception with a list of all pairs (offset, text)
  38. * of non-Latin-1 characters found
  39.   check_only_latin1( cv_string ).
  40.  
  41. endmethod.
  42.  
  43. method check_only_latin1.
  44.  
  45. * Check if a clike data object given in IV_TEXT contains only Latin-1 characters
  46. * Character class [:unicode:] is the set of all chars having UCCP > 255 = all non-Latin-1 chars
  47.   find all occurrences of regex `[[:unicode:]]`
  48.     in iv_text
  49.     results data(lt_results).
  50.   if sy-subrc eq 0.
  51. * Found non-Latin-1 characters
  52. * raise exception ZCX_SPECIAL_CHARS
  53. * attribute SPECIAL_CHARS contains a list of all non-Latin-1 chars and their offset
  54.     data(lt_chars) = extract_regex_results(
  55.       iv_string  = iv_text
  56.       it_results = lt_results ).
  57.     raise exception type zcx_special_chars
  58.       exporting
  59.         special_chars = lt_chars.
  60.   endif.
  61.  
  62. endmethod.
  63.  
  64. method extract_regex_results.
  65. * Extract the strings actually found
  66. * from the result set IT_RESULT of a regex search
  67.   loop at it_results assigning field-symbol(<ls_res>).
  68.     if iv_submatch = 0.
  69.       data(lv_off) = <ls_res>-offset.
  70.       data(lv_len) = <ls_res>-length.
  71.     else.
  72.       lv_off = <ls_res>-submatches[ iv_submatch ]-offset.
  73.       lv_len = <ls_res>-submatches[ iv_submatch ]-length.
  74.     endif.
  75.     append value #(
  76.       offset = lv_off
  77.       length = lv_len
  78.       text   = iv_string+lv_off(lv_len) )
  79.      to et_results.
  80.   endloop.
  81. endmethod.
  82.  
  83.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement