Typographic approximation to Latin-1

* Projection of Unicode to Latin-1 (typographic approximation)

* Given a unicode string of characters,
* 1.) apply a given replacement mapping, constructed by typographic similarity
* 2.) raise an exception if there still are non-Latin-1 characters in the projection result
*     the exception object has an attribute special_chars,
*     consisting of a list of all non-Latin-1 characters and their offset in the string
* 3.) The method which checks for "only Latin-1" has been extracted
*     so that it can be used elsewhere (method check_only_latin1).

* The projection mapping is performed using the TRANSLATE statement,
* with a mapping string defined in the class constructor.
* Currently, we have
*   gv_charmap_latin1 = `“"”"‘'’'‐-‑-‒-–-—-―-⸺-⸻-﹘-♯#♭b`.
* This attribute has been made changeable (as a kind of parameter)
* so that it can be extended by further characters.

* The Latin-1 test is based on the special character class
* [:unicode:]
* for regular expressions. This class contains all characters
* having an Unicode Code Point > 255.
* Which are precisely the non-Latin-1 characters.

method project_to_latin1.

* Performance: it is worth to check first whether
* there are non-Latin-1 characters at all
  find regex `[[:unicode:]]` in cv_string.
  check sy-subrc eq 0.

* Mapping rules for single characters using "TRANSLATE"
  if gv_charmap_latin1 is not initial.
    translate cv_string using gv_charmap_latin1.
  endif.

* Check wether there are still non-Latin-1 characters
* If yes, raise exception with a list of all pairs (offset, text)
* of non-Latin-1 characters found
  check_only_latin1( cv_string ).

endmethod.

method check_only_latin1.

* Check if a clike data object given in IV_TEXT contains only Latin-1 characters
* Character class [:unicode:] is the set of all chars having UCCP > 255 = all non-Latin-1 chars
  find all occurrences of regex `[[:unicode:]]`
    in iv_text
    results data(lt_results).
  if sy-subrc eq 0.
* Found non-Latin-1 characters
* raise exception ZCX_SPECIAL_CHARS
* attribute SPECIAL_CHARS contains a list of all non-Latin-1 chars and their offset
    data(lt_chars) = extract_regex_results(
      iv_string  = iv_text
      it_results = lt_results ).
    raise exception type zcx_special_chars
      exporting
        special_chars = lt_chars.
  endif.

endmethod.

method extract_regex_results.
* Extract the strings actually found
* from the result set IT_RESULT of a regex search
  loop at it_results assigning field-symbol(<ls_res>).
    if iv_submatch = 0.
      data(lv_off) = <ls_res>-offset.
      data(lv_len) = <ls_res>-length.
    else.
      lv_off = <ls_res>-submatches[ iv_submatch ]-offset.
      lv_len = <ls_res>-submatches[ iv_submatch ]-length.
    endif.
    append value #(
      offset = lv_off
      length = lv_len
      text   = iv_string+lv_off(lv_len) )
     to et_results.
  endloop.
endmethod.