Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- * Projection of Unicode to Latin-1 (typographic approximation)
- * Given a unicode string of characters,
- * 1.) apply a given replacement mapping, constructed by typographic similarity
- * 2.) raise an exception if there still are non-Latin-1 characters in the projection result
- * the exception object has an attribute special_chars,
- * consisting of a list of all non-Latin-1 characters and their offset in the string
- * 3.) The method which checks for "only Latin-1" has been extracted
- * so that it can be used elsewhere (method check_only_latin1).
- * The projection mapping is performed using the TRANSLATE statement,
- * with a mapping string defined in the class constructor.
- * Currently, we have
- * gv_charmap_latin1 = `“"”"‘'’'‐-‑-‒-–-—-―-⸺-⸻-﹘-♯#♭b`.
- * This attribute has been made changeable (as a kind of parameter)
- * so that it can be extended by further characters.
- * The Latin-1 test is based on the special character class
- * [:unicode:]
- * for regular expressions. This class contains all characters
- * having an Unicode Code Point > 255.
- * Which are precisely the non-Latin-1 characters.
- method project_to_latin1.
- * Performance: it is worth to check first whether
- * there are non-Latin-1 characters at all
- find regex `[[:unicode:]]` in cv_string.
- check sy-subrc eq 0.
- * Mapping rules for single characters using "TRANSLATE"
- if gv_charmap_latin1 is not initial.
- translate cv_string using gv_charmap_latin1.
- endif.
- * Check wether there are still non-Latin-1 characters
- * If yes, raise exception with a list of all pairs (offset, text)
- * of non-Latin-1 characters found
- check_only_latin1( cv_string ).
- endmethod.
- method check_only_latin1.
- * Check if a clike data object given in IV_TEXT contains only Latin-1 characters
- * Character class [:unicode:] is the set of all chars having UCCP > 255 = all non-Latin-1 chars
- find all occurrences of regex `[[:unicode:]]`
- in iv_text
- results data(lt_results).
- if sy-subrc eq 0.
- * Found non-Latin-1 characters
- * raise exception ZCX_SPECIAL_CHARS
- * attribute SPECIAL_CHARS contains a list of all non-Latin-1 chars and their offset
- data(lt_chars) = extract_regex_results(
- iv_string = iv_text
- it_results = lt_results ).
- raise exception type zcx_special_chars
- exporting
- special_chars = lt_chars.
- endif.
- endmethod.
- method extract_regex_results.
- * Extract the strings actually found
- * from the result set IT_RESULT of a regex search
- loop at it_results assigning field-symbol(<ls_res>).
- if iv_submatch = 0.
- data(lv_off) = <ls_res>-offset.
- data(lv_len) = <ls_res>-length.
- else.
- lv_off = <ls_res>-submatches[ iv_submatch ]-offset.
- lv_len = <ls_res>-submatches[ iv_submatch ]-length.
- endif.
- append value #(
- offset = lv_off
- length = lv_len
- text = iv_string+lv_off(lv_len) )
- to et_results.
- endloop.
- endmethod.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement