Advertisement
rplantiko

Pattern Matching rebuilt with ABAP

Dec 1st, 2012
659
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
ABAP 11.43 KB | None | 0 0
  1. report zz_patterns_with_backtracking.
  2.  
  3. * An ABAP implementation of a regex search strategy
  4. * Supports backtracking
  5.  
  6. * The basic interface is lif_pattern, with only one
  7. * method match(). Returns the length of the matching substring
  8. * or raises lcx_invalid if the given string does not match
  9.  
  10. * lcl_pattern_charset is a superclass for [---]{n,m}
  11. * = Find a string built from some characters, n <= length <= m
  12.  
  13. * lcl_pattern_sequence combines several lif_pattern instances
  14. * and requires the matches one after another.
  15. * It is here where backtracking is implemented.
  16.  
  17. * lcl_matcher is the object which applies a pattern to a given
  18. * string, and searches for single hits, or all hits.
  19.  
  20. * Confer SCN discussion http://scn.sap.com/message/13690938#13690938 for the
  21. * motivation.
  22.  
  23. types: begin of ty_match_result,
  24.     line type i,
  25.     offset type i,
  26.     length type i,
  27.     text type string,
  28.   end of ty_match_result,
  29.   ty_match_result_tab type standard table of ty_match_result.
  30.  
  31. * Stop condition for matching
  32. class lcx_invalid definition inheriting from cx_static_check.
  33. endclass.                    "lcx_invalid DEFINITION
  34.  
  35. *----------------------------------------------------------------------*
  36. * INTERFACE lif_pattern DEFINITION
  37. *----------------------------------------------------------------------*
  38. * An "atom" of a pattern
  39. *----------------------------------------------------------------------*
  40. interface lif_pattern.
  41.   methods match
  42.     importing
  43.       iv_text type csequence
  44.       iv_max type i default -1 " for backtracking
  45.     returning value(ev_length) type i
  46.     raising lcx_invalid.
  47. endinterface.                    "lif_pattern DEFINITION
  48.  
  49. *----------------------------------------------------------------------*
  50. * CLASS lcl_matcher DEFINITION
  51. *----------------------------------------------------------------------*
  52. class lcl_matcher definition create private.
  53.   public section.
  54.     class-methods: create
  55.                      importing io_pattern type ref to lif_pattern
  56.                      returning value(eo_matcher) type ref to lcl_matcher.
  57.     methods: match_one importing iv_text type csequence
  58.                        exporting es_match type ty_match_result
  59.                        raising lcx_invalid,
  60.              match_all importing iv_text type csequence
  61.                        exporting et_matches type ty_match_result_tab,
  62.              match_all_in_table
  63.                importing it_text type table
  64.                exporting et_matches type ty_match_result_tab.
  65.   private section.
  66.     data: go_pattern type ref to lif_pattern.
  67. endclass.                    "lcl_matcher DEFINITION
  68.  
  69. *----------------------------------------------------------------------*
  70. * CLASS lcl_pattern_sequence DEFINITION
  71. *----------------------------------------------------------------------*
  72. class lcl_pattern_sequence definition create private.
  73.   public section.
  74.     interfaces lif_pattern.
  75.     class-methods:
  76.       create importing io_pattern type ref to lif_pattern
  77.              returning value(eo_sequence) type ref to lcl_pattern_sequence.
  78.     methods:
  79.       add importing io_pattern type ref to lif_pattern
  80.           returning value(eo_sequence) type ref to lcl_pattern_sequence.
  81.   private section.
  82.     data: go_left type ref to lif_pattern,
  83.           go_right type ref to lif_pattern.
  84. endclass.                    "lcl_pattern_sequence DEFINITION
  85.  
  86. *----------------------------------------------------------------------*
  87. * CLASS lcl_pattern_charset DEFINITION
  88. *----------------------------------------------------------------------*
  89. class lcl_pattern_charset definition
  90.     create private.
  91.   public section.
  92.     interfaces lif_pattern.
  93.     class-methods:
  94.       create
  95.         importing
  96.           iv_min type i default 1
  97.           iv_max type i default -1
  98.           iv_charset type csequence
  99.         returning value(eo_pattern) type ref to lif_pattern,
  100.       word
  101.         importing
  102.           iv_min type i default 1
  103.           iv_max type i default -1
  104.         returning value(eo_pattern) type ref to lif_pattern,
  105.       alpha
  106.         importing
  107.           iv_min type i default 1
  108.           iv_max type i default -1
  109.         returning value(eo_pattern) type ref to lif_pattern,
  110.       char
  111.         importing
  112.           iv_min type i default 1
  113.           iv_max type i default 1
  114.           iv_char type csequence
  115.         returning value(eo_pattern) type ref to lif_pattern.
  116.   private section.
  117.     data:
  118.       gv_min type i,
  119.       gv_max type i,
  120.       gv_charset type string.
  121. endclass.                    "lcl_pattern_charset DEFINITION
  122.  
  123. start-of-selection.
  124.   perform start.
  125.  
  126. * ---
  127. form start.
  128.  
  129.   data: lt_text type tttext255,
  130.         lo_matcher type ref to lcl_matcher,
  131.         lo_pattern type ref to lcl_pattern_sequence,
  132.         lt_matches type ty_match_result_tab.
  133.  
  134.   field-symbols: <ls_match> type ty_match_result.
  135.  
  136. * Make a test table
  137.   perform make_test_text changing lt_text.
  138.  
  139.   lo_pattern = lcl_pattern_sequence=>create(
  140.     lcl_pattern_charset=>alpha( iv_max = 1 ) ).
  141.   lo_pattern->add(
  142.     lcl_pattern_charset=>word(  iv_min = 3 iv_max = 13 ) )->add(
  143.     lcl_pattern_charset=>char(  '-' )                    )->add(
  144.     lcl_pattern_charset=>word(  iv_min = 1 iv_max = 30 ) ).
  145.  
  146.   lo_matcher = lcl_matcher=>create( lo_pattern ).
  147.   lo_matcher->match_all_in_table(
  148.     exporting it_text    = lt_text
  149.     importing et_matches = lt_matches ).
  150.  
  151. * Print the found texts
  152.   loop at lt_matches assigning <ls_match>.
  153.     write: / <ls_match>-text.
  154.   endloop.
  155.  
  156. endform.                    "start
  157.  
  158.  
  159. *----------------------------------------------------------------------*
  160. *       CLASS lcl_matcher IMPLEMENTATION
  161. *----------------------------------------------------------------------*
  162. class lcl_matcher implementation.
  163.   method create.
  164.     create object eo_matcher.
  165.     eo_matcher->go_pattern = io_pattern.
  166.   endmethod.                    "add_pattern
  167.  
  168.   method match_one.
  169.     data: lv_pos type i,
  170.           lv_strlen type i.
  171.     lv_strlen = strlen( iv_text ).
  172.     while lv_pos < lv_strlen.
  173.       try.
  174.           es_match-offset = lv_pos.
  175.           es_match-length = go_pattern->match( iv_text+lv_pos(*) ).
  176.           es_match-text   = iv_text+lv_pos(es_match-length).
  177.           return.
  178.         catch lcx_invalid.
  179.           add 1 to lv_pos.
  180.       endtry.
  181.     endwhile.
  182.     raise exception type lcx_invalid.
  183.   endmethod.                    "match_one
  184.  
  185.   method match_all.
  186.     data: ls_match type ty_match_result,
  187.           lv_pos type i,
  188.           lv_strlen type i.
  189.     lv_strlen = strlen( iv_text ).
  190.     clear et_matches.
  191.     while lv_pos < lv_strlen.
  192.       try.
  193.           call method match_one
  194.             exporting
  195.               iv_text  = iv_text+lv_pos(*)
  196.             importing
  197.               es_match = ls_match.
  198.           append ls_match to et_matches.
  199.           lv_pos = lv_pos + ls_match-offset + ls_match-length.
  200.           if ls_match-length is initial.
  201.             return. " for zero-width assertions
  202.           endif.
  203.         catch lcx_invalid.
  204.           return.
  205.       endtry.
  206.     endwhile.
  207.   endmethod.                    "match_all
  208.  
  209.   method match_all_in_table.
  210.     data: lt_matches type ty_match_result_tab,
  211.           lv_line type i.
  212.     field-symbols: <lv_text> type csequence,
  213.                    <ls_match> type ty_match_result.
  214.     loop at it_text assigning <lv_text>.
  215.       lv_line = sy-tabix.
  216.       call method match_all
  217.         exporting
  218.           iv_text    = <lv_text>
  219.         importing
  220.           et_matches = lt_matches.
  221.       loop at lt_matches assigning <ls_match>.
  222.         <ls_match>-line = lv_line.
  223.         append <ls_match> to et_matches.
  224.       endloop.
  225.     endloop.
  226.   endmethod.                    "match_all_in_table
  227.  
  228. endclass.  " lcl_matcher
  229.  
  230.  
  231. *----------------------------------------------------------------------*
  232. *       CLASS lcl_pattern_charset IMPLEMENTATION
  233. *----------------------------------------------------------------------*
  234. class lcl_pattern_charset implementation.
  235.   method lif_pattern~match.
  236.     data: lv_strlen type i,
  237.           lv_pos type i,
  238.           lv_matches type flag,
  239.           lv_max type i.
  240.     if iv_max > -1.
  241.       lv_max = iv_max.
  242.     else.
  243.       lv_max = gv_max.
  244.     endif.
  245.     lv_strlen = strlen( iv_text ).
  246.     lv_pos    = gv_min.
  247. * Greedy match
  248.     while ( lv_pos <= lv_max or lv_max = -1 ) and lv_pos < lv_strlen.
  249.       if iv_text(lv_pos) cn gv_charset.
  250.         exit.
  251.       endif.
  252.       lv_matches = abap_true.
  253.       add 1 to lv_pos.
  254.     endwhile.
  255.     if lv_matches eq abap_true.
  256.       ev_length = lv_pos - 1.
  257.     else.
  258.       raise exception type lcx_invalid.
  259.     endif.
  260.   endmethod.                    "lif_pattern~match
  261.   method create.
  262.     data: lo_pattern type ref to lcl_pattern_charset.
  263.     create object lo_pattern.
  264.     lo_pattern->gv_min = iv_min.
  265.     lo_pattern->gv_max = iv_max.
  266.     lo_pattern->gv_charset = iv_charset.
  267.     eo_pattern = lo_pattern.
  268.   endmethod.                    "create
  269.   method word.
  270.     eo_pattern = create( iv_min = iv_min
  271.                          iv_max = iv_max
  272.                          iv_charset =
  273.       `_0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` ).
  274.   endmethod.                    "word
  275.   method char.
  276.     eo_pattern = create( iv_min = iv_min
  277.                          iv_max = iv_max
  278.                          iv_charset = iv_char ).
  279.   endmethod.                    "char
  280.   method alpha.
  281.     eo_pattern = create( iv_min = iv_min
  282.                          iv_max = iv_max
  283.                          iv_charset =
  284.       `abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ` ).
  285.   endmethod.                    "alpha
  286.  
  287. endclass.  " lcl_pattern_charset
  288.  
  289. *----------------------------------------------------------------------*
  290. *       CLASS lcl_pattern_sequence IMPLEMENTATION
  291. *----------------------------------------------------------------------*
  292. class lcl_pattern_sequence implementation.
  293.   method create.
  294.     create object eo_sequence.
  295.     eo_sequence->go_left = io_pattern.
  296.   endmethod.                    "create
  297.   method add.
  298.     go_right = eo_sequence = create( io_pattern ).
  299.   endmethod.                    "add
  300.   method lif_pattern~match.
  301.     data: lv_pos type i,
  302.           lv_max type i value -1.
  303.     while lv_pos >= 0.
  304.       lv_pos = go_left->match( iv_text = iv_text iv_max = lv_max ).
  305.       if go_right is bound.
  306.         try.
  307.             lv_pos = lv_pos + go_right->match( iv_text+lv_pos(*) ).
  308.           catch lcx_invalid.
  309.             lv_max = lv_pos - 1. " Release 1 char (backtracking)
  310.             continue.
  311.         endtry.
  312.       endif.
  313.       ev_length = lv_pos.
  314.       return.
  315.     endwhile.
  316.   endmethod.                    "lif_pattern~match
  317. endclass.                    "lcl_pattern_sequence IMPLEMENTATION
  318.  
  319.  
  320. * Make a test text
  321. form make_test_text changing ct_text type tttext255.
  322.   define _append.
  323.     append &1 to ct_text.
  324.   end-of-definition.
  325.   _append :
  326.     `Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy`,
  327.     `eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam`,
  328.     `voluptua. At VBAP-MATNR accusam et justo duo dolores et ea rebum. Stet `,
  329.     `clita kasd gubergren, no sea sanctus est Lorem ipsum dolor sit`,
  330.     `amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam `,
  331.     `nonumy eirmod tempor in EKPO-PSTYP labore et dolore magna aliquyam erat,s`,
  332.     `ed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum`.
  333. endform.                    "make_test_text
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement