dipBRUR

All appear location query

Aug 25th, 2017
58
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
ABAP 3.52 KB | None | 0 0
  1. 6. All appear location query :
  2.                               Given a text T, the query format is as follows :
  3.    Given the pattern string P, it is required to give all occurences of P in T
  4.    (The interval can intesect).
  5.    
  6.    Complexity requirements : Pre-process O(length(T)), single pass O(length(P) +
  7.    answer(P)), where answer(P) is the size of the answer collection. i.e requires
  8.    time complexity and input and output rank.
  9.    
  10.    Algorithm : for the text T to establish a suffix automaton, similar to the prev-
  11.    ious problem, in the process of building automaton for each state to calculate
  12.    the first occurrence of the end of the first pos.
  13.    Suppose we have received an inquiry - the string P. We found the corresponding
  14.    state t.
  15.    Obviously should return firstpos(t). What are the locations? We consider the states
  16.    of the automaton that contain the string P, that is, those whose P is its suffix.
  17.    In other words, we need to find out all the state through the suffix link to the
  18.    state t.
  19.    Therefore, in order to solve this problem, we need to store for each node all the
  20.    suffixes points to it. In order to find the answer, we need to carry out DFS/BFS
  21.    along with these rollover suffix links starting from state t.
  22.    This traversal will end in O (answer (P)) time because we will not access the same
  23.    state twice (since each suffix link points to only one point, so there are no two
  24.    paths to the same state).
  25.    However, the firstpos value of the two states may be the same if a state is copied
  26.    from another copy. But this does not affect progressive complexity because each
  27.    non-replicated node will only have one copy.
  28.    In addition, you can easily remove those repetitive locations if we do not consider
  29.    those copies of the state of the firstpos. In fact, all copies of the state are the
  30.    "parent" state of the suffix link point. So, we record the tag is_clon for each node,
  31.    and we do not consider the firstpos that is the state of is_clon = true. So that we
  32.    get the answer (P) non-repetitive state.
  33.  
  34.    Give an offline version of the implementation:
  35. struct state {
  36.     ....
  37.     bool is_clon;
  38.     int first_pos;
  39.     vector <int> inv_link;
  40. };
  41.     .... suffix automato build
  42.     for (int v=1; v<sz; v++)
  43.         st[st[v].link].inv_link.push_back(v);
  44. ...ans : return all occurrences (may be overlap)
  45. void output_all_occurences(int v, int P_length)
  46. {
  47.     if (!st[v].is_clon)
  48.         cout << st[v].first_pos - P_length + 1 << endl;
  49.     for (size_t i=0; i<st[v].inv_link.size(); i++)
  50.         output_all_occurences(st[v[.inv_link[i], P_length);
  51. }
  52.  
  53.  
  54. 8. The query is not the shortest string that appears in the text :
  55.    Problem : Given string S and alphabet. Ask to find a shortest string, so it is not a
  56.    string of S.
  57.    Complexity requirements. O (length (s)).
  58.    Algorithm : Dynamic programming is performed on the suffix automaton of the string S.
  59.  
  60.    Let d [v] be the answer to node v, that is, we have entered a part of the string,
  61.    matching v, and we want to find the minimum number of characters to be added to reach
  62.    a nonexistent transfer.
  63.    Calculating d [v] is very simple. If a transfer does not exist at v, then d [v] = 1: use
  64.    this character to "jump out" the automaton to get the string.
  65.    Otherwise, a string can not meet the requirements, so we have to take the smallest of all
  66.    the characters in the answer:     
  67.                   d[v] = 1 + min(d[w])
  68.    The answer to the original question is equal to d [t_0], and the desired string can be
  69.    obtained by the method of recording the traversing path.
Add Comment
Please, Sign In to add comment