Advertisement
Guest User

Untitled

a guest
Jul 23rd, 2019
110
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.91 KB | None | 0 0
  1. 1. Find Pattern in Text
  2. 2. Brute-force
  3. 1. Check for pattern starting at each text position
  4. 2. Worst case ~MN
  5. 3. Need backup of the text to go backwards
  6. 3. Knuth-Morris-Pratt
  7. 1. Don’t need to back up text pointer
  8. 2. DFA
  9. 1. Deterministic Finite State Automaton
  10. 2. Exactly one transition for each possibility
  11. 3. Accept if sequence of transitions leads to halt state
  12. 4. State: number of characters in pattern that have been matched
  13. 5. Need to precompute dfa[][], text pointer i never decrements, could use input stream
  14. 3. Build DFA table
  15. 1. if mismatch, the last j-1 characters of input are pat[1…j-1] followed by mismatch c
  16. 2. Simulate pat[1…j-1] on DFA and take transition c
  17. 3. require j steps
  18. 4. Take only constant time if we maintain state X
  19. 5. For each state j if mismatch, set dfa[c][j] = dfa[c][X], then update X = dfa[pat.charAt(j)][X]
  20. 6. Time space (RM), M+N chacter search accesses
  21. 4. Boyer-Moore
  22. 1. Scan character in pattern from right to left
  23. 2. Can skip as many as M text chars when finding one not in the pattern
  24. 3. How much to skip
  25. 1. Precompute index of rightmost occurence of character c in pattern (-1 if character not in pattern)
  26. 2. skip the length of the pattern - (the above)
  27. 4. N/M character compares
  28. 5. Worst case ~MN
  29. 5. Rabin-Karp
  30. 1. Modular hashing
  31. 2. Computer a hash of pattern characters (take a big prime and compute reminder)
  32. 3. for each i computer a hash of text character i to M+i-1
  33. 4. If pattern hash = text substring hash, check for a match
  34. 5. Honer’s method: linear time method to evaluate degree M polynomial
  35. 1. (ti*10^(M-1) + t(i+m-1))%Q
  36. 2. Reuse the previous hashed result, subtract the unused trailing character, to compute the next character
  37. 6. Monte-carlo version:no check
  38. 7. Las-vegas version:check even hash matches
  39. 8. Linear time
  40. 9. Easy to extended to complex situations (2d)
  41. 10. Direct suffix search may be slow (arithmetic ops slower than char compares)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement