Advertisement
bobdodds

dedupe and norepeat and lastrepeat - Sed

Aug 9th, 2020 (edited)
1,702
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 1.07 KB | None | 0 0
  1. # First line in a set of duplicate lines is kept, rest are deleted.
  2. # Emulate human eyes on trailing spaces and tabs by trimming those.
  3. # Use after norepeat() to dedupe blank lines.
  4.  
  5. # my answer to https://stackoverflow.com/questions/1444406/how-to-delete-duplicate-lines-in-a-file-without-sorting-it-in-unix/63322817#63322817
  6.  
  7. dedupe() {
  8.  sed -E '
  9.  $!{
  10.   N;
  11.   s/[ \t]+$//;
  12.   /^(.*)\n\1$/!P;
  13.   D;
  14.  }
  15. ';
  16. }
  17.  
  18. # Delete duplicate, nonconsecutive lines from a file. Ignore blank
  19. # lines. Trailing spaces and tabs are trimmed to humanize comparisons
  20. # squeeze blank lines to one
  21.  
  22. norepeat() {
  23.  sed -n -E '
  24.  s/[ \t]+$//;
  25.  G;
  26.  /^(\n){2,}/d;
  27.  /^([^\n]+).*\n\1(\n|$)/d;
  28.  h;
  29.  P;
  30.  ';
  31. }
  32.  
  33. lastrepeat() {
  34.  sed -n -E '
  35.  s/[ \t]+$//;
  36.  /^$/{
  37.   H;
  38.   d;
  39.  };
  40.  G;
  41.  # delete previous repeated line if found
  42.  s/^([^\n]+)(.*)(\n\1(\n.*|$))/\1\2\4/;
  43.  # after searching for previous repeat, move tested last line to end
  44.  s/^([^\n]+)(\n)(.*)/\3\2\1/;
  45.  $!{
  46.   h;
  47.   d;
  48.  };
  49.  # squeeze blank lines to one
  50.  s/(\n){3,}/\n\n/g;
  51.  s/^\n//;
  52.  p;
  53. ';
  54. }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement