Advertisement
homer512

unique columns sed

Apr 29th, 2014
155
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 1.30 KB | None | 0 0
  1. #!/bin/sed -f
  2.  
  3. # Removes duplicate fields in a | separated file
  4. # e.g.    foo|bar|foo|quz|bar
  5. # becomes foo|bar|quz
  6.  
  7. : restart
  8.  
  9. # The s instruction needs some explanation.
  10. # The regular expression consists of the following parts
  11. # \1: \(^\||\)
  12. #     Beginning of line or termination of last field
  13. #     Note that we use | as field separator
  14. # \2: \([^|]\+\)
  15. #     Everything between \1 and the next field
  16. #     We can use the \+ extension because we need an extension in \4 anyway
  17. # \3: \(.*\)
  18. #     Everything between \2 and \4
  19. # \4: \(|\2\)
  20. #     A field identical to \2 plus field separator
  21. # \5: \(|\|$\)
  22. #     Field separator closing \4 or end of line
  23. #
  24. # The replacement \1\2\3\5 excludes \4. So the duplicated field is removed
  25. s/\(^\||\)\([^|]\+\)\(.*\)\(|\2\)\(|\|$\)/\1\2\3\5/
  26.  
  27. # Loop if the s instruction matched something until all duplicates are gone
  28. # s///g does not work in this case as changes may overlap
  29. t restart
  30.  
  31. # Handling of repeated empty fields has to happen separately
  32. # The regex matches || or | followed by end of line
  33. # The replacement is a single | unless we matched the end of line
  34. # Then it is the null line matched by $
  35. #
  36. # The suffix 2g is a GNU extension and replaces all but the first match
  37. # For non-GNU, may be replaced with another loop
  38. s/|\(|\|$\)/\1/2g
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement