Advertisement
BlackBeltPanda

RegexSwear

May 15th, 2016
98
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.34 KB | None | 0 0
  1. Example swear filter that matches many variants of "ass" and "asshole", including several common filter bypasses:
  2.  
  3. #Ass(es)|(hole(s))
  4. match (?i)\b[a@]+?[\W\d_]*?[sz5$]+?[\W\d_]*?[sz5$]+?[\W\d_e3sz5$]*?(?:h+?[\W\d_]*?[o0]+?[\W\d_]*?l+?[\W\d_]*?[e3]+?[\W\d_sz5$]*?)?\b
  5. id swear
  6. handle as swear
  7.  
  8. Explanation:
  9. "(?i)" is the java Regex flag for case-insensitive matching so we don't have to specify both capital and lower-case letters.
  10. "\b" is the regex word boundary anchor. It's use to detect word boundaries. We use it at the beginning and end of the regex string to prevent the above example from matching other words, like "assassination", "brass", "mass holes", etc.
  11. "[a@]" is a regex group that matches either "a" or "@". This allows us to check for "l33t sp34k". When checking for only a single letter we can format it simply as "a", without the brackets.
  12. "+" is a regex quantifier that matches 1 or more of the previous match.
  13. "?" is used in front of the "+" quantifier to make it "lazy". This means it will match from left-to-right instead of the usual right-to-left. The benefit here is that it stops looking as soon as it finds a match, checks less of the string, and fails quicker if there's no match.
  14. "[\W\d_]" is another regex group. This one looks for either "\W", which is anything that's not a word, "\d", which is any digit, and "_", which is just a regular underscore.
  15. "*" is a regex quantifier that matches 0 or more of the previous match. This means that the previous match doesn't have to exist. Useful for checking for swear bypasses like "ass_hole" or "as.s.h ole". As before, the "?" is placed after it to make it lazy; otherwise it will check the entire string from right-to-left until it finds a match.
  16. "(?:<regex_here>)" is a passive group. Unlike regular regex groups ("(<regex_here>)"), this doesn't capture the result for use later on. With this example, there's no point in capturing a group so we use a passive group to save on resources and improve the speed of the regex. In this case the passive group is used to check for a series of matches resulting in "hole".
  17. "?" is placed after the passive group as a regex quantifier to match 0 or 1. This means the "hole" is optional, allowing us to match both "ass" and "asshole". Using "?" makes more sense in this case than "*" (which matches 0 or more) because we don't need to match "assholehole...".
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement