Advertisement
Guest User

bitwise regexps

a guest
Jul 20th, 2010
195
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.14 KB | None | 0 0
  1. DISCLAIMER
  2.  
  3. To make this whole thing easier, there are very few examples. Note that I am
  4. not responsible for any mistakes in my examples, if they get copied into
  5. production code. You are free to take my word for it that I've got it right,
  6. but do so at your own risk! (Not that it makes much difference anyway.)
  7.  
  8.  
  9.  
  10.  
  11.  
  12. BASIC NOTATION
  13.  
  14. Bitwise regular expressions are expressed as sequences of zeroes, backslashes,
  15. and parenthesis.
  16.  
  17. All operators take two arguments (although the NOT operator ignores the first
  18. one.) Notation is postfix. Use parenthesis to make your regular expressions
  19. unambiguous if needed, although keep in mind that both of these characters
  20. must be properly backslash-escaped in order to avoid confusion with the
  21. unrelated yet indispensable, vitally important ( and ) operators.
  22.  
  23. operator meaning example
  24. ----------------------------------------------------------------------------
  25. \ NOT 00\ --> not 0
  26. \\ OR 00\\ --> 0 or 0
  27. \\\\ GROUP 00\\\\ --> 00
  28. \\\\\\\\ REPEAT 00\\\\\\\\ --> 0 repeated 0 times
  29. ( HANG 00( --> (interpreter enters infinite loop)
  30. ) SYNTAX ERR 00) --> (interpreter rejects code)
  31.  
  32. It was decided to violate one of the most important principles of INTERCAL by
  33. giving the operators an order of operations (because hey, why not?) For any
  34. sequence of backslashes, the interpeter tries to separate them out in the
  35. order in which they are listed above:
  36. 0000\\\\\\\ --> \( 0
  37. \( 0
  38. \( 00\ \)
  39. \\ \)
  40. \\\\ \)
  41. Also, if an opening parenthesis is preceded by a backslash, it is assumed that
  42. the backslash is escaping the paren unless this is completely impossible. If a
  43. ( is backslash-escaped, then the matching ) is assumed to be backslash-escaped
  44. as well unless there is actually no backslash before it, or unless the
  45. assumption simply cannot be made, in which case the interpreter will simply
  46. ignore the ( and only parse the ).
  47.  
  48.  
  49.  
  50.  
  51.  
  52. EMULATING TRADITIONAL REGEXP OPERATORS
  53.  
  54. ? and * are easy enough:
  55. ? --> \(00\0\\\)\\\\\\\\
  56. * --> \(\(00\0\\\)\(\(00\00\\\\\00\\\\\00\\\\\\)\(00\0\\\\0\\\\\)\\\\\\\\\)
  57. \\\\\\\\\)\\\\\\\\\
  58. This implementation of * is only designed for matching 16-bit integers, but it
  59. can be modified for more general-purpose use. These could probably be written
  60. with fewer parenthesis as well, God forbid.
  61.  
  62. The "+" operator is extra-tricky. Usually, you'll want to just group one
  63. instance with an arbitrary number of instances using the * implementation:
  64. n+ --> nn*
  65. If that way is unnacceptable or over-acceptable, you can use this:
  66. + --> \(\(\(\(00\0\\\)\(\(00\0\\\)\(00\00\\\\\00\\\\\00\\\\\\)\\\\\\\\\)
  67. \\\\\\\\\)\(00\0\\\)\\\\\)\(\(00\0\\\)\(\(00\0\\\)\
  68. (00\00\\\\\00\\\\\00\\\\\\)\\\\\\\\\)\\\\\\\\\)\\\\\)\\\\\\\\
  69. The logic behind it can be expressed this way: "repeat N times where N has at
  70. least one binary place value equal to 1, surrounded on either side by up to 15
  71. place values which could optionally also be 1." Once again, it would require
  72. generalization if it were applied to anything besides a 16-bit integer. I only
  73. wrote the simple form here because of how much I value brevity.
  74.  
  75.  
  76.  
  77.  
  78.  
  79. PROPOSED NEW "SWAP" EXPRESSION
  80.  
  81. This is adapted from sed, and takes the form:
  82. As\B\C\
  83. Where A is a variable, constant, or array of either, B is a regexp, and C is
  84. a variable or constant. The whole expression evaluates to a version of A
  85. derived by swapping areas matching B for C. If C is longer than a pattern
  86. matched by B, then the result will be promoted from 16 bits, to 32 bits, to an
  87. array, or to a longer array as is needed. A 32-bit value and an array of two
  88. 16-bit values will be treated identically: as one 32-bit length of bits. To
  89. avoid mistaking the backslashes within B for delimeters, they must be properly
  90. backslash-escaped. A itself is not modified, so the "swap" expression should
  91. be used in an assignment statement (unless, of course, you're just being
  92. obtuse.)
  93.  
  94. This will be the only expression where the regexps see usage.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement