Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- DISCLAIMER
- To make this whole thing easier, there are very few examples. Note that I am
- not responsible for any mistakes in my examples, if they get copied into
- production code. You are free to take my word for it that I've got it right,
- but do so at your own risk! (Not that it makes much difference anyway.)
- BASIC NOTATION
- Bitwise regular expressions are expressed as sequences of zeroes, backslashes,
- and parenthesis.
- All operators take two arguments (although the NOT operator ignores the first
- one.) Notation is postfix. Use parenthesis to make your regular expressions
- unambiguous if needed, although keep in mind that both of these characters
- must be properly backslash-escaped in order to avoid confusion with the
- unrelated yet indispensable, vitally important ( and ) operators.
- operator meaning example
- ----------------------------------------------------------------------------
- \ NOT 00\ --> not 0
- \\ OR 00\\ --> 0 or 0
- \\\\ GROUP 00\\\\ --> 00
- \\\\\\\\ REPEAT 00\\\\\\\\ --> 0 repeated 0 times
- ( HANG 00( --> (interpreter enters infinite loop)
- ) SYNTAX ERR 00) --> (interpreter rejects code)
- It was decided to violate one of the most important principles of INTERCAL by
- giving the operators an order of operations (because hey, why not?) For any
- sequence of backslashes, the interpeter tries to separate them out in the
- order in which they are listed above:
- 0000\\\\\\\ --> \( 0
- \( 0
- \( 00\ \)
- \\ \)
- \\\\ \)
- Also, if an opening parenthesis is preceded by a backslash, it is assumed that
- the backslash is escaping the paren unless this is completely impossible. If a
- ( is backslash-escaped, then the matching ) is assumed to be backslash-escaped
- as well unless there is actually no backslash before it, or unless the
- assumption simply cannot be made, in which case the interpreter will simply
- ignore the ( and only parse the ).
- EMULATING TRADITIONAL REGEXP OPERATORS
- ? and * are easy enough:
- ? --> \(00\0\\\)\\\\\\\\
- * --> \(\(00\0\\\)\(\(00\00\\\\\00\\\\\00\\\\\\)\(00\0\\\\0\\\\\)\\\\\\\\\)
- \\\\\\\\\)\\\\\\\\\
- This implementation of * is only designed for matching 16-bit integers, but it
- can be modified for more general-purpose use. These could probably be written
- with fewer parenthesis as well, God forbid.
- The "+" operator is extra-tricky. Usually, you'll want to just group one
- instance with an arbitrary number of instances using the * implementation:
- n+ --> nn*
- If that way is unnacceptable or over-acceptable, you can use this:
- + --> \(\(\(\(00\0\\\)\(\(00\0\\\)\(00\00\\\\\00\\\\\00\\\\\\)\\\\\\\\\)
- \\\\\\\\\)\(00\0\\\)\\\\\)\(\(00\0\\\)\(\(00\0\\\)\
- (00\00\\\\\00\\\\\00\\\\\\)\\\\\\\\\)\\\\\\\\\)\\\\\)\\\\\\\\
- The logic behind it can be expressed this way: "repeat N times where N has at
- least one binary place value equal to 1, surrounded on either side by up to 15
- place values which could optionally also be 1." Once again, it would require
- generalization if it were applied to anything besides a 16-bit integer. I only
- wrote the simple form here because of how much I value brevity.
- PROPOSED NEW "SWAP" EXPRESSION
- This is adapted from sed, and takes the form:
- As\B\C\
- Where A is a variable, constant, or array of either, B is a regexp, and C is
- a variable or constant. The whole expression evaluates to a version of A
- derived by swapping areas matching B for C. If C is longer than a pattern
- matched by B, then the result will be promoted from 16 bits, to 32 bits, to an
- array, or to a longer array as is needed. A 32-bit value and an array of two
- 16-bit values will be treated identically: as one 32-bit length of bits. To
- avoid mistaking the backslashes within B for delimeters, they must be properly
- backslash-escaped. A itself is not modified, so the "swap" expression should
- be used in an assignment statement (unless, of course, you're just being
- obtuse.)
- This will be the only expression where the regexps see usage.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement