Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- I went offline and proved the original algorithm subject to the
- conjecture that the XOR tricks worked. As it happens, the XOR tricks
- don't work, but the following argument may still interest some
- people. (I re-did it in Haskell because I find proofs much easier when
- I have recursive functions instead of loops and I can use data
- structures. But for the Pythonistas in the audience I tried to use
- list comprehensions wherever possible.)
- Beautiful theory slain by an ugly fact
- ======================================
- Problem: we're given a sequence of *n* nonzero 32-bit words in which
- every element is either *singleton* or *doubleton*:
- - If a word appears exactly once, it is *singleton*.
- - If a word appears exactly twice, it is *doubleton*.
- - No word appears three or more times.
- The problem is to find the singletons. If there are three singletons,
- we should use linear time and constant space. More generally, if
- there are *k* singletons, we should use _O(k*n)_ time and _O(k)_
- space. Unfortunately, the algorithm rests on an unproven conjecture
- about exclusive or, and the conjecture has been shown to be false.
- I've paid close attention to proof of correctness. Arguments about
- resource bounds are hand-waving.
- We begin with these basics:
- > module Singleton where
- > import Data.Bits
- > import Data.List
- > import Data.Word
- > import Test.QuickCheck hiding ((.&.))
- Key abstraction: Partial specification of a word
- -----------------------------------------
- To tackle the problem I'm going to introduce an abstraction:
- to describe the least significant $w$ bits of a 32-bit word,
- I introduce a `Spec`:
- > data Spec = Spec { w :: Int, bits :: Word32 }
- > deriving Show
- > width = w -- width of a Spec
- A `Spec` matches a word if the least significant `w` bits are equal
- to `bits`. If `w` is zero, by definition all words match:
- > matches :: Spec -> Word32 -> Bool
- > matches spec word = width spec == 0 ||
- > ((word `shiftL` n) `shiftR` n) == bits spec
- > where n = 32 - width spec
- >
- > universalSpec = Spec { w = 0, bits = 0 }
- Here are some claims about `Spec`s:
- - All words match the `universalSpec`, which has width 0
- - If `matches spec word` and `width spec == 32`, then `word == bits spec`
- Key idea: "extend" a partial specification
- ------------------------------------------
- Here is the key idea of the algorithm: we can *extend* a `Spec` by
- adding another bit to the specification. Extending a `Spec` produces
- a list of two `Spec`s
- > extend :: Spec -> [Spec]
- > extend spec = [ Spec { w = w', bits = bits spec .|. (bit `shiftL` width spec) }
- > | bit <- [0, 1] ]
- > where w' = width spec + 1
- And here's the crucial claim: if `spec` matches `word` and if `width spec`
- is less than 32, then exactly one of the two specs from `extend spec`
- match `word`. The proof is by case analysis on the relevant bit of `word`.
- This claim is so important that I'm going to call it Lemma One
- Here's a test:
- > lemmaOne :: Spec -> Word32 -> Property
- > lemmaOne spec word =
- > width spec < 32 && (spec `matches` word) ==>
- > isSingletonList [s | s <- extend spec, s `matches` word]
- >
- > isSingletonList :: [a] -> Bool
- > isSingletonList [a] = True
- > isSingletonList _ = False
- We're going to define a function which given a `Spec` and a sequence
- of 32-bit words, returns a list of the singleton words that match the
- spec. The function will take time proportional to the length of the
- input times the size of the answer times 32, and extra space
- proportional to the size of the answer times 32. Before we tackle the
- main functio, we define some constant-space XOR functions.
- XOR ideas that are broken
- -------------
- Function `xorWith f ws` applies function `f` to every word in `ws` and
- returns the exclusive or of the result.
- > xorWith :: (Word32 -> Word32) -> [Word32] -> Word32
- > xorWith f ws = reduce xor 0 [f w | w <- ws]
- > where reduce = foldl'
- Thanks to *stream fusion* (see ICFP 2007), function `xorWith` takes
- constant space.
- A list of nonzero words has a singleton if and only if either the
- exclusive or is nonzero, or if the exclusive or of `3 * w + 1` is
- nonzero. (The "if" direction is trivial. The "only if" direction is
- a conjecture that Evgeny Kluev has disproven; for a counterexample,
- see array `testb` below. I can make Evgeny's example work by adding
- a third function `g`, but obviously this situation calls for a proof,
- and I don't have one.)
- > hasSingleton :: [Word32] -> Bool
- > hasSingleton ws = xorWith id ws /= 0 || xorWith f ws /= 0 || xorWith g ws /= 0
- > where f w = 3 * w + 1
- > g w = 31 * w + 17
- Efficient search for singletons
- -------------------------------
- Our main function returns a list of all the singletons matching a spec.
- > singletonsMatching :: Spec -> [Word32] -> [Word32]
- > singletonsMatching spec words =
- > if hasSingleton [w | w <- words, spec `matches` w] then
- > if width spec == 32 then
- > [bits spec]
- > else
- > concat [singletonsMatching spec' words | spec' <- extend spec]
- > else
- > []
- We'll prove its correctness by induction on the width of the `spec`.
- - The base case is that `spec` has width 32. In this case, the list
- comprehension will give the list of words that are exactly equal
- to `bits spec`. Function `hasSingleton` will return `True` if and
- only if this list has exactly one element, which will be true
- exactly when `bits spec` is singleton in `words`.
- - Now let's prove that if `singletonsMatching` is correct for with
- *m+1*, it is also correct for width *m*, where *m < 32$. (This is
- the opposite direction as usual for induction, but it doesn't
- matter.)
- Calling `extend spec` on a `spec` of width *m* returns two specs
- that have width $m+1$. By hypothesis, `singletonsMatching` is
- correct on these specs. To prove: that the result contains
- exactly those singletons that match `spec`. By Lemma One, any
- word that matches `spec` matches exactly one of the extended
- specs. By hypothesis, the recursive calls return exactly the
- singletons matching the extend specs. When we combine the results
- of these calls with `concat`, we get exactly the matching
- singletons, with no duplicates and no omissions.
- Actually solving the problem is anticlimactic: the singletons are all
- the singletons that match the empty spec:
- > singletons :: [Word32] -> [Word32]
- > singletons words = singletonsMatching universalSpec words
- Testing code
- ------------
- > testa, testb :: [Word32]
- > testa = [10, 1, 1, 2, 3, 4, 2, 5, 6, 7, 10]
- > testb = [ 0x0000
- > , 0x0010
- > , 0x0100
- > , 0x0110
- > , 0x1000
- > , 0x1010
- > , 0x1100
- > , 0x1110
- > ]
- Beyond this point, if you want to follow what's going on, you need to
- know [QuickCheck](http://en.wikipedia.org/wiki/QuickCheck).
- Here's a random generator for specs:
- > instance Arbitrary Spec where
- > arbitrary = do width <- choose (0, 32)
- > b <- arbitrary
- > return (randomSpec width b)
- > shrink spec = [randomSpec w' (bits spec) | w' <- shrink (width spec)] ++
- > [randomSpec (width spec) b | b <- shrink (bits spec)]
- > randomSpec width bits = Spec { w = width, bits = mask bits }
- > where mask b = if width == 32 then b
- > else (b `shiftL` n) `shiftR` n
- > n = 32 - width
- Using this generator, we can test Lemma One using `quickCheck lemmaOne`.
- We can test to see that any word claimed to be a singleton is in fact singleton:
- > singletonsAreSingleton nzwords =
- > not (hasTriple words) ==> all (`isSingleton` words) (singletons words)
- > where isSingleton w words = isSingletonList [w' | w' <- words, w' == w]
- > words = [w | NonZero w <- nzwords]
- > hasTriple :: [Word32] -> Bool
- > hasTriple words = hasTrip (sort words)
- > hasTrip (w1:w2:w3:ws) = (w1 == w2 && w2 == w3) || hasTrip (w2:w3:ws)
- > hasTrip _ = False
- Here's another property that tests the fast `singletons` against a
- slower algorithm that uses sorting.
- > singletonsOK :: [NonZero Word32] -> Property
- > singletonsOK nzwords = not (hasTriple words) ==>
- > sort (singletons words) == sort (slowSingletons words)
- > where words = [w | NonZero w <- nzwords ]
- > slowSingletons words = stripDoubletons (sort words)
- > stripDoubletons (w1:w2:ws) | w1 == w2 = stripDoubletons ws
- > | otherwise = w1 : stripDoubletons (w2:ws)
- > stripDoubletons as = as
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement