This week only. Pastebin PRO Accounts Christmas Special! Don't miss out!Want more features on Pastebin? Sign Up, it's FREE!
Guest

finding singleton elements in a sequence

By: nrnrnr on Aug 13th, 2012  |  syntax: Haskell  |  size: 8.56 KB  |  views: 57  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. I went offline and proved the original algorithm subject to the
  2. conjecture that the XOR tricks worked. As it happens, the XOR tricks
  3. don't work, but the following argument may still interest some
  4. people. (I re-did it in Haskell because I find proofs much easier when
  5. I have recursive functions instead of loops and I can use data
  6. structures. But for the Pythonistas in the audience I tried to use
  7. list comprehensions wherever possible.)
  8.  
  9.  
  10. Beautiful theory slain by an ugly fact
  11. ======================================
  12.  
  13. Problem: we're given a sequence of *n* nonzero 32-bit words in which
  14. every element is either *singleton* or *doubleton*:
  15.  
  16.   - If a word appears exactly once, it is *singleton*.
  17.  
  18.   - If a word appears exactly twice, it is *doubleton*.
  19.  
  20.   - No word appears three or more times.
  21.  
  22. The problem is to find the singletons.  If there are three singletons,
  23. we should use linear time and constant space.  More generally, if
  24. there are *k* singletons, we should use _O(k*n)_ time and _O(k)_
  25. space.  Unfortunately, the algorithm rests on an unproven conjecture
  26. about exclusive or, and the conjecture has been shown to be false.
  27.  
  28. I've paid close attention to proof of correctness.  Arguments about
  29. resource bounds are hand-waving.
  30.  
  31. We begin with these basics:
  32.  
  33. > module Singleton where
  34. > import Data.Bits
  35. > import Data.List
  36. > import Data.Word
  37. > import Test.QuickCheck hiding ((.&.))
  38.  
  39.  
  40. Key abstraction: Partial specification of a word
  41. -----------------------------------------
  42.  
  43.  
  44. To tackle the problem I'm going to introduce an abstraction:
  45. to describe the least significant $w$ bits of a 32-bit word,
  46. I introduce a `Spec`:
  47.  
  48. > data Spec = Spec { w :: Int, bits :: Word32 }
  49. >    deriving Show
  50. > width = w -- width of a Spec
  51.  
  52. A `Spec` matches a word if the least significant `w` bits are equal
  53. to `bits`.  If `w` is zero, by definition all words match:
  54.  
  55. > matches :: Spec -> Word32 -> Bool
  56. > matches spec word = width spec == 0 ||
  57. >                     ((word `shiftL` n) `shiftR` n) == bits spec
  58. >   where n = 32 - width spec
  59. >
  60. > universalSpec = Spec { w = 0, bits = 0 }
  61.  
  62. Here are some claims about `Spec`s:
  63.          
  64.   - All words match the `universalSpec`, which has width 0
  65.          
  66.   - If `matches spec word` and `width spec == 32`, then `word == bits spec`
  67.  
  68. Key idea: "extend" a partial specification
  69. ------------------------------------------
  70.  
  71. Here is the key idea of the algorithm: we can *extend* a `Spec` by
  72. adding another bit to the specification.  Extending a `Spec` produces
  73. a list of two `Spec`s
  74.  
  75. > extend :: Spec -> [Spec]
  76. > extend spec = [ Spec { w = w', bits = bits spec .|. (bit `shiftL` width spec) }
  77. >               | bit <- [0, 1] ]
  78. >   where w' = width spec + 1
  79.  
  80. And here's the crucial claim: if `spec` matches `word` and if `width spec`
  81. is less than 32, then exactly one of the two specs from `extend spec`
  82. match `word`.  The proof is by case analysis on the relevant bit of `word`.
  83. This claim is so important that I'm going to call it Lemma One
  84. Here's a test:
  85.  
  86. > lemmaOne :: Spec -> Word32 -> Property
  87. > lemmaOne spec word =
  88. >   width spec < 32 && (spec `matches` word) ==>
  89. >       isSingletonList [s | s <- extend spec, s `matches` word]
  90. >
  91. > isSingletonList :: [a] -> Bool
  92. > isSingletonList [a] = True
  93. > isSingletonList _   = False
  94.  
  95. We're going to define a function which given a `Spec` and a sequence
  96. of 32-bit words, returns a list of the singleton words that match the
  97. spec.  The function will take time proportional to the length of the
  98. input times the size of the answer times 32, and extra space
  99. proportional to the size of the answer times 32.  Before we tackle the
  100. main functio, we define some constant-space XOR functions.
  101.  
  102. XOR ideas that are broken
  103. -------------
  104.  
  105. Function `xorWith f ws` applies function `f` to every word in `ws` and
  106. returns the exclusive or of the result.
  107.  
  108. > xorWith :: (Word32 -> Word32) -> [Word32] -> Word32
  109. > xorWith f ws = reduce xor 0 [f w | w <- ws]
  110. >   where reduce = foldl'
  111.  
  112. Thanks to *stream fusion* (see ICFP 2007), function `xorWith` takes
  113. constant space.
  114.  
  115. A list of nonzero words has a singleton if and only if either the
  116. exclusive or is nonzero, or if the exclusive or of `3 * w + 1` is
  117. nonzero.  (The "if" direction is trivial.  The "only if" direction is
  118. a conjecture that Evgeny Kluev has disproven; for a counterexample,
  119. see array `testb` below.  I can make Evgeny's example work by adding
  120. a third function `g`, but obviously this situation calls for a proof,
  121. and I don't have one.)
  122.  
  123. > hasSingleton :: [Word32] -> Bool
  124. > hasSingleton ws = xorWith id ws /= 0 || xorWith f ws /= 0 || xorWith g ws /= 0
  125. >   where f w = 3 * w + 1
  126. >         g w = 31 * w + 17
  127.  
  128. Efficient search for singletons
  129. -------------------------------
  130.  
  131.  
  132. Our main function returns a list of all the singletons matching a spec.          
  133.          
  134. > singletonsMatching :: Spec -> [Word32] -> [Word32]
  135. > singletonsMatching spec words =
  136. >  if hasSingleton [w | w <- words, spec `matches` w] then
  137. >    if width spec == 32 then
  138. >      [bits spec]      
  139. >    else
  140. >      concat [singletonsMatching spec' words | spec' <- extend spec]
  141. >  else
  142. >    []
  143.  
  144. We'll prove its correctness by induction on the width of the `spec`.
  145.  
  146.   - The base case is that `spec` has width 32.  In this case, the list
  147.     comprehension will give the list of words that are exactly equal
  148.     to `bits spec`.  Function `hasSingleton` will return `True` if and
  149.     only if this list has exactly one element, which will be true
  150.     exactly when `bits spec` is singleton in `words`.
  151.  
  152.   - Now let's prove that if `singletonsMatching` is correct for with
  153.     *m+1*, it is also correct for width *m*, where *m < 32$.  (This is
  154.     the opposite direction as usual for induction, but it doesn't
  155.     matter.)
  156.  
  157.     Calling `extend spec` on a `spec` of width *m* returns two specs
  158.     that have width $m+1$.  By hypothesis, `singletonsMatching` is
  159.     correct on these specs.  To prove: that the result contains
  160.     exactly those singletons that match `spec`.  By Lemma One, any
  161.     word that matches `spec` matches exactly one of the extended
  162.     specs.  By hypothesis, the recursive calls return exactly the
  163.     singletons matching the extend specs.  When we combine the results
  164.     of these calls with `concat`, we get exactly the matching
  165.     singletons, with no duplicates and no omissions.
  166.  
  167. Actually solving the problem is anticlimactic: the singletons are all
  168. the singletons that match the empty spec:
  169.  
  170. > singletons :: [Word32] -> [Word32]
  171. > singletons words = singletonsMatching universalSpec words
  172.  
  173. Testing code
  174. ------------
  175.  
  176. > testa, testb :: [Word32]
  177. > testa = [10, 1, 1, 2, 3, 4, 2, 5, 6, 7, 10]
  178. > testb = [ 0x0000
  179. >         , 0x0010
  180. >         , 0x0100
  181. >         , 0x0110
  182. >         , 0x1000
  183. >         , 0x1010
  184. >         , 0x1100
  185. >         , 0x1110
  186. >         ]
  187.  
  188. Beyond this point, if you want to follow what's going on, you need to
  189. know [QuickCheck](http://en.wikipedia.org/wiki/QuickCheck).  
  190.  
  191. Here's a random generator for specs:
  192.  
  193. > instance Arbitrary Spec where
  194. >   arbitrary = do width <- choose (0, 32)
  195. >                  b <- arbitrary
  196. >                  return (randomSpec width b)
  197. >   shrink spec = [randomSpec w' (bits spec) | w' <- shrink (width spec)] ++
  198. >                 [randomSpec (width spec) b | b  <- shrink (bits spec)]
  199. > randomSpec width bits = Spec { w = width, bits = mask bits }    
  200. >   where mask b = if width == 32 then b
  201. >                  else (b `shiftL` n) `shiftR` n
  202. >         n = 32 - width
  203.  
  204. Using this generator, we can test Lemma One using `quickCheck lemmaOne`.
  205.  
  206. We can test to see that any word claimed to be a singleton is in fact singleton:
  207.  
  208. > singletonsAreSingleton nzwords =
  209. >   not (hasTriple words) ==> all (`isSingleton` words) (singletons words)
  210. >   where isSingleton w words = isSingletonList [w' | w' <- words, w' == w]
  211. >         words = [w | NonZero w <- nzwords]
  212.  
  213. > hasTriple :: [Word32] -> Bool
  214. > hasTriple words = hasTrip (sort words)
  215. > hasTrip (w1:w2:w3:ws) = (w1 == w2 && w2 == w3) || hasTrip (w2:w3:ws)
  216. > hasTrip _ = False
  217.  
  218.  
  219. Here's another property that tests the fast `singletons` against a
  220. slower algorithm that uses sorting.
  221.  
  222. > singletonsOK :: [NonZero Word32] -> Property
  223. > singletonsOK nzwords = not (hasTriple words) ==>
  224. >   sort (singletons words) == sort (slowSingletons words)
  225. >  where words = [w | NonZero w <- nzwords ]
  226. >        slowSingletons words = stripDoubletons (sort words)
  227. >        stripDoubletons (w1:w2:ws) | w1 == w2 = stripDoubletons ws
  228. >                                   | otherwise = w1 : stripDoubletons (w2:ws)
  229. >        stripDoubletons as = as
clone this paste RAW Paste Data