• API
• FAQ
• Tools
• Trends
• Archive
SHARE
TWEET

# finding singleton elements in a sequence

nrnrnr Aug 13th, 2012 89 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
1. I went offline and proved the original algorithm subject to the
2. conjecture that the XOR tricks worked. As it happens, the XOR tricks
3. don't work, but the following argument may still interest some
4. people. (I re-did it in Haskell because I find proofs much easier when
5. I have recursive functions instead of loops and I can use data
6. structures. But for the Pythonistas in the audience I tried to use
7. list comprehensions wherever possible.)
8.
9.
10. Beautiful theory slain by an ugly fact
11. ======================================
12.
13. Problem: we're given a sequence of *n* nonzero 32-bit words in which
14. every element is either *singleton* or *doubleton*:
15.
16.   - If a word appears exactly once, it is *singleton*.
17.
18.   - If a word appears exactly twice, it is *doubleton*.
19.
20.   - No word appears three or more times.
21.
22. The problem is to find the singletons.  If there are three singletons,
23. we should use linear time and constant space.  More generally, if
24. there are *k* singletons, we should use _O(k*n)_ time and _O(k)_
25. space.  Unfortunately, the algorithm rests on an unproven conjecture
26. about exclusive or, and the conjecture has been shown to be false.
27.
28. I've paid close attention to proof of correctness.  Arguments about
29. resource bounds are hand-waving.
30.
31. We begin with these basics:
32.
33. > module Singleton where
34. > import Data.Bits
35. > import Data.List
36. > import Data.Word
37. > import Test.QuickCheck hiding ((.&.))
38.
39.
40. Key abstraction: Partial specification of a word
41. -----------------------------------------
42.
43.
44. To tackle the problem I'm going to introduce an abstraction:
45. to describe the least significant \$w\$ bits of a 32-bit word,
46. I introduce a `Spec`:
47.
48. > data Spec = Spec { w :: Int, bits :: Word32 }
49. >    deriving Show
50. > width = w -- width of a Spec
51.
52. A `Spec` matches a word if the least significant `w` bits are equal
53. to `bits`.  If `w` is zero, by definition all words match:
54.
55. > matches :: Spec -> Word32 -> Bool
56. > matches spec word = width spec == 0 ||
57. >                     ((word `shiftL` n) `shiftR` n) == bits spec
58. >   where n = 32 - width spec
59. >
60. > universalSpec = Spec { w = 0, bits = 0 }
61.
62. Here are some claims about `Spec`s:
63.
64.   - All words match the `universalSpec`, which has width 0
65.
66.   - If `matches spec word` and `width spec == 32`, then `word == bits spec`
67.
68. Key idea: "extend" a partial specification
69. ------------------------------------------
70.
71. Here is the key idea of the algorithm: we can *extend* a `Spec` by
72. adding another bit to the specification.  Extending a `Spec` produces
73. a list of two `Spec`s
74.
75. > extend :: Spec -> [Spec]
76. > extend spec = [ Spec { w = w', bits = bits spec .|. (bit `shiftL` width spec) }
77. >               | bit <- [0, 1] ]
78. >   where w' = width spec + 1
79.
80. And here's the crucial claim: if `spec` matches `word` and if `width spec`
81. is less than 32, then exactly one of the two specs from `extend spec`
82. match `word`.  The proof is by case analysis on the relevant bit of `word`.
83. This claim is so important that I'm going to call it Lemma One
84. Here's a test:
85.
86. > lemmaOne :: Spec -> Word32 -> Property
87. > lemmaOne spec word =
88. >   width spec < 32 && (spec `matches` word) ==>
89. >       isSingletonList [s | s <- extend spec, s `matches` word]
90. >
91. > isSingletonList :: [a] -> Bool
92. > isSingletonList [a] = True
93. > isSingletonList _   = False
94.
95. We're going to define a function which given a `Spec` and a sequence
96. of 32-bit words, returns a list of the singleton words that match the
97. spec.  The function will take time proportional to the length of the
98. input times the size of the answer times 32, and extra space
99. proportional to the size of the answer times 32.  Before we tackle the
100. main functio, we define some constant-space XOR functions.
101.
102. XOR ideas that are broken
103. -------------
104.
105. Function `xorWith f ws` applies function `f` to every word in `ws` and
106. returns the exclusive or of the result.
107.
108. > xorWith :: (Word32 -> Word32) -> [Word32] -> Word32
109. > xorWith f ws = reduce xor 0 [f w | w <- ws]
110. >   where reduce = foldl'
111.
112. Thanks to *stream fusion* (see ICFP 2007), function `xorWith` takes
113. constant space.
114.
115. A list of nonzero words has a singleton if and only if either the
116. exclusive or is nonzero, or if the exclusive or of `3 * w + 1` is
117. nonzero.  (The "if" direction is trivial.  The "only if" direction is
118. a conjecture that Evgeny Kluev has disproven; for a counterexample,
119. see array `testb` below.  I can make Evgeny's example work by adding
120. a third function `g`, but obviously this situation calls for a proof,
121. and I don't have one.)
122.
123. > hasSingleton :: [Word32] -> Bool
124. > hasSingleton ws = xorWith id ws /= 0 || xorWith f ws /= 0 || xorWith g ws /= 0
125. >   where f w = 3 * w + 1
126. >         g w = 31 * w + 17
127.
128. Efficient search for singletons
129. -------------------------------
130.
131.
132. Our main function returns a list of all the singletons matching a spec.
133.
134. > singletonsMatching :: Spec -> [Word32] -> [Word32]
135. > singletonsMatching spec words =
136. >  if hasSingleton [w | w <- words, spec `matches` w] then
137. >    if width spec == 32 then
138. >      [bits spec]
139. >    else
140. >      concat [singletonsMatching spec' words | spec' <- extend spec]
141. >  else
142. >    []
143.
144. We'll prove its correctness by induction on the width of the `spec`.
145.
146.   - The base case is that `spec` has width 32.  In this case, the list
147.     comprehension will give the list of words that are exactly equal
148.     to `bits spec`.  Function `hasSingleton` will return `True` if and
149.     only if this list has exactly one element, which will be true
150.     exactly when `bits spec` is singleton in `words`.
151.
152.   - Now let's prove that if `singletonsMatching` is correct for with
153.     *m+1*, it is also correct for width *m*, where *m < 32\$.  (This is
154.     the opposite direction as usual for induction, but it doesn't
155.     matter.)
156.
157.     Calling `extend spec` on a `spec` of width *m* returns two specs
158.     that have width \$m+1\$.  By hypothesis, `singletonsMatching` is
159.     correct on these specs.  To prove: that the result contains
160.     exactly those singletons that match `spec`.  By Lemma One, any
161.     word that matches `spec` matches exactly one of the extended
162.     specs.  By hypothesis, the recursive calls return exactly the
163.     singletons matching the extend specs.  When we combine the results
164.     of these calls with `concat`, we get exactly the matching
165.     singletons, with no duplicates and no omissions.
166.
167. Actually solving the problem is anticlimactic: the singletons are all
168. the singletons that match the empty spec:
169.
170. > singletons :: [Word32] -> [Word32]
171. > singletons words = singletonsMatching universalSpec words
172.
173. Testing code
174. ------------
175.
176. > testa, testb :: [Word32]
177. > testa = [10, 1, 1, 2, 3, 4, 2, 5, 6, 7, 10]
178. > testb = [ 0x0000
179. >         , 0x0010
180. >         , 0x0100
181. >         , 0x0110
182. >         , 0x1000
183. >         , 0x1010
184. >         , 0x1100
185. >         , 0x1110
186. >         ]
187.
188. Beyond this point, if you want to follow what's going on, you need to
189. know [QuickCheck](http://en.wikipedia.org/wiki/QuickCheck).
190.
191. Here's a random generator for specs:
192.
193. > instance Arbitrary Spec where
194. >   arbitrary = do width <- choose (0, 32)
195. >                  b <- arbitrary
196. >                  return (randomSpec width b)
197. >   shrink spec = [randomSpec w' (bits spec) | w' <- shrink (width spec)] ++
198. >                 [randomSpec (width spec) b | b  <- shrink (bits spec)]
199. > randomSpec width bits = Spec { w = width, bits = mask bits }
200. >   where mask b = if width == 32 then b
201. >                  else (b `shiftL` n) `shiftR` n
202. >         n = 32 - width
203.
204. Using this generator, we can test Lemma One using `quickCheck lemmaOne`.
205.
206. We can test to see that any word claimed to be a singleton is in fact singleton:
207.
208. > singletonsAreSingleton nzwords =
209. >   not (hasTriple words) ==> all (`isSingleton` words) (singletons words)
210. >   where isSingleton w words = isSingletonList [w' | w' <- words, w' == w]
211. >         words = [w | NonZero w <- nzwords]
212.
213. > hasTriple :: [Word32] -> Bool
214. > hasTriple words = hasTrip (sort words)
215. > hasTrip (w1:w2:w3:ws) = (w1 == w2 && w2 == w3) || hasTrip (w2:w3:ws)
216. > hasTrip _ = False
217.
218.
219. Here's another property that tests the fast `singletons` against a
220. slower algorithm that uses sorting.
221.
222. > singletonsOK :: [NonZero Word32] -> Property
223. > singletonsOK nzwords = not (hasTriple words) ==>
224. >   sort (singletons words) == sort (slowSingletons words)
225. >  where words = [w | NonZero w <- nzwords ]
226. >        slowSingletons words = stripDoubletons (sort words)
227. >        stripDoubletons (w1:w2:ws) | w1 == w2 = stripDoubletons ws
228. >                                   | otherwise = w1 : stripDoubletons (w2:ws)
229. >        stripDoubletons as = as
RAW Paste Data
Top