Guest User

Untitled

a guest
Jun 18th, 2018
83
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.06 KB | None | 0 0
  1. (ns clj-etl-utils.sequences)
  2.  
  3.  
  4. (def random-sample-seq
  5. (let [rnd (java.util.Random.)]
  6. (fn self [[item & population :as population-seq] population-size remaining-samples-needed]
  7. (if (or (zero? remaining-samples-needed) (empty? population-seq))
  8. nil
  9. (if (< (.nextInt rnd population-size) remaining-samples-needed)
  10. (lazy-cat
  11. [item]
  12. (self population
  13. (dec population-size)
  14. (dec remaining-samples-needed)))
  15. (self population
  16. (dec population-size)
  17. remaining-samples-needed))))))
  18.  
  19. (comment
  20.  
  21. (with-open [outp (ds/writer "/tmp/20k-sample.tab")]
  22. (doseq [line (clj-etl-utils.sequences/random-sample-seq
  23. (ds/read-lines "data-to-be-sampled.tab")
  24. 390000000
  25. 20000
  26. (clj-etl-utils.lang/make-periodic-invoker
  27. 10
  28. #(printf "at %s out of %s: %3.2f\n" %1 100 (* 100 (/ %1 100.0)))))]
  29. (.println outp line)))
  30.  
  31. )
Add Comment
Please, Sign In to add comment