Advertisement
Guest User

Untitled

a guest
May 28th, 2013
167
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.91 KB | None | 0 0
  1. Tyler Camp: my custom implementation is the same as it was before, but instead of the nodes being allocated off the heap I just make a giant vector of blank nodes and pull new nodes out of it
  2. Tyler Camp: it has the same structure, how I generate the nodes is just different
  3. Allen Webster: right so there should be less cache misses
  4. Tyler Camp: the tree's still there, the nodes that make up the tree are in contiguous memory
  5. Tyler Camp: and maybe
  6. Tyler Camp: I don't really know
  7. Tyler Camp: I didn't really expect there to be much of a difference but I knew that it would be a possibility
  8. Tyler Camp: since cache only goes up to about 16MB (lumping L1, L2, and L3 all together) there's no way that all of the relevant data could be stored in cache for comparison for either algorithm
  9. Tyler Camp: EXCEPT
  10. Tyler Camp: that fucking quicksort
  11. Tyler Camp: right
  12. Tyler Camp: all of my nodes are in contiguous memory, but they're not accessed linearly, they're accessed (almost) randomly
  13. Tyler Camp: quicksort iterates directly over a group of elements, in this case all that memory is contiguous
  14. Tyler Camp: and therefore it is the perfect candidate for acceleration via cache
  15. Tyler Camp: my code, not so much
  16. Allen Webster: wait...
  17. Allen Webster: right
  18. Allen Webster: I forgot how quicksort worked
  19. Allen Webster: I wasn't thinking exactly correctly
  20. Tyler Camp: roit
  21. Tyler Camp: IIRC, it's along the lines of
  22. Tyler Camp: pick some reasonable value; everything under that value is placed in one bin and everything over is placed in another bin
  23. Tyler Camp: then, on each bin, repeat the process
  24. Tyler Camp: it is essentially the same process as a binary tree's insertion, but with quicksort it iterates over the contents of an entire bin to do its comparisons
  25. Allen Webster: ok I guess I was thinking correctly
  26. Tyler Camp: the comparisons are against a single constant value, which is contained in memory
  27. Tyler Camp: since the comparison is a single address and all other elements being referenced are contiguously aligned and linearly accessed, the cache can be adequately used to access data
  28. Tyler Camp: with my code there's a comparison value that always changes (the current value being inserted) and the values it's being compared against are always changing since the node route to be taken for insertion will be different for each item
  29. Allen Webster: anyway you're beating it in the <4000 range?
  30. Tyler Camp: and past that is most likely approximately the point where all of the data can't be held in cache
  31. Tyler Camp: well, it would require a lot more data to fill the cache, but at that 4000 point it's possible that more's being shoved into L3 than into L1 due to other processes running in the background, and quicksort can better make use of locally contiguous data. My application would be more likely to dip back and forth between L1, L3, and RAM, while quicksort would do spurts of L1 and then maybe some loading time from the RAM. (AFAIK.)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement