Advertisement
Guest User

Untitled

a guest
Nov 21st, 2017
65
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.01 KB | None | 0 0
  1. From https://hornet.org/code/effects/rotozoom/pasroto.zip
  2.  
  3. -------------------------------------------------------------------
  4.  
  5. history and background:
  6. Second Reality: you remeber the head with the pentagram and the
  7. lens effect... then a flash and whooopsy, what's that? this is 160x100...
  8. Why? Otherwise it'd be too slow! But why, This is such a simple effect?
  9. When I (some time later) recoded this effect I noticed that the framerate
  10. drops at a certain angle, and the only reason could be the cache.
  11. The processor cache is organized in a special way to have fast access
  12. to it's memory. So you have cache lines of 16 (32) bytes on a 486 (pentium)
  13. which are atoms. They all have a tag address field which stores the
  14. position of the 16 (32) bytes in memory. Then you have 4 (2) ways which
  15. are in a way 4 (2) equal caches which can be processed at the same time.
  16. Finally there are 256 sets of one cache line per way. Bits 4-11 (5-12) of
  17. the address determine the used set for a memory access.
  18. At a memory access, the address is split into 3 parts: bits 0-3 (0-4)
  19. determine the byte in a line, bits 4-11 (5-12) determine the set, and
  20. bits 12-31 (13-31) are the tag address. The tag address is then compared
  21. to the tag addresses of the 4 (2) lines of the set. If one matches it is
  22. a cache hit, if not you get a cache miss, 16 (32) bytes are read from
  23. memory to the least recently used cache line of that set. This takes
  24. about 23 cycles on a 486dx2-66, while a cache hit takes no extra cycles.
  25. A cache is most effective if you read the memory in a linear order like
  26. you do it in a rotozoomer at low angles. You then get one cache miss
  27. out of 16 (32) memory accesses. Now imagine the angle is exactly 90°.
  28. You would then read then memory in steps of 256, after 8k the first
  29. cache line is overwritten, so if you process the next line, it is a
  30. cache miss. This results in 100% cache misses...
  31. How to optimize it?
  32. I had several discussions with Scholar / $eeN on this topic. (hiho!)
  33. We though about rendering the screen in a different order, so that
  34. the texture is read in a linear fashion. This would be diagonal lines
  35. instead of h-lines. But this is not a fast solution either, and more
  36. complicated anyway. You could also keep prerotated versions of the
  37. texture, but this would require 2x or more the amount of memory,
  38. and you are limited to a fixed texture if you do not want to modify
  39. 2 textures all the time.
  40. The 8x8 block approach was a good compromise. :) You can write dwords,
  41. and do not need too much memory while the cache contents are not
  42. destroyed.
  43. You can also use this 8x8 block approach to optimize movelist-tunnels:
  44. keep the movelist linear, while you go though the 8x8 blocks.
  45. And you can do other nice things with 8x8 blocks... ;))))))))))
  46. Which I cannot tell you yet. probably later! =}
  47. Cache optimizing seems to be quite stupid for vector engines, but
  48. it is ESSENTIAL for fast bitmap effects.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement