Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- From https://hornet.org/code/effects/rotozoom/pasroto.zip
- -------------------------------------------------------------------
- history and background:
- Second Reality: you remeber the head with the pentagram and the
- lens effect... then a flash and whooopsy, what's that? this is 160x100...
- Why? Otherwise it'd be too slow! But why, This is such a simple effect?
- When I (some time later) recoded this effect I noticed that the framerate
- drops at a certain angle, and the only reason could be the cache.
- The processor cache is organized in a special way to have fast access
- to it's memory. So you have cache lines of 16 (32) bytes on a 486 (pentium)
- which are atoms. They all have a tag address field which stores the
- position of the 16 (32) bytes in memory. Then you have 4 (2) ways which
- are in a way 4 (2) equal caches which can be processed at the same time.
- Finally there are 256 sets of one cache line per way. Bits 4-11 (5-12) of
- the address determine the used set for a memory access.
- At a memory access, the address is split into 3 parts: bits 0-3 (0-4)
- determine the byte in a line, bits 4-11 (5-12) determine the set, and
- bits 12-31 (13-31) are the tag address. The tag address is then compared
- to the tag addresses of the 4 (2) lines of the set. If one matches it is
- a cache hit, if not you get a cache miss, 16 (32) bytes are read from
- memory to the least recently used cache line of that set. This takes
- about 23 cycles on a 486dx2-66, while a cache hit takes no extra cycles.
- A cache is most effective if you read the memory in a linear order like
- you do it in a rotozoomer at low angles. You then get one cache miss
- out of 16 (32) memory accesses. Now imagine the angle is exactly 90°.
- You would then read then memory in steps of 256, after 8k the first
- cache line is overwritten, so if you process the next line, it is a
- cache miss. This results in 100% cache misses...
- How to optimize it?
- I had several discussions with Scholar / $eeN on this topic. (hiho!)
- We though about rendering the screen in a different order, so that
- the texture is read in a linear fashion. This would be diagonal lines
- instead of h-lines. But this is not a fast solution either, and more
- complicated anyway. You could also keep prerotated versions of the
- texture, but this would require 2x or more the amount of memory,
- and you are limited to a fixed texture if you do not want to modify
- 2 textures all the time.
- The 8x8 block approach was a good compromise. :) You can write dwords,
- and do not need too much memory while the cache contents are not
- destroyed.
- You can also use this 8x8 block approach to optimize movelist-tunnels:
- keep the movelist linear, while you go though the 8x8 blocks.
- And you can do other nice things with 8x8 blocks... ;))))))))))
- Which I cannot tell you yet. probably later! =}
- Cache optimizing seems to be quite stupid for vector engines, but
- it is ESSENTIAL for fast bitmap effects.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement