Advertisement
Guest User

Untitled

a guest
Nov 2nd, 2017
356
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.94 KB | None | 0 0
  1. /* Computed GOTOs, or
  2. the-optimization-commonly-but-improperly-known-as-"threaded code"
  3. using gcc's labels-as-values extension
  4. (http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html).
  5.  
  6. The traditional bytecode evaluation loop uses a "switch" statement, which
  7. decent compilers will optimize as a single indirect branch instruction
  8. combined with a lookup table of jump addresses. However, since the
  9. indirect jump instruction is shared by all opcodes, the CPU will have a
  10. hard time making the right prediction for where to jump next (actually,
  11. it will be always wrong except in the uncommon case of a sequence of
  12. several identical opcodes).
  13.  
  14. "Threaded code" in contrast, uses an explicit jump table and an explicit
  15. indirect jump instruction at the end of each opcode. Since the jump
  16. instruction is at a different address for each opcode, the CPU will make a
  17. separate prediction for each of these instructions, which is equivalent to
  18. predicting the second opcode of each opcode pair. These predictions have
  19. a much better chance to turn out valid, especially in small bytecode loops.
  20.  
  21. A mispredicted branch on a modern CPU flushes the whole pipeline and
  22. can cost several CPU cycles (depending on the pipeline depth),
  23. and potentially many more instructions (depending on the pipeline width).
  24. A correctly predicted branch, however, is nearly free.
  25.  
  26. At the time of this writing, the "threaded code" version is up to 15-20%
  27. faster than the normal "switch" version, depending on the compiler and the
  28. CPU architecture.
  29.  
  30. We disable the optimization if DYNAMIC_EXECUTION_PROFILE is defined,
  31. because it would render the measurements invalid.
  32.  
  33.  
  34. NOTE: care must be taken that the compiler doesn't try to "optimize" the
  35. indirect jumps by sharing them between all opcodes. Such optimizations
  36. can be disabled on gcc by using the -fno-gcse flag (or possibly
  37. -fno-crossjumping).
  38. */
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement