Instruction misses

Jan 9th, 2020
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Matt P. Dziubinski Instruction cache misses are fairly common nowadays and for larger applications this is in fact a source of significant optimization opportunities (from the AsmDB paper: "13.8% of total performance potential wasted due to front-end latency, dominated by i-cache misses"). It's generally not automatic (although there are tools that can assist with select use cases) and you may have to spend some time on this.
  3. Here are some of the resources that may be of help: Specifically for iTLB misses, see "Runtime Performance Optimization Blueprint: Large Code Pages" (
  5. You may be able to manually induce the compiler to select a basic block placement favoring a sequence of consecutive blocks that are likely to occur in a sequence using (C++20) or (pre-C++20).
  7. There are also tools for that, e.g., BOLT (Binary Optimization and Layout Tool) & PROPELLER (Profile Guided Optimizing Large Scale LLVM-based Relinker).
  8. You may be able to derive some benefits (even if indirectly) even from LTO in your compiler (e.g.,
  10. Other than that, see:
  11. - AsmDB: Understanding and Mitigating Front-end Stalls in Warehouse-Scale Computers:
  12. - Avoiding instruction cache misses:
  13. -
  14. -
  15. -
  16. -
  17. - Causes of Performance Instability due to Code Placement in X86:
  18. - Improving LLVM Generated Code Size for X86 Processors:
  20. Binary optimization tools (which include basic block layout optimizations):
  22. - BOLT (Binary Optimization and Layout Tool)
  23. - A linux command-line utility used for optimizing performance of binaries
  24. -
  25. - Accelerate large-scale applications with BOLT
  26. -
  27. - Building Binary Optimizer with LLVM
  28. - 2016 EuroLLVM Developers' Meeting; Maksim Panchenko
  29. -
  30. -
  31. - BOLT: A Practical Binary Optimizer for Data Centers and Beyond
  32. - Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni
  33. -
  34. - PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
  35. -
  36. -
  37. -
  38. -
RAW Paste Data