thezxtreme

Instruction misses

Jan 9th, 2020
46
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Matt P. Dziubinski Instruction cache misses are fairly common nowadays and for larger applications this is in fact a source of significant optimization opportunities (from the AsmDB paper: "13.8% of total performance potential wasted due to front-end latency, dominated by i-cache misses"). It's generally not automatic (although there are tools that can assist with select use cases) and you may have to spend some time on this.
  2.  
  3. Here are some of the resources that may be of help: Specifically for iTLB misses, see "Runtime Performance Optimization Blueprint: Large Code Pages" (https://software.intel.com/.../runtime-performance...).
  4.  
  5. You may be able to manually induce the compiler to select a basic block placement favoring a sequence of consecutive blocks that are likely to occur in a sequence using https://en.cppreference.com/w/cpp/language/attributes/likely (C++20) or https://nemequ.github.io/hedley/api-reference.html... (pre-C++20).
  6.  
  7. There are also tools for that, e.g., BOLT (Binary Optimization and Layout Tool) & PROPELLER (Profile Guided Optimizing Large Scale LLVM-based Relinker).
  8. You may be able to derive some benefits (even if indirectly) even from LTO in your compiler (e.g., https://hubicka.blogspot.com/.../gcc-9-link-time-and...).
  9.  
  10. Other than that, see:
  11. - AsmDB: Understanding and Mitigating Front-end Stalls in Warehouse-Scale Computers: https://ai.google/research/pubs/pub48320/
  12. - Avoiding instruction cache misses: https://pdziepak.github.io/.../06/21/avoiding-icache-misses/
  13. - https://easyperf.net/blog/2018/01/18/Code_alignment_issues
  14. - https://easyperf.net/.../Improving-performance-by-better...
  15. - https://easyperf.net/.../Understanding-IDQ_UOPS_NOT...
  16. - https://easyperf.net/.../Machine-code-layout-optimizatoins
  17. - Causes of Performance Instability due to Code Placement in X86: https://www.youtube.com/watch?v=IX16gcX4vDQ
  18. - Improving LLVM Generated Code Size for X86 Processors: https://www.youtube.com/watch?v=yHexQSFud3w
  19.  
  20. Binary optimization tools (which include basic block layout optimizations):
  21.  
  22. - BOLT (Binary Optimization and Layout Tool)
  23. - A linux command-line utility used for optimizing performance of binaries
  24. - https://github.com/facebookincubator/BOLT
  25. - Accelerate large-scale applications with BOLT
  26. - https://code.fb.com/.../accelerate-large-scale.../
  27. - Building Binary Optimizer with LLVM
  28. - 2016 EuroLLVM Developers' Meeting; Maksim Panchenko
  29. - https://llvm.org/.../Presentations/BOLT_EuroLLVM_2016.pdf
  30. - https://www.youtube.com/watch?v=gw3iDO3By5Y
  31. - BOLT: A Practical Binary Optimizer for Data Centers and Beyond
  32. - Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni
  33. - https://arxiv.org/abs/1807.06735
  34. - PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
  35. - https://github.com/google/llvm-propeller
  36. - https://github.com/.../llv.../blob/plo-dev/Propeller_RFC.pdf
  37. - http://lists.llvm.org/.../llvm.../2019-September/135393.html
  38. - https://www.youtube.com/watch?v=DySuXFGmB40
RAW Paste Data