Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Matt P. Dziubinski Instruction cache misses are fairly common nowadays and for larger applications this is in fact a source of significant optimization opportunities (from the AsmDB paper: "13.8% of total performance potential wasted due to front-end latency, dominated by i-cache misses"). It's generally not automatic (although there are tools that can assist with select use cases) and you may have to spend some time on this.
- Here are some of the resources that may be of help: Specifically for iTLB misses, see "Runtime Performance Optimization Blueprint: Large Code Pages" (https://software.intel.com/.../runtime-performance...).
- You may be able to manually induce the compiler to select a basic block placement favoring a sequence of consecutive blocks that are likely to occur in a sequence using https://en.cppreference.com/w/cpp/language/attributes/likely (C++20) or https://nemequ.github.io/hedley/api-reference.html... (pre-C++20).
- There are also tools for that, e.g., BOLT (Binary Optimization and Layout Tool) & PROPELLER (Profile Guided Optimizing Large Scale LLVM-based Relinker).
- You may be able to derive some benefits (even if indirectly) even from LTO in your compiler (e.g., https://hubicka.blogspot.com/.../gcc-9-link-time-and...).
- Other than that, see:
- - AsmDB: Understanding and Mitigating Front-end Stalls in Warehouse-Scale Computers: https://ai.google/research/pubs/pub48320/
- - Avoiding instruction cache misses: https://pdziepak.github.io/.../06/21/avoiding-icache-misses/
- - https://easyperf.net/blog/2018/01/18/Code_alignment_issues
- - https://easyperf.net/.../Improving-performance-by-better...
- - https://easyperf.net/.../Understanding-IDQ_UOPS_NOT...
- - https://easyperf.net/.../Machine-code-layout-optimizatoins
- - Causes of Performance Instability due to Code Placement in X86: https://www.youtube.com/watch?v=IX16gcX4vDQ
- - Improving LLVM Generated Code Size for X86 Processors: https://www.youtube.com/watch?v=yHexQSFud3w
- Binary optimization tools (which include basic block layout optimizations):
- - BOLT (Binary Optimization and Layout Tool)
- - A linux command-line utility used for optimizing performance of binaries
- - https://github.com/facebookincubator/BOLT
- - Accelerate large-scale applications with BOLT
- - https://code.fb.com/.../accelerate-large-scale.../
- - Building Binary Optimizer with LLVM
- - 2016 EuroLLVM Developers' Meeting; Maksim Panchenko
- - https://llvm.org/.../Presentations/BOLT_EuroLLVM_2016.pdf
- - https://www.youtube.com/watch?v=gw3iDO3By5Y
- - BOLT: A Practical Binary Optimizer for Data Centers and Beyond
- - Maksim Panchenko, Rafael Auler, Bill Nell, Guilherme Ottoni
- - https://arxiv.org/abs/1807.06735
- - PROPELLER: Profile Guided Optimizing Large Scale LLVM-based Relinker
- - https://github.com/google/llvm-propeller
- - https://github.com/.../llv.../blob/plo-dev/Propeller_RFC.pdf
- - http://lists.llvm.org/.../llvm.../2019-September/135393.html
- - https://www.youtube.com/watch?v=DySuXFGmB40
Add Comment
Please, Sign In to add comment