Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ===== IACA output - first option =====
- Intel(R) Architecture Code Analyzer Version - 2.1
- Analyzed File - /Users/jonny/Library/Developer/Xcode/DerivedData/scatter-drouuljnuqacmubvnwtsdswhoaif/Build/Products/Release/scatter
- Binary Format - 32Bit
- Architecture - SNB
- Analysis Type - Throughput
- Throughput Analysis Report
- --------------------------
- Block Throughput: 8.20 Cycles Throughput Bottleneck: Port0, Port1, Port2_DATA, Port3_DATA
- Port Binding In Cycles Per Iteration:
- -------------------------------------------------------------------------
- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 |
- -------------------------------------------------------------------------
- | Cycles | 8.0 0.0 | 8.0 | 6.5 8.0 | 6.5 8.0 | 4.0 | 6.0 |
- -------------------------------------------------------------------------
- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
- D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
- F - Macro Fusion with the previous instruction occurred
- * - instruction micro-ops not bound to a port
- ^ - Micro Fusion happened
- # - ESP Tracking sync uop was issued
- @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
- ! - instruction not supported, was not accounted in Analysis
- | Num Of | Ports pressure in cycles | |
- | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
- ---------------------------------------------------------------------
- | 1 | | | 0.5 0.5 | 0.5 0.5 | | | CP | mov ecx, dword ptr [ebp+0x8]
- | 1 | | | 0.5 0.5 | 0.5 0.5 | | | CP | mov ecx, dword ptr [ecx]
- | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm0, qword ptr [edx-0x18]
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm1, ymmword ptr [ecx+ebx*1-0x20]
- | 1 | 1.0 | | | | | | CP | vmulpd ymm2, ymm1, ymm0
- | 1 | | | | | | 1.0 | | vperm2f128 ymm3, ymm1, ymm0, 0x1
- | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm4, qword ptr [edx-0x8]
- | 1 | 1.0 | | | | | | CP | vmulpd ymm5, ymm3, ymm4
- | 1 | | 1.0 | | | | | CP | vaddpd ymm2, ymm2, ymm5
- | 1 | | 1.0 | | | | | CP | vaddpd ymm7, ymm7, ymm2
- | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0xc0], ymm7
- | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm2, qword ptr [edx-0x10]
- | 1 | 1.0 | | | | | | CP | vmulpd ymm1, ymm1, ymm2
- | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm5, qword ptr [edx]
- | 1 | 1.0 | | | | | | CP | vmulpd ymm3, ymm3, ymm5
- | 1 | | 1.0 | | | | | CP | vaddpd ymm1, ymm1, ymm3
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm3, ymmword ptr [ecx+ebx*1]
- | 1 | | 1.0 | | | | | CP | vaddpd ymm6, ymm6, ymm1
- | 1 | 1.0 | | | | | | CP | vmulpd ymm0, ymm0, ymm3
- | 1 | | | | | | 1.0 | | vperm2f128 ymm1, ymm3, ymm0, 0x1
- | 1 | 1.0 | | | | | | CP | vmulpd ymm4, ymm1, ymm4
- | 1 | | 1.0 | | | | | CP | vsubpd ymm0, ymm0, ymm4
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm4, ymmword ptr [esp+0xe0]
- | 1 | | | | | | 1.0 | | vmovapd ymm7, ymm6
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm6, ymmword ptr [esp+0x100]
- | 1 | | 1.0 | | | | | CP | vaddpd ymm6, ymm6, ymm0
- | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0x100], ymm6
- | 1 | | | | | | 1.0 | | vmovapd ymm6, ymm7
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm7, ymmword ptr [esp+0xc0]
- | 1 | 1.0 | | | | | | CP | vmulpd ymm0, ymm3, ymm2
- | 1 | 1.0 | | | | | | CP | vmulpd ymm1, ymm1, ymm5
- | 1 | | 1.0 | | | | | CP | vsubpd ymm0, ymm0, ymm1
- | 1 | | 1.0 | | | | | CP | vaddpd ymm4, ymm4, ymm0
- | 1 | | | | | | 1.0 | | add eax, 0x2
- | 1 | | | | | | 1.0 | | inc edi
- Total Num Of Uops: 37
- ===== IACA output - second option =====
- Throughput Analysis Report
- --------------------------
- Block Throughput: 13.15 Cycles Throughput Bottleneck: Port5
- Port Binding In Cycles Per Iteration:
- -------------------------------------------------------------------------
- | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 |
- -------------------------------------------------------------------------
- | Cycles | 8.9 0.0 | 8.9 | 5.5 8.0 | 5.5 8.0 | 4.0 | 13.1 |
- -------------------------------------------------------------------------
- N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
- D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
- F - Macro Fusion with the previous instruction occurred
- * - instruction micro-ops not bound to a port
- ^ - Micro Fusion happened
- # - ESP Tracking sync uop was issued
- @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
- ! - instruction not supported, was not accounted in Analysis
- | Num Of | Ports pressure in cycles | |
- | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
- ---------------------------------------------------------------------
- | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov ecx, dword ptr [ebp+0x8]
- | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov ecx, dword ptr [ecx]
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm0, ymmword ptr [ebx]
- | 1 | | | | | | 1.0 | CP | vpermilpd xmm1, xmm0, 0x0
- | 1 | | | | | | 1.0 | CP | vinsertf128 ymm1, ymm1, xmm1, 0x1
- | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0x80], ymm1
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm2, ymmword ptr [ecx+edx*1-0x20]
- | 1 | 1.0 | | | | | | | vmulpd ymm3, ymm2, ymm1
- | 1 | | | | | | 1.0 | CP | vperm2f128 ymm4, ymm2, ymm0, 0x1
- | 1 | | | | | | 1.0 | CP | vextractf128 xmm5, ymm0, 0x1
- | 1 | | | | | | 1.0 | CP | vpermilpd xmm6, xmm5, 0x0
- | 1 | | | | | | 1.0 | CP | vinsertf128 ymm6, ymm6, xmm6, 0x1
- | 1 | | | | | | 1.0 | CP | vmovapd ymm1, ymm7
- | 1 | 1.0 | | | | | | | vmulpd ymm7, ymm4, ymm6
- | 1 | | 1.0 | | | | | | vaddpd ymm3, ymm3, ymm7
- | 1 | | | | | | 1.0 | CP | vmovapd ymm7, ymm1
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm1, ymmword ptr [esp+0xe0]
- | 1 | | 1.0 | | | | | | vaddpd ymm1, ymm1, ymm3
- | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0xe0], ymm1
- | 1 | | | | | | 1.0 | CP | vpermilpd xmm0, xmm0, 0x3
- | 1 | | | | | | 1.0 | CP | vinsertf128 ymm0, ymm0, xmm0, 0x1
- | 1 | 1.0 | | | | | | | vmulpd ymm2, ymm2, ymm0
- | 1 | | | | | | 1.0 | CP | vpermilpd xmm3, xmm5, 0x3
- | 1 | | | | | | 1.0 | CP | vinsertf128 ymm3, ymm3, xmm3, 0x1
- | 1 | 1.0 | | | | | | | vmulpd ymm4, ymm4, ymm3
- | 1 | | 1.0 | | | | | | vaddpd ymm2, ymm2, ymm4
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm4, ymmword ptr [ecx+edx*1]
- | 1 | | 1.0 | | | | | | vaddpd ymm7, ymm7, ymm2
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm1, ymmword ptr [esp+0x80]
- | 1 | 1.0 | | | | | | | vmulpd ymm1, ymm1, ymm4
- | 1 | | | | | | 1.0 | CP | vperm2f128 ymm2, ymm4, ymm0, 0x1
- | 1 | 1.0 | | | | | | | vmulpd ymm5, ymm6, ymm2
- | 1 | | 1.0 | | | | | | vsubpd ymm1, ymm1, ymm5
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm5, ymmword ptr [esp+0xa0]
- | 1 | | 1.0 | | | | | | vaddpd ymm5, ymm5, ymm1
- | 1 | 1.0 | | | | | | | vmulpd ymm0, ymm0, ymm4
- | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm4, ymmword ptr [esp+0xc0]
- | 1 | 1.0 | | | | | | | vmulpd ymm1, ymm3, ymm2
- | 1 | | 1.0 | | | | | | vsubpd ymm0, ymm0, ymm1
- | 1 | | 1.0 | | | | | | vaddpd ymm4, ymm4, ymm0
- | 1 | 0.1 | 0.9 | | | | 0.1 | CP | add eax, 0x2
- | 1 | 0.9 | 0.1 | | | | 0.1 | CP | inc edi
- Total Num Of Uops: 44
Advertisement
Add Comment
Please, Sign In to add comment