Guest User

IACA output

a guest
May 20th, 2015
266
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 10.28 KB | None | 0 0
  1. ===== IACA output - first option =====
  2.  
  3. Intel(R) Architecture Code Analyzer Version - 2.1
  4. Analyzed File - /Users/jonny/Library/Developer/Xcode/DerivedData/scatter-drouuljnuqacmubvnwtsdswhoaif/Build/Products/Release/scatter
  5. Binary Format - 32Bit
  6. Architecture - SNB
  7. Analysis Type - Throughput
  8.  
  9. Throughput Analysis Report
  10. --------------------------
  11. Block Throughput: 8.20 Cycles Throughput Bottleneck: Port0, Port1, Port2_DATA, Port3_DATA
  12.  
  13. Port Binding In Cycles Per Iteration:
  14. -------------------------------------------------------------------------
  15. | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 |
  16. -------------------------------------------------------------------------
  17. | Cycles | 8.0 0.0 | 8.0 | 6.5 8.0 | 6.5 8.0 | 4.0 | 6.0 |
  18. -------------------------------------------------------------------------
  19.  
  20. N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
  21. D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
  22. F - Macro Fusion with the previous instruction occurred
  23. * - instruction micro-ops not bound to a port
  24. ^ - Micro Fusion happened
  25. # - ESP Tracking sync uop was issued
  26. @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
  27. ! - instruction not supported, was not accounted in Analysis
  28.  
  29. | Num Of | Ports pressure in cycles | |
  30. | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
  31. ---------------------------------------------------------------------
  32. | 1 | | | 0.5 0.5 | 0.5 0.5 | | | CP | mov ecx, dword ptr [ebp+0x8]
  33. | 1 | | | 0.5 0.5 | 0.5 0.5 | | | CP | mov ecx, dword ptr [ecx]
  34. | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm0, qword ptr [edx-0x18]
  35. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm1, ymmword ptr [ecx+ebx*1-0x20]
  36. | 1 | 1.0 | | | | | | CP | vmulpd ymm2, ymm1, ymm0
  37. | 1 | | | | | | 1.0 | | vperm2f128 ymm3, ymm1, ymm0, 0x1
  38. | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm4, qword ptr [edx-0x8]
  39. | 1 | 1.0 | | | | | | CP | vmulpd ymm5, ymm3, ymm4
  40. | 1 | | 1.0 | | | | | CP | vaddpd ymm2, ymm2, ymm5
  41. | 1 | | 1.0 | | | | | CP | vaddpd ymm7, ymm7, ymm2
  42. | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0xc0], ymm7
  43. | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm2, qword ptr [edx-0x10]
  44. | 1 | 1.0 | | | | | | CP | vmulpd ymm1, ymm1, ymm2
  45. | 1^ | | | 0.5 0.5 | 0.5 0.5 | | | CP | vbroadcastsd ymm5, qword ptr [edx]
  46. | 1 | 1.0 | | | | | | CP | vmulpd ymm3, ymm3, ymm5
  47. | 1 | | 1.0 | | | | | CP | vaddpd ymm1, ymm1, ymm3
  48. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm3, ymmword ptr [ecx+ebx*1]
  49. | 1 | | 1.0 | | | | | CP | vaddpd ymm6, ymm6, ymm1
  50. | 1 | 1.0 | | | | | | CP | vmulpd ymm0, ymm0, ymm3
  51. | 1 | | | | | | 1.0 | | vperm2f128 ymm1, ymm3, ymm0, 0x1
  52. | 1 | 1.0 | | | | | | CP | vmulpd ymm4, ymm1, ymm4
  53. | 1 | | 1.0 | | | | | CP | vsubpd ymm0, ymm0, ymm4
  54. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm4, ymmword ptr [esp+0xe0]
  55. | 1 | | | | | | 1.0 | | vmovapd ymm7, ymm6
  56. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm6, ymmword ptr [esp+0x100]
  57. | 1 | | 1.0 | | | | | CP | vaddpd ymm6, ymm6, ymm0
  58. | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0x100], ymm6
  59. | 1 | | | | | | 1.0 | | vmovapd ymm6, ymm7
  60. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | CP | vmovapd ymm7, ymmword ptr [esp+0xc0]
  61. | 1 | 1.0 | | | | | | CP | vmulpd ymm0, ymm3, ymm2
  62. | 1 | 1.0 | | | | | | CP | vmulpd ymm1, ymm1, ymm5
  63. | 1 | | 1.0 | | | | | CP | vsubpd ymm0, ymm0, ymm1
  64. | 1 | | 1.0 | | | | | CP | vaddpd ymm4, ymm4, ymm0
  65. | 1 | | | | | | 1.0 | | add eax, 0x2
  66. | 1 | | | | | | 1.0 | | inc edi
  67. Total Num Of Uops: 37
  68.  
  69.  
  70.  
  71.  
  72. ===== IACA output - second option =====
  73.  
  74. Throughput Analysis Report
  75. --------------------------
  76. Block Throughput: 13.15 Cycles Throughput Bottleneck: Port5
  77.  
  78. Port Binding In Cycles Per Iteration:
  79. -------------------------------------------------------------------------
  80. | Port | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 |
  81. -------------------------------------------------------------------------
  82. | Cycles | 8.9 0.0 | 8.9 | 5.5 8.0 | 5.5 8.0 | 4.0 | 13.1 |
  83. -------------------------------------------------------------------------
  84.  
  85. N - port number or number of cycles resource conflict caused delay, DV - Divider pipe (on port 0)
  86. D - Data fetch pipe (on ports 2 and 3), CP - on a critical path
  87. F - Macro Fusion with the previous instruction occurred
  88. * - instruction micro-ops not bound to a port
  89. ^ - Micro Fusion happened
  90. # - ESP Tracking sync uop was issued
  91. @ - SSE instruction followed an AVX256 instruction, dozens of cycles penalty is expected
  92. ! - instruction not supported, was not accounted in Analysis
  93.  
  94. | Num Of | Ports pressure in cycles | |
  95. | Uops | 0 - DV | 1 | 2 - D | 3 - D | 4 | 5 | |
  96. ---------------------------------------------------------------------
  97. | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov ecx, dword ptr [ebp+0x8]
  98. | 1 | | | 0.5 0.5 | 0.5 0.5 | | | | mov ecx, dword ptr [ecx]
  99. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm0, ymmword ptr [ebx]
  100. | 1 | | | | | | 1.0 | CP | vpermilpd xmm1, xmm0, 0x0
  101. | 1 | | | | | | 1.0 | CP | vinsertf128 ymm1, ymm1, xmm1, 0x1
  102. | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0x80], ymm1
  103. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm2, ymmword ptr [ecx+edx*1-0x20]
  104. | 1 | 1.0 | | | | | | | vmulpd ymm3, ymm2, ymm1
  105. | 1 | | | | | | 1.0 | CP | vperm2f128 ymm4, ymm2, ymm0, 0x1
  106. | 1 | | | | | | 1.0 | CP | vextractf128 xmm5, ymm0, 0x1
  107. | 1 | | | | | | 1.0 | CP | vpermilpd xmm6, xmm5, 0x0
  108. | 1 | | | | | | 1.0 | CP | vinsertf128 ymm6, ymm6, xmm6, 0x1
  109. | 1 | | | | | | 1.0 | CP | vmovapd ymm1, ymm7
  110. | 1 | 1.0 | | | | | | | vmulpd ymm7, ymm4, ymm6
  111. | 1 | | 1.0 | | | | | | vaddpd ymm3, ymm3, ymm7
  112. | 1 | | | | | | 1.0 | CP | vmovapd ymm7, ymm1
  113. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm1, ymmword ptr [esp+0xe0]
  114. | 1 | | 1.0 | | | | | | vaddpd ymm1, ymm1, ymm3
  115. | 2^ | | | 0.5 | 0.5 | 2.0 | | | vmovapd ymmword ptr [esp+0xe0], ymm1
  116. | 1 | | | | | | 1.0 | CP | vpermilpd xmm0, xmm0, 0x3
  117. | 1 | | | | | | 1.0 | CP | vinsertf128 ymm0, ymm0, xmm0, 0x1
  118. | 1 | 1.0 | | | | | | | vmulpd ymm2, ymm2, ymm0
  119. | 1 | | | | | | 1.0 | CP | vpermilpd xmm3, xmm5, 0x3
  120. | 1 | | | | | | 1.0 | CP | vinsertf128 ymm3, ymm3, xmm3, 0x1
  121. | 1 | 1.0 | | | | | | | vmulpd ymm4, ymm4, ymm3
  122. | 1 | | 1.0 | | | | | | vaddpd ymm2, ymm2, ymm4
  123. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm4, ymmword ptr [ecx+edx*1]
  124. | 1 | | 1.0 | | | | | | vaddpd ymm7, ymm7, ymm2
  125. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm1, ymmword ptr [esp+0x80]
  126. | 1 | 1.0 | | | | | | | vmulpd ymm1, ymm1, ymm4
  127. | 1 | | | | | | 1.0 | CP | vperm2f128 ymm2, ymm4, ymm0, 0x1
  128. | 1 | 1.0 | | | | | | | vmulpd ymm5, ymm6, ymm2
  129. | 1 | | 1.0 | | | | | | vsubpd ymm1, ymm1, ymm5
  130. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm5, ymmword ptr [esp+0xa0]
  131. | 1 | | 1.0 | | | | | | vaddpd ymm5, ymm5, ymm1
  132. | 1 | 1.0 | | | | | | | vmulpd ymm0, ymm0, ymm4
  133. | 1 | | | 0.5 1.0 | 0.5 1.0 | | | | vmovapd ymm4, ymmword ptr [esp+0xc0]
  134. | 1 | 1.0 | | | | | | | vmulpd ymm1, ymm3, ymm2
  135. | 1 | | 1.0 | | | | | | vsubpd ymm0, ymm0, ymm1
  136. | 1 | | 1.0 | | | | | | vaddpd ymm4, ymm4, ymm0
  137. | 1 | 0.1 | 0.9 | | | | 0.1 | CP | add eax, 0x2
  138. | 1 | 0.9 | 0.1 | | | | 0.1 | CP | inc edi
  139. Total Num Of Uops: 44
Advertisement
Add Comment
Please, Sign In to add comment