Guest User

Untitled

a guest
Nov 24th, 2017
250
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.63 KB | None | 0 0
  1. ==16832== NVPROF is profiling process 16832, command: ./standalone
  2.  
  3. + + + + + + + + + + + + + + + +
  4. + RUNNING IN DOUBLE PRECISION +
  5. + + + + + + + + + + + + + + + +
  6.  
  7.  
  8. STANDALONE PHYSICS
  9. RUNNING ON CPU
  10. RADIATION TEST
  11.  
  12. Iteration 1
  13. Initialize test
  14. *****************************************************
  15. * Radiative transfer calculations employ data *
  16. * provided in routine rad_aibi *
  17. *****************************************************
  18. Run test
  19. Finalize test
  20. Iteration 2
  21. Initialize test
  22. *****************************************************
  23. * Radiative transfer calculations employ data *
  24. * provided in routine rad_aibi *
  25. *****************************************************
  26. Run test
  27. Finalize test
  28. Iteration 3
  29. Initialize test
  30. *****************************************************
  31. * Radiative transfer calculations employ data *
  32. * provided in routine rad_aibi *
  33. *****************************************************
  34. Run test
  35. Finalize test
  36. Iteration 4
  37. Initialize test
  38. *****************************************************
  39. * Radiative transfer calculations employ data *
  40. * provided in routine rad_aibi *
  41. *****************************************************
  42. Run test
  43. Finalize test
  44. Domain size, ie,je,ke : 80 60 60
  45. nproma : 4800
  46. data_set type :full
  47. --------------------------------------------------------------------------
  48. Local timers:
  49. NCOMP_PE= 1
  50. --------------------------------------------------------------------------
  51. Id Tag Ncalls min[s] max[s] mean[s]
  52. 1 Total Phys 4 2.7110 2.7110 2.7110
  53. 2 Copy block 8 0.0770 0.0770 0.0770
  54. 3 Radiation 4 2.6280 2.6280 2.6280
  55. --------------------------------------------------------------------------
  56. ==16832== Generated result file: /scratch/snx1600/siddhart/playground/standalone/run/standalone-nvprof-output.prof
  57. + nvprof -i standalone-nvprof-output.prof
  58. ======== Profiling result:
  59. Time(%) Time Calls Avg Min Max Name
  60. 44.66% 245.55ms 116 2.1168ms 1.9131ms 3.5222ms FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_2
  61. 7.87% 43.243ms 12 3.6036ms 603.49us 9.8776ms FUNC___radiation_rg_MOD_opt_so_SCOP_0_KERNEL_1
  62. 7.42% 40.778ms 4 10.194ms 8.9761ms 12.233ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_1
  63. 6.89% 37.882ms 20 1.8941ms 1.6174ms 2.4078ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_24_KERNEL_0
  64. 5.23% 28.756ms 216 133.13us 81.290us 1.5316ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_17_KERNEL_0
  65. 4.09% 22.483ms 132 170.33us 158.87us 180.05us FUNC___radiation_rg_MOD_fesft_dp_SCOP_7_KERNEL_0
  66. 3.48% 19.118ms 192 99.575us 12.829us 1.1363ms FUNC___radiation_rg_MOD_inv_th_SCOP_0_KERNEL_0
  67. 3.39% 18.640ms 12 1.5533ms 690.67us 2.1805ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_3_KERNEL_0
  68. 2.77% 15.230ms 4 3.8074ms 3.5440ms 3.9241ms FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_4_KERNEL_0
  69. 2.45% 13.495ms 4 3.3738ms 2.9762ms 3.5508ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_3
  70. 1.38% 7.5798ms 116 65.342us 17.691us 1.3666ms FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_0
  71. 1.05% 5.7774ms 12 481.45us 298.07us 770.61us FUNC___radiation_rg_MOD_fesft_dp_SCOP_9_KERNEL_0
  72. 0.80% 4.3813ms 28 156.48us 149.24us 167.76us FUNC___radiation_rg_MOD_fesft_dp_SCOP_8_KERNEL_0
  73. 0.79% 4.3189ms 4 1.0797ms 1.0215ms 1.1682ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_26_KERNEL_0
  74. 0.78% 4.2897ms 40 107.24us 100.84us 113.28us FUNC___radiation_rg_MOD_fesft_dp_SCOP_18_KERNEL_0
  75. 0.74% 4.0645ms 28 145.16us 115.01us 182.54us FUNC___radiation_rg_MOD_fesft_dp_SCOP_5_KERNEL_0
  76. 0.68% 3.7169ms 28 132.75us 68.654us 524.28us FUNC___radiation_rg_MOD_fesft_dp_SCOP_6_KERNEL_1
  77. 0.59% 3.2666ms 20 163.33us 83.754us 491.68us FUNC___radiation_rg_MOD_fesft_dp_SCOP_25_KERNEL_0
  78. 0.50% 2.7589ms 16 172.43us 148.44us 224.93us FUNC___radiation_rg_MOD_fesft_dp_SCOP_19_KERNEL_0
  79. 0.48% 2.6412ms 4 660.30us 625.60us 727.71us FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_0_KERNEL_0
  80. 0.46% 2.5118ms 12 209.32us 110.69us 437.42us FUNC___radiation_rg_MOD_fesft_dp_SCOP_3_KERNEL_1
  81. 0.44% 2.4347ms 44 55.334us 21.402us 449.74us FUNC___radiation_rg_MOD_fesft_dp_SCOP_16_KERNEL_0
  82. 0.43% 2.3590ms 44 53.612us 30.072us 102.95us FUNC___radiation_rg_MOD_fesft_dp_SCOP_16_KERNEL_1
  83. 0.42% 2.3215ms 20 116.07us 24.793us 547.06us FUNC___radiation_rg_MOD_fesft_dp_SCOP_14_KERNEL_0
  84. 0.34% 1.8966ms 44 43.104us 35.958us 60.688us FUNC___radiation_rg_MOD_fesft_dp_SCOP_15_KERNEL_0
  85. 0.31% 1.6846ms 4 421.16us 377.18us 463.59us FUNC___radiation_rg_MOD_fesft_dp_SCOP_20_KERNEL_0
  86. 0.27% 1.5017ms 4 375.43us 314.99us 445.07us FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_4_KERNEL_1
  87. 0.27% 1.4910ms 28 53.250us 28.728us 82.730us FUNC___radiation_rg_MOD_fesft_dp_SCOP_6_KERNEL_0
  88. 0.27% 1.4708ms 4 367.71us 326.31us 397.69us FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_2
  89. 0.23% 1.2819ms 4 320.48us 275.13us 348.87us FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_0
  90. 0.18% 963.39us 12 80.282us 77.868us 86.057us FUNC___radiation_rg_MOD_fesft_dp_SCOP_4_KERNEL_0
  91. 0.15% 807.21us 20 40.360us 34.007us 57.777us FUNC___radiation_rg_MOD_fesft_dp_SCOP_13_KERNEL_0
  92. 0.12% 667.09us 12 55.590us 27.129us 174.55us FUNC___radiation_rg_MOD_opt_so_SCOP_0_KERNEL_0
  93. 0.08% 415.06us 116 3.5780us 2.8470us 55.282us FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_1
  94.  
  95. ======== Unified Memory profiling result:
  96. Device "Tesla P100-PCIE-16GB (0)"
  97. Count Avg Size Min Size Max Size Total Size Total Time Name
  98. 6620 71.153KB 4.0000KB 0.9961MB 459.9961MB 46.56854ms Host To Device
  99. 3966 82.778KB 4.0000KB 0.9961MB 320.6055MB 28.41315ms Device To Host
  100. 3398 - - - - 95.05756ms GPU Page fault groups
  101. Total CPU Page faults: 2913
  102.  
  103. ======== API calls:
  104. Time(%) Time Calls Avg Min Max Name
  105. 58.02% 1.37953s 34 40.574ms 4.5246ms 714.77ms cuLinkAddData
  106. 23.45% 557.48ms 1376 405.15us 7.0080us 12.239ms cuCtxSynchronize
  107. 16.15% 383.99ms 1 383.99ms 383.99ms 383.99ms cuCtxCreate
  108. 1.02% 24.350ms 1376 17.696us 11.487us 736.82us cuLaunchKernel
  109. 0.88% 21.005ms 2 10.503ms 196.82us 20.808ms cuMemAllocManaged
  110. 0.26% 6.1225ms 34 180.07us 100.93us 524.05us cuLinkComplete
  111. 0.17% 4.1600ms 34 122.35us 68.177us 427.87us cuModuleLoadData
  112. 0.05% 1.0799ms 34 31.761us 17.647us 412.87us cuLinkCreate
  113. 0.00% 21.485us 34 631ns 551ns 810ns cuModuleGetFunction
  114. 0.00% 19.444us 34 571ns 380ns 2.8220us cuLinkDestroy
  115. 0.00% 17.558us 1 17.558us 17.558us 17.558us cuDeviceGetName
  116. 0.00% 3.0770us 3 1.0250us 271ns 2.3150us cuDeviceGetCount
  117. 0.00% 1.3200us 4 330ns 192ns 546ns cuDeviceGetAttribute
  118. 0.00% 994ns 3 331ns 194ns 561ns cuDeviceGet
  119. 0.00% 436ns 1 436ns 436ns 436ns cuCtxGetCurrent
  120. 0.00% 236ns 1 236ns 236ns 236ns cuDeviceComputeCapability
  121. + /project/c01/install_old/daint/serialbox/gnu/bin/compare Field_rank0.json radiation-standalone_rank0.json
Add Comment
Please, Sign In to add comment