Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ==16832== NVPROF is profiling process 16832, command: ./standalone
- + + + + + + + + + + + + + + + +
- + RUNNING IN DOUBLE PRECISION +
- + + + + + + + + + + + + + + + +
- STANDALONE PHYSICS
- RUNNING ON CPU
- RADIATION TEST
- Iteration 1
- Initialize test
- *****************************************************
- * Radiative transfer calculations employ data *
- * provided in routine rad_aibi *
- *****************************************************
- Run test
- Finalize test
- Iteration 2
- Initialize test
- *****************************************************
- * Radiative transfer calculations employ data *
- * provided in routine rad_aibi *
- *****************************************************
- Run test
- Finalize test
- Iteration 3
- Initialize test
- *****************************************************
- * Radiative transfer calculations employ data *
- * provided in routine rad_aibi *
- *****************************************************
- Run test
- Finalize test
- Iteration 4
- Initialize test
- *****************************************************
- * Radiative transfer calculations employ data *
- * provided in routine rad_aibi *
- *****************************************************
- Run test
- Finalize test
- Domain size, ie,je,ke : 80 60 60
- nproma : 4800
- data_set type :full
- --------------------------------------------------------------------------
- Local timers:
- NCOMP_PE= 1
- --------------------------------------------------------------------------
- Id Tag Ncalls min[s] max[s] mean[s]
- 1 Total Phys 4 2.7110 2.7110 2.7110
- 2 Copy block 8 0.0770 0.0770 0.0770
- 3 Radiation 4 2.6280 2.6280 2.6280
- --------------------------------------------------------------------------
- ==16832== Generated result file: /scratch/snx1600/siddhart/playground/standalone/run/standalone-nvprof-output.prof
- + nvprof -i standalone-nvprof-output.prof
- ======== Profiling result:
- Time(%) Time Calls Avg Min Max Name
- 44.66% 245.55ms 116 2.1168ms 1.9131ms 3.5222ms FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_2
- 7.87% 43.243ms 12 3.6036ms 603.49us 9.8776ms FUNC___radiation_rg_MOD_opt_so_SCOP_0_KERNEL_1
- 7.42% 40.778ms 4 10.194ms 8.9761ms 12.233ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_1
- 6.89% 37.882ms 20 1.8941ms 1.6174ms 2.4078ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_24_KERNEL_0
- 5.23% 28.756ms 216 133.13us 81.290us 1.5316ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_17_KERNEL_0
- 4.09% 22.483ms 132 170.33us 158.87us 180.05us FUNC___radiation_rg_MOD_fesft_dp_SCOP_7_KERNEL_0
- 3.48% 19.118ms 192 99.575us 12.829us 1.1363ms FUNC___radiation_rg_MOD_inv_th_SCOP_0_KERNEL_0
- 3.39% 18.640ms 12 1.5533ms 690.67us 2.1805ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_3_KERNEL_0
- 2.77% 15.230ms 4 3.8074ms 3.5440ms 3.9241ms FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_4_KERNEL_0
- 2.45% 13.495ms 4 3.3738ms 2.9762ms 3.5508ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_3
- 1.38% 7.5798ms 116 65.342us 17.691us 1.3666ms FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_0
- 1.05% 5.7774ms 12 481.45us 298.07us 770.61us FUNC___radiation_rg_MOD_fesft_dp_SCOP_9_KERNEL_0
- 0.80% 4.3813ms 28 156.48us 149.24us 167.76us FUNC___radiation_rg_MOD_fesft_dp_SCOP_8_KERNEL_0
- 0.79% 4.3189ms 4 1.0797ms 1.0215ms 1.1682ms FUNC___radiation_rg_MOD_fesft_dp_SCOP_26_KERNEL_0
- 0.78% 4.2897ms 40 107.24us 100.84us 113.28us FUNC___radiation_rg_MOD_fesft_dp_SCOP_18_KERNEL_0
- 0.74% 4.0645ms 28 145.16us 115.01us 182.54us FUNC___radiation_rg_MOD_fesft_dp_SCOP_5_KERNEL_0
- 0.68% 3.7169ms 28 132.75us 68.654us 524.28us FUNC___radiation_rg_MOD_fesft_dp_SCOP_6_KERNEL_1
- 0.59% 3.2666ms 20 163.33us 83.754us 491.68us FUNC___radiation_rg_MOD_fesft_dp_SCOP_25_KERNEL_0
- 0.50% 2.7589ms 16 172.43us 148.44us 224.93us FUNC___radiation_rg_MOD_fesft_dp_SCOP_19_KERNEL_0
- 0.48% 2.6412ms 4 660.30us 625.60us 727.71us FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_0_KERNEL_0
- 0.46% 2.5118ms 12 209.32us 110.69us 437.42us FUNC___radiation_rg_MOD_fesft_dp_SCOP_3_KERNEL_1
- 0.44% 2.4347ms 44 55.334us 21.402us 449.74us FUNC___radiation_rg_MOD_fesft_dp_SCOP_16_KERNEL_0
- 0.43% 2.3590ms 44 53.612us 30.072us 102.95us FUNC___radiation_rg_MOD_fesft_dp_SCOP_16_KERNEL_1
- 0.42% 2.3215ms 20 116.07us 24.793us 547.06us FUNC___radiation_rg_MOD_fesft_dp_SCOP_14_KERNEL_0
- 0.34% 1.8966ms 44 43.104us 35.958us 60.688us FUNC___radiation_rg_MOD_fesft_dp_SCOP_15_KERNEL_0
- 0.31% 1.6846ms 4 421.16us 377.18us 463.59us FUNC___radiation_rg_MOD_fesft_dp_SCOP_20_KERNEL_0
- 0.27% 1.5017ms 4 375.43us 314.99us 445.07us FUNC___radiation_rg_org_MOD_radiation_rg_organize_SCOP_4_KERNEL_1
- 0.27% 1.4910ms 28 53.250us 28.728us 82.730us FUNC___radiation_rg_MOD_fesft_dp_SCOP_6_KERNEL_0
- 0.27% 1.4708ms 4 367.71us 326.31us 397.69us FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_2
- 0.23% 1.2819ms 4 320.48us 275.13us 348.87us FUNC___radiation_rg_MOD_fesft_dp_SCOP_0_KERNEL_0
- 0.18% 963.39us 12 80.282us 77.868us 86.057us FUNC___radiation_rg_MOD_fesft_dp_SCOP_4_KERNEL_0
- 0.15% 807.21us 20 40.360us 34.007us 57.777us FUNC___radiation_rg_MOD_fesft_dp_SCOP_13_KERNEL_0
- 0.12% 667.09us 12 55.590us 27.129us 174.55us FUNC___radiation_rg_MOD_opt_so_SCOP_0_KERNEL_0
- 0.08% 415.06us 116 3.5780us 2.8470us 55.282us FUNC___radiation_rg_MOD_inv_so_SCOP_0_KERNEL_1
- ======== Unified Memory profiling result:
- Device "Tesla P100-PCIE-16GB (0)"
- Count Avg Size Min Size Max Size Total Size Total Time Name
- 6620 71.153KB 4.0000KB 0.9961MB 459.9961MB 46.56854ms Host To Device
- 3966 82.778KB 4.0000KB 0.9961MB 320.6055MB 28.41315ms Device To Host
- 3398 - - - - 95.05756ms GPU Page fault groups
- Total CPU Page faults: 2913
- ======== API calls:
- Time(%) Time Calls Avg Min Max Name
- 58.02% 1.37953s 34 40.574ms 4.5246ms 714.77ms cuLinkAddData
- 23.45% 557.48ms 1376 405.15us 7.0080us 12.239ms cuCtxSynchronize
- 16.15% 383.99ms 1 383.99ms 383.99ms 383.99ms cuCtxCreate
- 1.02% 24.350ms 1376 17.696us 11.487us 736.82us cuLaunchKernel
- 0.88% 21.005ms 2 10.503ms 196.82us 20.808ms cuMemAllocManaged
- 0.26% 6.1225ms 34 180.07us 100.93us 524.05us cuLinkComplete
- 0.17% 4.1600ms 34 122.35us 68.177us 427.87us cuModuleLoadData
- 0.05% 1.0799ms 34 31.761us 17.647us 412.87us cuLinkCreate
- 0.00% 21.485us 34 631ns 551ns 810ns cuModuleGetFunction
- 0.00% 19.444us 34 571ns 380ns 2.8220us cuLinkDestroy
- 0.00% 17.558us 1 17.558us 17.558us 17.558us cuDeviceGetName
- 0.00% 3.0770us 3 1.0250us 271ns 2.3150us cuDeviceGetCount
- 0.00% 1.3200us 4 330ns 192ns 546ns cuDeviceGetAttribute
- 0.00% 994ns 3 331ns 194ns 561ns cuDeviceGet
- 0.00% 436ns 1 436ns 436ns 436ns cuCtxGetCurrent
- 0.00% 236ns 1 236ns 236ns 236ns cuDeviceComputeCapability
- + /project/c01/install_old/daint/serialbox/gnu/bin/compare Field_rank0.json radiation-standalone_rank0.json
Add Comment
Please, Sign In to add comment