Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ==========================================
- ## bin/dgemm_nn_sm35-60_clang_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 53.706 ms, total flops: 240576888832, GFLOP/s: 4479.52
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 13 SM occupancy, 4096 split_k
- Avg runtime: 162.993 ms, total flops: 240576888832, GFLOP/s: 1476.00
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 103.393 ms, total flops: 240576888832, GFLOP/s: 2326.83
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 68.634 ms, total flops: 240576888832, GFLOP/s: 3505.24
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 83.297 ms, total flops: 240576888832, GFLOP/s: 2888.18
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 59.055 ms, total flops: 240576888832, GFLOP/s: 4073.79
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,128.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 56.380 ms, total flops: 240576888832, GFLOP/s: 4267.04
- ==========================================
- ## bin/dgemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 53.666 ms, total flops: 240576888832, GFLOP/s: 4482.85
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 13 SM occupancy, 4096 split_k
- Avg runtime: 107.486 ms, total flops: 240576888832, GFLOP/s: 2238.21
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 61.130 ms, total flops: 240576888832, GFLOP/s: 3935.50
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 59.699 ms, total flops: 240576888832, GFLOP/s: 4029.82
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 57.979 ms, total flops: 240576888832, GFLOP/s: 4149.35
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 57.273 ms, total flops: 240576888832, GFLOP/s: 4200.53
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,128.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 55.135 ms, total flops: 240576888832, GFLOP/s: 4363.44
- ==========================================
- ## bin/igemm_nn_sm35-60_clang_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- CUDA error 30 [gemm.cu, 199]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- Avg runtime: 0.000 ms, total flops: 240576888832, GFLOP/s: 544784609.36
- Final wave_efficiency 1.0000, tiling_efficiency 10.6667
- Invoking kernel<<<(448, 128, 1), (1.y,32.x), 0, 0>>>(), 28 SM occupancy, 4096 split_k
- Avg runtime: 54.287 ms, total flops: 240576888832, GFLOP/s: 4431.56
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 20 SM occupancy, 4096 split_k
- Avg runtime: 48.548 ms, total flops: 240576888832, GFLOP/s: 4955.47
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,128.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 41.715 ms, total flops: 240576888832, GFLOP/s: 5767.18
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(56, 64, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 40.127 ms, total flops: 240576888832, GFLOP/s: 5995.43
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 41.407 ms, total flops: 240576888832, GFLOP/s: 5810.05
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 36.196 ms, total flops: 240576888832, GFLOP/s: 6646.46
- ==========================================
- ## bin/igemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- CUDA error 30 [gemm.cu, 199]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- Avg runtime: 0.000 ms, total flops: 240576888832, GFLOP/s: 1503605593.18
- Final wave_efficiency 1.0000, tiling_efficiency 10.6667
- Invoking kernel<<<(448, 128, 1), (1.y,32.x), 0, 0>>>(), 28 SM occupancy, 4096 split_k
- Avg runtime: 50.958 ms, total flops: 240576888832, GFLOP/s: 4721.12
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 20 SM occupancy, 4096 split_k
- Avg runtime: 49.173 ms, total flops: 240576888832, GFLOP/s: 4892.42
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,128.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 44.977 ms, total flops: 240576888832, GFLOP/s: 5348.92
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(56, 64, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 42.101 ms, total flops: 240576888832, GFLOP/s: 5714.31
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 41.762 ms, total flops: 240576888832, GFLOP/s: 5760.67
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 36.201 ms, total flops: 240576888832, GFLOP/s: 6645.63
- ==========================================
- ## bin/sgemm_nn_sm35-60_clang_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 26.767 ms, total flops: 240576888832, GFLOP/s: 8987.94
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 14 SM occupancy, 4096 split_k
- Avg runtime: 84.903 ms, total flops: 240576888832, GFLOP/s: 2833.56
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 16 SM occupancy, 4096 split_k
- Avg runtime: 43.737 ms, total flops: 240576888832, GFLOP/s: 5500.53
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,64.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 30.143 ms, total flops: 240576888832, GFLOP/s: 7981.22
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 5 SM occupancy, 4096 split_k
- Avg runtime: 34.104 ms, total flops: 240576888832, GFLOP/s: 7054.30
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 5 SM occupancy, 4096 split_k
- Avg runtime: 30.888 ms, total flops: 240576888832, GFLOP/s: 7788.80
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 27.776 ms, total flops: 240576888832, GFLOP/s: 8661.31
- ==========================================
- ## bin/sgemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 26.773 ms, total flops: 240576888832, GFLOP/s: 8985.93
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 24 SM occupancy, 4096 split_k
- Avg runtime: 98.990 ms, total flops: 240576888832, GFLOP/s: 2430.31
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 16 SM occupancy, 4096 split_k
- Avg runtime: 35.311 ms, total flops: 240576888832, GFLOP/s: 6813.08
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,64.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 27.999 ms, total flops: 240576888832, GFLOP/s: 8592.41
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 6 SM occupancy, 4096 split_k
- Avg runtime: 29.564 ms, total flops: 240576888832, GFLOP/s: 8137.48
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 6 SM occupancy, 4096 split_k
- Avg runtime: 30.145 ms, total flops: 240576888832, GFLOP/s: 7980.67
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 27.201 ms, total flops: 240576888832, GFLOP/s: 8844.31
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement