Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ==========================================
- ## bin/dgemm_nn_sm35-60_clang_cuda_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 53.224 ms, total flops: 240576888832, GFLOP/s: 4520.08
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 13 SM occupancy, 4096 split_k
- Avg runtime: 163.195 ms, total flops: 240576888832, GFLOP/s: 1474.17
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 101.542 ms, total flops: 240576888832, GFLOP/s: 2369.24
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 67.952 ms, total flops: 240576888832, GFLOP/s: 3540.41
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 81.973 ms, total flops: 240576888832, GFLOP/s: 2934.82
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 58.602 ms, total flops: 240576888832, GFLOP/s: 4105.26
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,128.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 55.507 ms, total flops: 240576888832, GFLOP/s: 4334.18
- ==========================================
- ## bin/dgemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 53.219 ms, total flops: 240576888832, GFLOP/s: 4520.53
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 13 SM occupancy, 4096 split_k
- Avg runtime: 106.596 ms, total flops: 240576888832, GFLOP/s: 2256.91
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 61.048 ms, total flops: 240576888832, GFLOP/s: 3940.79
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 59.593 ms, total flops: 240576888832, GFLOP/s: 4036.99
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 57.847 ms, total flops: 240576888832, GFLOP/s: 4158.82
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 57.236 ms, total flops: 240576888832, GFLOP/s: 4203.25
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,128.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 54.754 ms, total flops: 240576888832, GFLOP/s: 4393.77
- ==========================================
- ## bin/igemm_nn_sm35-60_clang_cuda_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- CUDA error 30 [gemm.cu, 199]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- Avg runtime: 0.000 ms, total flops: 240576888832, GFLOP/s: 1474123120.06
- Final wave_efficiency 1.0000, tiling_efficiency 10.6667
- Invoking kernel<<<(448, 128, 1), (1.y,32.x), 0, 0>>>(), 28 SM occupancy, 4096 split_k
- Avg runtime: 54.396 ms, total flops: 240576888832, GFLOP/s: 4422.68
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 20 SM occupancy, 4096 split_k
- Avg runtime: 48.562 ms, total flops: 240576888832, GFLOP/s: 4954.06
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,128.x), 0, 0>>>(), 7 SM occupancy, 4096 split_k
- Avg runtime: 41.002 ms, total flops: 240576888832, GFLOP/s: 5867.48
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(56, 64, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 40.773 ms, total flops: 240576888832, GFLOP/s: 5900.44
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 40.614 ms, total flops: 240576888832, GFLOP/s: 5923.45
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 1 SM occupancy, 4096 split_k
- Avg runtime: 42.676 ms, total flops: 240576888832, GFLOP/s: 5637.26
- ==========================================
- ## bin/igemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- CUDA error 30 [gemm.cu, 199]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 253]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- CUDA error 30 [gemm.cu, 275]: unknown error
- Avg runtime: 0.001 ms, total flops: 240576888832, GFLOP/s: 170476816.05
- Final wave_efficiency 1.0000, tiling_efficiency 10.6667
- Invoking kernel<<<(448, 128, 1), (1.y,32.x), 0, 0>>>(), 28 SM occupancy, 4096 split_k
- Avg runtime: 50.941 ms, total flops: 240576888832, GFLOP/s: 4722.66
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 20 SM occupancy, 4096 split_k
- Avg runtime: 49.127 ms, total flops: 240576888832, GFLOP/s: 4897.09
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,128.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 45.015 ms, total flops: 240576888832, GFLOP/s: 5344.39
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(56, 64, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 42.090 ms, total flops: 240576888832, GFLOP/s: 5715.73
- Final wave_efficiency 1.0000, tiling_efficiency 42.6667
- Invoking kernel<<<(112, 32, 1), (1.y,256.x), 0, 0>>>(), 3 SM occupancy, 4096 split_k
- Avg runtime: 41.746 ms, total flops: 240576888832, GFLOP/s: 5762.85
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 36.177 ms, total flops: 240576888832, GFLOP/s: 6650.04
- ==========================================
- ## bin/sgemm_nn_sm35-60_clang_cuda_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 26.732 ms, total flops: 240576888832, GFLOP/s: 8999.53
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 14 SM occupancy, 4096 split_k
- Avg runtime: 72.281 ms, total flops: 240576888832, GFLOP/s: 3328.37
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 16 SM occupancy, 4096 split_k
- Avg runtime: 42.854 ms, total flops: 240576888832, GFLOP/s: 5613.85
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,64.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 30.098 ms, total flops: 240576888832, GFLOP/s: 7993.04
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 5 SM occupancy, 4096 split_k
- Avg runtime: 34.436 ms, total flops: 240576888832, GFLOP/s: 6986.27
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 6 SM occupancy, 4096 split_k
- Avg runtime: 31.245 ms, total flops: 240576888832, GFLOP/s: 7699.77
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 27.501 ms, total flops: 240576888832, GFLOP/s: 8748.00
- ==========================================
- ## bin/sgemm_nn_sm35-60_nvcc_9.0
- ==========================================
- ------------------------------------------------------------
- 7168x4096x4096, GEMM_nn, 29360128 C elements, 10 timing iterations
- Avg runtime: 26.723 ms, total flops: 240576888832, GFLOP/s: 9002.61
- Final wave_efficiency 1.0000, tiling_efficiency 8.0000
- Invoking kernel<<<(448, 256, 1), (1.y,64.x), 0, 0>>>(), 24 SM occupancy, 4096 split_k
- Avg runtime: 98.982 ms, total flops: 240576888832, GFLOP/s: 2430.50
- Final wave_efficiency 1.0000, tiling_efficiency 16.0000
- Invoking kernel<<<(224, 128, 1), (1.y,64.x), 0, 0>>>(), 16 SM occupancy, 4096 split_k
- Avg runtime: 35.309 ms, total flops: 240576888832, GFLOP/s: 6813.55
- Final wave_efficiency 1.0000, tiling_efficiency 32.0000
- Invoking kernel<<<(112, 64, 1), (1.y,64.x), 0, 0>>>(), 8 SM occupancy, 4096 split_k
- Avg runtime: 27.971 ms, total flops: 240576888832, GFLOP/s: 8600.93
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(56, 128, 1), (1.y,128.x), 0, 0>>>(), 6 SM occupancy, 4096 split_k
- Avg runtime: 29.545 ms, total flops: 240576888832, GFLOP/s: 8142.77
- Final wave_efficiency 1.0000, tiling_efficiency 25.6000
- Invoking kernel<<<(224, 32, 1), (1.y,128.x), 0, 0>>>(), 6 SM occupancy, 4096 split_k
- Avg runtime: 30.165 ms, total flops: 240576888832, GFLOP/s: 7975.34
- Final wave_efficiency 1.0000, tiling_efficiency 64.0000
- Invoking kernel<<<(56, 32, 1), (1.y,256.x), 0, 0>>>(), 2 SM occupancy, 4096 split_k
- Avg runtime: 27.154 ms, total flops: 240576888832, GFLOP/s: 8859.79
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement