SHARE
TWEET

Untitled

a guest Aug 19th, 2019 55 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. import torch
  2. import time
  3.  
  4. torch.backends.cudnn.benchmark = True
  5.  
  6.  
  7. # 1a)
  8. I, J, K = 64, 1024, 1024
  9. A = torch.randn(I, J, device='cuda', dtype=torch.half)
  10. B = torch.randn(J, K, device='cuda', dtype=torch.half)
  11.  
  12. # warumup
  13. for _ in range(50):
  14.     C = torch.matmul(A, B)
  15. torch.cuda.synchronize()
  16.  
  17. nb_iters = 1000
  18. torch.cuda.synchronize()
  19. t0 = time.time()
  20. for _ in range(nb_iters):
  21.     C = torch.matmul(A, B)
  22. torch.cuda.synchronize()
  23. t1 = time.time()
  24. print('1a) {:.3f}us per iteration)'.format((t1 - t0) / nb_iters * 1e6))
  25.  
  26. # 1b)
  27. I, J, K = 1, 1024, 1024
  28. A = torch.randn(I, J, device='cuda', dtype=torch.half)
  29. B = torch.randn(J, K, device='cuda', dtype=torch.half)
  30.  
  31. # warumup
  32. for _ in range(50):
  33.     C = torch.matmul(A, B)
  34. torch.cuda.synchronize()
  35.  
  36. nb_iters = 1000
  37. torch.cuda.synchronize()
  38. t0 = time.time()
  39. for _ in range(nb_iters):
  40.     C = torch.matmul(A, B)
  41. torch.cuda.synchronize()
  42. t1 = time.time()
  43. print('1b) {:.3f}us per iteration'.format((t1 - t0) / nb_iters * 1e6))
  44.  
  45.  
  46.  
  47. # 2a)
  48. I, J, K = 63, 1023, 1023
  49. A = torch.randn(I, J, device='cuda', dtype=torch.half)
  50. B = torch.randn(J, K, device='cuda', dtype=torch.half)
  51.  
  52. # warumup
  53. for _ in range(50):
  54.     C = torch.matmul(A, B)
  55. torch.cuda.synchronize()
  56.  
  57. nb_iters = 1000
  58. torch.cuda.synchronize()
  59. t0 = time.time()
  60. for _ in range(nb_iters):
  61.     C = torch.matmul(A, B)
  62. torch.cuda.synchronize()
  63. t1 = time.time()
  64. print('2a) {:.3f}us per iteration'.format((t1 - t0) / nb_iters * 1e6))
  65.  
  66.    
  67. # 2b)
  68. I, J, K = 1, 1023, 1023
  69. A = torch.randn(I, J, device='cuda', dtype=torch.half)
  70. B = torch.randn(J, K, device='cuda', dtype=torch.half)
  71.  
  72. # warumup
  73. for _ in range(50):
  74.     C = torch.matmul(A, B)
  75. torch.cuda.synchronize()
  76.  
  77. nb_iters = 1000
  78. torch.cuda.synchronize()
  79. t0 = time.time()
  80. for _ in range(nb_iters):
  81.     C = torch.matmul(A, B)
  82. torch.cuda.synchronize()
  83. t1 = time.time()
  84. print('2b) {:.3f}us per iteration'.format((t1 - t0) / nb_iters * 1e6))
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
Not a member of Pastebin yet?
Sign Up, it unlocks many cool features!
 
Top