Advertisement
Guest User

Untitled

a guest
Oct 20th, 2015
13
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 259.60 KB | None | 0 0
  1. args: deepcl_unittests.exe --gtest_filter=-SLOW*
  2. Note: Google Test filter = -SLOW*
  3. [==========] Running 159 tests from 29 test cases.
  4. [----------] Global test environment set-up.
  5. [----------] 7 tests from testClBlas
  6. [ RUN ] testClBlas.basic
  7. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  8. Using OpenCL device: Tahiti
  9. initializing clblas
  10. clblas teardown
  11. [ OK ] testClBlas.basic (430 ms)
  12. [ RUN ] testClBlas.transA
  13. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  14. Using OpenCL device: Tahiti
  15. 1 2 9
  16. 3 7 5
  17. initializing clblas
  18. clblas teardown
  19. [ OK ] testClBlas.transA (90 ms)
  20. [ RUN ] testClBlas.transB
  21. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  22. Using OpenCL device: Tahiti
  23. 3
  24. -1
  25. initializing clblas
  26. clblas teardown
  27. [ OK ] testClBlas.transB (100 ms)
  28. [ RUN ] testClBlas.colMajor
  29. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  30. Using OpenCL device: Tahiti
  31. initializing clblas
  32. clblas teardown
  33. [ OK ] testClBlas.colMajor (80 ms)
  34. [ RUN ] testClBlas.colMajor2
  35. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  36. Using OpenCL device: Tahiti
  37. initializing clblas
  38. clblas teardown
  39. [ OK ] testClBlas.colMajor2 (90 ms)
  40. [ RUN ] testClBlas.colMajorTransA
  41. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  42. Using OpenCL device: Tahiti
  43. initializing clblas
  44. clblas teardown
  45. [ OK ] testClBlas.colMajorTransA (100 ms)
  46. [ RUN ] testClBlas.colMajorTransB
  47. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  48. Using OpenCL device: Tahiti
  49. initializing clblas
  50. clblas teardown
  51. [ OK ] testClBlas.colMajorTransB (90 ms)
  52. [----------] 7 tests from testClBlas (980 ms total)
  53.  
  54. [----------] 1 test from testDeepCL
  55. [ RUN ] testDeepCL.basic
  56. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  57. Using OpenCL device: Tahiti
  58. initializing clblas
  59. expected number of output: 4
  60. clblas teardown
  61. [ OK ] testDeepCL.basic (430 ms)
  62. [----------] 1 test from testDeepCL (430 ms total)
  63.  
  64. [----------] 23 tests from testupdateweights
  65. [ RUN ] testupdateweights.conv1
  66. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  67. Using OpenCL device: Tahiti
  68. initializing clblas
  69. layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
  70. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  71. layer 2:SquareLossLayer{}
  72.  
  73. layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
  74. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  75. layer 2:SquareLossLayer{}
  76.  
  77. batchSize: 4
  78. inputtotalsize=200 outputTotalSize=72
  79. layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  80. weightsize=36 biassize=0
  81. statefultimer v0.7
  82. forward try kernel 0
  83. ... not plausibly optimal, skipping
  84. forward try kernel 1
  85. ... seems valid
  86. ForwardAuto: kernel 1 0ms
  87. layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
  88. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  89. layer 2:SquareLossLayer{}
  90. Parameters overview: (skipping 2 layers with 0 params)
  91. layer 1: params=36 100.0%
  92. TOTAL : params=36
  93. calcGradWeights try kernel 0
  94. ... not plausibly optimal, skipping
  95. calcGradWeights try kernel 1
  96. ... seems valid
  97. BackpropWeightsAuto: kernel 1 0ms
  98. forward try kernel 2
  99. ... seems valid
  100. ForwardAuto: kernel 2 0ms
  101. idx=8 predicted losschange=0.000111445 actual=0.000112534
  102. forward try kernel 3
  103. ... seems valid
  104. ForwardAuto: kernel 3 0ms
  105. idx=13 predicted losschange=-0.000886715 actual=-0.000884056
  106. forward try kernel 4
  107. ... seems valid
  108. ForwardAuto: kernel 4 0ms
  109. idx=0 predicted losschange=0.000210491 actual=0.000212669
  110. forward try kernel 5
  111. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  112. ... not valid
  113. forward try kernel 6
  114. ... seems valid
  115. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  116. forward try kernel 7
  117. ... seems valid
  118. ForwardAuto: kernel 7 440ms
  119. idx=22 predicted losschange=-0.000164224 actual=0.000212669
  120. forward kernel 0: cannot be used
  121. forward kernel 1 time: 0ms
  122. forward kernel 2 time: 0ms
  123. forward kernel 3 time: 0ms
  124. forward kernel 4 time: 0ms
  125. forward kernel 5: cannot be used
  126. forward kernel 6: cannot be used
  127. forward kernel 7 time: 440ms
  128. forward layer selected kernel 1
  129. idx=22 predicted losschange=-0.000164224 actual=-0.000163078
  130. idx=35 predicted losschange=-0.000391028 actual=-0.000391006
  131. idx=26 predicted losschange=2.23142e-05 actual=2.57492e-05
  132. idx=27 predicted losschange=9.38328e-05 actual=9.44138e-05
  133. idx=27 predicted losschange=9.38328e-05 actual=9.44138e-05
  134. idx=10 predicted losschange=0.00186697 actual=0.00187111
  135. clblas teardown
  136. [ OK ] testupdateweights.conv1 (1380 ms)
  137. [ RUN ] testupdateweights.conv1z
  138. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  139. Using OpenCL device: Tahiti
  140. initializing clblas
  141. layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
  142. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
  143. layer 2:SquareLossLayer{}
  144.  
  145. layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
  146. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
  147. layer 2:SquareLossLayer{}
  148.  
  149. batchSize: 4
  150. inputtotalsize=72 outputTotalSize=72
  151. layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
  152. weightsize=36 biassize=0
  153. forward try kernel 0
  154. ... not plausibly optimal, skipping
  155. forward try kernel 1
  156. ... seems valid
  157. ForwardAuto: kernel 1 0ms
  158. layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
  159. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
  160. layer 2:SquareLossLayer{}
  161. Parameters overview: (skipping 2 layers with 0 params)
  162. layer 1: params=36 100.0%
  163. TOTAL : params=36
  164. calcGradWeights try kernel 0
  165. ... not plausibly optimal, skipping
  166. calcGradWeights try kernel 1
  167. ... seems valid
  168. BackpropWeightsAuto: kernel 1 0ms
  169. forward try kernel 2
  170. ... seems valid
  171. ForwardAuto: kernel 2 0ms
  172. idx=8 predicted losschange=0.00039831 actual=0.000397682
  173. forward try kernel 3
  174. ... seems valid
  175. ForwardAuto: kernel 3 0ms
  176. idx=13 predicted losschange=-0.000426502 actual=-0.000426292
  177. forward try kernel 4
  178. ... seems valid
  179. ForwardAuto: kernel 4 0ms
  180. idx=0 predicted losschange=0.000143287 actual=0.000144005
  181. forward try kernel 5
  182. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, padzeros must be disabled
  183. ... not valid
  184. forward try kernel 6
  185. ... seems valid
  186. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  187. forward try kernel 7
  188. ... seems valid
  189. ForwardAuto: kernel 7 430ms
  190. idx=22 predicted losschange=-1.7916e-06 actual=0.000144005
  191. forward kernel 0: cannot be used
  192. forward kernel 1 time: 0ms
  193. forward kernel 2 time: 0ms
  194. forward kernel 3 time: 0ms
  195. forward kernel 4 time: 0ms
  196. forward kernel 5: cannot be used
  197. forward kernel 6: cannot be used
  198. forward kernel 7 time: 430ms
  199. forward layer selected kernel 1
  200. idx=22 predicted losschange=-1.7916e-06 actual=0
  201. idx=35 predicted losschange=-2.82565e-05 actual=-2.76566e-05
  202. idx=26 predicted losschange=3.62191e-05 actual=3.71933e-05
  203. idx=27 predicted losschange=-0.000319862 actual=-0.000317574
  204. idx=27 predicted losschange=-0.000319862 actual=-0.000317574
  205. idx=10 predicted losschange=-0.000883857 actual=-0.000883102
  206. clblas teardown
  207. [ OK ] testupdateweights.conv1z (1390 ms)
  208. [ RUN ] testupdateweights.numericallytest
  209. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  210. Using OpenCL device: Tahiti
  211. forward try kernel 0
  212. ... not plausibly optimal, skipping
  213. forward try kernel 1
  214. ... seems valid
  215. ForwardAuto: kernel 1 10ms
  216. layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
  217. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=1 filterSize=1 outputSize=1 padZeros=0 biased=0 skip=0} }
  218. layer 2:ActivationLayer{ TANH }
  219. layer 3:SquareLossLayer{}
  220. Parameters overview: (skipping 3 layers with 0 params)
  221. layer 1: params=1 100.0%
  222. TOTAL : params=1
  223. forward try kernel 2
  224. ... seems valid
  225. ForwardAuto: kernel 2 0ms
  226. calcGradWeights try kernel 0
  227. ... not plausibly optimal, skipping
  228. calcGradWeights try kernel 1
  229. ... seems valid
  230. BackpropWeightsAuto: kernel 1 0ms
  231. forward try kernel 3
  232. ... seems valid
  233. ForwardAuto: kernel 3 0ms
  234. layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
  235. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=1 filterSize=1 outputSize=1 padZeros=0 biased=0 skip=0} }
  236. layer 2:ActivationLayer{ TANH }
  237. layer 3:SquareLossLayer{}
  238. Parameters overview: (skipping 3 layers with 0 params)
  239. layer 1: params=1 100.0%
  240. TOTAL : params=1
  241. loss 0.0367983 loss2 0.0367913 change: 7.01472e-06
  242. sumweightsdiff -0.000264842
  243. loss change 7.01472e-06
  244. estimatedLossChangeFromW 7.01413e-06
  245. [ OK ] testupdateweights.numericallytest (850 ms)
  246. [ RUN ] testupdateweights.numericallytest_imagesize3
  247. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  248. Using OpenCL device: Tahiti
  249. forward try kernel 0
  250. ... not plausibly optimal, skipping
  251. forward try kernel 1
  252. ... seems valid
  253. ForwardAuto: kernel 1 0ms
  254. layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
  255. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=1 outputSize=3 padZeros=0 biased=0 skip=0} }
  256. layer 2:ActivationLayer{ TANH }
  257. layer 3:SquareLossLayer{}
  258. Parameters overview: (skipping 3 layers with 0 params)
  259. layer 1: params=1 100.0%
  260. TOTAL : params=1
  261. forward try kernel 2
  262. ... seems valid
  263. ForwardAuto: kernel 2 0ms
  264. calcGradWeights try kernel 0
  265. ... not plausibly optimal, skipping
  266. calcGradWeights try kernel 1
  267. ... seems valid
  268. BackpropWeightsAuto: kernel 1 0ms
  269. forward try kernel 3
  270. ... seems valid
  271. ForwardAuto: kernel 3 0ms
  272. layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
  273. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=1 outputSize=3 padZeros=0 biased=0 skip=0} }
  274. layer 2:ActivationLayer{ TANH }
  275. layer 3:SquareLossLayer{}
  276. Parameters overview: (skipping 3 layers with 0 params)
  277. layer 1: params=1 100.0%
  278. TOTAL : params=1
  279. loss 1.23358 loss2 1.21612 change: 0.0174606
  280. sumweightsdiff -0.0132709
  281. loss change 0.0174606
  282. estimatedLossChangeFromW 0.0176118
  283. [ OK ] testupdateweights.numericallytest_imagesize3 (860 ms)
  284. [ RUN ] testupdateweights.numericallytest_imagesize5
  285. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  286. Using OpenCL device: Tahiti
  287. forward try kernel 0
  288. ... not plausibly optimal, skipping
  289. forward try kernel 1
  290. ... seems valid
  291. ForwardAuto: kernel 1 0ms
  292. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  293. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=1 outputSize=5 padZeros=0 biased=0 skip=0} }
  294. layer 2:ActivationLayer{ TANH }
  295. layer 3:SquareLossLayer{}
  296. Parameters overview: (skipping 3 layers with 0 params)
  297. layer 1: params=1 100.0%
  298. TOTAL : params=1
  299. forward try kernel 2
  300. ... seems valid
  301. ForwardAuto: kernel 2 0ms
  302. calcGradWeights try kernel 0
  303. ... not plausibly optimal, skipping
  304. calcGradWeights try kernel 1
  305. ... seems valid
  306. BackpropWeightsAuto: kernel 1 0ms
  307. forward try kernel 3
  308. ... seems valid
  309. ForwardAuto: kernel 3 0ms
  310. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  311. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=1 outputSize=5 padZeros=0 biased=0 skip=0} }
  312. layer 2:ActivationLayer{ TANH }
  313. layer 3:SquareLossLayer{}
  314. Parameters overview: (skipping 3 layers with 0 params)
  315. layer 1: params=1 100.0%
  316. TOTAL : params=1
  317. loss 4.12958 loss2 4.11952 change: 0.0100665
  318. sumweightsdiff -0.0101708
  319. loss change 0.0100665
  320. estimatedLossChangeFromW 0.0103444
  321. [ OK ] testupdateweights.numericallytest_imagesize5 (890 ms)
  322. [ RUN ] testupdateweights.numericallytest_imagesize9
  323. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  324. Using OpenCL device: Tahiti
  325. forward try kernel 0
  326. ... not plausibly optimal, skipping
  327. forward try kernel 1
  328. ... seems valid
  329. ForwardAuto: kernel 1 0ms
  330. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  331. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=1 outputSize=9 padZeros=0 biased=0 skip=0} }
  332. layer 2:ActivationLayer{ TANH }
  333. layer 3:SquareLossLayer{}
  334. Parameters overview: (skipping 3 layers with 0 params)
  335. layer 1: params=1 100.0%
  336. TOTAL : params=1
  337. forward try kernel 2
  338. ... seems valid
  339. ForwardAuto: kernel 2 0ms
  340. calcGradWeights try kernel 0
  341. ... not plausibly optimal, skipping
  342. calcGradWeights try kernel 1
  343. ... seems valid
  344. BackpropWeightsAuto: kernel 1 0ms
  345. forward try kernel 3
  346. ... seems valid
  347. ForwardAuto: kernel 3 0ms
  348. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  349. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=1 outputSize=9 padZeros=0 biased=0 skip=0} }
  350. layer 2:ActivationLayer{ TANH }
  351. layer 3:SquareLossLayer{}
  352. Parameters overview: (skipping 3 layers with 0 params)
  353. layer 1: params=1 100.0%
  354. TOTAL : params=1
  355. loss 13.4341 loss2 13.4339 change: 0.000207901
  356. sumweightsdiff 0.00153953
  357. loss change 0.000207901
  358. estimatedLossChangeFromW 0.000237015
  359. [ OK ] testupdateweights.numericallytest_imagesize9 (890 ms)
  360. [ RUN ] testupdateweights.numericallytest_imagesize9_filtersize9
  361. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  362. Using OpenCL device: Tahiti
  363. forward try kernel 0
  364. ... not plausibly optimal, skipping
  365. forward try kernel 1
  366. ... seems valid
  367. ForwardAuto: kernel 1 0ms
  368. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  369. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=9 outputSize=1 padZeros=0 biased=0 skip=0} }
  370. layer 2:ActivationLayer{ TANH }
  371. layer 3:SquareLossLayer{}
  372. Parameters overview: (skipping 3 layers with 0 params)
  373. layer 1: params=81 100.0%
  374. TOTAL : params=81
  375. forward try kernel 2
  376. ... seems valid
  377. ForwardAuto: kernel 2 0ms
  378. calcGradWeights try kernel 0
  379. ... not plausibly optimal, skipping
  380. calcGradWeights try kernel 1
  381. ... seems valid
  382. BackpropWeightsAuto: kernel 1 0ms
  383. forward try kernel 3
  384. ... seems valid
  385. ForwardAuto: kernel 3 0ms
  386. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  387. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=9 outputSize=1 padZeros=0 biased=0 skip=0} }
  388. layer 2:ActivationLayer{ TANH }
  389. layer 3:SquareLossLayer{}
  390. Parameters overview: (skipping 3 layers with 0 params)
  391. layer 1: params=81 100.0%
  392. TOTAL : params=81
  393. loss 0.135896 loss2 0.0848782 change: 0.0510182
  394. sumweightsdiff -0.0322406
  395. loss change 0.0510182
  396. estimatedLossChangeFromW 0.0555841
  397. [ OK ] testupdateweights.numericallytest_imagesize9_filtersize9 (930 ms)
  398. [ RUN ] testupdateweights.numericallytest_imagesize9_filtersize3
  399. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  400. Using OpenCL device: Tahiti
  401. forward try kernel 0
  402. ... not plausibly optimal, skipping
  403. forward try kernel 1
  404. ... seems valid
  405. ForwardAuto: kernel 1 0ms
  406. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  407. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=3 outputSize=7 padZeros=0 biased=0 skip=0} }
  408. layer 2:ActivationLayer{ TANH }
  409. layer 3:SquareLossLayer{}
  410. Parameters overview: (skipping 3 layers with 0 params)
  411. layer 1: params=9 100.0%
  412. TOTAL : params=9
  413. forward try kernel 2
  414. ... seems valid
  415. ForwardAuto: kernel 2 0ms
  416. calcGradWeights try kernel 0
  417. ... not plausibly optimal, skipping
  418. calcGradWeights try kernel 1
  419. ... seems valid
  420. BackpropWeightsAuto: kernel 1 0ms
  421. forward try kernel 3
  422. ... seems valid
  423. ForwardAuto: kernel 3 0ms
  424. layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
  425. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=3 outputSize=7 padZeros=0 biased=0 skip=0} }
  426. layer 2:ActivationLayer{ TANH }
  427. layer 3:SquareLossLayer{}
  428. Parameters overview: (skipping 3 layers with 0 params)
  429. layer 1: params=9 100.0%
  430. TOTAL : params=9
  431. loss 7.70633 loss2 7.41581 change: 0.290529
  432. sumweightsdiff -0.0898812
  433. loss change 0.290529
  434. estimatedLossChangeFromW 0.316231
  435. [ OK ] testupdateweights.numericallytest_imagesize9_filtersize3 (940 ms)
  436. [ RUN ] testupdateweights.numericallytest_imagesize3_filtersize3
  437. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  438. Using OpenCL device: Tahiti
  439. forward try kernel 0
  440. ... not plausibly optimal, skipping
  441. forward try kernel 1
  442. ... seems valid
  443. ForwardAuto: kernel 1 0ms
  444. layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
  445. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=3 outputSize=1 padZeros=0 biased=0 skip=0} }
  446. layer 2:ActivationLayer{ TANH }
  447. layer 3:SquareLossLayer{}
  448. Parameters overview: (skipping 3 layers with 0 params)
  449. layer 1: params=9 100.0%
  450. TOTAL : params=9
  451. forward try kernel 2
  452. ... seems valid
  453. ForwardAuto: kernel 2 0ms
  454. calcGradWeights try kernel 0
  455. ... not plausibly optimal, skipping
  456. calcGradWeights try kernel 1
  457. ... seems valid
  458. BackpropWeightsAuto: kernel 1 0ms
  459. forward try kernel 3
  460. ... seems valid
  461. ForwardAuto: kernel 3 0ms
  462. layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
  463. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=3 outputSize=1 padZeros=0 biased=0 skip=0} }
  464. layer 2:ActivationLayer{ TANH }
  465. layer 3:SquareLossLayer{}
  466. Parameters overview: (skipping 3 layers with 0 params)
  467. layer 1: params=9 100.0%
  468. TOTAL : params=9
  469. loss 0.0719101 loss2 0.0694461 change: 0.00246406
  470. sumweightsdiff -0.0110647
  471. loss change 0.00246406
  472. estimatedLossChangeFromW 0.00248372
  473. [ OK ] testupdateweights.numericallytest_imagesize3_filtersize3 (880 ms)
  474. [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3
  475. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  476. Using OpenCL device: Tahiti
  477. forward try kernel 0
  478. ... not plausibly optimal, skipping
  479. forward try kernel 1
  480. ... seems valid
  481. ForwardAuto: kernel 1 0ms
  482. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  483. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  484. layer 2:ActivationLayer{ TANH }
  485. layer 3:SquareLossLayer{}
  486. Parameters overview: (skipping 3 layers with 0 params)
  487. layer 1: params=9 100.0%
  488. TOTAL : params=9
  489. forward try kernel 2
  490. ... seems valid
  491. ForwardAuto: kernel 2 0ms
  492. calcGradWeights try kernel 0
  493. ... not plausibly optimal, skipping
  494. calcGradWeights try kernel 1
  495. ... seems valid
  496. BackpropWeightsAuto: kernel 1 10ms
  497. forward try kernel 3
  498. ... seems valid
  499. ForwardAuto: kernel 3 0ms
  500. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  501. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  502. layer 2:ActivationLayer{ TANH }
  503. layer 3:SquareLossLayer{}
  504. Parameters overview: (skipping 3 layers with 0 params)
  505. layer 1: params=9 100.0%
  506. TOTAL : params=9
  507. loss 1.20022 loss2 1.17241 change: 0.0278131
  508. sumweightsdiff -0.0203888
  509. loss change 0.0278131
  510. estimatedLossChangeFromW 0.0280929
  511. [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3 (910 ms)
  512. [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3
  513. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  514. Using OpenCL device: Tahiti
  515. forward try kernel 0
  516. ... not plausibly optimal, skipping
  517. forward try kernel 1
  518. ... seems valid
  519. ForwardAuto: kernel 1 0ms
  520. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  521. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  522. layer 2:ActivationLayer{ TANH }
  523. layer 3:SquareLossLayer{}
  524. Parameters overview: (skipping 3 layers with 0 params)
  525. layer 1: params=9 100.0%
  526. TOTAL : params=9
  527. forward try kernel 2
  528. ... seems valid
  529. ForwardAuto: kernel 2 0ms
  530. calcGradWeights try kernel 0
  531. ... not plausibly optimal, skipping
  532. calcGradWeights try kernel 1
  533. ... seems valid
  534. BackpropWeightsAuto: kernel 1 0ms
  535. forward try kernel 3
  536. ... seems valid
  537. ForwardAuto: kernel 3 0ms
  538. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  539. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  540. layer 2:ActivationLayer{ TANH }
  541. layer 3:SquareLossLayer{}
  542. Parameters overview: (skipping 3 layers with 0 params)
  543. layer 1: params=9 100.0%
  544. TOTAL : params=9
  545. loss 4.97142 loss2 4.78768 change: 0.183744
  546. sumweightsdiff -0.056004
  547. loss change 0.183744
  548. estimatedLossChangeFromW 0.193264
  549. [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3 (900 ms)
  550. [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3
  551. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  552. Using OpenCL device: Tahiti
  553. forward try kernel 0
  554. ... not plausibly optimal, skipping
  555. forward try kernel 1
  556. ... seems valid
  557. ForwardAuto: kernel 1 0ms
  558. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  559. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  560. layer 2:ActivationLayer{ TANH }
  561. layer 3:SquareLossLayer{}
  562. Parameters overview: (skipping 3 layers with 0 params)
  563. layer 1: params=27 100.0%
  564. TOTAL : params=27
  565. forward try kernel 2
  566. ... seems valid
  567. ForwardAuto: kernel 2 0ms
  568. calcGradWeights try kernel 0
  569. ... not plausibly optimal, skipping
  570. calcGradWeights try kernel 1
  571. ... seems valid
  572. BackpropWeightsAuto: kernel 1 0ms
  573. forward try kernel 3
  574. ... seems valid
  575. ForwardAuto: kernel 3 0ms
  576. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  577. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  578. layer 2:ActivationLayer{ TANH }
  579. layer 3:SquareLossLayer{}
  580. Parameters overview: (skipping 3 layers with 0 params)
  581. layer 1: params=27 100.0%
  582. TOTAL : params=27
  583. loss 1.08887 loss2 0.9575 change: 0.13137
  584. sumweightsdiff -0.00764531
  585. loss change 0.13137
  586. estimatedLossChangeFromW 0.134379
  587. [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3 (940 ms)
  588. [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3
  589. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  590. Using OpenCL device: Tahiti
  591. forward try kernel 0
  592. ... not plausibly optimal, skipping
  593. forward try kernel 1
  594. ... seems valid
  595. ForwardAuto: kernel 1 10ms
  596. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  597. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  598. layer 2:ActivationLayer{ TANH }
  599. layer 3:SquareLossLayer{}
  600. Parameters overview: (skipping 3 layers with 0 params)
  601. layer 1: params=27 100.0%
  602. TOTAL : params=27
  603. forward try kernel 2
  604. ... seems valid
  605. ForwardAuto: kernel 2 0ms
  606. calcGradWeights try kernel 0
  607. ... not plausibly optimal, skipping
  608. calcGradWeights try kernel 1
  609. ... seems valid
  610. BackpropWeightsAuto: kernel 1 0ms
  611. forward try kernel 3
  612. ... seems valid
  613. ForwardAuto: kernel 3 0ms
  614. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  615. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  616. layer 2:ActivationLayer{ TANH }
  617. layer 3:SquareLossLayer{}
  618. Parameters overview: (skipping 3 layers with 0 params)
  619. layer 1: params=27 100.0%
  620. TOTAL : params=27
  621. loss 4.76631 loss2 4.18154 change: 0.584769
  622. sumweightsdiff 0.029606
  623. loss change 0.584769
  624. estimatedLossChangeFromW 0.620442
  625. [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3 (940 ms)
  626. [ RUN ] testupdateweights.backprop_weights_2
  627. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  628. Using OpenCL device: Tahiti
  629. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  630. mismatch for i 0
  631. [ OK ] testupdateweights.backprop_weights_2 (110 ms)
  632. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize2
  633. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  634. Using OpenCL device: Tahiti
  635. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
  636. mismatch for i 0
  637. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize2 (110 ms)
  638. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3
  639. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  640. Using OpenCL device: Tahiti
  641. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  642. mismatch for i 0
  643. mismatch for i 1
  644. mismatch for i 2
  645. mismatch for i 3
  646. mismatch for i 4
  647. mismatch for i 5
  648. mismatch for i 6
  649. mismatch for i 7
  650. mismatch for i 8
  651. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3 (110 ms)
  652. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3
  653. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  654. Using OpenCL device: Tahiti
  655. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
  656. mismatch for i 0
  657. mismatch for i 8
  658. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3 (120 ms)
  659. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3
  660. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  661. Using OpenCL device: Tahiti
  662. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  663. mismatch for i 0
  664. mismatch for i 4
  665. mismatch for i 8
  666. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3 (120 ms)
  667. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1
  668. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  669. Using OpenCL device: Tahiti
  670. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  671. mismatch for i 0
  672. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1 (110 ms)
  673. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1
  674. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  675. Using OpenCL device: Tahiti
  676. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=16 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=256 -DgInputStripeOuterSize=256 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=16 -DgOutputStripeSize=256
  677. mismatch for i 0
  678. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1 (240 ms)
  679. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1
  680. LayerDimensions{ inputPlanes=1 inputSize=17 numFilters=1 filterSize=1 outputSize=17 padZeros=0 biased=0 skip=0}
  681. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  682. Using OpenCL device: Tahiti
  683. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=17 -DgInputStripeOuterNumRows=17 -DgInputStripeInnerSize=289 -DgInputStripeOuterSize=289 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=17 -DgOutputStripeSize=289
  684. mismatch for i 0
  685. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1 (390 ms)
  686. [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata
  687. expectedresult: -958.715
  688. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  689. Using OpenCL device: Tahiti
  690. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=17 -DgInputStripeOuterNumRows=17 -DgInputStripeInnerSize=289 -DgInputStripeOuterSize=289 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=17 -DgOutputStripeSize=289
  691. mismatch for i 0
  692. [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata (380 ms)
  693. [ RUN ] testupdateweights.backprop_instance3_smaller2
  694. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  695. Using OpenCL device: Tahiti
  696. numweights: 36
  697. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=6 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=576 -DgInputStripeOuterSize=1536 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=6 -DgOutputStripeSize=546
  698. 138 0 0 0 0 0
  699. 132 0 0 0 0 0
  700. 138 0 0 0 0 0
  701. 138 0 0 0 0 0
  702. 138 0 0 0 0 0
  703. 132 0 0 0 0 0
  704.  
  705. 138 0 0 0 0 0
  706. 132 0 0 0 0 0
  707. 138 0 0 0 0 0
  708. 138 0 0 0 0 0
  709. 138 0 0 0 0 0
  710. 132 0 0 0 0 0
  711.  
  712. ......
  713. ......
  714. ......
  715. ......
  716. ......
  717. ......
  718.  
  719. 0=0 0 0 0 0 0 0 0
  720. 1=0 0 0 0 0 0 0 0
  721. 2=0 0 0 0 0 0 0 0
  722. 3=0 0 0 0 0 0 0 0
  723. 4=0 0 0 0 0 0 0 0
  724. 5=0 0 0 0 0 0 0 0
  725. 6=0 0 0 0 0 0 0 0
  726. 7=0 0 0 0 0 0 0 0
  727. 8=0 0 0 0 0 0 0 0
  728. 9=0 0 0 0 0 0 0 0
  729. 10=0 0 0 0 0 0 0 0
  730. 11=0 0 0 0 0 0 0 0
  731.  
  732. 0=0 0 0 0 0 0 0 0
  733. 1=0 0 0 0 0 0 0 0
  734. 2=0 0 0 0 0 0 0 0
  735. 3=0 0 0 0 0 0 0 0
  736. 4=0 0 0 0 0 0 0 0
  737. 5=0 0 0 0 0 0 0 0
  738. 6=0 0 0 0 0 0 0 0
  739. 7=0 0 0 0 0 0 0 0
  740. 8=0 0 0 0 0 0 0 0
  741. 9=0 0 0 0 0 0 0 0
  742. 10=0 0 0 0 0 0 0 0
  743. 11=0 0 0 0 0 0 0 0
  744. 12=0 0 0 0 0 0 0 0
  745. 13=0 0 0 0 0 0 0 0
  746. 14=0 0 0 0 0 0 0 0
  747. 15=0 0 0 0 0 0 0 0
  748. 16=0 0 0 0 0 0 0 0
  749. 17=0 0 0 0 0 0 0 0
  750. 18=0 0 0 0 0 0 0 0
  751. 19=0 0 0 0 0 0 0 0
  752. [ OK ] testupdateweights.backprop_instance3_smaller2 (800 ms)
  753. [----------] 23 tests from testupdateweights (15190 ms total)
  754.  
  755. [----------] 17 tests from testforward
  756. [ RUN ] testforward.imagesize2_nopadzeros
  757. expected number of output: 4
  758. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  759. Using OpenCL device: Tahiti
  760. [ OK ] testforward.imagesize2_nopadzeros (430 ms)
  761. [ RUN ] testforward.imagesize2_padzeros
  762. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  763. Using OpenCL device: Tahiti
  764. checking result[0]=0 expecting: 0
  765. checking result[1]=0 expecting: 0
  766. checking result[2]=0 expecting: 0
  767. checking result[3]=0.2 expecting: 0.2
  768. checking result[4]=-0.13 expecting: -0.13
  769. checking result[5]=-0.15 expecting: -0.15
  770. checking result[6]=0 expecting: 0
  771. checking result[7]=0 expecting: 0
  772. checking result[8]=0 expecting: 0
  773. checking result[9]=0 expecting: 0
  774. checking result[10]=0 expecting: 0
  775. checking result[11]=0 expecting: 0
  776. checking result[12]=-0.55 expecting: -0.55
  777. checking result[13]=0.02 expecting: 0.02
  778. checking result[14]=0.21 expecting: 0.21
  779. checking result[27]=-14.3 expecting: -14.3
  780. checking result[28]=-9.6 expecting: -9.6
  781. checking result[29]=11.9 expecting: 11.9
  782. checking result[35]=0.46 expecting: 0.46
  783. [ OK ] testforward.imagesize2_padzeros (170 ms)
  784. [ RUN ] testforward.imagesize3
  785. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  786. Using OpenCL device: Tahiti
  787. test1 ok
  788. [ OK ] testforward.imagesize3 (180 ms)
  789. [ RUN ] testforward.test2
  790. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  791. Using OpenCL device: Tahiti
  792. [ OK ] testforward.test2 (160 ms)
  793. [ RUN ] testforward.test3
  794. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  795. Using OpenCL device: Tahiti
  796. [ OK ] testforward.test3 (170 ms)
  797. [ RUN ] testforward.compare_0_1_biased_nopad
  798. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  799. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  800. Using OpenCL device: Tahiti
  801. initializing clblas
  802. batch 0 batchsize 4
  803. dump enabled=0
  804. batch 0 batchsize 4
  805. dump enabled=0
  806. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  807. clblas teardown
  808. [ OK ] testforward.compare_0_1_biased_nopad (340 ms)
  809. [ RUN ] testforward.compare_0_1_biased_pad
  810. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  811. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  812. Using OpenCL device: Tahiti
  813. initializing clblas
  814. batch 0 batchsize 4
  815. dump enabled=0
  816. batch 0 batchsize 4
  817. dump enabled=0
  818. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  819. clblas teardown
  820. [ OK ] testforward.compare_0_1_biased_pad (360 ms)
  821. [ RUN ] testforward.compare_1_n_biased_nopad
  822. instance: 2
  823. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  824. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  825. Using OpenCL device: Tahiti
  826. initializing clblas
  827. batch 0 batchsize 4
  828. dump enabled=0
  829. batch 0 batchsize 4
  830. dump enabled=0
  831. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  832. clblas teardown
  833. instance: 3
  834. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  835. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  836. Using OpenCL device: Tahiti
  837. initializing clblas
  838. batch 0 batchsize 4
  839. dump enabled=0
  840. batch 0 batchsize 4
  841. dump enabled=0
  842. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  843. clblas teardown
  844. instance: 4
  845. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  846. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  847. Using OpenCL device: Tahiti
  848. initializing clblas
  849. batch 0 batchsize 4
  850. dump enabled=0
  851. batch 0 batchsize 4
  852. dump enabled=0
  853. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  854. clblas teardown
  855. instance: 6
  856. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
  857. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  858. Using OpenCL device: Tahiti
  859. initializing clblas
  860. batch 0 batchsize 4
  861. dump enabled=0
  862. batch 0 batchsize 4
  863. clblas teardown
  864. unknown file: error: C++ exception with description "memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size" thrown in the test body.
  865. [ FAILED ] testforward.compare_1_n_biased_nopad (2230 ms)
  866. [ RUN ] testforward.compare_1_n_biased_pad
  867. instance: 2
  868. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  869. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  870. Using OpenCL device: Tahiti
  871. initializing clblas
  872. clblas teardown
  873. unknown file: error: C++ exception with description "cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize" thrown in the test body.
  874. [ FAILED ] testforward.compare_1_n_biased_pad (310 ms)
  875. [ RUN ] testforward.compare_1_5_biased_nopad
  876. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=19 outputSize=1 padZeros=0 biased=1 skip=0}
  877. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  878. Using OpenCL device: Tahiti
  879. initializing clblas
  880. cl/forward_fc_wgperrow.cl build log:
  881. "C:\Users\pz\AppData\Local\Temp\OCLB4E.tmp.cl", line 75: warning: variable
  882. "loopsPerExample" was declared but never referenced
  883. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  884. ^
  885.  
  886.  
  887. batch 0 batchsize 4
  888. dump enabled=0
  889. batch 0 batchsize 4
  890. dump enabled=0
  891. LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=19 outputSize=1 padZeros=0 biased=1 skip=0}
  892. clblas teardown
  893. [ OK ] testforward.compare_1_5_biased_nopad (820 ms)
  894. [ RUN ] testforward.compare_1_4_fcscenario
  895. LayerDimensions{ inputPlanes=10 inputSize=24 numFilters=10 filterSize=24 outputSize=1 padZeros=0 biased=1 skip=0}
  896. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  897. Using OpenCL device: Tahiti
  898. initializing clblas
  899. batch 0 batchsize 4
  900. dump enabled=0
  901. batch 0 batchsize 4
  902. dump enabled=0
  903. LayerDimensions{ inputPlanes=10 inputSize=24 numFilters=10 filterSize=24 outputSize=1 padZeros=0 biased=1 skip=0}
  904. clblas teardown
  905. [ OK ] testforward.compare_1_4_fcscenario (1200 ms)
  906. [ RUN ] testforward.compare_break1_0_1
  907. LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
  908. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  909. Using OpenCL device: Tahiti
  910. initializing clblas
  911. batch 0 batchsize 1
  912. dump enabled=0
  913. batch 0 batchsize 1
  914. dump enabled=0
  915. LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
  916. clblas teardown
  917. [ OK ] testforward.compare_break1_0_1 (170 ms)
  918. [ RUN ] testforward.compare_break1_0_4
  919. LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
  920. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  921. Using OpenCL device: Tahiti
  922. initializing clblas
  923. batch 0 batchsize 1
  924. dump enabled=0
  925. batch 0 batchsize 1
  926. dump enabled=0
  927. LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
  928. clblas teardown
  929. [ OK ] testforward.compare_break1_0_4 (160 ms)
  930. [ RUN ] testforward.comparespecific_break2
  931. LayerDimensions{ inputPlanes=64 inputSize=19 numFilters=64 filterSize=19 outputSize=1 padZeros=0 biased=0 skip=0}
  932. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  933. Using OpenCL device: Tahiti
  934. initializing clblas
  935. cl/forward_fc_wgperrow.cl build log:
  936. "C:\Users\pz\AppData\Local\Temp\OCL14AC.tmp.cl", line 75: warning: variable
  937. "loopsPerExample" was declared but never referenced
  938. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  939. ^
  940.  
  941.  
  942. batch 0 batchsize 4
  943. dump enabled=0
  944. batch 0 batchsize 4
  945. dump enabled=0
  946. LayerDimensions{ inputPlanes=64 inputSize=19 numFilters=64 filterSize=19 outputSize=1 padZeros=0 biased=0 skip=0}
  947. clblas teardown
  948. [ OK ] testforward.comparespecific_break2 (850 ms)
  949. [ RUN ] testforward.softmax
  950. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  951. Using OpenCL device: Tahiti
  952. output[0]=0.0320586
  953. output[1]=0.0871443
  954. output[2]=0.643914
  955. output[3]=0.236883
  956. loss 0.44019
  957. loss 3.44019
  958. loss 2.44019
  959. loss 1.44019
  960. [ OK ] testforward.softmax (20 ms)
  961. [ RUN ] testforward.softmax_byplane
  962. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  963. Using OpenCL device: Tahiti
  964. output[0]=0.0320586
  965. output[1]=0.0871443
  966. output[2]=0.643914
  967. output[3]=0.236883
  968. loss 0.44019
  969. loss 3.44019
  970. loss 2.44019
  971. loss 1.44019
  972. [ OK ] testforward.softmax_byplane (10 ms)
  973. [ RUN ] testforward.crash_from_jm
  974. -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0
  975. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  976. Using OpenCL device: Tahiti
  977. dump enabled=0
  978. [ OK ] testforward.crash_from_jm (410 ms)
  979. [----------] 17 tests from testforward (7990 ms total)
  980.  
  981. [----------] 2 tests from testfilehelper
  982. [ RUN ] testfilehelper.testfilehelper
  983. [ OK ] testfilehelper.testfilehelper (10 ms)
  984. [ RUN ] testfilehelper.testreadchunk
  985. [ OK ] testfilehelper.testreadchunk (0 ms)
  986. [----------] 2 tests from testfilehelper (10 ms total)
  987.  
  988. [----------] 12 tests from testsimpleconvolvenet
  989. [ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
  990. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  991. Using OpenCL device: Tahiti
  992. initializing clblas
  993. forward try kernel 0
  994. ... not plausibly optimal, skipping
  995. forward try kernel 1
  996. ... seems valid
  997. ForwardAuto: kernel 1 0ms
  998. calcGradWeights try kernel 0
  999. ... not plausibly optimal, skipping
  1000. calcGradWeights try kernel 1
  1001. ... seems valid
  1002. BackpropWeightsAuto: kernel 1 0ms
  1003. loss, E, 0.141046
  1004. accuracy: 2/2 100%
  1005. forward try kernel 2
  1006. ... seems valid
  1007. ForwardAuto: kernel 2 0ms
  1008. calcGradWeights try kernel 2
  1009. ... seems valid
  1010. BackpropWeightsAuto: kernel 2 0ms
  1011. forward try kernel 3
  1012. ... seems valid
  1013. ForwardAuto: kernel 3 0ms
  1014. calcGradWeights try kernel 3
  1015. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1016. ... seems valid
  1017. BackpropWeightsAuto: kernel 3 0ms
  1018. forward try kernel 4
  1019. ... seems valid
  1020. ForwardAuto: kernel 4 0ms
  1021. calcGradWeights try kernel 4
  1022. ... seems valid
  1023. BackpropWeightsAuto: kernel 4 290ms
  1024. forward try kernel 5
  1025. cl/forward_fc_wgperrow.cl build log:
  1026. "C:\Users\pz\AppData\Local\Temp\OCL1C51.tmp.cl", line 75: warning: variable
  1027. "loopsPerExample" was declared but never referenced
  1028. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1029. ^
  1030.  
  1031.  
  1032. ... seems valid
  1033. ForwardAuto: kernel 5 10ms
  1034. calcGradWeights kernel 0: cannot be used
  1035. calcGradWeights kernel 1 time: 0ms
  1036. calcGradWeights kernel 2 time: 0ms
  1037. calcGradWeights kernel 3 time: 0ms
  1038. calcGradWeights kernel 4 time: 290ms
  1039. calcGradWeights layer selected kernel 1
  1040. forward try kernel 6
  1041. ... seems valid
  1042. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1043. forward try kernel 7
  1044. ... seems valid
  1045. ForwardAuto: kernel 7 200ms
  1046. forward kernel 0: cannot be used
  1047. forward kernel 1 time: 0ms
  1048. forward kernel 2 time: 0ms
  1049. forward kernel 3 time: 0ms
  1050. forward kernel 4 time: 0ms
  1051. forward kernel 5 time: 10ms
  1052. forward kernel 6: cannot be used
  1053. forward kernel 7 time: 200ms
  1054. forward layer selected kernel 1
  1055. loss, E, 0.0733092
  1056. accuracy: 2/2 100%
  1057. loss, E, 0.0426809
  1058. accuracy: 2/2 100%
  1059. loss, E, 0.0262452
  1060. accuracy: 2/2 100%
  1061. loss, E, 0.0164245
  1062. accuracy: 2/2 100%
  1063. loss, E, 0.0107573
  1064. accuracy: 2/2 100%
  1065. accuracy: 2/2
  1066. clblas teardown
  1067. [ OK ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh (1920 ms)
  1068. [ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh
  1069. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1070. Using OpenCL device: Tahiti
  1071. initializing clblas
  1072. forward try kernel 0
  1073. ... not plausibly optimal, skipping
  1074. forward try kernel 1
  1075. ... seems valid
  1076. ForwardAuto: kernel 1 0ms
  1077. calcGradWeights try kernel 0
  1078. ... not plausibly optimal, skipping
  1079. calcGradWeights try kernel 1
  1080. ... seems valid
  1081. BackpropWeightsAuto: kernel 1 0ms
  1082. loss, E, 0.964924
  1083. accuracy: 1/2 50%
  1084. forward try kernel 2
  1085. ... seems valid
  1086. ForwardAuto: kernel 2 0ms
  1087. calcGradWeights try kernel 2
  1088. ... seems valid
  1089. BackpropWeightsAuto: kernel 2 0ms
  1090. forward try kernel 3
  1091. ... seems valid
  1092. ForwardAuto: kernel 3 0ms
  1093. calcGradWeights try kernel 3
  1094. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1095. ... seems valid
  1096. BackpropWeightsAuto: kernel 3 0ms
  1097. forward try kernel 4
  1098. ... seems valid
  1099. ForwardAuto: kernel 4 0ms
  1100. calcGradWeights try kernel 4
  1101. ... seems valid
  1102. BackpropWeightsAuto: kernel 4 420ms
  1103. forward try kernel 5
  1104. cl/forward_fc_wgperrow.cl build log:
  1105. "C:\Users\pz\AppData\Local\Temp\OCL247A.tmp.cl", line 75: warning: variable
  1106. "loopsPerExample" was declared but never referenced
  1107. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1108. ^
  1109.  
  1110.  
  1111. ... seems valid
  1112. ForwardAuto: kernel 5 0ms
  1113. calcGradWeights kernel 0: cannot be used
  1114. calcGradWeights kernel 1 time: 0ms
  1115. calcGradWeights kernel 2 time: 0ms
  1116. calcGradWeights kernel 3 time: 0ms
  1117. calcGradWeights kernel 4 time: 420ms
  1118. calcGradWeights layer selected kernel 1
  1119. forward try kernel 6
  1120. ... seems valid
  1121. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1122. forward try kernel 7
  1123. ... seems valid
  1124. ForwardAuto: kernel 7 200ms
  1125. forward kernel 0: cannot be used
  1126. forward kernel 1 time: 0ms
  1127. forward kernel 2 time: 0ms
  1128. forward kernel 3 time: 0ms
  1129. forward kernel 4 time: 0ms
  1130. forward kernel 5 time: 0ms
  1131. forward kernel 6: cannot be used
  1132. forward kernel 7 time: 200ms
  1133. forward layer selected kernel 1
  1134. loss, E, 0.00570461
  1135. accuracy: 2/2 100%
  1136. loss, E, 1.34828e-05
  1137. accuracy: 2/2 100%
  1138. loss, E, 3.62078e-08
  1139. accuracy: 2/2 100%
  1140. accuracy: 2/2
  1141. clblas teardown
  1142. [ OK ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh (2050 ms)
  1143. [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh
  1144. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1145. Using OpenCL device: Tahiti
  1146. initializing clblas
  1147. forward try kernel 0
  1148. ... not plausibly optimal, skipping
  1149. forward try kernel 1
  1150. ... seems valid
  1151. ForwardAuto: kernel 1 10ms
  1152. calcGradWeights try kernel 0
  1153. ... not plausibly optimal, skipping
  1154. calcGradWeights try kernel 1
  1155. ... seems valid
  1156. BackpropWeightsAuto: kernel 1 0ms
  1157. loss, E, 1.13283
  1158. accuracy: 3/4 75%
  1159. forward try kernel 2
  1160. ... seems valid
  1161. ForwardAuto: kernel 2 0ms
  1162. calcGradWeights try kernel 2
  1163. ... seems valid
  1164. BackpropWeightsAuto: kernel 2 0ms
  1165. forward try kernel 3
  1166. ... seems valid
  1167. ForwardAuto: kernel 3 0ms
  1168. calcGradWeights try kernel 3
  1169. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1170. ... seems valid
  1171. BackpropWeightsAuto: kernel 3 0ms
  1172. forward try kernel 4
  1173. ... seems valid
  1174. ForwardAuto: kernel 4 0ms
  1175. calcGradWeights try kernel 4
  1176. ... seems valid
  1177. BackpropWeightsAuto: kernel 4 490ms
  1178. forward try kernel 5
  1179. cl/forward_fc_wgperrow.cl build log:
  1180. "C:\Users\pz\AppData\Local\Temp\OCL2D1F.tmp.cl", line 75: warning: variable
  1181. "loopsPerExample" was declared but never referenced
  1182. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1183. ^
  1184.  
  1185.  
  1186. ... seems valid
  1187. ForwardAuto: kernel 5 0ms
  1188. calcGradWeights kernel 0: cannot be used
  1189. calcGradWeights kernel 1 time: 0ms
  1190. calcGradWeights kernel 2 time: 0ms
  1191. calcGradWeights kernel 3 time: 0ms
  1192. calcGradWeights kernel 4 time: 490ms
  1193. calcGradWeights layer selected kernel 1
  1194. forward try kernel 6
  1195. ... seems valid
  1196. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1197. forward try kernel 7
  1198. ... seems valid
  1199. ForwardAuto: kernel 7 260ms
  1200. loss, E, 0.00996342
  1201. accuracy: 4/4 100%
  1202. forward kernel 0: cannot be used
  1203. forward kernel 1 time: 10ms
  1204. forward kernel 2 time: 0ms
  1205. forward kernel 3 time: 0ms
  1206. forward kernel 4 time: 0ms
  1207. forward kernel 5 time: 0ms
  1208. forward kernel 6: cannot be used
  1209. forward kernel 7 time: 260ms
  1210. forward layer selected kernel 2
  1211. loss, E, 4.70669e-05
  1212. accuracy: 4/4 100%
  1213. loss, E, 4.0975e-07
  1214. accuracy: 4/4 100%
  1215. accuracy: 4/4
  1216. clblas teardown
  1217. [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh (2250 ms)
  1218. [ RUN ] testsimpleconvolvenet.imagesize1_2planes_filtersize1
  1219. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1220. Using OpenCL device: Tahiti
  1221. initializing clblas
  1222. forward try kernel 0
  1223. ... not plausibly optimal, skipping
  1224. forward try kernel 1
  1225. ... seems valid
  1226. ForwardAuto: kernel 1 0ms
  1227. calcGradWeights try kernel 0
  1228. ... not plausibly optimal, skipping
  1229. calcGradWeights try kernel 1
  1230. ... seems valid
  1231. BackpropWeightsAuto: kernel 1 0ms
  1232. loss, E, 0.751601
  1233. accuracy: 2/2 100%
  1234. forward try kernel 2
  1235. ... seems valid
  1236. ForwardAuto: kernel 2 0ms
  1237. calcGradWeights try kernel 2
  1238. ... seems valid
  1239. BackpropWeightsAuto: kernel 2 0ms
  1240. forward try kernel 3
  1241. ... seems valid
  1242. ForwardAuto: kernel 3 0ms
  1243. calcGradWeights try kernel 3
  1244. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1245. ... seems valid
  1246. BackpropWeightsAuto: kernel 3 0ms
  1247. forward try kernel 4
  1248. ... seems valid
  1249. ForwardAuto: kernel 4 0ms
  1250. calcGradWeights try kernel 4
  1251. ... seems valid
  1252. BackpropWeightsAuto: kernel 4 430ms
  1253. forward try kernel 5
  1254. cl/forward_fc_wgperrow.cl build log:
  1255. "C:\Users\pz\AppData\Local\Temp\OCL3507.tmp.cl", line 75: warning: variable
  1256. "loopsPerExample" was declared but never referenced
  1257. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1258. ^
  1259.  
  1260.  
  1261. ... seems valid
  1262. ForwardAuto: kernel 5 0ms
  1263. calcGradWeights kernel 0: cannot be used
  1264. calcGradWeights kernel 1 time: 0ms
  1265. calcGradWeights kernel 2 time: 0ms
  1266. calcGradWeights kernel 3 time: 0ms
  1267. calcGradWeights kernel 4 time: 430ms
  1268. calcGradWeights layer selected kernel 1
  1269. forward try kernel 6
  1270. ... seems valid
  1271. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1272. forward try kernel 7
  1273. ... seems valid
  1274. ForwardAuto: kernel 7 210ms
  1275. loss, E, 0.195916
  1276. accuracy: 2/2 100%
  1277. forward kernel 0: cannot be used
  1278. forward kernel 1 time: 0ms
  1279. forward kernel 2 time: 0ms
  1280. forward kernel 3 time: 0ms
  1281. forward kernel 4 time: 0ms
  1282. forward kernel 5 time: 0ms
  1283. forward kernel 6: cannot be used
  1284. forward kernel 7 time: 210ms
  1285. forward layer selected kernel 1
  1286. loss, E, 0.0679117
  1287. accuracy: 2/2 100%
  1288. loss, E, 0.023677
  1289. accuracy: 2/2 100%
  1290. loss, E, 0.00825563
  1291. accuracy: 2/2 100%
  1292. loss, E, 0.00287856
  1293. accuracy: 2/2 100%
  1294. loss, E, 0.00100369
  1295. accuracy: 2/2 100%
  1296. loss, E, 0.000349964
  1297. accuracy: 2/2 100%
  1298. accuracy: 2/2 100%
  1299. accuracy: 2/2
  1300. loss, E, 0.000150648
  1301. clblas teardown
  1302. [ OK ] testsimpleconvolvenet.imagesize1_2planes_filtersize1 (1950 ms)
  1303. [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu
  1304. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1305. Using OpenCL device: Tahiti
  1306. initializing clblas
  1307. forward try kernel 0
  1308. ... not plausibly optimal, skipping
  1309. forward try kernel 1
  1310. ... seems valid
  1311. ForwardAuto: kernel 1 0ms
  1312. calcGradWeights try kernel 0
  1313. ... not plausibly optimal, skipping
  1314. calcGradWeights try kernel 1
  1315. ... seems valid
  1316. BackpropWeightsAuto: kernel 1 0ms
  1317. loss, E, 1.48951
  1318. accuracy: 2/4 50%
  1319. forward try kernel 2
  1320. ... seems valid
  1321. ForwardAuto: kernel 2 0ms
  1322. calcGradWeights try kernel 2
  1323. ... seems valid
  1324. BackpropWeightsAuto: kernel 2 0ms
  1325. forward try kernel 3
  1326. ... seems valid
  1327. ForwardAuto: kernel 3 0ms
  1328. calcGradWeights try kernel 3
  1329. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1330. ... seems valid
  1331. BackpropWeightsAuto: kernel 3 0ms
  1332. forward try kernel 4
  1333. ... seems valid
  1334. ForwardAuto: kernel 4 0ms
  1335. calcGradWeights try kernel 4
  1336. ... seems valid
  1337. BackpropWeightsAuto: kernel 4 500ms
  1338. forward try kernel 5
  1339. cl/forward_fc_wgperrow.cl build log:
  1340. "C:\Users\pz\AppData\Local\Temp\OCL3DDA.tmp.cl", line 75: warning: variable
  1341. "loopsPerExample" was declared but never referenced
  1342. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1343. ^
  1344.  
  1345.  
  1346. ... seems valid
  1347. ForwardAuto: kernel 5 0ms
  1348. calcGradWeights kernel 0: cannot be used
  1349. calcGradWeights kernel 1 time: 0ms
  1350. calcGradWeights kernel 2 time: 0ms
  1351. calcGradWeights kernel 3 time: 0ms
  1352. calcGradWeights kernel 4 time: 500ms
  1353. calcGradWeights layer selected kernel 1
  1354. forward try kernel 6
  1355. ... seems valid
  1356. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1357. forward try kernel 7
  1358. ... seems valid
  1359. ForwardAuto: kernel 7 250ms
  1360. loss, E, 1.12957
  1361. accuracy: 2/4 50%
  1362. forward kernel 0: cannot be used
  1363. forward kernel 1 time: 0ms
  1364. forward kernel 2 time: 0ms
  1365. forward kernel 3 time: 0ms
  1366. forward kernel 4 time: 0ms
  1367. forward kernel 5 time: 0ms
  1368. forward kernel 6: cannot be used
  1369. forward kernel 7 time: 250ms
  1370. forward layer selected kernel 1
  1371. loss, E, 0.070782
  1372. accuracy: 4/4 100%
  1373. loss, E, 0.003026
  1374. accuracy: 4/4 100%
  1375. loss, E, 0.00021158
  1376. accuracy: 4/4 100%
  1377. loss, E, 1.96858e-05
  1378. accuracy: 4/4 100%
  1379. loss, E, 2.03002e-06
  1380. accuracy: 4/4 100%
  1381. loss, E, 2.15572e-07
  1382. accuracy: 4/4 100%
  1383. loss, E, 2.3083e-08
  1384. accuracy: 4/4 100%
  1385. loss, E, 2.48239e-09
  1386. accuracy: 4/4 100%
  1387. loss, E, 4.14442e-10
  1388. accuracy: 4/4 100%
  1389. accuracy: 4/4
  1390. loss, E, 4.14442e-10
  1391. clblas teardown
  1392. [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu (2330 ms)
  1393. [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear
  1394. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1395. Using OpenCL device: Tahiti
  1396. initializing clblas
  1397. forward try kernel 0
  1398. ... not plausibly optimal, skipping
  1399. forward try kernel 1
  1400. ... seems valid
  1401. ForwardAuto: kernel 1 10ms
  1402. calcGradWeights try kernel 0
  1403. ... not plausibly optimal, skipping
  1404. calcGradWeights try kernel 1
  1405. ... seems valid
  1406. BackpropWeightsAuto: kernel 1 0ms
  1407. loss, E, 0.50604
  1408. accuracy: 4/4 100%
  1409. forward try kernel 2
  1410. ... seems valid
  1411. ForwardAuto: kernel 2 0ms
  1412. calcGradWeights try kernel 2
  1413. ... seems valid
  1414. BackpropWeightsAuto: kernel 2 0ms
  1415. forward try kernel 3
  1416. ... seems valid
  1417. ForwardAuto: kernel 3 0ms
  1418. calcGradWeights try kernel 3
  1419. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1420. ... seems valid
  1421. BackpropWeightsAuto: kernel 3 0ms
  1422. forward try kernel 4
  1423. ... seems valid
  1424. ForwardAuto: kernel 4 0ms
  1425. calcGradWeights try kernel 4
  1426. ... seems valid
  1427. BackpropWeightsAuto: kernel 4 510ms
  1428. forward try kernel 5
  1429. cl/forward_fc_wgperrow.cl build log:
  1430. "C:\Users\pz\AppData\Local\Temp\OCL466E.tmp.cl", line 75: warning: variable
  1431. "loopsPerExample" was declared but never referenced
  1432. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1433. ^
  1434.  
  1435.  
  1436. ... seems valid
  1437. ForwardAuto: kernel 5 0ms
  1438. calcGradWeights kernel 0: cannot be used
  1439. calcGradWeights kernel 1 time: 0ms
  1440. calcGradWeights kernel 2 time: 0ms
  1441. calcGradWeights kernel 3 time: 0ms
  1442. calcGradWeights kernel 4 time: 510ms
  1443. calcGradWeights layer selected kernel 1
  1444. forward try kernel 6
  1445. ... seems valid
  1446. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1447. forward try kernel 7
  1448. ... seems valid
  1449. ForwardAuto: kernel 7 260ms
  1450. loss, E, 0.0565529
  1451. accuracy: 4/4 100%
  1452. forward kernel 0: cannot be used
  1453. forward kernel 1 time: 10ms
  1454. forward kernel 2 time: 0ms
  1455. forward kernel 3 time: 0ms
  1456. forward kernel 4 time: 0ms
  1457. forward kernel 5 time: 0ms
  1458. forward kernel 6: cannot be used
  1459. forward kernel 7 time: 260ms
  1460. forward layer selected kernel 2
  1461. loss, E, 0.00777245
  1462. accuracy: 4/4 100%
  1463. loss, E, 0.00106831
  1464. accuracy: 4/4 100%
  1465. loss, E, 0.000218376
  1466. accuracy: 4/4 100%
  1467. accuracy: 4/4
  1468. loss, E, 0.000218376
  1469. clblas teardown
  1470. [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear (2110 ms)
  1471. [ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
  1472. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1473. Using OpenCL device: Tahiti
  1474. initializing clblas
  1475. forward try kernel 0
  1476. ... not plausibly optimal, skipping
  1477. forward try kernel 1
  1478. ... seems valid
  1479. ForwardAuto: kernel 1 0ms
  1480. forward try kernel 0
  1481. ... not plausibly optimal, skipping
  1482. forward try kernel 1
  1483. ... seems valid
  1484. ForwardAuto: kernel 1 0ms
  1485. backward try kernel 0
  1486. ... not plausibly optimal, skipping
  1487. backward try kernel 1
  1488. ... seems valid
  1489. BackwardAuto: kernel 1 0ms
  1490. calcGradWeights try kernel 0
  1491. ... not plausibly optimal, skipping
  1492. calcGradWeights try kernel 1
  1493. ... seems valid
  1494. BackpropWeightsAuto: kernel 1 0ms
  1495. calcGradWeights try kernel 0
  1496. ... not plausibly optimal, skipping
  1497. calcGradWeights try kernel 1
  1498. ... seems valid
  1499. BackpropWeightsAuto: kernel 1 0ms
  1500. epoch 0 loss, E, 0.0559531
  1501. forward try kernel 2
  1502. ... seems valid
  1503. ForwardAuto: kernel 2 0ms
  1504. forward try kernel 2
  1505. ... seems valid
  1506. ForwardAuto: kernel 2 0ms
  1507. backward try kernel 2
  1508. ... seems valid
  1509. BackwardAuto: kernel 2 0ms
  1510. calcGradWeights try kernel 2
  1511. ... seems valid
  1512. BackpropWeightsAuto: kernel 2 0ms
  1513. calcGradWeights try kernel 2
  1514. ... seems valid
  1515. BackpropWeightsAuto: kernel 2 0ms
  1516. epoch 1 loss, E, 0.0254554
  1517. forward try kernel 3
  1518. ... seems valid
  1519. ForwardAuto: kernel 3 0ms
  1520. forward try kernel 3
  1521. ... seems valid
  1522. ForwardAuto: kernel 3 0ms
  1523. backward try kernel 3
  1524. ... seems valid
  1525. BackwardAuto: kernel 3 223ms
  1526. calcGradWeights try kernel 3
  1527. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1528. ... seems valid
  1529. BackpropWeightsAuto: kernel 3 0ms
  1530. calcGradWeights try kernel 3
  1531. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1532. ... seems valid
  1533. BackpropWeightsAuto: kernel 3 0ms
  1534. epoch 2 loss, E, 0.0172943
  1535. forward try kernel 4
  1536. ... seems valid
  1537. ForwardAuto: kernel 4 0ms
  1538. forward try kernel 4
  1539. ... seems valid
  1540. ForwardAuto: kernel 4 0ms
  1541. backward kernel 0: cannot be used
  1542. backward kernel 1 time: 0ms
  1543. backward kernel 2 time: 0ms
  1544. backward kernel 3 time: 223ms
  1545. backward layer selected kernel 1
  1546. calcGradWeights try kernel 4
  1547. ... seems valid
  1548. BackpropWeightsAuto: kernel 4 430ms
  1549. calcGradWeights try kernel 4
  1550. ... seems valid
  1551. BackpropWeightsAuto: kernel 4 70ms
  1552. epoch 3 loss, E, 0.0138013
  1553. forward try kernel 5
  1554. cl/forward_fc_wgperrow.cl build log:
  1555. "C:\Users\pz\AppData\Local\Temp\OCL52A9.tmp.cl", line 75: warning: variable
  1556. "loopsPerExample" was declared but never referenced
  1557. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1558. ^
  1559.  
  1560.  
  1561. ... seems valid
  1562. ForwardAuto: kernel 5 0ms
  1563. forward try kernel 5
  1564. cl/forward_fc_wgperrow.cl build log:
  1565. "C:\Users\pz\AppData\Local\Temp\OCL52F8.tmp.cl", line 75: warning: variable
  1566. "loopsPerExample" was declared but never referenced
  1567. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1568. ^
  1569.  
  1570.  
  1571. ... seems valid
  1572. ForwardAuto: kernel 5 0ms
  1573. calcGradWeights kernel 0: cannot be used
  1574. calcGradWeights kernel 1 time: 0ms
  1575. calcGradWeights kernel 2 time: 0ms
  1576. calcGradWeights kernel 3 time: 0ms
  1577. calcGradWeights kernel 4 time: 430ms
  1578. calcGradWeights layer selected kernel 1
  1579. calcGradWeights kernel 0: cannot be used
  1580. calcGradWeights kernel 1 time: 0ms
  1581. calcGradWeights kernel 2 time: 0ms
  1582. calcGradWeights kernel 3 time: 0ms
  1583. calcGradWeights kernel 4 time: 70ms
  1584. calcGradWeights layer selected kernel 1
  1585. epoch 4 loss, E, 0.0115848
  1586. forward try kernel 6
  1587. ... seems valid
  1588. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1589. forward try kernel 7
  1590. ... seems valid
  1591. ForwardAuto: kernel 7 200ms
  1592. forward try kernel 6
  1593. ... seems valid
  1594. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1595. forward try kernel 7
  1596. ... seems valid
  1597. ForwardAuto: kernel 7 80ms
  1598. epoch 5 loss, E, 0.00987036
  1599. forward kernel 0: cannot be used
  1600. forward kernel 1 time: 0ms
  1601. forward kernel 2 time: 0ms
  1602. forward kernel 3 time: 0ms
  1603. forward kernel 4 time: 0ms
  1604. forward kernel 5 time: 0ms
  1605. forward kernel 6: cannot be used
  1606. forward kernel 7 time: 200ms
  1607. forward layer selected kernel 1
  1608. forward kernel 0: cannot be used
  1609. forward kernel 1 time: 0ms
  1610. forward kernel 2 time: 0ms
  1611. forward kernel 3 time: 0ms
  1612. forward kernel 4 time: 0ms
  1613. forward kernel 5 time: 0ms
  1614. forward kernel 6: cannot be used
  1615. forward kernel 7 time: 80ms
  1616. forward layer selected kernel 1
  1617. epoch 6 loss, E, 0.00844797
  1618. epoch 7 loss, E, 0.00724182
  1619. epoch 8 loss, E, 0.00621212
  1620. epoch 9 loss, E, 0.00533106
  1621. epoch 10 loss, E, 0.00457645
  1622. epoch 11 loss, E, 0.00392979
  1623. epoch 12 loss, E, 0.00337539
  1624. epoch 13 loss, E, 0.00289992
  1625. epoch 14 loss, E, 0.002492
  1626. epoch 15 loss, E, 0.00214191
  1627. epoch 16 loss, E, 0.00184138
  1628. epoch 17 loss, E, 0.00158331
  1629. epoch 18 loss, E, 0.00136164
  1630. epoch 19 loss, E, 0.0011712
  1631. epoch 20 loss, E, 0.00100754
  1632. epoch 21 loss, E, 0.000866877
  1633. epoch 22 loss, E, 0.000745946
  1634. epoch 23 loss, E, 0.000641966
  1635. epoch 24 loss, E, 0.000552543
  1636. epoch 25 loss, E, 0.000475625
  1637. epoch 26 loss, E, 0.000409454
  1638. epoch 27 loss, E, 0.000352522
  1639. epoch 28 loss, E, 0.000303531
  1640. epoch 29 loss, E, 0.00026137
  1641. epoch 30 loss, E, 0.000225082
  1642. epoch 31 loss, E, 0.000193845
  1643. epoch 32 loss, E, 0.000166954
  1644. epoch 33 loss, E, 0.000143801
  1645. epoch 34 loss, E, 0.000123866
  1646. epoch 35 loss, E, 0.000106699
  1647. epoch 36 loss, E, 9.19176e-05
  1648. epoch 37 loss, E, 7.91864e-05
  1649. epoch 38 loss, E, 6.82211e-05
  1650. epoch 39 loss, E, 5.87767e-05
  1651. layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
  1652. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  1653. layer 2:ActivationLayer{ RELU }
  1654. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  1655. layer 4:SquareLossLayer{}
  1656. Parameters overview: (skipping 3 layers with 0 params)
  1657. layer 1: params=4 40.0%
  1658. layer 3: params=6 60.0%
  1659. TOTAL : params=10
  1660. loss, E, 5.87767e-05
  1661. accuracy: 2/2 100%
  1662. accuracy: 2/2
  1663. loss, E, 5.87767e-05
  1664. loss, E, 5.87767e-05
  1665. layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
  1666. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  1667. layer 2:ActivationLayer{ RELU }
  1668. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  1669. layer 4:SquareLossLayer{}
  1670. Parameters overview: (skipping 3 layers with 0 params)
  1671. layer 1: params=4 40.0%
  1672. layer 3: params=6 60.0%
  1673. TOTAL : params=10
  1674. float weights1[] = {-0.303866f, -1.59823f};
  1675. float weights3[] = {0.426358f, -0.719592f, -0.420361f, 0.719566f};
  1676. float bias1[] = {-0.324465f, 0.60279f};
  1677. float bias3[] = {0.506862f, -0.506837f};
  1678. clblas teardown
  1679. [ OK ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased (3450 ms)
  1680. [ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
  1681. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1682. Using OpenCL device: Tahiti
  1683. initializing clblas
  1684. forward try kernel 0
  1685. ... not plausibly optimal, skipping
  1686. forward try kernel 1
  1687. ... seems valid
  1688. ForwardAuto: kernel 1 0ms
  1689. forward try kernel 0
  1690. ... not plausibly optimal, skipping
  1691. forward try kernel 1
  1692. ... seems valid
  1693. ForwardAuto: kernel 1 0ms
  1694. backward try kernel 0
  1695. ... not plausibly optimal, skipping
  1696. backward try kernel 1
  1697. ... seems valid
  1698. BackwardAuto: kernel 1 0ms
  1699. calcGradWeights try kernel 0
  1700. ... not plausibly optimal, skipping
  1701. calcGradWeights try kernel 1
  1702. ... seems valid
  1703. BackpropWeightsAuto: kernel 1 0ms
  1704. calcGradWeights try kernel 0
  1705. ... not plausibly optimal, skipping
  1706. calcGradWeights try kernel 1
  1707. ... seems valid
  1708. BackpropWeightsAuto: kernel 1 0ms
  1709. loss, E, 1.19067
  1710. forward try kernel 2
  1711. ... seems valid
  1712. ForwardAuto: kernel 2 0ms
  1713. forward try kernel 2
  1714. ... seems valid
  1715. ForwardAuto: kernel 2 0ms
  1716. backward try kernel 2
  1717. ... seems valid
  1718. BackwardAuto: kernel 2 0ms
  1719. calcGradWeights try kernel 2
  1720. ... seems valid
  1721. BackpropWeightsAuto: kernel 2 0ms
  1722. calcGradWeights try kernel 2
  1723. ... seems valid
  1724. BackpropWeightsAuto: kernel 2 0ms
  1725. forward try kernel 3
  1726. ... seems valid
  1727. ForwardAuto: kernel 3 0ms
  1728. forward try kernel 3
  1729. ... seems valid
  1730. ForwardAuto: kernel 3 0ms
  1731. backward try kernel 3
  1732. ... seems valid
  1733. BackwardAuto: kernel 3 217ms
  1734. calcGradWeights try kernel 3
  1735. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1736. ... seems valid
  1737. BackpropWeightsAuto: kernel 3 0ms
  1738. calcGradWeights try kernel 3
  1739. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1740. ... seems valid
  1741. BackpropWeightsAuto: kernel 3 16ms
  1742. forward try kernel 4
  1743. ... seems valid
  1744. ForwardAuto: kernel 4 0ms
  1745. forward try kernel 4
  1746. ... seems valid
  1747. ForwardAuto: kernel 4 0ms
  1748. backward kernel 0: cannot be used
  1749. backward kernel 1 time: 0ms
  1750. backward kernel 2 time: 0ms
  1751. backward kernel 3 time: 217ms
  1752. backward layer selected kernel 1
  1753. calcGradWeights try kernel 4
  1754. ... seems valid
  1755. BackpropWeightsAuto: kernel 4 421ms
  1756. calcGradWeights try kernel 4
  1757. ... seems valid
  1758. BackpropWeightsAuto: kernel 4 78ms
  1759. forward try kernel 5
  1760. cl/forward_fc_wgperrow.cl build log:
  1761. "C:\Users\pz\AppData\Local\Temp\OCL6040.tmp.cl", line 75: warning: variable
  1762. "loopsPerExample" was declared but never referenced
  1763. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1764. ^
  1765.  
  1766.  
  1767. ... seems valid
  1768. ForwardAuto: kernel 5 0ms
  1769. forward try kernel 5
  1770. cl/forward_fc_wgperrow.cl build log:
  1771. "C:\Users\pz\AppData\Local\Temp\OCL607F.tmp.cl", line 75: warning: variable
  1772. "loopsPerExample" was declared but never referenced
  1773. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1774. ^
  1775.  
  1776.  
  1777. ... seems valid
  1778. ForwardAuto: kernel 5 0ms
  1779. calcGradWeights kernel 0: cannot be used
  1780. calcGradWeights kernel 1 time: 0ms
  1781. calcGradWeights kernel 2 time: 0ms
  1782. calcGradWeights kernel 3 time: 0ms
  1783. calcGradWeights kernel 4 time: 421ms
  1784. calcGradWeights layer selected kernel 1
  1785. calcGradWeights kernel 0: cannot be used
  1786. calcGradWeights kernel 1 time: 0ms
  1787. calcGradWeights kernel 2 time: 0ms
  1788. calcGradWeights kernel 3 time: 16ms
  1789. calcGradWeights kernel 4 time: 78ms
  1790. calcGradWeights layer selected kernel 1
  1791. forward try kernel 6
  1792. ... seems valid
  1793. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1794. forward try kernel 7
  1795. ... seems valid
  1796. ForwardAuto: kernel 7 218ms
  1797. forward try kernel 6
  1798. ... seems valid
  1799. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1800. forward try kernel 7
  1801. ... seems valid
  1802. ForwardAuto: kernel 7 78ms
  1803. loss, E, 0.0667568
  1804. forward kernel 0: cannot be used
  1805. forward kernel 1 time: 0ms
  1806. forward kernel 2 time: 0ms
  1807. forward kernel 3 time: 0ms
  1808. forward kernel 4 time: 0ms
  1809. forward kernel 5 time: 0ms
  1810. forward kernel 6: cannot be used
  1811. forward kernel 7 time: 218ms
  1812. forward layer selected kernel 1
  1813. forward kernel 0: cannot be used
  1814. forward kernel 1 time: 0ms
  1815. forward kernel 2 time: 0ms
  1816. forward kernel 3 time: 0ms
  1817. forward kernel 4 time: 0ms
  1818. forward kernel 5 time: 0ms
  1819. forward kernel 6: cannot be used
  1820. forward kernel 7 time: 78ms
  1821. forward layer selected kernel 1
  1822. loss, E, 0.00923595
  1823. loss, E, 0.00112611
  1824. loss, E, 0.0001174
  1825. loss, E, 1.15642e-05
  1826. dump enabled=0
  1827. loss, E, 1.78564e-06
  1828. accuracy: 2/2 100%
  1829. accuracy: 2/2
  1830. loss, E, 1.78564e-06
  1831. clblas teardown
  1832. [ OK ] testsimpleconvolvenet.imagesize1_n2_2layers_biased (3385 ms)
  1833. [ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
  1834. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1835. Using OpenCL device: Tahiti
  1836. initializing clblas
  1837. forward try kernel 0
  1838. ... not plausibly optimal, skipping
  1839. forward try kernel 1
  1840. ... seems valid
  1841. ForwardAuto: kernel 1 0ms
  1842. forward try kernel 0
  1843. ... not plausibly optimal, skipping
  1844. forward try kernel 1
  1845. ... seems valid
  1846. ForwardAuto: kernel 1 0ms
  1847. backward try kernel 0
  1848. ... not plausibly optimal, skipping
  1849. backward try kernel 1
  1850. ... seems valid
  1851. BackwardAuto: kernel 1 0ms
  1852. calcGradWeights try kernel 0
  1853. ... not plausibly optimal, skipping
  1854. calcGradWeights try kernel 1
  1855. ... seems valid
  1856. BackpropWeightsAuto: kernel 1 0ms
  1857. calcGradWeights try kernel 0
  1858. ... not plausibly optimal, skipping
  1859. calcGradWeights try kernel 1
  1860. ... seems valid
  1861. BackpropWeightsAuto: kernel 1 0ms
  1862. loss, E, 1.6207
  1863. forward try kernel 2
  1864. ... seems valid
  1865. ForwardAuto: kernel 2 0ms
  1866. forward try kernel 2
  1867. ... seems valid
  1868. ForwardAuto: kernel 2 0ms
  1869. backward try kernel 2
  1870. ... seems valid
  1871. BackwardAuto: kernel 2 0ms
  1872. calcGradWeights try kernel 2
  1873. ... seems valid
  1874. BackpropWeightsAuto: kernel 2 0ms
  1875. calcGradWeights try kernel 2
  1876. ... seems valid
  1877. BackpropWeightsAuto: kernel 2 0ms
  1878. forward try kernel 3
  1879. ... seems valid
  1880. ForwardAuto: kernel 3 0ms
  1881. forward try kernel 3
  1882. ... seems valid
  1883. ForwardAuto: kernel 3 0ms
  1884. backward try kernel 3
  1885. ... seems valid
  1886. BackwardAuto: kernel 3 249ms
  1887. calcGradWeights try kernel 3
  1888. options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=3 -D gFilterSize=4 -D gHalfFilterSize=2 -D gFilterSizeSquared=16 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=3 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=40 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  1889. ... seems valid
  1890. BackpropWeightsAuto: kernel 3 0ms
  1891. calcGradWeights try kernel 3
  1892. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=4 -D gOutputSizeSquared=16 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=1 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=35 -DgInputStripeMarginSize=5 -DgOutputStripeNumRows=4 -DgOutputStripeSize=16
  1893. ... seems valid
  1894. BackpropWeightsAuto: kernel 3 0ms
  1895. forward try kernel 4
  1896. ... seems valid
  1897. ForwardAuto: kernel 4 0ms
  1898. forward try kernel 4
  1899. ... seems valid
  1900. ForwardAuto: kernel 4 0ms
  1901. backward kernel 0: cannot be used
  1902. backward kernel 1 time: 0ms
  1903. backward kernel 2 time: 0ms
  1904. backward kernel 3 time: 249ms
  1905. backward layer selected kernel 1
  1906. calcGradWeights try kernel 4
  1907. ... seems valid
  1908. BackpropWeightsAuto: kernel 4 468ms
  1909. calcGradWeights try kernel 4
  1910. ... seems valid
  1911. BackpropWeightsAuto: kernel 4 437ms
  1912. forward try kernel 5
  1913. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  1914. ... not valid
  1915. forward try kernel 6
  1916. ... seems valid
  1917. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1918. forward try kernel 7
  1919. ... seems valid
  1920. ForwardAuto: kernel 7 234ms
  1921. forward try kernel 5
  1922. cl/forward_fc_wgperrow.cl build log:
  1923. "C:\Users\pz\AppData\Local\Temp\OCL7234.tmp.cl", line 75: warning: variable
  1924. "loopsPerExample" was declared but never referenced
  1925. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  1926. ^
  1927.  
  1928.  
  1929. ... seems valid
  1930. ForwardAuto: kernel 5 0ms
  1931. calcGradWeights kernel 0: cannot be used
  1932. calcGradWeights kernel 1 time: 0ms
  1933. calcGradWeights kernel 2 time: 0ms
  1934. calcGradWeights kernel 3 time: 0ms
  1935. calcGradWeights kernel 4 time: 468ms
  1936. calcGradWeights layer selected kernel 1
  1937. calcGradWeights kernel 0: cannot be used
  1938. calcGradWeights kernel 1 time: 0ms
  1939. calcGradWeights kernel 2 time: 0ms
  1940. calcGradWeights kernel 3 time: 0ms
  1941. calcGradWeights kernel 4 time: 437ms
  1942. calcGradWeights layer selected kernel 1
  1943. forward kernel 0: cannot be used
  1944. forward kernel 1 time: 0ms
  1945. forward kernel 2 time: 0ms
  1946. forward kernel 3 time: 0ms
  1947. forward kernel 4 time: 0ms
  1948. forward kernel 5: cannot be used
  1949. forward kernel 6: cannot be used
  1950. forward kernel 7 time: 234ms
  1951. forward layer selected kernel 1
  1952. forward try kernel 6
  1953. ... seems valid
  1954. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  1955. forward try kernel 7
  1956. ... seems valid
  1957. ForwardAuto: kernel 7 234ms
  1958. forward kernel 0: cannot be used
  1959. forward kernel 1 time: 0ms
  1960. forward kernel 2 time: 0ms
  1961. forward kernel 3 time: 0ms
  1962. forward kernel 4 time: 0ms
  1963. forward kernel 5 time: 0ms
  1964. forward kernel 6: cannot be used
  1965. forward kernel 7 time: 234ms
  1966. forward layer selected kernel 1
  1967. loss, E, 0.000427028
  1968. loss, E, 8.40991e-08
  1969. loss, E, 3.03482e-11
  1970. loss, E, 2.59792e-13
  1971. loss, E, 1.15907e-13
  1972. loss, E, 8.03801e-14
  1973. loss, E, 7.14984e-14
  1974. loss, E, 6.08402e-14
  1975. loss, E, 6.9722e-14
  1976. loss, E, 6.9722e-14
  1977. accuracy: 3/3 100%
  1978. accuracy: 3/3
  1979. loss, E, 6.9722e-14
  1980. clblas teardown
  1981. [ OK ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3 (6365 ms)
  1982. [ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
  1983. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  1984. Using OpenCL device: Tahiti
  1985. initializing clblas
  1986. forward try kernel 0
  1987. ... not plausibly optimal, skipping
  1988. forward try kernel 1
  1989. ... seems valid
  1990. ForwardAuto: kernel 1 16ms
  1991. forward try kernel 0
  1992. ... not plausibly optimal, skipping
  1993. forward try kernel 1
  1994. ... seems valid
  1995. ForwardAuto: kernel 1 0ms
  1996. backward try kernel 0
  1997. ... not plausibly optimal, skipping
  1998. backward try kernel 1
  1999. ... seems valid
  2000. BackwardAuto: kernel 1 0ms
  2001. calcGradWeights try kernel 0
  2002. ... not plausibly optimal, skipping
  2003. calcGradWeights try kernel 1
  2004. ... seems valid
  2005. BackpropWeightsAuto: kernel 1 0ms
  2006. calcGradWeights try kernel 0
  2007. ... not plausibly optimal, skipping
  2008. calcGradWeights try kernel 1
  2009. ... seems valid
  2010. BackpropWeightsAuto: kernel 1 0ms
  2011. loss, E, 3.64011
  2012. forward try kernel 2
  2013. ... seems valid
  2014. ForwardAuto: kernel 2 0ms
  2015. forward try kernel 2
  2016. ... seems valid
  2017. ForwardAuto: kernel 2 0ms
  2018. backward try kernel 2
  2019. ... seems valid
  2020. BackwardAuto: kernel 2 0ms
  2021. calcGradWeights try kernel 2
  2022. ... seems valid
  2023. BackpropWeightsAuto: kernel 2 0ms
  2024. calcGradWeights try kernel 2
  2025. ... seems valid
  2026. BackpropWeightsAuto: kernel 2 0ms
  2027. forward try kernel 3
  2028. ... seems valid
  2029. ForwardAuto: kernel 3 0ms
  2030. forward try kernel 3
  2031. ... seems valid
  2032. ForwardAuto: kernel 3 0ms
  2033. backward try kernel 3
  2034. ... seems valid
  2035. BackwardAuto: kernel 3 265ms
  2036. calcGradWeights try kernel 3
  2037. options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=3 -D gFilterSize=4 -D gHalfFilterSize=2 -D gFilterSizeSquared=16 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=3 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=40 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2038. ... seems valid
  2039. BackpropWeightsAuto: kernel 3 0ms
  2040. calcGradWeights try kernel 3
  2041. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=4 -D gOutputSizeSquared=16 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=1 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=35 -DgInputStripeMarginSize=5 -DgOutputStripeNumRows=4 -DgOutputStripeSize=16
  2042. ... seems valid
  2043. BackpropWeightsAuto: kernel 3 0ms
  2044. forward try kernel 4
  2045. ... seems valid
  2046. ForwardAuto: kernel 4 16ms
  2047. forward try kernel 4
  2048. ... seems valid
  2049. ForwardAuto: kernel 4 0ms
  2050. backward kernel 0: cannot be used
  2051. backward kernel 1 time: 0ms
  2052. backward kernel 2 time: 0ms
  2053. backward kernel 3 time: 265ms
  2054. backward layer selected kernel 1
  2055. calcGradWeights try kernel 4
  2056. ... seems valid
  2057. BackpropWeightsAuto: kernel 4 452ms
  2058. calcGradWeights try kernel 4
  2059. ... seems valid
  2060. BackpropWeightsAuto: kernel 4 452ms
  2061. forward try kernel 5
  2062. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  2063. ... not valid
  2064. forward try kernel 6
  2065. ... seems valid
  2066. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2067. forward try kernel 7
  2068. ... seems valid
  2069. ForwardAuto: kernel 7 234ms
  2070. forward try kernel 5
  2071. cl/forward_fc_wgperrow.cl build log:
  2072. "C:\Users\pz\AppData\Local\Temp\OCL8B45.tmp.cl", line 75: warning: variable
  2073. "loopsPerExample" was declared but never referenced
  2074. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2075. ^
  2076.  
  2077.  
  2078. ... seems valid
  2079. ForwardAuto: kernel 5 0ms
  2080. calcGradWeights kernel 0: cannot be used
  2081. calcGradWeights kernel 1 time: 0ms
  2082. calcGradWeights kernel 2 time: 0ms
  2083. calcGradWeights kernel 3 time: 0ms
  2084. calcGradWeights kernel 4 time: 452ms
  2085. calcGradWeights layer selected kernel 1
  2086. calcGradWeights kernel 0: cannot be used
  2087. calcGradWeights kernel 1 time: 0ms
  2088. calcGradWeights kernel 2 time: 0ms
  2089. calcGradWeights kernel 3 time: 0ms
  2090. calcGradWeights kernel 4 time: 452ms
  2091. calcGradWeights layer selected kernel 1
  2092. forward kernel 0: cannot be used
  2093. forward kernel 1 time: 16ms
  2094. forward kernel 2 time: 0ms
  2095. forward kernel 3 time: 0ms
  2096. forward kernel 4 time: 16ms
  2097. forward kernel 5: cannot be used
  2098. forward kernel 6: cannot be used
  2099. forward kernel 7 time: 234ms
  2100. forward layer selected kernel 2
  2101. forward try kernel 6
  2102. ... seems valid
  2103. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2104. forward try kernel 7
  2105. ... seems valid
  2106. ForwardAuto: kernel 7 234ms
  2107. forward kernel 0: cannot be used
  2108. forward kernel 1 time: 0ms
  2109. forward kernel 2 time: 0ms
  2110. forward kernel 3 time: 0ms
  2111. forward kernel 4 time: 0ms
  2112. forward kernel 5 time: 0ms
  2113. forward kernel 6: cannot be used
  2114. forward kernel 7 time: 234ms
  2115. forward layer selected kernel 1
  2116. loss, E, 4.07297e-10
  2117. loss, E, 2.30926e-14
  2118. loss, E, 3.9968e-15
  2119. loss, E, 3.9968e-15
  2120. loss, E, 1.55431e-14
  2121. accuracy: 6/6 100%
  2122. accuracy: 6/6
  2123. loss, E, 1.55431e-14
  2124. clblas teardown
  2125. [ OK ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6 (5398 ms)
  2126. [ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6
  2127. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2128. Using OpenCL device: Tahiti
  2129. initializing clblas
  2130. forward try kernel 0
  2131. ... not plausibly optimal, skipping
  2132. forward try kernel 1
  2133. ... seems valid
  2134. ForwardAuto: kernel 1 0ms
  2135. forward try kernel 0
  2136. ... not plausibly optimal, skipping
  2137. forward try kernel 1
  2138. ... seems valid
  2139. ForwardAuto: kernel 1 0ms
  2140. backward try kernel 0
  2141. ... not plausibly optimal, skipping
  2142. backward try kernel 1
  2143. ... seems valid
  2144. BackwardAuto: kernel 1 0ms
  2145. calcGradWeights try kernel 0
  2146. ... not plausibly optimal, skipping
  2147. calcGradWeights try kernel 1
  2148. ... seems valid
  2149. BackpropWeightsAuto: kernel 1 0ms
  2150. calcGradWeights try kernel 0
  2151. ... not plausibly optimal, skipping
  2152. calcGradWeights try kernel 1
  2153. ... seems valid
  2154. BackpropWeightsAuto: kernel 1 0ms
  2155. loss, E, 4.00796
  2156. forward try kernel 2
  2157. ... seems valid
  2158. ForwardAuto: kernel 2 0ms
  2159. forward try kernel 2
  2160. ... seems valid
  2161. ForwardAuto: kernel 2 0ms
  2162. backward try kernel 2
  2163. ... seems valid
  2164. BackwardAuto: kernel 2 0ms
  2165. calcGradWeights try kernel 2
  2166. ... seems valid
  2167. BackpropWeightsAuto: kernel 2 0ms
  2168. calcGradWeights try kernel 2
  2169. ... seems valid
  2170. BackpropWeightsAuto: kernel 2 0ms
  2171. forward try kernel 3
  2172. ... seems valid
  2173. ForwardAuto: kernel 3 0ms
  2174. forward try kernel 3
  2175. ... seems valid
  2176. ForwardAuto: kernel 3 0ms
  2177. backward try kernel 3
  2178. ... seems valid
  2179. BackwardAuto: kernel 3 280ms
  2180. calcGradWeights try kernel 3
  2181. options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2182. ... seems valid
  2183. BackpropWeightsAuto: kernel 3 16ms
  2184. calcGradWeights try kernel 3
  2185. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  2186. ... seems valid
  2187. BackpropWeightsAuto: kernel 3 0ms
  2188. forward try kernel 4
  2189. ... seems valid
  2190. ForwardAuto: kernel 4 0ms
  2191. forward try kernel 4
  2192. ... seems valid
  2193. ForwardAuto: kernel 4 0ms
  2194. backward kernel 0: cannot be used
  2195. backward kernel 1 time: 0ms
  2196. backward kernel 2 time: 0ms
  2197. backward kernel 3 time: 280ms
  2198. backward layer selected kernel 1
  2199. calcGradWeights try kernel 4
  2200. ... seems valid
  2201. BackpropWeightsAuto: kernel 4 499ms
  2202. calcGradWeights try kernel 4
  2203. ... seems valid
  2204. BackpropWeightsAuto: kernel 4 640ms
  2205. forward try kernel 5
  2206. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  2207. ... not valid
  2208. forward try kernel 6
  2209. ... seems valid
  2210. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2211. forward try kernel 7
  2212. ... seems valid
  2213. ForwardAuto: kernel 7 437ms
  2214. forward try kernel 5
  2215. cl/forward_fc_wgperrow.cl build log:
  2216. "C:\Users\pz\AppData\Local\Temp\OCLA243.tmp.cl", line 75: warning: variable
  2217. "loopsPerExample" was declared but never referenced
  2218. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2219. ^
  2220.  
  2221.  
  2222. ... seems valid
  2223. ForwardAuto: kernel 5 0ms
  2224. calcGradWeights kernel 0: cannot be used
  2225. calcGradWeights kernel 1 time: 0ms
  2226. calcGradWeights kernel 2 time: 0ms
  2227. calcGradWeights kernel 3 time: 16ms
  2228. calcGradWeights kernel 4 time: 499ms
  2229. calcGradWeights layer selected kernel 1
  2230. calcGradWeights kernel 0: cannot be used
  2231. calcGradWeights kernel 1 time: 0ms
  2232. calcGradWeights kernel 2 time: 0ms
  2233. calcGradWeights kernel 3 time: 0ms
  2234. calcGradWeights kernel 4 time: 640ms
  2235. calcGradWeights layer selected kernel 1
  2236. forward kernel 0: cannot be used
  2237. forward kernel 1 time: 0ms
  2238. forward kernel 2 time: 0ms
  2239. forward kernel 3 time: 0ms
  2240. forward kernel 4 time: 0ms
  2241. forward kernel 5: cannot be used
  2242. forward kernel 6: cannot be used
  2243. forward kernel 7 time: 437ms
  2244. forward layer selected kernel 1
  2245. forward try kernel 6
  2246. ... seems valid
  2247. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2248. forward try kernel 7
  2249. ... seems valid
  2250. ForwardAuto: kernel 7 249ms
  2251. forward kernel 0: cannot be used
  2252. forward kernel 1 time: 0ms
  2253. forward kernel 2 time: 0ms
  2254. forward kernel 3 time: 0ms
  2255. forward kernel 4 time: 0ms
  2256. forward kernel 5 time: 0ms
  2257. forward kernel 6: cannot be used
  2258. forward kernel 7 time: 249ms
  2259. forward layer selected kernel 1
  2260. loss, E, 1.87712e-08
  2261. loss, E, 5.01821e-14
  2262. loss, E, 8.88178e-15
  2263. accuracy: 6/6 100%
  2264. accuracy: 6/6
  2265. loss, E, 8.88178e-15
  2266. clblas teardown
  2267. [ OK ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6 (5304 ms)
  2268. [ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
  2269. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2270. Using OpenCL device: Tahiti
  2271. initializing clblas
  2272. forward try kernel 0
  2273. ... not plausibly optimal, skipping
  2274. forward try kernel 1
  2275. ... seems valid
  2276. ForwardAuto: kernel 1 0ms
  2277. forward try kernel 0
  2278. ... not plausibly optimal, skipping
  2279. forward try kernel 1
  2280. ... seems valid
  2281. ForwardAuto: kernel 1 0ms
  2282. backward try kernel 0
  2283. ... not plausibly optimal, skipping
  2284. backward try kernel 1
  2285. ... seems valid
  2286. BackwardAuto: kernel 1 0ms
  2287. calcGradWeights try kernel 0
  2288. ... not plausibly optimal, skipping
  2289. calcGradWeights try kernel 1
  2290. ... seems valid
  2291. BackpropWeightsAuto: kernel 1 0ms
  2292. calcGradWeights try kernel 0
  2293. ... not plausibly optimal, skipping
  2294. calcGradWeights try kernel 1
  2295. ... seems valid
  2296. BackpropWeightsAuto: kernel 1 0ms
  2297. loss, E, 17.9931
  2298. forward try kernel 2
  2299. ... seems valid
  2300. ForwardAuto: kernel 2 0ms
  2301. forward try kernel 2
  2302. ... seems valid
  2303. ForwardAuto: kernel 2 15ms
  2304. backward try kernel 2
  2305. ... seems valid
  2306. BackwardAuto: kernel 2 0ms
  2307. calcGradWeights try kernel 2
  2308. ... seems valid
  2309. BackpropWeightsAuto: kernel 2 16ms
  2310. calcGradWeights try kernel 2
  2311. ... seems valid
  2312. BackpropWeightsAuto: kernel 2 0ms
  2313. forward try kernel 3
  2314. ... seems valid
  2315. ForwardAuto: kernel 3 0ms
  2316. forward try kernel 3
  2317. ... seems valid
  2318. ForwardAuto: kernel 3 0ms
  2319. backward try kernel 3
  2320. ... seems valid
  2321. BackwardAuto: kernel 3 281ms
  2322. calcGradWeights try kernel 3
  2323. options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2324. ... seems valid
  2325. BackpropWeightsAuto: kernel 3 0ms
  2326. calcGradWeights try kernel 3
  2327. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  2328. ... seems valid
  2329. BackpropWeightsAuto: kernel 3 0ms
  2330. forward try kernel 4
  2331. ... seems valid
  2332. ForwardAuto: kernel 4 0ms
  2333. forward try kernel 4
  2334. ... seems valid
  2335. ForwardAuto: kernel 4 0ms
  2336. backward kernel 0: cannot be used
  2337. backward kernel 1 time: 0ms
  2338. backward kernel 2 time: 0ms
  2339. backward kernel 3 time: 281ms
  2340. backward layer selected kernel 1
  2341. calcGradWeights try kernel 4
  2342. ... seems valid
  2343. BackpropWeightsAuto: kernel 4 484ms
  2344. calcGradWeights try kernel 4
  2345. ... seems valid
  2346. BackpropWeightsAuto: kernel 4 655ms
  2347. forward try kernel 5
  2348. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  2349. ... not valid
  2350. forward try kernel 6
  2351. ... seems valid
  2352. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2353. forward try kernel 7
  2354. ... seems valid
  2355. ForwardAuto: kernel 7 437ms
  2356. forward try kernel 5
  2357. cl/forward_fc_wgperrow.cl build log:
  2358. "C:\Users\pz\AppData\Local\Temp\OCLB730.tmp.cl", line 75: warning: variable
  2359. "loopsPerExample" was declared but never referenced
  2360. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2361. ^
  2362.  
  2363.  
  2364. ... seems valid
  2365. ForwardAuto: kernel 5 0ms
  2366. calcGradWeights kernel 0: cannot be used
  2367. calcGradWeights kernel 1 time: 0ms
  2368. calcGradWeights kernel 2 time: 16ms
  2369. calcGradWeights kernel 3 time: 0ms
  2370. calcGradWeights kernel 4 time: 484ms
  2371. calcGradWeights layer selected kernel 1
  2372. calcGradWeights kernel 0: cannot be used
  2373. calcGradWeights kernel 1 time: 0ms
  2374. calcGradWeights kernel 2 time: 0ms
  2375. calcGradWeights kernel 3 time: 0ms
  2376. calcGradWeights kernel 4 time: 655ms
  2377. calcGradWeights layer selected kernel 1
  2378. forward kernel 0: cannot be used
  2379. forward kernel 1 time: 0ms
  2380. forward kernel 2 time: 0ms
  2381. forward kernel 3 time: 0ms
  2382. forward kernel 4 time: 0ms
  2383. forward kernel 5: cannot be used
  2384. forward kernel 6: cannot be used
  2385. forward kernel 7 time: 437ms
  2386. forward layer selected kernel 1
  2387. forward try kernel 6
  2388. ... seems valid
  2389. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2390. forward try kernel 7
  2391. ... seems valid
  2392. ForwardAuto: kernel 7 249ms
  2393. forward kernel 0: cannot be used
  2394. forward kernel 1 time: 0ms
  2395. forward kernel 2 time: 15ms
  2396. forward kernel 3 time: 0ms
  2397. forward kernel 4 time: 0ms
  2398. forward kernel 5 time: 0ms
  2399. forward kernel 6: cannot be used
  2400. forward kernel 7 time: 249ms
  2401. forward layer selected kernel 1
  2402. loss, E, 2.93736
  2403. loss, E, 2.74045
  2404. loss, E, 2.72813
  2405. loss, E, 2.72734
  2406. loss, E, 2.72728
  2407. loss, E, 2.72727
  2408. loss, E, 2.72727
  2409. loss, E, 2.72727
  2410. loss, E, 2.72727
  2411. loss, E, 2.72727
  2412. loss, E, 2.72727
  2413. loss, E, 2.72727
  2414. loss, E, 2.72727
  2415. loss, E, 2.72727
  2416. loss, E, 2.72727
  2417. loss, E, 2.72727
  2418. loss, E, 2.72727
  2419. loss, E, 2.72727
  2420. loss, E, 2.72727
  2421. loss, E, 2.72727
  2422. loss, E, 2.72727
  2423. loss, E, 2.72727
  2424. loss, E, 2.72727
  2425. loss, E, 2.72727
  2426. loss, E, 2.72727
  2427. loss, E, 2.72727
  2428. loss, E, 2.72727
  2429. loss, E, 2.72727
  2430. loss, E, 2.72727
  2431. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  2432. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=3 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
  2433. layer 2:ActivationLayer{ RELU }
  2434. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=3 numFilters=3 filterSize=3 outputSize=1 padZeros=0 biased=1 skip=0} }
  2435. layer 4:SquareLossLayer{}
  2436. Parameters overview: (skipping 3 layers with 0 params)
  2437. layer 1: params=30 26.3%
  2438. layer 3: params=84 73.7%
  2439. TOTAL : params=114
  2440. loss, E, 2.72727
  2441. accuracy: 13/18 72.2222%
  2442. accuracy: 13/18
  2443. C:\Users\pz\Documents\ml\DeepCL\test\testsimpleconvolvenet.cpp(1055): error: Value of: N
  2444. Actual: 18
  2445. Expected: numCorrect
  2446. Which is: 13
  2447. loss, E, 2.72727
  2448. C:\Users\pz\Documents\ml\DeepCL\test\testsimpleconvolvenet.cpp(1059): error: Expected: (0.1f) >= (loss), actual: 0.1 vs 2.72727
  2449. clblas teardown
  2450. [ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18 (11310 ms)
  2451. [----------] 12 tests from testsimpleconvolvenet (47822 ms total)
  2452.  
  2453. [----------] 3 tests from testlogicaloperators
  2454. [ RUN ] testlogicaloperators.Convolve_1layer_biased_And
  2455. And
  2456. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2457. Using OpenCL device: Tahiti
  2458. initializing clblas
  2459. forward try kernel 0
  2460. ... not plausibly optimal, skipping
  2461. forward try kernel 1
  2462. ... seems valid
  2463. ForwardAuto: kernel 1 0ms
  2464. calcGradWeights try kernel 0
  2465. ... not plausibly optimal, skipping
  2466. calcGradWeights try kernel 1
  2467. ... seems valid
  2468. BackpropWeightsAuto: kernel 1 0ms
  2469. Loss L 2.13088
  2470. forward try kernel 2
  2471. ... seems valid
  2472. ForwardAuto: kernel 2 0ms
  2473. calcGradWeights try kernel 2
  2474. ... seems valid
  2475. BackpropWeightsAuto: kernel 2 0ms
  2476. forward try kernel 3
  2477. ... seems valid
  2478. ForwardAuto: kernel 3 0ms
  2479. calcGradWeights try kernel 3
  2480. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2481. ... seems valid
  2482. BackpropWeightsAuto: kernel 3 0ms
  2483. forward try kernel 4
  2484. ... seems valid
  2485. ForwardAuto: kernel 4 0ms
  2486. calcGradWeights try kernel 4
  2487. ... seems valid
  2488. BackpropWeightsAuto: kernel 4 421ms
  2489. forward try kernel 5
  2490. cl/forward_fc_wgperrow.cl build log:
  2491. "C:\Users\pz\AppData\Local\Temp\OCLD91E.tmp.cl", line 75: warning: variable
  2492. "loopsPerExample" was declared but never referenced
  2493. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2494. ^
  2495.  
  2496.  
  2497. ... seems valid
  2498. ForwardAuto: kernel 5 0ms
  2499. calcGradWeights kernel 0: cannot be used
  2500. calcGradWeights kernel 1 time: 0ms
  2501. calcGradWeights kernel 2 time: 0ms
  2502. calcGradWeights kernel 3 time: 0ms
  2503. calcGradWeights kernel 4 time: 421ms
  2504. calcGradWeights layer selected kernel 1
  2505. forward try kernel 6
  2506. ... seems valid
  2507. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2508. forward try kernel 7
  2509. ... seems valid
  2510. ForwardAuto: kernel 7 203ms
  2511. Loss L 0.679527
  2512. forward kernel 0: cannot be used
  2513. forward kernel 1 time: 0ms
  2514. forward kernel 2 time: 0ms
  2515. forward kernel 3 time: 0ms
  2516. forward kernel 4 time: 0ms
  2517. forward kernel 5 time: 0ms
  2518. forward kernel 6: cannot be used
  2519. forward kernel 7 time: 203ms
  2520. forward layer selected kernel 1
  2521. Loss L 0.398398
  2522. Loss L 0.301735
  2523. accuracy: 4/4
  2524. loss, E, 0.27227
  2525. clblas teardown
  2526. [ OK ] testlogicaloperators.Convolve_1layer_biased_And (1919 ms)
  2527. [ RUN ] testlogicaloperators.Convolve_1layerbiased_Or
  2528. Or, convolve
  2529. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2530. Using OpenCL device: Tahiti
  2531. initializing clblas
  2532. forward try kernel 0
  2533. ... not plausibly optimal, skipping
  2534. forward try kernel 1
  2535. ... seems valid
  2536. ForwardAuto: kernel 1 15ms
  2537. calcGradWeights try kernel 0
  2538. ... not plausibly optimal, skipping
  2539. calcGradWeights try kernel 1
  2540. ... seems valid
  2541. BackpropWeightsAuto: kernel 1 0ms
  2542. Loss L 5.59056
  2543. forward try kernel 2
  2544. ... seems valid
  2545. ForwardAuto: kernel 2 0ms
  2546. calcGradWeights try kernel 2
  2547. ... seems valid
  2548. BackpropWeightsAuto: kernel 2 0ms
  2549. forward try kernel 3
  2550. ... seems valid
  2551. ForwardAuto: kernel 3 0ms
  2552. calcGradWeights try kernel 3
  2553. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2554. ... seems valid
  2555. BackpropWeightsAuto: kernel 3 0ms
  2556. forward try kernel 4
  2557. ... seems valid
  2558. ForwardAuto: kernel 4 0ms
  2559. calcGradWeights try kernel 4
  2560. ... seems valid
  2561. BackpropWeightsAuto: kernel 4 422ms
  2562. forward try kernel 5
  2563. cl/forward_fc_wgperrow.cl build log:
  2564. "C:\Users\pz\AppData\Local\Temp\OCLE0B6.tmp.cl", line 75: warning: variable
  2565. "loopsPerExample" was declared but never referenced
  2566. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2567. ^
  2568.  
  2569.  
  2570. ... seems valid
  2571. ForwardAuto: kernel 5 0ms
  2572. calcGradWeights kernel 0: cannot be used
  2573. calcGradWeights kernel 1 time: 0ms
  2574. calcGradWeights kernel 2 time: 0ms
  2575. calcGradWeights kernel 3 time: 0ms
  2576. calcGradWeights kernel 4 time: 422ms
  2577. calcGradWeights layer selected kernel 1
  2578. forward try kernel 6
  2579. ... seems valid
  2580. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2581. forward try kernel 7
  2582. ... seems valid
  2583. ForwardAuto: kernel 7 203ms
  2584. Loss L 1.22162
  2585. forward kernel 0: cannot be used
  2586. forward kernel 1 time: 15ms
  2587. forward kernel 2 time: 0ms
  2588. forward kernel 3 time: 0ms
  2589. forward kernel 4 time: 0ms
  2590. forward kernel 5 time: 0ms
  2591. forward kernel 6: cannot be used
  2592. forward kernel 7 time: 203ms
  2593. forward layer selected kernel 2
  2594. Loss L 0.583397
  2595. Loss L 0.366216
  2596. accuracy: 4/4 100%
  2597. loss, E, 0.300027
  2598. clblas teardown
  2599. [ OK ] testlogicaloperators.Convolve_1layerbiased_Or (1934 ms)
  2600. [ RUN ] testlogicaloperators.Convolve_2layers_relu_Xor
  2601. Xor, convolve
  2602. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2603. Using OpenCL device: Tahiti
  2604. initializing clblas
  2605. hand-setting weights...
  2606. forward try kernel 0
  2607. ... not plausibly optimal, skipping
  2608. forward try kernel 1
  2609. ... seems valid
  2610. ForwardAuto: kernel 1 0ms
  2611. forward try kernel 0
  2612. ... not plausibly optimal, skipping
  2613. forward try kernel 1
  2614. ... seems valid
  2615. ForwardAuto: kernel 1 0ms
  2616. backward try kernel 0
  2617. ... not plausibly optimal, skipping
  2618. backward try kernel 1
  2619. ... seems valid
  2620. BackwardAuto: kernel 1 0ms
  2621. calcGradWeights try kernel 0
  2622. ... not plausibly optimal, skipping
  2623. calcGradWeights try kernel 1
  2624. ... seems valid
  2625. BackpropWeightsAuto: kernel 1 0ms
  2626. calcGradWeights try kernel 0
  2627. ... not plausibly optimal, skipping
  2628. calcGradWeights try kernel 1
  2629. ... seems valid
  2630. BackpropWeightsAuto: kernel 1 0ms
  2631. Loss L 0.152638
  2632. forward try kernel 2
  2633. ... seems valid
  2634. ForwardAuto: kernel 2 0ms
  2635. forward try kernel 2
  2636. ... seems valid
  2637. ForwardAuto: kernel 2 15ms
  2638. backward try kernel 2
  2639. ... seems valid
  2640. BackwardAuto: kernel 2 0ms
  2641. calcGradWeights try kernel 2
  2642. ... seems valid
  2643. BackpropWeightsAuto: kernel 2 0ms
  2644. calcGradWeights try kernel 2
  2645. ... seems valid
  2646. BackpropWeightsAuto: kernel 2 0ms
  2647. forward try kernel 3
  2648. ... seems valid
  2649. ForwardAuto: kernel 3 0ms
  2650. forward try kernel 3
  2651. ... seems valid
  2652. ForwardAuto: kernel 3 0ms
  2653. backward try kernel 3
  2654. ... seems valid
  2655. BackwardAuto: kernel 3 219ms
  2656. calcGradWeights try kernel 3
  2657. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2658. ... seems valid
  2659. BackpropWeightsAuto: kernel 3 0ms
  2660. calcGradWeights try kernel 3
  2661. options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  2662. ... seems valid
  2663. BackpropWeightsAuto: kernel 3 0ms
  2664. forward try kernel 4
  2665. ... seems valid
  2666. ForwardAuto: kernel 4 0ms
  2667. forward try kernel 4
  2668. ... seems valid
  2669. ForwardAuto: kernel 4 0ms
  2670. backward kernel 0: cannot be used
  2671. backward kernel 1 time: 0ms
  2672. backward kernel 2 time: 0ms
  2673. backward kernel 3 time: 219ms
  2674. backward layer selected kernel 1
  2675. calcGradWeights try kernel 4
  2676. ... seems valid
  2677. BackpropWeightsAuto: kernel 4 421ms
  2678. calcGradWeights try kernel 4
  2679. ... seems valid
  2680. BackpropWeightsAuto: kernel 4 78ms
  2681. forward try kernel 5
  2682. cl/forward_fc_wgperrow.cl build log:
  2683. "C:\Users\pz\AppData\Local\Temp\OCLECF3.tmp.cl", line 75: warning: variable
  2684. "loopsPerExample" was declared but never referenced
  2685. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2686. ^
  2687.  
  2688.  
  2689. ... seems valid
  2690. ForwardAuto: kernel 5 0ms
  2691. forward try kernel 5
  2692. cl/forward_fc_wgperrow.cl build log:
  2693. "C:\Users\pz\AppData\Local\Temp\OCLED42.tmp.cl", line 75: warning: variable
  2694. "loopsPerExample" was declared but never referenced
  2695. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  2696. ^
  2697.  
  2698.  
  2699. ... seems valid
  2700. ForwardAuto: kernel 5 0ms
  2701. calcGradWeights kernel 0: cannot be used
  2702. calcGradWeights kernel 1 time: 0ms
  2703. calcGradWeights kernel 2 time: 0ms
  2704. calcGradWeights kernel 3 time: 0ms
  2705. calcGradWeights kernel 4 time: 421ms
  2706. calcGradWeights layer selected kernel 1
  2707. calcGradWeights kernel 0: cannot be used
  2708. calcGradWeights kernel 1 time: 0ms
  2709. calcGradWeights kernel 2 time: 0ms
  2710. calcGradWeights kernel 3 time: 0ms
  2711. calcGradWeights kernel 4 time: 78ms
  2712. calcGradWeights layer selected kernel 1
  2713. forward try kernel 6
  2714. ... seems valid
  2715. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2716. forward try kernel 7
  2717. ... seems valid
  2718. ForwardAuto: kernel 7 187ms
  2719. forward try kernel 6
  2720. ... seems valid
  2721. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  2722. forward try kernel 7
  2723. ... seems valid
  2724. ForwardAuto: kernel 7 78ms
  2725. Loss L 0.00640068
  2726. forward kernel 0: cannot be used
  2727. forward kernel 1 time: 0ms
  2728. forward kernel 2 time: 0ms
  2729. forward kernel 3 time: 0ms
  2730. forward kernel 4 time: 0ms
  2731. forward kernel 5 time: 0ms
  2732. forward kernel 6: cannot be used
  2733. forward kernel 7 time: 187ms
  2734. forward layer selected kernel 1
  2735. forward kernel 0: cannot be used
  2736. forward kernel 1 time: 0ms
  2737. forward kernel 2 time: 15ms
  2738. forward kernel 3 time: 0ms
  2739. forward kernel 4 time: 0ms
  2740. forward kernel 5 time: 0ms
  2741. forward kernel 6: cannot be used
  2742. forward kernel 7 time: 78ms
  2743. forward layer selected kernel 1
  2744. Loss L 0.00139435
  2745. Loss L 0.000383307
  2746. Loss L 0.000117079
  2747. Loss L 4.63626e-05
  2748. Loss L 1.8873e-05
  2749. Loss L 7.15534e-06
  2750. Loss L 2.83958e-06
  2751. Loss L 1.12727e-06
  2752. Loss L 4.44109e-07
  2753. Loss L 1.72233e-07
  2754. Loss L 6.82345e-08
  2755. Loss L 2.76343e-08
  2756. Loss L 1.04286e-08
  2757. Loss L 4.13357e-09
  2758. Loss L 1.67201e-09
  2759. Loss L 6.29148e-10
  2760. Loss L 2.4837e-10
  2761. Loss L 1.00833e-10
  2762. Loss L 3.80673e-11
  2763. Loss L 1.5131e-11
  2764. Loss L 5.84421e-12
  2765. Loss L 2.16893e-12
  2766. Loss L 9.52127e-13
  2767. Loss L 3.58824e-13
  2768. Loss L 1.56319e-13
  2769. Loss L 9.9476e-14
  2770. Loss L 9.9476e-14
  2771. Loss L 9.9476e-14
  2772. Loss L 9.9476e-14
  2773. Loss L 9.23706e-14
  2774. Loss L 9.23706e-14
  2775. Loss L 9.41469e-14
  2776. Loss L 8.70415e-14
  2777. Loss L 9.41469e-14
  2778. Loss L 8.52651e-14
  2779. Loss L 8.52651e-14
  2780. Loss L 8.52651e-14
  2781. Loss L 8.52651e-14
  2782. layer 0:InputLayer{ outputPlanes=2 outputSize=1 }
  2783. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  2784. layer 2:ActivationLayer{ RELU }
  2785. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
  2786. layer 4:ActivationLayer{ RELU }
  2787. layer 5:SquareLossLayer{}
  2788. Parameters overview: (skipping 4 layers with 0 params)
  2789. layer 1: params=6 50.0%
  2790. layer 3: params=6 50.0%
  2791. TOTAL : params=12
  2792. accuracy: 4/4 100%
  2793. loss, E, 8.52651e-14
  2794. clblas teardown
  2795. [ OK ] testlogicaloperators.Convolve_2layers_relu_Xor (3916 ms)
  2796. [----------] 3 tests from testlogicaloperators (7769 ms total)
  2797.  
  2798. [----------] 12 tests from testbackward
  2799. [ RUN ] testbackward.squareloss
  2800. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2801. Using OpenCL device: Tahiti
  2802. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  2803. layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
  2804. layer 2:SquareLossLayer{}
  2805.  
  2806. inputtotalsize=2400 outputTotalSize=2400
  2807. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  2808. layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
  2809. layer 2:SquareLossLayer{}
  2810. Parameters overview: (skipping 3 layers with 0 params)
  2811. TOTAL : params=0
  2812. idx=44 predicted losschange=-0.000912508 actual=-0.000976563
  2813. idx=2245 predicted losschange=0.00785823 actual=0.00805664
  2814. idx=648 predicted losschange=0.00965759 actual=0.00976563
  2815. idx=586 predicted losschange=0.0136895 actual=0.0136719
  2816. idx=730 predicted losschange=0.00117897 actual=0.00146484
  2817. idx=611 predicted losschange=0.00152302 actual=0.00195313
  2818. idx=1130 predicted losschange=0.0159167 actual=0.0161133
  2819. idx=15 predicted losschange=0.0434798 actual=0.0439453
  2820. idx=1923 predicted losschange=-0.00790002 actual=-0.0078125
  2821. idx=670 predicted losschange=0.0335141 actual=0.0336914
  2822. [ OK ] testbackward.squareloss (15 ms)
  2823. [ RUN ] testbackward.crossentropyloss
  2824. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2825. Using OpenCL device: Tahiti
  2826. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  2827. layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
  2828. layer 2:Layer{}
  2829.  
  2830. inputtotalsize=300 outputTotalSize=300
  2831. layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
  2832. layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
  2833. layer 2:Layer{}
  2834. Parameters overview: (skipping 3 layers with 0 params)
  2835. TOTAL : params=0
  2836. idx=44 predicted losschange=0.000274935 actual=0.000274658
  2837. idx=145 predicted losschange=-0.000885784 actual=-0.00088501
  2838. idx=48 predicted losschange=-0.000859834 actual=-0.000854492
  2839. idx=286 predicted losschange=0.00713042 actual=0.00717163
  2840. idx=130 predicted losschange=-0.000264829 actual=-0.000244141
  2841. idx=11 predicted losschange=-1.98163e-05 actual=0
  2842. idx=230 predicted losschange=-0.000594819 actual=-0.000610352
  2843. idx=15 predicted losschange=-0.0006499 actual=-0.000640869
  2844. idx=123 predicted losschange=-0.000846121 actual=-0.000823975
  2845. idx=70 predicted losschange=0.000790196 actual=0.000793457
  2846. [ OK ] testbackward.crossentropyloss (16 ms)
  2847. [ RUN ] testbackward.softmaxloss
  2848. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2849. Using OpenCL device: Tahiti
  2850. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2851. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2852. layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2853.  
  2854. inputtotalsize=10 outputTotalSize=10
  2855. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2856. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2857. layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2858. Parameters overview: (skipping 3 layers with 0 params)
  2859. TOTAL : params=0
  2860. idx=4 predicted losschange=0.000113075 actual=0.00011301
  2861. idx=5 predicted losschange=0.000145627 actual=0.000145674
  2862. idx=8 predicted losschange=3.16699e-05 actual=3.19481e-05
  2863. idx=6 predicted losschange=4.89271e-06 actual=5.24521e-06
  2864. idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
  2865. idx=1 predicted losschange=-8.26119e-05 actual=-8.27312e-05
  2866. idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
  2867. idx=5 predicted losschange=0.000145627 actual=0.000145674
  2868. idx=3 predicted losschange=-5.50179e-05 actual=-5.50747e-05
  2869. idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
  2870. [ OK ] testbackward.softmaxloss (0 ms)
  2871. [ RUN ] testbackward.squareloss2
  2872. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2873. Using OpenCL device: Tahiti
  2874. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2875. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2876. layer 2:SquareLossLayer{}
  2877.  
  2878. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2879. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2880. layer 2:SquareLossLayer{}
  2881.  
  2882. batchSize: 32
  2883. inputtotalsize=160 outputTotalSize=160
  2884. layer SquareLossLayer{}
  2885. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2886. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2887. layer 2:SquareLossLayer{}
  2888. Parameters overview: (skipping 3 layers with 0 params)
  2889. TOTAL : params=0
  2890. idx=44 predicted losschange=0.000126406 actual=0.000125885
  2891. idx=5 predicted losschange=0.00461891 actual=0.00464439
  2892. idx=8 predicted losschange=0.000356787 actual=0.000356674
  2893. idx=106 predicted losschange=0.00716324 actual=0.00719643
  2894. idx=90 predicted losschange=0.000474759 actual=0.000480652
  2895. idx=131 predicted losschange=0.000979017 actual=0.000984192
  2896. idx=10 predicted losschange=0.000660134 actual=0.000663757
  2897. idx=15 predicted losschange=0.00961313 actual=0.00965118
  2898. idx=3 predicted losschange=0.00264732 actual=0.00267029
  2899. idx=30 predicted losschange=0.00865312 actual=0.00868607
  2900. [ OK ] testbackward.squareloss2 (31 ms)
  2901. [ RUN ] testbackward.crossentropy2
  2902. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2903. Using OpenCL device: Tahiti
  2904. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2905. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2906. layer 2:Layer{}
  2907.  
  2908. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2909. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2910. layer 2:Layer{}
  2911.  
  2912. batchSize: 2
  2913. inputtotalsize=10 outputTotalSize=10
  2914. layer Layer{}
  2915. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2916. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2917. layer 2:Layer{}
  2918. Parameters overview: (skipping 3 layers with 0 params)
  2919. TOTAL : params=0
  2920. idx=4 predicted losschange=0.00258649 actual=-nan(ind)
  2921. idx=5 predicted losschange=0.0227095 actual=-nan(ind)
  2922. idx=8 predicted losschange=-0.00202714 actual=-nan(ind)
  2923. idx=6 predicted losschange=-0.000846508 actual=-nan(ind)
  2924. idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
  2925. idx=1 predicted losschange=-0.00171216 actual=-nan(ind)
  2926. idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
  2927. idx=5 predicted losschange=0.0227095 actual=-nan(ind)
  2928. idx=3 predicted losschange=0.0123444 actual=-nan(ind)
  2929. idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
  2930. [ OK ] testbackward.crossentropy2 (31 ms)
  2931. [ RUN ] testbackward.softmax2
  2932. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2933. Using OpenCL device: Tahiti
  2934. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2935. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2936. layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2937.  
  2938. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2939. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2940. layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2941.  
  2942. batchSize: 2
  2943. inputtotalsize=10 outputTotalSize=10
  2944. layer SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2945. layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
  2946. layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
  2947. layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  2948. Parameters overview: (skipping 3 layers with 0 params)
  2949. TOTAL : params=0
  2950. idx=4 predicted losschange=0.00035729 actual=0.000357628
  2951. idx=5 predicted losschange=0.0015055 actual=0.00151086
  2952. idx=8 predicted losschange=-5.63632e-05 actual=-5.65052e-05
  2953. idx=6 predicted losschange=-1.48864e-05 actual=-1.4782e-05
  2954. idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
  2955. idx=1 predicted losschange=-0.000287167 actual=-0.000287056
  2956. idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
  2957. idx=5 predicted losschange=0.0015055 actual=0.00151086
  2958. idx=3 predicted losschange=-0.000152824 actual=-0.00014782
  2959. idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
  2960. [ OK ] testbackward.softmax2 (63 ms)
  2961. [ RUN ] testbackward.conv1
  2962. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  2963. Using OpenCL device: Tahiti
  2964. initializing clblas
  2965. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  2966. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  2967. layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
  2968. layer 3:SquareLossLayer{}
  2969.  
  2970. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  2971. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  2972. layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
  2973. layer 3:SquareLossLayer{}
  2974.  
  2975. batchSize: 4
  2976. inputtotalsize=128 outputTotalSize=32
  2977. layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
  2978. forward try kernel 0
  2979. ... not plausibly optimal, skipping
  2980. forward try kernel 1
  2981. ... seems valid
  2982. ForwardAuto: kernel 1 0ms
  2983. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  2984. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  2985. layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
  2986. layer 3:SquareLossLayer{}
  2987. Parameters overview: (skipping 3 layers with 0 params)
  2988. layer 2: params=36 100.0%
  2989. TOTAL : params=36
  2990. backward try kernel 0
  2991. ... not plausibly optimal, skipping
  2992. backward try kernel 1
  2993. ... seems valid
  2994. BackwardAuto: kernel 1 0ms
  2995. calcGradWeights try kernel 0
  2996. ... not plausibly optimal, skipping
  2997. calcGradWeights try kernel 1
  2998. ... seems valid
  2999. BackpropWeightsAuto: kernel 1 0ms
  3000. forward try kernel 2
  3001. ... seems valid
  3002. ForwardAuto: kernel 2 16ms
  3003. idx=44 predicted losschange=0.000198655 actual=0.000199318
  3004. forward try kernel 3
  3005. ... seems valid
  3006. ForwardAuto: kernel 3 0ms
  3007. idx=37 predicted losschange=-0.00664573 actual=-0.00663185
  3008. forward try kernel 4
  3009. ... seems valid
  3010. ForwardAuto: kernel 4 0ms
  3011. idx=40 predicted losschange=0.00305358 actual=0.00306416
  3012. forward try kernel 5
  3013. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3014. ... not valid
  3015. forward try kernel 6
  3016. ... seems valid
  3017. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3018. forward try kernel 7
  3019. ... seems valid
  3020. ForwardAuto: kernel 7 266ms
  3021. idx=106 predicted losschange=0.000651619 actual=0.00306416
  3022. forward kernel 0: cannot be used
  3023. forward kernel 1 time: 0ms
  3024. forward kernel 2 time: 16ms
  3025. forward kernel 3 time: 0ms
  3026. forward kernel 4 time: 0ms
  3027. forward kernel 5: cannot be used
  3028. forward kernel 6: cannot be used
  3029. forward kernel 7 time: 266ms
  3030. forward layer selected kernel 1
  3031. idx=122 predicted losschange=0.0040653 actual=0.00407314
  3032. idx=99 predicted losschange=-0.000240484 actual=-0.00024128
  3033. idx=10 predicted losschange=0.00158175 actual=0.00158405
  3034. idx=47 predicted losschange=0.00140132 actual=0.00140285
  3035. idx=67 predicted losschange=-0.00154732 actual=-0.00154686
  3036. idx=126 predicted losschange=-0.000393638 actual=-0.000391006
  3037. clblas teardown
  3038. [ OK ] testbackward.conv1 (1294 ms)
  3039. [ RUN ] testbackward.fc1
  3040. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3041. Using OpenCL device: Tahiti
  3042. initializing clblas
  3043. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  3044. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  3045. layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
  3046. layer 3:SquareLossLayer{}
  3047.  
  3048. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  3049. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  3050. layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
  3051. layer 3:SquareLossLayer{}
  3052.  
  3053. batchSize: 4
  3054. inputtotalsize=128 outputTotalSize=16
  3055. layer FullyConnectedLayer{ numPlanes=4 imageSize=1 }
  3056. forward try kernel 0
  3057. ... not plausibly optimal, skipping
  3058. forward try kernel 1
  3059. ... seems valid
  3060. ForwardAuto: kernel 1 0ms
  3061. layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
  3062. layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
  3063. layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
  3064. layer 3:SquareLossLayer{}
  3065. Parameters overview: (skipping 3 layers with 0 params)
  3066. layer 2: params=128 100.0%
  3067. TOTAL : params=128
  3068. backward try kernel 0
  3069. ... not plausibly optimal, skipping
  3070. backward try kernel 1
  3071. ... seems valid
  3072. BackwardAuto: kernel 1 0ms
  3073. calcGradWeights try kernel 0
  3074. ... not plausibly optimal, skipping
  3075. calcGradWeights try kernel 1
  3076. ... seems valid
  3077. BackpropWeightsAuto: kernel 1 0ms
  3078. forward try kernel 2
  3079. ... seems valid
  3080. ForwardAuto: kernel 2 0ms
  3081. idx=44 predicted losschange=-2.78137e-06 actual=-2.86102e-06
  3082. forward try kernel 3
  3083. ... seems valid
  3084. ForwardAuto: kernel 3 0ms
  3085. idx=37 predicted losschange=-0.000552869 actual=-0.000545502
  3086. forward try kernel 4
  3087. ... seems valid
  3088. ForwardAuto: kernel 4 0ms
  3089. idx=40 predicted losschange=0.00245549 actual=0.00246334
  3090. forward try kernel 5
  3091. cl/forward_fc_wgperrow.cl build log:
  3092. "C:\Users\pz\AppData\Local\Temp\OCLFB2C.tmp.cl", line 75: warning: variable
  3093. "loopsPerExample" was declared but never referenced
  3094. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  3095. ^
  3096.  
  3097.  
  3098. ... seems valid
  3099. ForwardAuto: kernel 5 0ms
  3100. idx=106 predicted losschange=0.00259146 actual=0.00259662
  3101. forward try kernel 6
  3102. ... seems valid
  3103. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3104. forward try kernel 7
  3105. ... seems valid
  3106. ForwardAuto: kernel 7 250ms
  3107. idx=122 predicted losschange=0.000431057 actual=0.00259662
  3108. forward kernel 0: cannot be used
  3109. forward kernel 1 time: 0ms
  3110. forward kernel 2 time: 0ms
  3111. forward kernel 3 time: 0ms
  3112. forward kernel 4 time: 0ms
  3113. forward kernel 5 time: 0ms
  3114. forward kernel 6: cannot be used
  3115. forward kernel 7 time: 250ms
  3116. forward layer selected kernel 1
  3117. idx=99 predicted losschange=-0.00116097 actual=-0.00116014
  3118. idx=10 predicted losschange=-0.000360866 actual=-0.00036025
  3119. idx=47 predicted losschange=0.000165997 actual=0.000166655
  3120. idx=67 predicted losschange=-0.000468417 actual=-0.000465631
  3121. idx=126 predicted losschange=3.95745e-05 actual=4.1008e-05
  3122. clblas teardown
  3123. [ OK ] testbackward.fc1 (1389 ms)
  3124. [ RUN ] testbackward.act1
  3125. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3126. Using OpenCL device: Tahiti
  3127. layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
  3128. layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
  3129. layer 2:ActivationLayer{ RELU }
  3130. layer 3:SquareLossLayer{}
  3131.  
  3132. layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
  3133. layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
  3134. layer 2:ActivationLayer{ RELU }
  3135. layer 3:SquareLossLayer{}
  3136.  
  3137. batchSize: 1
  3138. inputtotalsize=4 outputTotalSize=4
  3139. layer ActivationLayer{ RELU }
  3140. layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
  3141. layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
  3142. layer 2:ActivationLayer{ RELU }
  3143. layer 3:SquareLossLayer{}
  3144. Parameters overview: (skipping 4 layers with 0 params)
  3145. TOTAL : params=0
  3146. idx=0 predicted losschange=-0.000880961 actual=-0.00088048
  3147. idx=1 predicted losschange=-0.00151209 actual=-0.00151044
  3148. idx=0 predicted losschange=-0.000880961 actual=-0.00088048
  3149. idx=2 predicted losschange=-0.00245153 actual=-0.0024423
  3150. idx=2 predicted losschange=-0.00245153 actual=-0.0024423
  3151. idx=3 predicted losschange=-0.00214455 actual=-0.00212085
  3152. idx=2 predicted losschange=-0.00245153 actual=-0.0024423
  3153. idx=3 predicted losschange=-0.00214455 actual=-0.00212085
  3154. idx=3 predicted losschange=-0.00214455 actual=-0.00212085
  3155. idx=2 predicted losschange=-0.00245153 actual=-0.0024423
  3156. [ OK ] testbackward.act1 (140 ms)
  3157. [ RUN ] testbackward.checknumerically
  3158. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3159. Using OpenCL device: Tahiti
  3160. initializing clblas
  3161. forward try kernel 0
  3162. ... not plausibly optimal, skipping
  3163. forward try kernel 1
  3164. ... seems valid
  3165. ForwardAuto: kernel 1 0ms
  3166. forward try kernel 0
  3167. ... not plausibly optimal, skipping
  3168. forward try kernel 1
  3169. ... seems valid
  3170. ForwardAuto: kernel 1 0ms
  3171. forward try kernel 2
  3172. ... seems valid
  3173. ForwardAuto: kernel 2 0ms
  3174. forward try kernel 2
  3175. ... seems valid
  3176. ForwardAuto: kernel 2 0ms
  3177. backward try kernel 0
  3178. ... not plausibly optimal, skipping
  3179. backward try kernel 1
  3180. ... seems valid
  3181. BackwardAuto: kernel 1 0ms
  3182. calcGradWeights try kernel 0
  3183. ... not plausibly optimal, skipping
  3184. calcGradWeights try kernel 1
  3185. ... seems valid
  3186. BackpropWeightsAuto: kernel 1 0ms
  3187. calcGradWeights try kernel 0
  3188. ... not plausibly optimal, skipping
  3189. calcGradWeights try kernel 1
  3190. ... seems valid
  3191. BackpropWeightsAuto: kernel 1 0ms
  3192. forward try kernel 3
  3193. ... seems valid
  3194. ForwardAuto: kernel 3 0ms
  3195. forward try kernel 3
  3196. ... seems valid
  3197. ForwardAuto: kernel 3 0ms
  3198. loss 0.0986296 loss2 0.0984814 change: 0.000148199
  3199. sumweightsdiff 0.0038507
  3200. loss change 0.000148199
  3201. estimatedLossChangeFromW 0.000148279
  3202. forward try kernel 4
  3203. ... seems valid
  3204. ForwardAuto: kernel 4 0ms
  3205. forward try kernel 4
  3206. ... seems valid
  3207. ForwardAuto: kernel 4 0ms
  3208. forward try kernel 5
  3209. cl/forward_fc_wgperrow.cl build log:
  3210. "C:\Users\pz\AppData\Local\Temp\OCL3D3.tmp.cl", line 75: warning: variable
  3211. "loopsPerExample" was declared but never referenced
  3212. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  3213. ^
  3214.  
  3215.  
  3216. ... seems valid
  3217. ForwardAuto: kernel 5 15ms
  3218. forward try kernel 5
  3219. cl/forward_fc_wgperrow.cl build log:
  3220. "C:\Users\pz\AppData\Local\Temp\OCL422.tmp.cl", line 75: warning: variable
  3221. "loopsPerExample" was declared but never referenced
  3222. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  3223. ^
  3224.  
  3225.  
  3226. ... seems valid
  3227. ForwardAuto: kernel 5 0ms
  3228. backward try kernel 2
  3229. ... seems valid
  3230. BackwardAuto: kernel 2 0ms
  3231. calcGradWeights try kernel 2
  3232. ... seems valid
  3233. BackpropWeightsAuto: kernel 2 0ms
  3234. calcGradWeights try kernel 2
  3235. ... seems valid
  3236. BackpropWeightsAuto: kernel 2 0ms
  3237. forward try kernel 6
  3238. ... seems valid
  3239. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3240. forward try kernel 7
  3241. ... seems valid
  3242. ForwardAuto: kernel 7 156ms
  3243. forward try kernel 6
  3244. ... seems valid
  3245. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3246. forward try kernel 7
  3247. ... seems valid
  3248. ForwardAuto: kernel 7 78ms
  3249. loss 0.0984814 loss2 0.0983336 change: 0.000147872
  3250. sumweightsdiff 0.00384641
  3251. loss change 0.000147872
  3252. estimatedLossChangeFromW 0.000147948
  3253. forward kernel 0: cannot be used
  3254. forward kernel 1 time: 0ms
  3255. forward kernel 2 time: 0ms
  3256. forward kernel 3 time: 0ms
  3257. forward kernel 4 time: 0ms
  3258. forward kernel 5 time: 15ms
  3259. forward kernel 6: cannot be used
  3260. forward kernel 7 time: 156ms
  3261. forward layer selected kernel 1
  3262. forward kernel 0: cannot be used
  3263. forward kernel 1 time: 0ms
  3264. forward kernel 2 time: 0ms
  3265. forward kernel 3 time: 0ms
  3266. forward kernel 4 time: 0ms
  3267. forward kernel 5 time: 0ms
  3268. forward kernel 6: cannot be used
  3269. forward kernel 7 time: 78ms
  3270. forward layer selected kernel 1
  3271. backward try kernel 3
  3272. ... seems valid
  3273. BackwardAuto: kernel 3 140ms
  3274. calcGradWeights try kernel 3
  3275. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  3276. ... seems valid
  3277. BackpropWeightsAuto: kernel 3 0ms
  3278. calcGradWeights try kernel 3
  3279. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  3280. ... seems valid
  3281. BackpropWeightsAuto: kernel 3 0ms
  3282. loss 0.0983336 loss2 0.098186 change: 0.000147544
  3283. sumweightsdiff 0.00384223
  3284. loss change 0.000147544
  3285. estimatedLossChangeFromW 0.000147628
  3286. backward kernel 0: cannot be used
  3287. backward kernel 1 time: 0ms
  3288. backward kernel 2 time: 0ms
  3289. backward kernel 3 time: 140ms
  3290. backward layer selected kernel 1
  3291. calcGradWeights try kernel 4
  3292. ... seems valid
  3293. BackpropWeightsAuto: kernel 4 218ms
  3294. calcGradWeights try kernel 4
  3295. ... seems valid
  3296. BackpropWeightsAuto: kernel 4 78ms
  3297. loss 0.098186 loss2 0.0980388 change: 0.000147216
  3298. sumweightsdiff 0.00383794
  3299. loss change 0.000147216
  3300. estimatedLossChangeFromW 0.000147298
  3301. calcGradWeights kernel 0: cannot be used
  3302. calcGradWeights kernel 1 time: 0ms
  3303. calcGradWeights kernel 2 time: 0ms
  3304. calcGradWeights kernel 3 time: 0ms
  3305. calcGradWeights kernel 4 time: 218ms
  3306. calcGradWeights layer selected kernel 1
  3307. calcGradWeights kernel 0: cannot be used
  3308. calcGradWeights kernel 1 time: 0ms
  3309. calcGradWeights kernel 2 time: 0ms
  3310. calcGradWeights kernel 3 time: 0ms
  3311. calcGradWeights kernel 4 time: 78ms
  3312. calcGradWeights layer selected kernel 1
  3313. loss 0.0980388 loss2 0.0978919 change: 0.000146888
  3314. sumweightsdiff 0.00383377
  3315. loss change 0.000146888
  3316. estimatedLossChangeFromW 0.000146978
  3317. clblas teardown
  3318. [ OK ] testbackward.checknumerically (3027 ms)
  3319. [ RUN ] testbackward.checknumerically_imagesize5_filter3_relu
  3320. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3321. Using OpenCL device: Tahiti
  3322. initializing clblas
  3323. forward try kernel 0
  3324. ... not plausibly optimal, skipping
  3325. forward try kernel 1
  3326. ... seems valid
  3327. ForwardAuto: kernel 1 0ms
  3328. forward try kernel 0
  3329. ... not plausibly optimal, skipping
  3330. forward try kernel 1
  3331. ... seems valid
  3332. ForwardAuto: kernel 1 0ms
  3333. forward try kernel 2
  3334. ... seems valid
  3335. ForwardAuto: kernel 2 0ms
  3336. forward try kernel 2
  3337. ... seems valid
  3338. ForwardAuto: kernel 2 0ms
  3339. backward try kernel 0
  3340. ... not plausibly optimal, skipping
  3341. backward try kernel 1
  3342. ... seems valid
  3343. BackwardAuto: kernel 1 0ms
  3344. calcGradWeights try kernel 0
  3345. ... not plausibly optimal, skipping
  3346. calcGradWeights try kernel 1
  3347. ... seems valid
  3348. BackpropWeightsAuto: kernel 1 0ms
  3349. calcGradWeights try kernel 0
  3350. ... not plausibly optimal, skipping
  3351. calcGradWeights try kernel 1
  3352. ... seems valid
  3353. BackpropWeightsAuto: kernel 1 0ms
  3354. forward try kernel 3
  3355. ... seems valid
  3356. ForwardAuto: kernel 3 0ms
  3357. forward try kernel 3
  3358. ... seems valid
  3359. ForwardAuto: kernel 3 0ms
  3360. loss 630.466 loss2 608.021 change: 22.4443
  3361. sumweightsdiff -0.035685
  3362. loss change 22.4443
  3363. estimatedLossChangeFromW 22.6629
  3364. forward try kernel 4
  3365. ... seems valid
  3366. ForwardAuto: kernel 4 0ms
  3367. forward try kernel 4
  3368. ... seems valid
  3369. ForwardAuto: kernel 4 0ms
  3370. forward try kernel 5
  3371. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3372. ... not valid
  3373. forward try kernel 6
  3374. ... seems valid
  3375. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3376. forward try kernel 7
  3377. ... seems valid
  3378. ForwardAuto: kernel 7 281ms
  3379. forward try kernel 5
  3380. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3381. ... not valid
  3382. forward try kernel 6
  3383. ... seems valid
  3384. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3385. forward try kernel 7
  3386. ... seems valid
  3387. ForwardAuto: kernel 7 94ms
  3388. backward try kernel 2
  3389. ... seems valid
  3390. BackwardAuto: kernel 2 0ms
  3391. calcGradWeights try kernel 2
  3392. ... seems valid
  3393. BackpropWeightsAuto: kernel 2 0ms
  3394. calcGradWeights try kernel 2
  3395. ... seems valid
  3396. BackpropWeightsAuto: kernel 2 0ms
  3397. forward kernel 0: cannot be used
  3398. forward kernel 1 time: 0ms
  3399. forward kernel 2 time: 0ms
  3400. forward kernel 3 time: 0ms
  3401. forward kernel 4 time: 0ms
  3402. forward kernel 5: cannot be used
  3403. forward kernel 6: cannot be used
  3404. forward kernel 7 time: 281ms
  3405. forward layer selected kernel 1
  3406. forward kernel 0: cannot be used
  3407. forward kernel 1 time: 0ms
  3408. forward kernel 2 time: 0ms
  3409. forward kernel 3 time: 0ms
  3410. forward kernel 4 time: 0ms
  3411. forward kernel 5: cannot be used
  3412. forward kernel 6: cannot be used
  3413. forward kernel 7 time: 94ms
  3414. forward layer selected kernel 1
  3415. loss 608.021 loss2 586.349 change: 21.672
  3416. sumweightsdiff -0.0350289
  3417. loss change 21.672
  3418. estimatedLossChangeFromW 21.7974
  3419. backward try kernel 3
  3420. ... seems valid
  3421. BackwardAuto: kernel 3 406ms
  3422. calcGradWeights try kernel 3
  3423. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
  3424. ... seems valid
  3425. BackpropWeightsAuto: kernel 3 0ms
  3426. calcGradWeights try kernel 3
  3427. options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
  3428. ... seems valid
  3429. BackpropWeightsAuto: kernel 3 0ms
  3430. loss 586.349 loss2 565.324 change: 21.025
  3431. sumweightsdiff -0.0345262
  3432. loss change 21.025
  3433. estimatedLossChangeFromW 21.2378
  3434. backward kernel 0: cannot be used
  3435. backward kernel 1 time: 0ms
  3436. backward kernel 2 time: 0ms
  3437. backward kernel 3 time: 406ms
  3438. backward layer selected kernel 1
  3439. calcGradWeights try kernel 4
  3440. ... seems valid
  3441. BackpropWeightsAuto: kernel 4 359ms
  3442. calcGradWeights try kernel 4
  3443. ... seems valid
  3444. BackpropWeightsAuto: kernel 4 109ms
  3445. loss 565.324 loss2 545.133 change: 20.1916
  3446. sumweightsdiff -0.0338754
  3447. loss change 20.1916
  3448. estimatedLossChangeFromW 20.3956
  3449. calcGradWeights kernel 0: cannot be used
  3450. calcGradWeights kernel 1 time: 0ms
  3451. calcGradWeights kernel 2 time: 0ms
  3452. calcGradWeights kernel 3 time: 0ms
  3453. calcGradWeights kernel 4 time: 359ms
  3454. calcGradWeights layer selected kernel 1
  3455. calcGradWeights kernel 0: cannot be used
  3456. calcGradWeights kernel 1 time: 0ms
  3457. calcGradWeights kernel 2 time: 0ms
  3458. calcGradWeights kernel 3 time: 0ms
  3459. calcGradWeights kernel 4 time: 109ms
  3460. calcGradWeights layer selected kernel 1
  3461. loss 545.133 loss2 525.742 change: 19.3912
  3462. sumweightsdiff -0.0332378
  3463. loss change 19.3912
  3464. estimatedLossChangeFromW 19.5872
  3465. loss 525.742 loss2 507.119 change: 18.6229
  3466. sumweightsdiff -0.0326132
  3467. loss change 18.6229
  3468. estimatedLossChangeFromW 18.8111
  3469. loss 507.119 loss2 489.233 change: 17.8853
  3470. sumweightsdiff -0.032001
  3471. loss change 17.8853
  3472. estimatedLossChangeFromW 18.066
  3473. loss 489.233 loss2 472.056 change: 17.1772
  3474. sumweightsdiff -0.0314012
  3475. loss change 17.1772
  3476. estimatedLossChangeFromW 17.3506
  3477. loss 472.056 loss2 455.559 change: 16.4975
  3478. sumweightsdiff -0.0308135
  3479. loss change 16.4975
  3480. estimatedLossChangeFromW 16.6639
  3481. loss 455.559 loss2 439.714 change: 15.8447
  3482. sumweightsdiff -0.0302379
  3483. loss change 15.8447
  3484. estimatedLossChangeFromW 16.0046
  3485. loss 439.714 loss2 424.416 change: 15.2976
  3486. sumweightsdiff -0.0296733
  3487. loss change 15.2976
  3488. estimatedLossChangeFromW 15.3717
  3489. loss 424.416 loss2 409.545 change: 14.871
  3490. sumweightsdiff -0.0299227
  3491. loss change 14.871
  3492. estimatedLossChangeFromW 15.0234
  3493. loss 409.545 loss2 395.271 change: 14.274
  3494. sumweightsdiff -0.0293575
  3495. loss change 14.274
  3496. estimatedLossChangeFromW 14.4202
  3497. loss 395.271 loss2 381.57 change: 13.7013
  3498. sumweightsdiff -0.0288033
  3499. loss change 13.7013
  3500. estimatedLossChangeFromW 13.8415
  3501. loss 381.57 loss2 368.418 change: 13.1519
  3502. sumweightsdiff -0.0282608
  3503. loss change 13.1519
  3504. estimatedLossChangeFromW 13.2864
  3505. loss 368.418 loss2 355.794 change: 12.6248
  3506. sumweightsdiff -0.0277294
  3507. loss change 12.6248
  3508. estimatedLossChangeFromW 12.7538
  3509. loss 355.794 loss2 343.675 change: 12.119
  3510. sumweightsdiff -0.027209
  3511. loss change 12.119
  3512. estimatedLossChangeFromW 12.2429
  3513. loss 343.675 loss2 332.041 change: 11.634
  3514. sumweightsdiff -0.0266991
  3515. loss change 11.634
  3516. estimatedLossChangeFromW 11.7526
  3517. loss 332.041 loss2 320.872 change: 11.1684
  3518. sumweightsdiff -0.0261997
  3519. loss change 11.1684
  3520. estimatedLossChangeFromW 11.2823
  3521. loss 320.872 loss2 310.15 change: 10.7218
  3522. sumweightsdiff -0.0257105
  3523. loss change 10.7218
  3524. estimatedLossChangeFromW 10.8312
  3525. clblas teardown
  3526. [ OK ] testbackward.checknumerically_imagesize5_filter3_relu (3915 ms)
  3527. [ RUN ] testbackward.compare_1_n_kgsgo_32c5
  3528. -D BIASED -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=32 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=32 -D gOutputPlanes=32 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0
  3529. batchsize=8 LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  3530. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3531. Using OpenCL device: Tahiti
  3532. initializing clblas
  3533. LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  3534. output[0]=-0.0308112 -0.0308112 SAME || -0.129603 || -0.048413 || 0.07916 || -0.118675 || 0.0416933 || 0.100887 || -0.106013
  3535. output[1]=-0.0574008 -0.0574008 SAME || 0.099984 || 0.0155394 || 0.00411644 || 0.131031 || -0.0107744 || 0.121347 || 0.0437087
  3536. output[2]=-0.0227139 -0.0227139 SAME || -0.0115189 || -0.190989 || -0.0445787 || -0.013341 || -0.04953 || -0.109186 || 0.104814
  3537. output[3]=-0.0805896 -0.0805896 SAME || 0.0216207 || -0.128649 || -0.0159031 || 0.0534839 || 0.0301581 || 0.104269 || -0.0841106
  3538. output[4]=-0.0723994 -0.0723994 SAME || -0.0164838 || -0.00649171 || -0.042007 || 0.147102 || -0.0702085 || -0.0120931 || 0.0597854
  3539. output[5]=0.130336 0.130336 SAME || -0.0816751 || -0.272227 || 0.0707071 || 0.133967 || 0.0323092 || 0.124248 || -0.0138626
  3540. output[6]=-0.00415662 -0.00415662 SAME || -0.0920411 || 0.0352436 || 0.0541946 || 0.00491123 || -0.0805987 || 0.0834764 || 0.0631893
  3541. output[7]=-0.0915931 -0.0915931 SAME || -0.0358497 || 0.0445722 || -0.0472172 || 0.0778742 || -0.0550363 || -0.179262 || -0.0812755
  3542. output[8]=0.0556533 0.0556533 SAME || -0.0684331 || -0.0243033 || -0.0822076 || -0.0104788 || -0.043145 || -0.0481164 || 0.0538944
  3543. output[9]=-0.0725742 -0.0725742 SAME || 0.0486592 || -0.0286811 || -0.0249626 || 0.0394469 || -0.144496 || 0.0909432 || -0.0152857
  3544. output[10]=-0.0153476 -0.0153476 SAME || -0.0677297 || -0.140709 || -0.0161164 || 0.131645 || 0.0545684 || -0.0210541 || 0.0611338
  3545. output[11]=-0.0212713 -0.0212713 SAME || 0.100494 || 0.2122 || -0.0812487 || 0.0532493 || -0.0183774 || -0.0937923 || -0.069912
  3546. output[12]=0.0389741 0.0389741 SAME || 0.0809882 || 0.0370538 || 0.0241565 || -0.0582968 || 0.0437625 || 0.139931 || -0.065007
  3547. output[13]=0.0349705 0.0349705 SAME || -0.0251775 || -0.0759114 || 0.0945214 || 0.00389841 || -0.0377205 || 0.17624 || -0.114476
  3548. output[14]=0.0366689 0.0366689 SAME || -0.0348694 || -0.0581568 || 0.0376178 || -0.0298947 || -0.0299259 || -0.0913825 || -0.0745193
  3549. output[15]=0.0186965 0.0186965 SAME || 0.0281147 || 0.00937999 || 0.108983 || -0.0505074 || -0.0573388 || 0.067382 || 0.0387854
  3550. output[16]=0.0658136 0.0658136 SAME || -0.0412163 || -0.128719 || 0.150029 || 0.0555238 || -0.0203267 || -0.0795422 || -0.123847
  3551. output[17]=0.0705919 0.0705919 SAME || 0.147334 || 0.151016 || -0.0122364 || 0.0360484 || -0.0609187 || 0.0166715 || -0.141399
  3552. output[18]=-0.0508929 -0.0508929 SAME || 0.0131358 || -0.0101773 || -0.120741 || -0.00821514 || 0.00894922 || -0.117651 || 0.0631629
  3553. output[19]=-0.0110406 -0.0110406 SAME || 0.189081 || 0.0665268 || 0.0622702 || 0.151629 || -0.0172241 || -0.0215623 || 0.0457666
  3554. clblas teardown
  3555. batchsize=8 LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
  3556. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3557. Using OpenCL device: Tahiti
  3558. initializing clblas
  3559. clblas teardown
  3560. unknown file: error: C++ exception with description "
  3561. kernel source:
  3562. 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
  3563. 2: //
  3564. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  3565. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  3566. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  3567. 6:
  3568. 7: void copyLocal(local float *target, global float const *source, int N) {
  3569. 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  3570. 9: for (int loop = 0; loop < numLoops; loop++) {
  3571. 10: int offset = loop * get_local_size(0) + get_local_id(0);
  3572. 11: if (offset < N) {
  3573. 12: target[offset] = source[offset];
  3574. 13: }
  3575. 14: }
  3576. 15: }
  3577. 16:
  3578. 17: // as calcGradInput, but with local cache
  3579. 18: // convolve weights with gradOutput to produce gradInput
  3580. 19: // workgroupid: [n][inputPlane]
  3581. 20: // localid: [upstreamrow][upstreamcol]
  3582. 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
  3583. 22: // need to store locally:
  3584. 23: // - _gradOutputPlane. size = outputSizeSquared
  3585. 24: // - _filterPlane. size = filtersizesquared
  3586. 25: // note: currently doesnt use bias as input. thats probably an error?
  3587. 26: // inputs: gradOutput :convolve: filters => gradInput
  3588. 27: //
  3589. 28: // global:
  3590. 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
  3591. 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
  3592. 31: // per workgroup:
  3593. 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
  3594. 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
  3595. 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
  3596. 35: void kernel calcGradInputCached(
  3597. 36: const int batchSize,
  3598. 37: global const float *gradOutputGlobal,
  3599. 38: global const float *filtersGlobal,
  3600. 39: global float *gradInput,
  3601. 40: local float *_gradOutputPlane,
  3602. 41: local float *_filterPlane) {
  3603. 42:
  3604. 43: #define globalId get_global_id(0)
  3605. 44: #define localId get_local_id(0)
  3606. 45: #define workgroupId get_group_id(0)
  3607. 46: #define workgroupSize get_local_size(0)
  3608. 47:
  3609. 48: const int n = workgroupId / gInputPlanes;
  3610. 49: const int upstreamPlane = workgroupId % gInputPlanes;
  3611. 50:
  3612. 51: const int upstreamRow = localId / gInputSize;
  3613. 52: const int upstreamCol = localId % gInputSize;
  3614. 53:
  3615. 54: float sumWeightTimesOutError = 0;
  3616. 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
  3617. 56: barrier(CLK_LOCAL_MEM_FENCE);
  3618. 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
  3619. 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
  3620. 59: barrier(CLK_LOCAL_MEM_FENCE);
  3621. 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
  3622. 61: int outRow = upstreamRow + gMargin - filterRow;
  3623. 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
  3624. 63: int outCol = upstreamCol + gMargin - filterCol;
  3625. 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
  3626. 65: float thisWeightTimesError =
  3627. 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
  3628. 67: _filterPlane[filterRow * gFilterSize + filterCol];
  3629. 68: sumWeightTimesOutError += thisWeightTimesError;
  3630. 69: }
  3631. 70: }
  3632. 71: }
  3633. 72: }
  3634. 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
  3635. 74: if (localId < gInputSizeSquared) {
  3636. 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
  3637. 76: }
  3638. 77: }
  3639. 78:
  3640. 79:
  3641.  
  3642. Something went wrong, code -55" thrown in the test body.
  3643. [ FAILED ] testbackward.compare_1_n_kgsgo_32c5 (843 ms)
  3644. [----------] 12 tests from testbackward (10764 ms total)
  3645.  
  3646. [----------] 6 tests from testsinglebatch
  3647. [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2
  3648. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3649. Using OpenCL device: Tahiti
  3650. initializing clblas
  3651. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  3652. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=5 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
  3653. layer 2:ActivationLayer{ LINEAR }
  3654. layer 3:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
  3655. layer 4:ActivationLayer{ TANH }
  3656. layer 5:SquareLossLayer{}
  3657. Parameters overview: (skipping 4 layers with 0 params)
  3658. layer 1: params=50 17.9%
  3659. layer 3: params=230 82.1%
  3660. TOTAL : params=280
  3661. weightsTotalSize=280
  3662. forward try kernel 0
  3663. ... not plausibly optimal, skipping
  3664. forward try kernel 1
  3665. ... seems valid
  3666. ForwardAuto: kernel 1 0ms
  3667. forward try kernel 0
  3668. ... not plausibly optimal, skipping
  3669. forward try kernel 1
  3670. ... seems valid
  3671. ForwardAuto: kernel 1 0ms
  3672. backward try kernel 0
  3673. ... not plausibly optimal, skipping
  3674. backward try kernel 1
  3675. ... seems valid
  3676. BackwardAuto: kernel 1 0ms
  3677. calcGradWeights try kernel 0
  3678. ... not plausibly optimal, skipping
  3679. calcGradWeights try kernel 1
  3680. ... seems valid
  3681. BackpropWeightsAuto: kernel 1 0ms
  3682. calcGradWeights try kernel 0
  3683. ... not plausibly optimal, skipping
  3684. calcGradWeights try kernel 1
  3685. ... seems valid
  3686. BackpropWeightsAuto: kernel 1 0ms
  3687. forward try kernel 2
  3688. ... seems valid
  3689. ForwardAuto: kernel 2 0ms
  3690. forward try kernel 2
  3691. ... seems valid
  3692. ForwardAuto: kernel 2 0ms
  3693. forward try kernel 3
  3694. ... seems valid
  3695. ForwardAuto: kernel 3 0ms
  3696. forward try kernel 3
  3697. ... seems valid
  3698. ForwardAuto: kernel 3 0ms
  3699. backward try kernel 2
  3700. ... seems valid
  3701. BackwardAuto: kernel 2 0ms
  3702. calcGradWeights try kernel 2
  3703. ... seems valid
  3704. BackpropWeightsAuto: kernel 2 16ms
  3705. calcGradWeights try kernel 2
  3706. ... seems valid
  3707. BackpropWeightsAuto: kernel 2 0ms
  3708. forward try kernel 4
  3709. ... seems valid
  3710. ForwardAuto: kernel 4 0ms
  3711. forward try kernel 4
  3712. ... seems valid
  3713. ForwardAuto: kernel 4 15ms
  3714. forward try kernel 5
  3715. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3716. ... not valid
  3717. forward try kernel 6
  3718. ... seems valid
  3719. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3720. forward try kernel 7
  3721. ... seems valid
  3722. ForwardAuto: kernel 7 702ms
  3723. forward try kernel 5
  3724. cl/forward_fc_wgperrow.cl build log:
  3725. "C:\Users\pz\AppData\Local\Temp\OCL2857.tmp.cl", line 75: warning: variable
  3726. "loopsPerExample" was declared but never referenced
  3727. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  3728. ^
  3729.  
  3730.  
  3731. ... seems valid
  3732. ForwardAuto: kernel 5 0ms
  3733. backward try kernel 3
  3734. ... seems valid
  3735. BackwardAuto: kernel 3 421ms
  3736. calcGradWeights try kernel 3
  3737. options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  3738. ... seems valid
  3739. BackpropWeightsAuto: kernel 3 0ms
  3740. calcGradWeights try kernel 3
  3741. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  3742. ... seems valid
  3743. BackpropWeightsAuto: kernel 3 0ms
  3744. forward kernel 0: cannot be used
  3745. forward kernel 1 time: 0ms
  3746. forward kernel 2 time: 0ms
  3747. forward kernel 3 time: 0ms
  3748. forward kernel 4 time: 0ms
  3749. forward kernel 5: cannot be used
  3750. forward kernel 6: cannot be used
  3751. forward kernel 7 time: 702ms
  3752. forward layer selected kernel 1
  3753. forward try kernel 6
  3754. ... seems valid
  3755. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3756. forward try kernel 7
  3757. ... seems valid
  3758. ForwardAuto: kernel 7 312ms
  3759. forward kernel 0: cannot be used
  3760. forward kernel 1 time: 0ms
  3761. forward kernel 2 time: 0ms
  3762. forward kernel 3 time: 0ms
  3763. forward kernel 4 time: 15ms
  3764. forward kernel 5 time: 0ms
  3765. forward kernel 6: cannot be used
  3766. forward kernel 7 time: 312ms
  3767. forward layer selected kernel 1
  3768. backward kernel 0: cannot be used
  3769. backward kernel 1 time: 0ms
  3770. backward kernel 2 time: 0ms
  3771. backward kernel 3 time: 421ms
  3772. backward layer selected kernel 1
  3773. calcGradWeights try kernel 4
  3774. ... seems valid
  3775. BackpropWeightsAuto: kernel 4 843ms
  3776. calcGradWeights try kernel 4
  3777. ... seems valid
  3778. BackpropWeightsAuto: kernel 4 1092ms
  3779. calcGradWeights kernel 0: cannot be used
  3780. calcGradWeights kernel 1 time: 0ms
  3781. calcGradWeights kernel 2 time: 16ms
  3782. calcGradWeights kernel 3 time: 0ms
  3783. calcGradWeights kernel 4 time: 843ms
  3784. calcGradWeights layer selected kernel 1
  3785. calcGradWeights kernel 0: cannot be used
  3786. calcGradWeights kernel 1 time: 0ms
  3787. calcGradWeights kernel 2 time: 0ms
  3788. calcGradWeights kernel 3 time: 0ms
  3789. calcGradWeights kernel 4 time: 1092ms
  3790. calcGradWeights layer selected kernel 1
  3791. batch time 5694 ms
  3792. dump enabled=0
  3793. clblas teardown
  3794. [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2 (6130 ms)
  3795. [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2_10filters
  3796. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3797. Using OpenCL device: Tahiti
  3798. initializing clblas
  3799. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  3800. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=10 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
  3801. layer 2:ActivationLayer{ RELU }
  3802. layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  3803. layer 4:ActivationLayer{ TANH }
  3804. layer 5:SquareLossLayer{}
  3805. Parameters overview: (skipping 4 layers with 0 params)
  3806. layer 1: params=100 9.9%
  3807. layer 3: params=910 90.1%
  3808. TOTAL : params=1010
  3809. weightsTotalSize=1010
  3810. forward try kernel 0
  3811. ... not plausibly optimal, skipping
  3812. forward try kernel 1
  3813. ... seems valid
  3814. ForwardAuto: kernel 1 0ms
  3815. forward try kernel 0
  3816. ... not plausibly optimal, skipping
  3817. forward try kernel 1
  3818. ... seems valid
  3819. ForwardAuto: kernel 1 0ms
  3820. backward try kernel 0
  3821. ... not plausibly optimal, skipping
  3822. backward try kernel 1
  3823. ... seems valid
  3824. BackwardAuto: kernel 1 0ms
  3825. calcGradWeights try kernel 0
  3826. ... not plausibly optimal, skipping
  3827. calcGradWeights try kernel 1
  3828. ... seems valid
  3829. BackpropWeightsAuto: kernel 1 0ms
  3830. calcGradWeights try kernel 0
  3831. ... not plausibly optimal, skipping
  3832. calcGradWeights try kernel 1
  3833. ... seems valid
  3834. BackpropWeightsAuto: kernel 1 0ms
  3835. forward try kernel 2
  3836. ... seems valid
  3837. ForwardAuto: kernel 2 0ms
  3838. forward try kernel 2
  3839. ... seems valid
  3840. ForwardAuto: kernel 2 0ms
  3841. forward try kernel 3
  3842. ... seems valid
  3843. ForwardAuto: kernel 3 0ms
  3844. forward try kernel 3
  3845. ... seems valid
  3846. ForwardAuto: kernel 3 0ms
  3847. backward try kernel 2
  3848. ... seems valid
  3849. BackwardAuto: kernel 2 0ms
  3850. calcGradWeights try kernel 2
  3851. ... seems valid
  3852. BackpropWeightsAuto: kernel 2 0ms
  3853. calcGradWeights try kernel 2
  3854. ... seems valid
  3855. BackpropWeightsAuto: kernel 2 0ms
  3856. forward try kernel 4
  3857. ... seems valid
  3858. ForwardAuto: kernel 4 0ms
  3859. forward try kernel 4
  3860. ... seems valid
  3861. ForwardAuto: kernel 4 0ms
  3862. forward try kernel 5
  3863. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3864. ... not valid
  3865. forward try kernel 6
  3866. ... seems valid
  3867. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3868. forward try kernel 7
  3869. ... seems valid
  3870. ForwardAuto: kernel 7 702ms
  3871. forward try kernel 5
  3872. cl/forward_fc_wgperrow.cl build log:
  3873. "C:\Users\pz\AppData\Local\Temp\OCL410C.tmp.cl", line 75: warning: variable
  3874. "loopsPerExample" was declared but never referenced
  3875. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  3876. ^
  3877.  
  3878.  
  3879. ... seems valid
  3880. ForwardAuto: kernel 5 0ms
  3881. backward try kernel 3
  3882. ... seems valid
  3883. BackwardAuto: kernel 3 374ms
  3884. calcGradWeights try kernel 3
  3885. options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  3886. ... seems valid
  3887. BackpropWeightsAuto: kernel 3 0ms
  3888. calcGradWeights try kernel 3
  3889. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
  3890. ... seems valid
  3891. BackpropWeightsAuto: kernel 3 0ms
  3892. forward kernel 0: cannot be used
  3893. forward kernel 1 time: 0ms
  3894. forward kernel 2 time: 0ms
  3895. forward kernel 3 time: 0ms
  3896. forward kernel 4 time: 0ms
  3897. forward kernel 5: cannot be used
  3898. forward kernel 6: cannot be used
  3899. forward kernel 7 time: 702ms
  3900. forward layer selected kernel 1
  3901. forward try kernel 6
  3902. ... seems valid
  3903. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  3904. forward try kernel 7
  3905. ... seems valid
  3906. ForwardAuto: kernel 7 312ms
  3907. forward kernel 0: cannot be used
  3908. forward kernel 1 time: 0ms
  3909. forward kernel 2 time: 0ms
  3910. forward kernel 3 time: 0ms
  3911. forward kernel 4 time: 0ms
  3912. forward kernel 5 time: 0ms
  3913. forward kernel 6: cannot be used
  3914. forward kernel 7 time: 312ms
  3915. forward layer selected kernel 1
  3916. backward kernel 0: cannot be used
  3917. backward kernel 1 time: 0ms
  3918. backward kernel 2 time: 0ms
  3919. backward kernel 3 time: 374ms
  3920. backward layer selected kernel 1
  3921. calcGradWeights try kernel 4
  3922. ... seems valid
  3923. BackpropWeightsAuto: kernel 4 827ms
  3924. calcGradWeights try kernel 4
  3925. ... seems valid
  3926. BackpropWeightsAuto: kernel 4 1076ms
  3927. calcGradWeights kernel 0: cannot be used
  3928. calcGradWeights kernel 1 time: 0ms
  3929. calcGradWeights kernel 2 time: 0ms
  3930. calcGradWeights kernel 3 time: 0ms
  3931. calcGradWeights kernel 4 time: 827ms
  3932. calcGradWeights layer selected kernel 1
  3933. calcGradWeights kernel 0: cannot be used
  3934. calcGradWeights kernel 1 time: 0ms
  3935. calcGradWeights kernel 2 time: 0ms
  3936. calcGradWeights kernel 3 time: 0ms
  3937. calcGradWeights kernel 4 time: 1076ms
  3938. calcGradWeights layer selected kernel 1
  3939. batch time 6037 ms
  3940. dump enabled=0
  3941. clblas teardown
  3942. [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2_10filters (6490 ms)
  3943. [ RUN ] testsinglebatch.imagesize28
  3944. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  3945. Using OpenCL device: Tahiti
  3946. initializing clblas
  3947. layer 0:InputLayer{ outputPlanes=1 outputSize=28 }
  3948. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=28 numFilters=10 filterSize=3 outputSize=26 padZeros=0 biased=1 skip=0} }
  3949. layer 2:ActivationLayer{ RELU }
  3950. layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  3951. layer 4:ActivationLayer{ TANH }
  3952. layer 5:SquareLossLayer{}
  3953. Parameters overview: (skipping 4 layers with 0 params)
  3954. layer 1: params=100 0.1%
  3955. layer 3: params=67610 99.9%
  3956. TOTAL : params=67710
  3957. weightsTotalSize=67710
  3958. forward try kernel 0
  3959. ... not plausibly optimal, skipping
  3960. forward try kernel 1
  3961. ... seems valid
  3962. ForwardAuto: kernel 1 0ms
  3963. forward try kernel 0
  3964. ... not plausibly optimal, skipping
  3965. forward try kernel 1
  3966. ... seems valid
  3967. ForwardAuto: kernel 1 0ms
  3968. backward try kernel 0
  3969. ... not plausibly optimal, skipping
  3970. backward try kernel 1
  3971. ... seems valid
  3972. BackwardAuto: kernel 1 0ms
  3973. calcGradWeights try kernel 0
  3974. ... not plausibly optimal, skipping
  3975. calcGradWeights try kernel 1
  3976. ... seems valid
  3977. BackpropWeightsAuto: kernel 1 0ms
  3978. calcGradWeights try kernel 0
  3979. ... not plausibly optimal, skipping
  3980. calcGradWeights try kernel 1
  3981. ... seems valid
  3982. BackpropWeightsAuto: kernel 1 0ms
  3983. forward try kernel 2
  3984. ForwardAuto: kernel 2: this instance cant be used: cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize
  3985. ... not valid
  3986. forward try kernel 3
  3987. ForwardAuto: kernel 3: this instance cant be used: cannot use forward3, since outputimagesize * outputimagesize > maxworkgroupsize
  3988. ... not valid
  3989. forward try kernel 4
  3990. ... seems valid
  3991. ForwardAuto: kernel 4 0ms
  3992. forward try kernel 2
  3993. ... seems valid
  3994. ForwardAuto: kernel 2 0ms
  3995. forward try kernel 5
  3996. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  3997. ... not valid
  3998. forward try kernel 6
  3999. ... seems valid
  4000. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  4001. forward try kernel 7
  4002. ... seems valid
  4003. ForwardAuto: kernel 7 562ms
  4004. forward try kernel 3
  4005. ... seems valid
  4006. ForwardAuto: kernel 3 0ms
  4007. backward try kernel 2
  4008. ... seems valid
  4009. BackwardAuto: kernel 2 this instance cant be used:
  4010. kernel source:
  4011. 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
  4012. 2: //
  4013. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4014. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4015. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4016. 6:
  4017. 7: void copyLocal(local float *target, global float const *source, int N) {
  4018. 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4019. 9: for (int loop = 0; loop < numLoops; loop++) {
  4020. 10: int offset = loop * get_local_size(0) + get_local_id(0);
  4021. 11: if (offset < N) {
  4022. 12: target[offset] = source[offset];
  4023. 13: }
  4024. 14: }
  4025. 15: }
  4026. 16:
  4027. 17: // as calcGradInput, but with local cache
  4028. 18: // convolve weights with gradOutput to produce gradInput
  4029. 19: // workgroupid: [n][inputPlane]
  4030. 20: // localid: [upstreamrow][upstreamcol]
  4031. 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
  4032. 22: // need to store locally:
  4033. 23: // - _gradOutputPlane. size = outputSizeSquared
  4034. 24: // - _filterPlane. size = filtersizesquared
  4035. 25: // note: currently doesnt use bias as input. thats probably an error?
  4036. 26: // inputs: gradOutput :convolve: filters => gradInput
  4037. 27: //
  4038. 28: // global:
  4039. 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
  4040. 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
  4041. 31: // per workgroup:
  4042. 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
  4043. 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
  4044. 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
  4045. 35: void kernel calcGradInputCached(
  4046. 36: const int batchSize,
  4047. 37: global const float *gradOutputGlobal,
  4048. 38: global const float *filtersGlobal,
  4049. 39: global float *gradInput,
  4050. 40: local float *_gradOutputPlane,
  4051. 41: local float *_filterPlane) {
  4052. 42:
  4053. 43: #define globalId get_global_id(0)
  4054. 44: #define localId get_local_id(0)
  4055. 45: #define workgroupId get_group_id(0)
  4056. 46: #define workgroupSize get_local_size(0)
  4057. 47:
  4058. 48: const int n = workgroupId / gInputPlanes;
  4059. 49: const int upstreamPlane = workgroupId % gInputPlanes;
  4060. 50:
  4061. 51: const int upstreamRow = localId / gInputSize;
  4062. 52: const int upstreamCol = localId % gInputSize;
  4063. 53:
  4064. 54: float sumWeightTimesOutError = 0;
  4065. 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
  4066. 56: barrier(CLK_LOCAL_MEM_FENCE);
  4067. 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
  4068. 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
  4069. 59: barrier(CLK_LOCAL_MEM_FENCE);
  4070. 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
  4071. 61: int outRow = upstreamRow + gMargin - filterRow;
  4072. 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
  4073. 63: int outCol = upstreamCol + gMargin - filterCol;
  4074. 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
  4075. 65: float thisWeightTimesError =
  4076. 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
  4077. 67: _filterPlane[filterRow * gFilterSize + filterCol];
  4078. 68: sumWeightTimesOutError += thisWeightTimesError;
  4079. 69: }
  4080. 70: }
  4081. 71: }
  4082. 72: }
  4083. 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
  4084. 74: if (localId < gInputSizeSquared) {
  4085. 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
  4086. 76: }
  4087. 77: }
  4088. 78:
  4089. 79:
  4090.  
  4091. Something went wrong, code -55
  4092. backward try kernel 3
  4093. ... seems valid
  4094. BackwardAuto: kernel 3 889ms
  4095. calcGradWeights try kernel 2
  4096. ... seems valid
  4097. BackpropWeightsAuto: kernel 2 this instance cant be used:
  4098. kernel source:
  4099. 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
  4100. 2: //
  4101. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4102. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4103. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4104. 6:
  4105. 7: // expected defines:
  4106. 8: // BIASED (or not)
  4107. 9:
  4108. 10: // including cl/copyLocal.cl:
  4109. 11: // Copyright Hugh Perkins 2015 hughperkins at gmail
  4110. 12: //
  4111. 13: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4112. 14: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4113. 15: // obtain one at http://mozilla.org/MPL/2.0/.
  4114. 16:
  4115. 17: void copyLocal(local float *target, global float const *source, int N) {
  4116. 18: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4117. 19: for (int loop = 0; loop < numLoops; loop++) {
  4118. 20: int offset = loop * get_local_size(0) + get_local_id(0);
  4119. 21: if (offset < N) {
  4120. 22: target[offset] = source[offset];
  4121. 23: }
  4122. 24: }
  4123. 25: }
  4124. 26:
  4125. 27: void copyGlobal(global float *target, local float const *source, int N) {
  4126. 28: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4127. 29: for (int loop = 0; loop < numLoops; loop++) {
  4128. 30: int offset = loop * get_local_size(0) + get_local_id(0);
  4129. 31: if (offset < N) {
  4130. 32: target[offset] = source[offset];
  4131. 33: }
  4132. 34: }
  4133. 35: }
  4134. 36:
  4135. 37:
  4136. 38: // including cl/ids.cl:
  4137. 39: // Copyright Hugh Perkins 2015 hughperkins at gmail
  4138. 40: //
  4139. 41: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4140. 42: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4141. 43: // obtain one at http://mozilla.org/MPL/2.0/.
  4142. 44:
  4143. 45: #define globalId (get_global_id(0))
  4144. 46: #define localId (get_local_id(0) )
  4145. 47: #define workgroupId (get_group_id(0))
  4146. 48: #define workgroupSize (get_local_size(0))
  4147. 49:
  4148. 50:
  4149. 51:
  4150. 52:
  4151. 53: // workgroupId: [outputPlane][inputPlane]
  4152. 54: // localId: [filterRow][filterCol]
  4153. 55: // per-thread iteration: [n][outputRow][outputCol]
  4154. 56: // local: errorimage: outputSize * outputSize
  4155. 57: // imageimage: inputSize * inputSize
  4156. 58: void kernel backprop_floats_withscratch_dobias(
  4157. 59: const float learningRateMultiplier, const int batchSize,
  4158. 60: global const float *gradOutput, global const float *images,
  4159. 61: global float *gradWeights,
  4160. 62: #ifdef BIASED
  4161. 63: global float *gradBiasWeights,
  4162. 64: #endif
  4163. 65: local float *_errorImage, local float *_imageImage
  4164. 66: ) {
  4165. 67: const int filterRow = localId / gFilterSize;
  4166. 68: const int filterCol = localId % gFilterSize;
  4167. 69:
  4168. 70: #define outPlane (workgroupId / gInputPlanes)
  4169. 71: #define upstreamPlane (workgroupId % gInputPlanes)
  4170. 72:
  4171. 73: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4172. 74: // aggregate over: [outRow][outCol][n]
  4173. 75: float thiswchange = 0;
  4174. 76: #ifdef BIASED
  4175. 77: float thisbiaschange = 0;
  4176. 78: #endif
  4177. 79: for (int n = 0; n < batchSize; n++) {
  4178. 80: barrier(CLK_LOCAL_MEM_FENCE);
  4179. 81: copyLocal(_imageImage, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
  4180. 82: copyLocal(_errorImage, gradOutput + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
  4181. 83: barrier(CLK_LOCAL_MEM_FENCE);
  4182. 84: if (localId < gFilterSizeSquared) {
  4183. 85: for (int outRow = 0; outRow < gOutputSize; outRow++) {
  4184. 86: int upstreamRow = outRow - gMargin + filterRow;
  4185. 87: for (int outCol = 0; outCol < gOutputSize; outCol++) {
  4186. 88: const int upstreamCol = outCol - gMargin + filterCol;
  4187. 89: #define proceed (upstreamRow >= 0 && upstreamCol >= 0 && upstreamRow < gInputSize && upstreamCol < gInputSize)
  4188. 90: if (proceed) {
  4189. 91: // these defines reduce register pressure, compared to const
  4190. 92: // giving a 40% speedup on nvidia :-)
  4191. 93: #define resultIndex (outRow * gOutputSize + outCol)
  4192. 94: #define error (_errorImage[resultIndex])
  4193. 95: //const float error = _errorImage[resultIndex];
  4194. 96: #define upstreamDataIndex (upstreamRow * gInputSize + upstreamCol)
  4195. 97: #define upstreamResult (_imageImage[upstreamDataIndex])
  4196. 98: thiswchange += upstreamResult * error;
  4197. 99: #ifdef BIASED
  4198. 100: thisbiaschange += error;
  4199. 101: #endif
  4200. 102: }
  4201. 103: }
  4202. 104: }
  4203. 105: }
  4204. 106: }
  4205. 107: if (localId < gFilterSizeSquared) {
  4206. 108: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
  4207. 109: }
  4208. 110: #ifdef BIASED
  4209. 111: #define writeBias (upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin)
  4210. 112: if (writeBias) {
  4211. 113: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
  4212. 114: }
  4213. 115: #endif
  4214. 116: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4215. 117: // aggregate over: [outRow][outCol][n]
  4216. 118: }
  4217. 119:
  4218. 120:
  4219.  
  4220. Something went wrong, code -55
  4221. calcGradWeights try kernel 3
  4222. options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=26 -D gInputSizeSquared=676 -D gNumFilters=10 -D gFilterSize=26 -D gHalfFilterSize=13 -D gFilterSizeSquared=676 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=25 -DgInputStripeInnerNumRows=26 -DgInputStripeOuterNumRows=76 -DgInputStripeInnerSize=676 -DgInputStripeOuterSize=1976 -DgInputStripeMarginSize=650 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  4223. ... seems valid
  4224. BackpropWeightsAuto: kernel 3 this instance cant be used:
  4225. kernel source:
  4226. 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
  4227. 2: //
  4228. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4229. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4230. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4231. 6:
  4232. 7: // expected defines:
  4233. 8: // BIASED (or not)
  4234. 9:
  4235. 10: // workgroupId: [outputPlane][inputPlane]
  4236. 11: // localId: [filterRow][filterCol]
  4237. 12: // per-thread iteration: [n][outputRow][outputCol]
  4238. 13: // local: errorimage: outputSize * outputSize
  4239. 14: // imageimage: inputSize * inputSize
  4240. 15: // specific characteristic: load one stripe of each image at a time,
  4241. 16: // so we dont run out of memory
  4242. 17: // number of stripes set in: gNumStripes
  4243. 18: // note that whilst we can stripe the gradOutput simply,
  4244. 19: // we actually need to add a half-filter widthed additional few rows
  4245. 20: // onto the images stripe, otherwise we will be missing data
  4246. 21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
  4247. 22: // the outersize, including the two margins is: gInputStripeOuterSize
  4248. 23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
  4249. 24: // corresponding outer margin would be
  4250. 25: void kernel backprop_floats_withscratch_dobias_striped(
  4251. 26: const float learningRateMultiplier, const int batchSize,
  4252. 27: global const float *gradOutput, global const float *images,
  4253. 28: global float *gradWeights,
  4254. 29: #ifdef BIASED
  4255. 30: global float *gradBiasWeights,
  4256. 31: #endif
  4257. 32: local float *_errorStripe, local float *_imageStripe
  4258. 33: ) {
  4259. 34: // gHalfFilterSize
  4260. 35: // gInputSize
  4261. 36: //
  4262. 37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
  4263. 38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
  4264. 39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
  4265. 40: // if we just added gFilterSize)
  4266. 41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
  4267. 42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
  4268. 43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
  4269. 44: //
  4270. 45: // gOutputStripeNumRows
  4271. 46: // gOutputStripeSize
  4272. 47:
  4273. 48: const int globalId = get_global_id(0);
  4274. 49: const int localId = get_local_id(0);
  4275. 50: const int workgroupId = get_group_id(0);
  4276. 51: const int workgroupSize = get_local_size(0);
  4277. 52:
  4278. 53: const int filterRow = localId / gFilterSize;
  4279. 54: const int filterCol = localId % gFilterSize;
  4280. 55:
  4281. 56: const int outPlane = workgroupId / gInputPlanes;
  4282. 57: const int upstreamPlane = workgroupId % gInputPlanes;
  4283. 58:
  4284. 59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4285. 60: // aggregate over: [outRow][outCol][n]
  4286. 61: float thiswchange = 0;
  4287. 62: #ifdef BIASED
  4288. 63: float thisbiaschange = 0;
  4289. 64: #endif
  4290. 65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
  4291. 66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
  4292. 67: for (int n = 0; n < batchSize; n++) {
  4293. 68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
  4294. 69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
  4295. 70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
  4296. 71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
  4297. 72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
  4298. 73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
  4299. 74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
  4300. 75: // need to fetch the image, but it's bigger than us, so will need to loop...
  4301. 76: barrier(CLK_LOCAL_MEM_FENCE);
  4302. 77: for (int i = 0; i < numLoopsForImageStripe; i++) {
  4303. 78: int thisOffset = i * workgroupSize + localId;
  4304. 79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
  4305. 80: bool process = thisOffset < gInputStripeOuterSize
  4306. 81: && thisGlobalImagesOffset >= imageImageGlobalOffset
  4307. 82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
  4308. 83: if (process) {
  4309. 84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
  4310. 85: }
  4311. 86: }
  4312. 87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
  4313. 88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
  4314. 89: int thisOffset = i * workgroupSize + localId;
  4315. 90: int globalErrorsOffset = errorStripeOffset + thisOffset;
  4316. 91: bool process = thisOffset < gOutputStripeSize
  4317. 92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
  4318. 93: if (process) {
  4319. 94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
  4320. 95: }
  4321. 96: }
  4322. 97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
  4323. 98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
  4324. 99: barrier(CLK_LOCAL_MEM_FENCE);
  4325. 100: // if (localId == 13) {
  4326. 101: // for (int i = 0; i < 12; i++) {
  4327. 102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
  4328. 103: // }
  4329. 104: // for (int i = 0; i < 20; i++) {
  4330. 105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
  4331. 106: // }
  4332. 107: // }
  4333. 108: if (localId < gFilterSizeSquared) {
  4334. 109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
  4335. 110: int upstreamRow = outRow - gMargin + filterRow;
  4336. 111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
  4337. 112: int upstreamCol = outCol - gMargin + filterCol;
  4338. 113: bool proceed =
  4339. 114: upstreamRow >= 0 && upstreamCol >= 0
  4340. 115: && upstreamRow < gInputSize && upstreamCol < gInputSize
  4341. 116: && outRow < gOutputSize;
  4342. 117: if (proceed) {
  4343. 118: int resultIndex = outRow * gOutputSize + outCol;
  4344. 119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
  4345. 120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
  4346. 121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
  4347. 122: - stripe * gInputStripeInnerSize ];
  4348. 123: thiswchange += upstreamResult * error;
  4349. 124: #ifdef BIASED
  4350. 125: thisbiaschange += error;
  4351. 126: #endif
  4352. 127: }
  4353. 128: }
  4354. 129: }
  4355. 130: }
  4356. 131: }
  4357. 132: }
  4358. 133: if (localId < gFilterSizeSquared) {
  4359. 134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
  4360. 135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
  4361. 136: }
  4362. 137: #ifdef BIASED
  4363. 138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
  4364. 139: if (writeBias) {
  4365. 140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
  4366. 141: }
  4367. 142: #endif
  4368. 143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4369. 144: // aggregate over: [outRow][outCol][n]
  4370. 145: }
  4371. 146:
  4372. 147:
  4373.  
  4374. Something went wrong, code -55
  4375. calcGradWeights try kernel 4
  4376. ... seems valid
  4377. BackpropWeightsAuto: kernel 4 1341ms
  4378. calcGradWeights try kernel 2
  4379. ... seems valid
  4380. BackpropWeightsAuto: kernel 2 0ms
  4381. forward kernel 0: cannot be used
  4382. forward kernel 1 time: 0ms
  4383. forward kernel 2: cannot be used
  4384. forward kernel 3: cannot be used
  4385. forward kernel 4 time: 0ms
  4386. forward kernel 5: cannot be used
  4387. forward kernel 6: cannot be used
  4388. forward kernel 7 time: 562ms
  4389. forward layer selected kernel 1
  4390. forward try kernel 4
  4391. ... seems valid
  4392. ForwardAuto: kernel 4 0ms
  4393. forward try kernel 5
  4394. cl/forward_fc_wgperrow.cl build log:
  4395. "C:\Users\pz\AppData\Local\Temp\OCL7D02.tmp.cl", line 75: warning: variable
  4396. "loopsPerExample" was declared but never referenced
  4397. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  4398. ^
  4399.  
  4400.  
  4401. ... seems valid
  4402. ForwardAuto: kernel 5 0ms
  4403. backward kernel 0: cannot be used
  4404. backward kernel 1 time: 0ms
  4405. backward kernel 2: cannot be used
  4406. backward kernel 3 time: 889ms
  4407. backward layer selected kernel 1
  4408. calcGradWeights kernel 0: cannot be used
  4409. calcGradWeights kernel 1 time: 0ms
  4410. calcGradWeights kernel 2: cannot be used
  4411. calcGradWeights kernel 3: cannot be used
  4412. calcGradWeights kernel 4 time: 1341ms
  4413. calcGradWeights layer selected kernel 1
  4414. calcGradWeights try kernel 3
  4415. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=26 -D gOutputSizeSquared=676 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=28 -DgInputStripeOuterNumRows=32 -DgInputStripeInnerSize=784 -DgInputStripeOuterSize=896 -DgInputStripeMarginSize=56 -DgOutputStripeNumRows=26 -DgOutputStripeSize=676
  4416. ... seems valid
  4417. BackpropWeightsAuto: kernel 3 0ms
  4418. forward try kernel 6
  4419. ... seems valid
  4420. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  4421. forward try kernel 7
  4422. ... seems valid
  4423. ForwardAuto: kernel 7 827ms
  4424. forward kernel 0: cannot be used
  4425. forward kernel 1 time: 0ms
  4426. forward kernel 2 time: 0ms
  4427. forward kernel 3 time: 0ms
  4428. forward kernel 4 time: 0ms
  4429. forward kernel 5 time: 0ms
  4430. forward kernel 6: cannot be used
  4431. forward kernel 7 time: 827ms
  4432. forward layer selected kernel 1
  4433. calcGradWeights try kernel 4
  4434. ... seems valid
  4435. BackpropWeightsAuto: kernel 4 889ms
  4436. calcGradWeights kernel 0: cannot be used
  4437. calcGradWeights kernel 1 time: 0ms
  4438. calcGradWeights kernel 2 time: 0ms
  4439. calcGradWeights kernel 3 time: 0ms
  4440. calcGradWeights kernel 4 time: 889ms
  4441. calcGradWeights layer selected kernel 1
  4442. batch time 15116 ms
  4443. dump enabled=0
  4444. clblas teardown
  4445. [ OK ] testsinglebatch.imagesize28 (15553 ms)
  4446. [ RUN ] testsinglebatch.imagesize28_filtersize5
  4447. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  4448. Using OpenCL device: Tahiti
  4449. initializing clblas
  4450. layer 0:InputLayer{ outputPlanes=1 outputSize=28 }
  4451. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=28 numFilters=10 filterSize=5 outputSize=24 padZeros=0 biased=1 skip=0} }
  4452. layer 2:ActivationLayer{ RELU }
  4453. layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  4454. layer 4:ActivationLayer{ TANH }
  4455. layer 5:SquareLossLayer{}
  4456. Parameters overview: (skipping 4 layers with 0 params)
  4457. layer 1: params=260 0.4%
  4458. layer 3: params=57610 99.6%
  4459. TOTAL : params=57870
  4460. weightsTotalSize=57870
  4461. forward try kernel 0
  4462. ... not plausibly optimal, skipping
  4463. forward try kernel 1
  4464. ... seems valid
  4465. ForwardAuto: kernel 1 15ms
  4466. forward try kernel 0
  4467. ... not plausibly optimal, skipping
  4468. forward try kernel 1
  4469. ... seems valid
  4470. ForwardAuto: kernel 1 0ms
  4471. backward try kernel 0
  4472. ... not plausibly optimal, skipping
  4473. backward try kernel 1
  4474. ... seems valid
  4475. BackwardAuto: kernel 1 0ms
  4476. calcGradWeights try kernel 0
  4477. ... not plausibly optimal, skipping
  4478. calcGradWeights try kernel 1
  4479. ... seems valid
  4480. BackpropWeightsAuto: kernel 1 0ms
  4481. calcGradWeights try kernel 0
  4482. ... not plausibly optimal, skipping
  4483. calcGradWeights try kernel 1
  4484. ... seems valid
  4485. BackpropWeightsAuto: kernel 1 0ms
  4486. forward try kernel 2
  4487. ForwardAuto: kernel 2: this instance cant be used: cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize
  4488. ... not valid
  4489. forward try kernel 3
  4490. ForwardAuto: kernel 3: this instance cant be used: cannot use forward3, since outputimagesize * outputimagesize > maxworkgroupsize
  4491. ... not valid
  4492. forward try kernel 4
  4493. ... seems valid
  4494. ForwardAuto: kernel 4 0ms
  4495. forward try kernel 2
  4496. ... seems valid
  4497. ForwardAuto: kernel 2 0ms
  4498. forward try kernel 5
  4499. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  4500. ... not valid
  4501. forward try kernel 6
  4502. ... seems valid
  4503. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  4504. forward try kernel 7
  4505. ... seems valid
  4506. ForwardAuto: kernel 7 562ms
  4507. forward try kernel 3
  4508. ... seems valid
  4509. ForwardAuto: kernel 3 0ms
  4510. backward try kernel 2
  4511. ... seems valid
  4512. BackwardAuto: kernel 2 this instance cant be used:
  4513. kernel source:
  4514. 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
  4515. 2: //
  4516. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4517. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4518. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4519. 6:
  4520. 7: void copyLocal(local float *target, global float const *source, int N) {
  4521. 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4522. 9: for (int loop = 0; loop < numLoops; loop++) {
  4523. 10: int offset = loop * get_local_size(0) + get_local_id(0);
  4524. 11: if (offset < N) {
  4525. 12: target[offset] = source[offset];
  4526. 13: }
  4527. 14: }
  4528. 15: }
  4529. 16:
  4530. 17: // as calcGradInput, but with local cache
  4531. 18: // convolve weights with gradOutput to produce gradInput
  4532. 19: // workgroupid: [n][inputPlane]
  4533. 20: // localid: [upstreamrow][upstreamcol]
  4534. 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
  4535. 22: // need to store locally:
  4536. 23: // - _gradOutputPlane. size = outputSizeSquared
  4537. 24: // - _filterPlane. size = filtersizesquared
  4538. 25: // note: currently doesnt use bias as input. thats probably an error?
  4539. 26: // inputs: gradOutput :convolve: filters => gradInput
  4540. 27: //
  4541. 28: // global:
  4542. 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
  4543. 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
  4544. 31: // per workgroup:
  4545. 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
  4546. 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
  4547. 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
  4548. 35: void kernel calcGradInputCached(
  4549. 36: const int batchSize,
  4550. 37: global const float *gradOutputGlobal,
  4551. 38: global const float *filtersGlobal,
  4552. 39: global float *gradInput,
  4553. 40: local float *_gradOutputPlane,
  4554. 41: local float *_filterPlane) {
  4555. 42:
  4556. 43: #define globalId get_global_id(0)
  4557. 44: #define localId get_local_id(0)
  4558. 45: #define workgroupId get_group_id(0)
  4559. 46: #define workgroupSize get_local_size(0)
  4560. 47:
  4561. 48: const int n = workgroupId / gInputPlanes;
  4562. 49: const int upstreamPlane = workgroupId % gInputPlanes;
  4563. 50:
  4564. 51: const int upstreamRow = localId / gInputSize;
  4565. 52: const int upstreamCol = localId % gInputSize;
  4566. 53:
  4567. 54: float sumWeightTimesOutError = 0;
  4568. 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
  4569. 56: barrier(CLK_LOCAL_MEM_FENCE);
  4570. 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
  4571. 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
  4572. 59: barrier(CLK_LOCAL_MEM_FENCE);
  4573. 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
  4574. 61: int outRow = upstreamRow + gMargin - filterRow;
  4575. 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
  4576. 63: int outCol = upstreamCol + gMargin - filterCol;
  4577. 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
  4578. 65: float thisWeightTimesError =
  4579. 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
  4580. 67: _filterPlane[filterRow * gFilterSize + filterCol];
  4581. 68: sumWeightTimesOutError += thisWeightTimesError;
  4582. 69: }
  4583. 70: }
  4584. 71: }
  4585. 72: }
  4586. 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
  4587. 74: if (localId < gInputSizeSquared) {
  4588. 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
  4589. 76: }
  4590. 77: }
  4591. 78:
  4592. 79:
  4593.  
  4594. Something went wrong, code -55
  4595. backward try kernel 3
  4596. ... seems valid
  4597. BackwardAuto: kernel 3 562ms
  4598. calcGradWeights try kernel 2
  4599. ... seems valid
  4600. BackpropWeightsAuto: kernel 2 this instance cant be used:
  4601. kernel source:
  4602. 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
  4603. 2: //
  4604. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4605. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4606. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4607. 6:
  4608. 7: // expected defines:
  4609. 8: // BIASED (or not)
  4610. 9:
  4611. 10: // including cl/copyLocal.cl:
  4612. 11: // Copyright Hugh Perkins 2015 hughperkins at gmail
  4613. 12: //
  4614. 13: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4615. 14: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4616. 15: // obtain one at http://mozilla.org/MPL/2.0/.
  4617. 16:
  4618. 17: void copyLocal(local float *target, global float const *source, int N) {
  4619. 18: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4620. 19: for (int loop = 0; loop < numLoops; loop++) {
  4621. 20: int offset = loop * get_local_size(0) + get_local_id(0);
  4622. 21: if (offset < N) {
  4623. 22: target[offset] = source[offset];
  4624. 23: }
  4625. 24: }
  4626. 25: }
  4627. 26:
  4628. 27: void copyGlobal(global float *target, local float const *source, int N) {
  4629. 28: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
  4630. 29: for (int loop = 0; loop < numLoops; loop++) {
  4631. 30: int offset = loop * get_local_size(0) + get_local_id(0);
  4632. 31: if (offset < N) {
  4633. 32: target[offset] = source[offset];
  4634. 33: }
  4635. 34: }
  4636. 35: }
  4637. 36:
  4638. 37:
  4639. 38: // including cl/ids.cl:
  4640. 39: // Copyright Hugh Perkins 2015 hughperkins at gmail
  4641. 40: //
  4642. 41: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4643. 42: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4644. 43: // obtain one at http://mozilla.org/MPL/2.0/.
  4645. 44:
  4646. 45: #define globalId (get_global_id(0))
  4647. 46: #define localId (get_local_id(0) )
  4648. 47: #define workgroupId (get_group_id(0))
  4649. 48: #define workgroupSize (get_local_size(0))
  4650. 49:
  4651. 50:
  4652. 51:
  4653. 52:
  4654. 53: // workgroupId: [outputPlane][inputPlane]
  4655. 54: // localId: [filterRow][filterCol]
  4656. 55: // per-thread iteration: [n][outputRow][outputCol]
  4657. 56: // local: errorimage: outputSize * outputSize
  4658. 57: // imageimage: inputSize * inputSize
  4659. 58: void kernel backprop_floats_withscratch_dobias(
  4660. 59: const float learningRateMultiplier, const int batchSize,
  4661. 60: global const float *gradOutput, global const float *images,
  4662. 61: global float *gradWeights,
  4663. 62: #ifdef BIASED
  4664. 63: global float *gradBiasWeights,
  4665. 64: #endif
  4666. 65: local float *_errorImage, local float *_imageImage
  4667. 66: ) {
  4668. 67: const int filterRow = localId / gFilterSize;
  4669. 68: const int filterCol = localId % gFilterSize;
  4670. 69:
  4671. 70: #define outPlane (workgroupId / gInputPlanes)
  4672. 71: #define upstreamPlane (workgroupId % gInputPlanes)
  4673. 72:
  4674. 73: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4675. 74: // aggregate over: [outRow][outCol][n]
  4676. 75: float thiswchange = 0;
  4677. 76: #ifdef BIASED
  4678. 77: float thisbiaschange = 0;
  4679. 78: #endif
  4680. 79: for (int n = 0; n < batchSize; n++) {
  4681. 80: barrier(CLK_LOCAL_MEM_FENCE);
  4682. 81: copyLocal(_imageImage, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
  4683. 82: copyLocal(_errorImage, gradOutput + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
  4684. 83: barrier(CLK_LOCAL_MEM_FENCE);
  4685. 84: if (localId < gFilterSizeSquared) {
  4686. 85: for (int outRow = 0; outRow < gOutputSize; outRow++) {
  4687. 86: int upstreamRow = outRow - gMargin + filterRow;
  4688. 87: for (int outCol = 0; outCol < gOutputSize; outCol++) {
  4689. 88: const int upstreamCol = outCol - gMargin + filterCol;
  4690. 89: #define proceed (upstreamRow >= 0 && upstreamCol >= 0 && upstreamRow < gInputSize && upstreamCol < gInputSize)
  4691. 90: if (proceed) {
  4692. 91: // these defines reduce register pressure, compared to const
  4693. 92: // giving a 40% speedup on nvidia :-)
  4694. 93: #define resultIndex (outRow * gOutputSize + outCol)
  4695. 94: #define error (_errorImage[resultIndex])
  4696. 95: //const float error = _errorImage[resultIndex];
  4697. 96: #define upstreamDataIndex (upstreamRow * gInputSize + upstreamCol)
  4698. 97: #define upstreamResult (_imageImage[upstreamDataIndex])
  4699. 98: thiswchange += upstreamResult * error;
  4700. 99: #ifdef BIASED
  4701. 100: thisbiaschange += error;
  4702. 101: #endif
  4703. 102: }
  4704. 103: }
  4705. 104: }
  4706. 105: }
  4707. 106: }
  4708. 107: if (localId < gFilterSizeSquared) {
  4709. 108: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
  4710. 109: }
  4711. 110: #ifdef BIASED
  4712. 111: #define writeBias (upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin)
  4713. 112: if (writeBias) {
  4714. 113: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
  4715. 114: }
  4716. 115: #endif
  4717. 116: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4718. 117: // aggregate over: [outRow][outCol][n]
  4719. 118: }
  4720. 119:
  4721. 120:
  4722.  
  4723. Something went wrong, code -55
  4724. calcGradWeights try kernel 3
  4725. options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=24 -D gInputSizeSquared=576 -D gNumFilters=10 -D gFilterSize=24 -D gHalfFilterSize=12 -D gFilterSizeSquared=576 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=23 -DgInputStripeInnerNumRows=24 -DgInputStripeOuterNumRows=70 -DgInputStripeInnerSize=576 -DgInputStripeOuterSize=1680 -DgInputStripeMarginSize=552 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  4726. ... seems valid
  4727. BackpropWeightsAuto: kernel 3 this instance cant be used:
  4728. kernel source:
  4729. 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
  4730. 2: //
  4731. 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
  4732. 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
  4733. 5: // obtain one at http://mozilla.org/MPL/2.0/.
  4734. 6:
  4735. 7: // expected defines:
  4736. 8: // BIASED (or not)
  4737. 9:
  4738. 10: // workgroupId: [outputPlane][inputPlane]
  4739. 11: // localId: [filterRow][filterCol]
  4740. 12: // per-thread iteration: [n][outputRow][outputCol]
  4741. 13: // local: errorimage: outputSize * outputSize
  4742. 14: // imageimage: inputSize * inputSize
  4743. 15: // specific characteristic: load one stripe of each image at a time,
  4744. 16: // so we dont run out of memory
  4745. 17: // number of stripes set in: gNumStripes
  4746. 18: // note that whilst we can stripe the gradOutput simply,
  4747. 19: // we actually need to add a half-filter widthed additional few rows
  4748. 20: // onto the images stripe, otherwise we will be missing data
  4749. 21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
  4750. 22: // the outersize, including the two margins is: gInputStripeOuterSize
  4751. 23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
  4752. 24: // corresponding outer margin would be
  4753. 25: void kernel backprop_floats_withscratch_dobias_striped(
  4754. 26: const float learningRateMultiplier, const int batchSize,
  4755. 27: global const float *gradOutput, global const float *images,
  4756. 28: global float *gradWeights,
  4757. 29: #ifdef BIASED
  4758. 30: global float *gradBiasWeights,
  4759. 31: #endif
  4760. 32: local float *_errorStripe, local float *_imageStripe
  4761. 33: ) {
  4762. 34: // gHalfFilterSize
  4763. 35: // gInputSize
  4764. 36: //
  4765. 37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
  4766. 38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
  4767. 39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
  4768. 40: // if we just added gFilterSize)
  4769. 41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
  4770. 42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
  4771. 43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
  4772. 44: //
  4773. 45: // gOutputStripeNumRows
  4774. 46: // gOutputStripeSize
  4775. 47:
  4776. 48: const int globalId = get_global_id(0);
  4777. 49: const int localId = get_local_id(0);
  4778. 50: const int workgroupId = get_group_id(0);
  4779. 51: const int workgroupSize = get_local_size(0);
  4780. 52:
  4781. 53: const int filterRow = localId / gFilterSize;
  4782. 54: const int filterCol = localId % gFilterSize;
  4783. 55:
  4784. 56: const int outPlane = workgroupId / gInputPlanes;
  4785. 57: const int upstreamPlane = workgroupId % gInputPlanes;
  4786. 58:
  4787. 59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4788. 60: // aggregate over: [outRow][outCol][n]
  4789. 61: float thiswchange = 0;
  4790. 62: #ifdef BIASED
  4791. 63: float thisbiaschange = 0;
  4792. 64: #endif
  4793. 65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
  4794. 66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
  4795. 67: for (int n = 0; n < batchSize; n++) {
  4796. 68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
  4797. 69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
  4798. 70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
  4799. 71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
  4800. 72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
  4801. 73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
  4802. 74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
  4803. 75: // need to fetch the image, but it's bigger than us, so will need to loop...
  4804. 76: barrier(CLK_LOCAL_MEM_FENCE);
  4805. 77: for (int i = 0; i < numLoopsForImageStripe; i++) {
  4806. 78: int thisOffset = i * workgroupSize + localId;
  4807. 79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
  4808. 80: bool process = thisOffset < gInputStripeOuterSize
  4809. 81: && thisGlobalImagesOffset >= imageImageGlobalOffset
  4810. 82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
  4811. 83: if (process) {
  4812. 84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
  4813. 85: }
  4814. 86: }
  4815. 87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
  4816. 88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
  4817. 89: int thisOffset = i * workgroupSize + localId;
  4818. 90: int globalErrorsOffset = errorStripeOffset + thisOffset;
  4819. 91: bool process = thisOffset < gOutputStripeSize
  4820. 92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
  4821. 93: if (process) {
  4822. 94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
  4823. 95: }
  4824. 96: }
  4825. 97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
  4826. 98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
  4827. 99: barrier(CLK_LOCAL_MEM_FENCE);
  4828. 100: // if (localId == 13) {
  4829. 101: // for (int i = 0; i < 12; i++) {
  4830. 102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
  4831. 103: // }
  4832. 104: // for (int i = 0; i < 20; i++) {
  4833. 105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
  4834. 106: // }
  4835. 107: // }
  4836. 108: if (localId < gFilterSizeSquared) {
  4837. 109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
  4838. 110: int upstreamRow = outRow - gMargin + filterRow;
  4839. 111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
  4840. 112: int upstreamCol = outCol - gMargin + filterCol;
  4841. 113: bool proceed =
  4842. 114: upstreamRow >= 0 && upstreamCol >= 0
  4843. 115: && upstreamRow < gInputSize && upstreamCol < gInputSize
  4844. 116: && outRow < gOutputSize;
  4845. 117: if (proceed) {
  4846. 118: int resultIndex = outRow * gOutputSize + outCol;
  4847. 119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
  4848. 120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
  4849. 121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
  4850. 122: - stripe * gInputStripeInnerSize ];
  4851. 123: thiswchange += upstreamResult * error;
  4852. 124: #ifdef BIASED
  4853. 125: thisbiaschange += error;
  4854. 126: #endif
  4855. 127: }
  4856. 128: }
  4857. 129: }
  4858. 130: }
  4859. 131: }
  4860. 132: }
  4861. 133: if (localId < gFilterSizeSquared) {
  4862. 134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
  4863. 135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
  4864. 136: }
  4865. 137: #ifdef BIASED
  4866. 138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
  4867. 139: if (writeBias) {
  4868. 140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
  4869. 141: }
  4870. 142: #endif
  4871. 143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
  4872. 144: // aggregate over: [outRow][outCol][n]
  4873. 145: }
  4874. 146:
  4875. 147:
  4876.  
  4877. Something went wrong, code -55
  4878. calcGradWeights try kernel 4
  4879. ... seems valid
  4880. BackpropWeightsAuto: kernel 4 1014ms
  4881. calcGradWeights try kernel 2
  4882. ... seems valid
  4883. BackpropWeightsAuto: kernel 2 0ms
  4884. forward kernel 0: cannot be used
  4885. forward kernel 1 time: 15ms
  4886. forward kernel 2: cannot be used
  4887. forward kernel 3: cannot be used
  4888. forward kernel 4 time: 0ms
  4889. forward kernel 5: cannot be used
  4890. forward kernel 6: cannot be used
  4891. forward kernel 7 time: 562ms
  4892. forward layer selected kernel 4
  4893. forward try kernel 4
  4894. ... seems valid
  4895. ForwardAuto: kernel 4 0ms
  4896. forward try kernel 5
  4897. cl/forward_fc_wgperrow.cl build log:
  4898. "C:\Users\pz\AppData\Local\Temp\OCLB324.tmp.cl", line 75: warning: variable
  4899. "loopsPerExample" was declared but never referenced
  4900. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  4901. ^
  4902.  
  4903.  
  4904. ... seems valid
  4905. ForwardAuto: kernel 5 0ms
  4906. backward kernel 0: cannot be used
  4907. backward kernel 1 time: 0ms
  4908. backward kernel 2: cannot be used
  4909. backward kernel 3 time: 562ms
  4910. backward layer selected kernel 1
  4911. calcGradWeights kernel 0: cannot be used
  4912. calcGradWeights kernel 1 time: 0ms
  4913. calcGradWeights kernel 2: cannot be used
  4914. calcGradWeights kernel 3: cannot be used
  4915. calcGradWeights kernel 4 time: 1014ms
  4916. calcGradWeights layer selected kernel 1
  4917. calcGradWeights try kernel 3
  4918. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=10 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=24 -D gOutputSizeSquared=576 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=4 -DgInputStripeInnerNumRows=28 -DgInputStripeOuterNumRows=36 -DgInputStripeInnerSize=784 -DgInputStripeOuterSize=1008 -DgInputStripeMarginSize=112 -DgOutputStripeNumRows=24 -DgOutputStripeSize=576
  4919. ... seems valid
  4920. BackpropWeightsAuto: kernel 3 0ms
  4921. forward try kernel 6
  4922. ... seems valid
  4923. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  4924. forward try kernel 7
  4925. ... seems valid
  4926. ForwardAuto: kernel 7 499ms
  4927. forward kernel 0: cannot be used
  4928. forward kernel 1 time: 0ms
  4929. forward kernel 2 time: 0ms
  4930. forward kernel 3 time: 0ms
  4931. forward kernel 4 time: 0ms
  4932. forward kernel 5 time: 0ms
  4933. forward kernel 6: cannot be used
  4934. forward kernel 7 time: 499ms
  4935. forward layer selected kernel 1
  4936. calcGradWeights try kernel 4
  4937. ... seems valid
  4938. BackpropWeightsAuto: kernel 4 702ms
  4939. calcGradWeights kernel 0: cannot be used
  4940. calcGradWeights kernel 1 time: 0ms
  4941. calcGradWeights kernel 2 time: 0ms
  4942. calcGradWeights kernel 3 time: 0ms
  4943. calcGradWeights kernel 4 time: 702ms
  4944. calcGradWeights layer selected kernel 1
  4945. batch time 12714 ms
  4946. dump enabled=0
  4947. clblas teardown
  4948. [ OK ] testsinglebatch.imagesize28_filtersize5 (13151 ms)
  4949. [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2_softmax
  4950. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  4951. Using OpenCL device: Tahiti
  4952. initializing clblas
  4953. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  4954. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=5 filterSize=3 outputSize=5 padZeros=1 biased=1 skip=0} }
  4955. layer 2:ActivationLayer{ RELU }
  4956. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=5 inputSize=5 numFilters=5 filterSize=3 outputSize=5 padZeros=1 biased=1 skip=0} }
  4957. layer 4:ActivationLayer{ RELU }
  4958. layer 5:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
  4959. layer 6:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  4960. Parameters overview: (skipping 4 layers with 0 params)
  4961. layer 1: params=50 5.5%
  4962. layer 3: params=230 25.3%
  4963. layer 5: params=630 69.2%
  4964. TOTAL : params=910
  4965. forward try kernel 0
  4966. ... not plausibly optimal, skipping
  4967. forward try kernel 1
  4968. ... seems valid
  4969. ForwardAuto: kernel 1 0ms
  4970. forward try kernel 0
  4971. ... not plausibly optimal, skipping
  4972. forward try kernel 1
  4973. ... seems valid
  4974. ForwardAuto: kernel 1 0ms
  4975. forward try kernel 0
  4976. ... not plausibly optimal, skipping
  4977. forward try kernel 1
  4978. ... seems valid
  4979. ForwardAuto: kernel 1 0ms
  4980. forward try kernel 2
  4981. ... seems valid
  4982. ForwardAuto: kernel 2 0ms
  4983. forward try kernel 2
  4984. ... seems valid
  4985. ForwardAuto: kernel 2 0ms
  4986. forward try kernel 2
  4987. ... seems valid
  4988. ForwardAuto: kernel 2 0ms
  4989. backward try kernel 0
  4990. ... not plausibly optimal, skipping
  4991. backward try kernel 1
  4992. ... seems valid
  4993. BackwardAuto: kernel 1 0ms
  4994. calcGradWeights try kernel 0
  4995. ... not plausibly optimal, skipping
  4996. calcGradWeights try kernel 1
  4997. ... seems valid
  4998. BackpropWeightsAuto: kernel 1 0ms
  4999. backward try kernel 0
  5000. ... not plausibly optimal, skipping
  5001. backward try kernel 1
  5002. ... seems valid
  5003. BackwardAuto: kernel 1 0ms
  5004. calcGradWeights try kernel 0
  5005. ... not plausibly optimal, skipping
  5006. calcGradWeights try kernel 1
  5007. ... seems valid
  5008. BackpropWeightsAuto: kernel 1 0ms
  5009. calcGradWeights try kernel 0
  5010. ... not plausibly optimal, skipping
  5011. calcGradWeights try kernel 1
  5012. ... seems valid
  5013. BackpropWeightsAuto: kernel 1 16ms
  5014. layer 1 offset: 0
  5015. forward try kernel 3
  5016. ... seems valid
  5017. ForwardAuto: kernel 3 0ms
  5018. forward try kernel 3
  5019. ... seems valid
  5020. ForwardAuto: kernel 3 0ms
  5021. forward try kernel 3
  5022. ... seems valid
  5023. ForwardAuto: kernel 3 0ms
  5024. layer 1
  5025. from w: 0
  5026. actual: -3.3898
  5027. layer 2 offset: 50
  5028. layer 3 offset: 50
  5029. forward try kernel 4
  5030. ... seems valid
  5031. ForwardAuto: kernel 4 0ms
  5032. forward try kernel 4
  5033. ... seems valid
  5034. ForwardAuto: kernel 4 0ms
  5035. forward try kernel 4
  5036. ... seems valid
  5037. ForwardAuto: kernel 4 16ms
  5038. layer 3
  5039. from w: 0
  5040. actual: -3.3898
  5041. layer 4 offset: 280
  5042. layer 5 offset: 280
  5043. forward try kernel 5
  5044. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  5045. ... not valid
  5046. forward try kernel 6
  5047. ... seems valid
  5048. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5049. forward try kernel 7
  5050. ... seems valid
  5051. ForwardAuto: kernel 7 687ms
  5052. forward try kernel 5
  5053. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  5054. ... not valid
  5055. forward try kernel 6
  5056. ... seems valid
  5057. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5058. forward try kernel 7
  5059. ... seems valid
  5060. ForwardAuto: kernel 7 109ms
  5061. forward try kernel 5
  5062. cl/forward_fc_wgperrow.cl build log:
  5063. "C:\Users\pz\AppData\Local\Temp\OCLCC8F.tmp.cl", line 75: warning: variable
  5064. "loopsPerExample" was declared but never referenced
  5065. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  5066. ^
  5067.  
  5068.  
  5069. ... seems valid
  5070. ForwardAuto: kernel 5 0ms
  5071. layer 5
  5072. from w: 0
  5073. actual: -3.38981
  5074. layer 6 offset: 910
  5075. forward kernel 0: cannot be used
  5076. forward kernel 1 time: 0ms
  5077. forward kernel 2 time: 0ms
  5078. forward kernel 3 time: 0ms
  5079. forward kernel 4 time: 0ms
  5080. forward kernel 5: cannot be used
  5081. forward kernel 6: cannot be used
  5082. forward kernel 7 time: 687ms
  5083. forward layer selected kernel 1
  5084. forward kernel 0: cannot be used
  5085. forward kernel 1 time: 0ms
  5086. forward kernel 2 time: 0ms
  5087. forward kernel 3 time: 0ms
  5088. forward kernel 4 time: 0ms
  5089. forward kernel 5: cannot be used
  5090. forward kernel 6: cannot be used
  5091. forward kernel 7 time: 109ms
  5092. forward layer selected kernel 1
  5093. forward try kernel 6
  5094. ... seems valid
  5095. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5096. forward try kernel 7
  5097. ... seems valid
  5098. ForwardAuto: kernel 7 328ms
  5099. full thisloss: 3.3898
  5100. forward kernel 0: cannot be used
  5101. forward kernel 1 time: 0ms
  5102. forward kernel 2 time: 0ms
  5103. forward kernel 3 time: 0ms
  5104. forward kernel 4 time: 16ms
  5105. forward kernel 5 time: 0ms
  5106. forward kernel 6: cannot be used
  5107. forward kernel 7 time: 328ms
  5108. forward layer selected kernel 1
  5109. backward try kernel 2
  5110. ... seems valid
  5111. BackwardAuto: kernel 2 0ms
  5112. calcGradWeights try kernel 2
  5113. ... seems valid
  5114. BackpropWeightsAuto: kernel 2 0ms
  5115. backward try kernel 2
  5116. ... seems valid
  5117. BackwardAuto: kernel 2 0ms
  5118. calcGradWeights try kernel 2
  5119. ... seems valid
  5120. BackpropWeightsAuto: kernel 2 0ms
  5121. calcGradWeights try kernel 2
  5122. ... seems valid
  5123. BackpropWeightsAuto: kernel 2 0ms
  5124. layer 1 offset: 0
  5125. layer 1
  5126. from w: 0
  5127. actual: 0
  5128. layer 2 offset: 50
  5129. layer 3 offset: 50
  5130. layer 3
  5131. from w: 0
  5132. actual: 0
  5133. layer 4 offset: 280
  5134. layer 5 offset: 280
  5135. layer 5
  5136. from w: 0
  5137. actual: 0
  5138. layer 6 offset: 910
  5139. full thisloss: 3.3898
  5140. backward try kernel 3
  5141. ... seems valid
  5142. BackwardAuto: kernel 3 436ms
  5143. calcGradWeights try kernel 3
  5144. options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=4 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=13 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=65 -DgInputStripeMarginSize=20 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  5145. ... seems valid
  5146. BackpropWeightsAuto: kernel 3 0ms
  5147. backward try kernel 3
  5148. ... seems valid
  5149. BackwardAuto: kernel 3 1108ms
  5150. calcGradWeights try kernel 3
  5151. options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
  5152. ... seems valid
  5153. BackpropWeightsAuto: kernel 3 15ms
  5154. calcGradWeights try kernel 3
  5155. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
  5156. ... seems valid
  5157. BackpropWeightsAuto: kernel 3 0ms
  5158. layer 1 offset: 0
  5159. layer 1
  5160. from w: 0
  5161. actual: 0
  5162. layer 2 offset: 50
  5163. layer 3 offset: 50
  5164. layer 3
  5165. from w: 0
  5166. actual: 0
  5167. layer 4 offset: 280
  5168. layer 5 offset: 280
  5169. layer 5
  5170. from w: 0
  5171. actual: 0
  5172. layer 6 offset: 910
  5173. full thisloss: 3.3898
  5174. backward kernel 0: cannot be used
  5175. backward kernel 1 time: 0ms
  5176. backward kernel 2 time: 0ms
  5177. backward kernel 3 time: 436ms
  5178. backward layer selected kernel 1
  5179. calcGradWeights try kernel 4
  5180. ... seems valid
  5181. BackpropWeightsAuto: kernel 4 858ms
  5182. backward kernel 0: cannot be used
  5183. backward kernel 1 time: 0ms
  5184. backward kernel 2 time: 0ms
  5185. backward kernel 3 time: 1108ms
  5186. backward layer selected kernel 1
  5187. calcGradWeights try kernel 4
  5188. ... seems valid
  5189. BackpropWeightsAuto: kernel 4 1076ms
  5190. calcGradWeights try kernel 4
  5191. ... seems valid
  5192. BackpropWeightsAuto: kernel 4 109ms
  5193. layer 1 offset: 0
  5194. layer 1
  5195. from w: 0
  5196. actual: 0
  5197. layer 2 offset: 50
  5198. layer 3 offset: 50
  5199. layer 3
  5200. from w: 0
  5201. actual: 0
  5202. layer 4 offset: 280
  5203. layer 5 offset: 280
  5204. layer 5
  5205. from w: 0
  5206. actual: 0
  5207. layer 6 offset: 910
  5208. full thisloss: 3.3898
  5209. calcGradWeights kernel 0: cannot be used
  5210. calcGradWeights kernel 1 time: 0ms
  5211. calcGradWeights kernel 2 time: 0ms
  5212. calcGradWeights kernel 3 time: 0ms
  5213. calcGradWeights kernel 4 time: 858ms
  5214. calcGradWeights layer selected kernel 1
  5215. calcGradWeights kernel 0: cannot be used
  5216. calcGradWeights kernel 1 time: 0ms
  5217. calcGradWeights kernel 2 time: 0ms
  5218. calcGradWeights kernel 3 time: 15ms
  5219. calcGradWeights kernel 4 time: 1076ms
  5220. calcGradWeights layer selected kernel 1
  5221. calcGradWeights kernel 0: cannot be used
  5222. calcGradWeights kernel 1 time: 16ms
  5223. calcGradWeights kernel 2 time: 0ms
  5224. calcGradWeights kernel 3 time: 0ms
  5225. calcGradWeights kernel 4 time: 109ms
  5226. calcGradWeights layer selected kernel 2
  5227. layer 1 offset: 0
  5228. layer 1
  5229. from w: 0
  5230. actual: 0
  5231. layer 2 offset: 50
  5232. layer 3 offset: 50
  5233. layer 3
  5234. from w: 0
  5235. actual: 0
  5236. layer 4 offset: 280
  5237. layer 5 offset: 280
  5238. layer 5
  5239. from w: 0
  5240. actual: 0
  5241. layer 6 offset: 910
  5242. full thisloss: 3.3898
  5243. layer 1 offset: 0
  5244. layer 1
  5245. from w: 0
  5246. actual: 0
  5247. layer 2 offset: 50
  5248. layer 3 offset: 50
  5249. layer 3
  5250. from w: 0
  5251. actual: 0
  5252. layer 4 offset: 280
  5253. layer 5 offset: 280
  5254. layer 5
  5255. from w: 0
  5256. actual: 0
  5257. layer 6 offset: 910
  5258. full thisloss: 3.3898
  5259. layer 1 offset: 0
  5260. layer 1
  5261. from w: 0
  5262. actual: 0
  5263. layer 2 offset: 50
  5264. layer 3 offset: 50
  5265. layer 3
  5266. from w: 0
  5267. actual: 0
  5268. layer 4 offset: 280
  5269. layer 5 offset: 280
  5270. layer 5
  5271. from w: 0
  5272. actual: 0
  5273. layer 6 offset: 910
  5274. full thisloss: 3.3898
  5275. layer 1 offset: 0
  5276. layer 1
  5277. from w: 0
  5278. actual: 0
  5279. layer 2 offset: 50
  5280. layer 3 offset: 50
  5281. layer 3
  5282. from w: 0
  5283. actual: 0
  5284. layer 4 offset: 280
  5285. layer 5 offset: 280
  5286. layer 5
  5287. from w: 0
  5288. actual: 0
  5289. layer 6 offset: 910
  5290. full thisloss: 3.3898
  5291. layer 1 offset: 0
  5292. layer 1
  5293. from w: 0
  5294. actual: 0
  5295. layer 2 offset: 50
  5296. layer 3 offset: 50
  5297. layer 3
  5298. from w: 0
  5299. actual: 0
  5300. layer 4 offset: 280
  5301. layer 5 offset: 280
  5302. layer 5
  5303. from w: 0
  5304. actual: 0
  5305. layer 6 offset: 910
  5306. full thisloss: 3.3898
  5307. layer 1 offset: 0
  5308. layer 1
  5309. from w: 0
  5310. actual: 0
  5311. layer 2 offset: 50
  5312. layer 3 offset: 50
  5313. layer 3
  5314. from w: 0
  5315. actual: 0
  5316. layer 4 offset: 280
  5317. layer 5 offset: 280
  5318. layer 5
  5319. from w: 0
  5320. actual: 0
  5321. layer 6 offset: 910
  5322. full thisloss: 3.3898
  5323. layer 1 offset: 0
  5324. layer 1
  5325. from w: 0
  5326. actual: 0
  5327. layer 2 offset: 50
  5328. layer 3 offset: 50
  5329. layer 3
  5330. from w: 0
  5331. actual: 0
  5332. layer 4 offset: 280
  5333. layer 5 offset: 280
  5334. layer 5
  5335. from w: 0
  5336. actual: 0
  5337. layer 6 offset: 910
  5338. full thisloss: 3.3898
  5339. layer 1 offset: 0
  5340. layer 1
  5341. from w: 0
  5342. actual: 0
  5343. layer 2 offset: 50
  5344. layer 3 offset: 50
  5345. layer 3
  5346. from w: 0
  5347. actual: 0
  5348. layer 4 offset: 280
  5349. layer 5 offset: 280
  5350. layer 5
  5351. from w: 0
  5352. actual: 0
  5353. layer 6 offset: 910
  5354. full thisloss: 3.3898
  5355. layer 1 offset: 0
  5356. layer 1
  5357. from w: 0
  5358. actual: 0
  5359. layer 2 offset: 50
  5360. layer 3 offset: 50
  5361. layer 3
  5362. from w: 0
  5363. actual: 0
  5364. layer 4 offset: 280
  5365. layer 5 offset: 280
  5366. layer 5
  5367. from w: 0
  5368. actual: 0
  5369. layer 6 offset: 910
  5370. full thisloss: 3.3898
  5371. layer 1 offset: 0
  5372. layer 1
  5373. from w: 0
  5374. actual: 0
  5375. layer 2 offset: 50
  5376. layer 3 offset: 50
  5377. layer 3
  5378. from w: 0
  5379. actual: 0
  5380. layer 4 offset: 280
  5381. layer 5 offset: 280
  5382. layer 5
  5383. from w: 0
  5384. actual: 0
  5385. layer 6 offset: 910
  5386. full thisloss: 3.3898
  5387. layer 1 offset: 0
  5388. layer 1
  5389. from w: 0
  5390. actual: 0
  5391. layer 2 offset: 50
  5392. layer 3 offset: 50
  5393. layer 3
  5394. from w: 0
  5395. actual: 0
  5396. layer 4 offset: 280
  5397. layer 5 offset: 280
  5398. layer 5
  5399. from w: 0
  5400. actual: 0
  5401. layer 6 offset: 910
  5402. full thisloss: 3.3898
  5403. layer 1 offset: 0
  5404. layer 1
  5405. from w: 0
  5406. actual: 0
  5407. layer 2 offset: 50
  5408. layer 3 offset: 50
  5409. layer 3
  5410. from w: 0
  5411. actual: 0
  5412. layer 4 offset: 280
  5413. layer 5 offset: 280
  5414. layer 5
  5415. from w: 0
  5416. actual: 0
  5417. layer 6 offset: 910
  5418. full thisloss: 3.3898
  5419. layer 1 offset: 0
  5420. layer 1
  5421. from w: 0
  5422. actual: 0
  5423. layer 2 offset: 50
  5424. layer 3 offset: 50
  5425. layer 3
  5426. from w: 0
  5427. actual: 0
  5428. layer 4 offset: 280
  5429. layer 5 offset: 280
  5430. layer 5
  5431. from w: 0
  5432. actual: 0
  5433. layer 6 offset: 910
  5434. full thisloss: 3.3898
  5435. layer 1 offset: 0
  5436. layer 1
  5437. from w: 0
  5438. actual: 0
  5439. layer 2 offset: 50
  5440. layer 3 offset: 50
  5441. layer 3
  5442. from w: 0
  5443. actual: 0
  5444. layer 4 offset: 280
  5445. layer 5 offset: 280
  5446. layer 5
  5447. from w: 0
  5448. actual: 0
  5449. layer 6 offset: 910
  5450. full thisloss: 3.3898
  5451. layer 1 offset: 0
  5452. layer 1
  5453. from w: 0
  5454. actual: 0
  5455. layer 2 offset: 50
  5456. layer 3 offset: 50
  5457. layer 3
  5458. from w: 0
  5459. actual: 0
  5460. layer 4 offset: 280
  5461. layer 5 offset: 280
  5462. layer 5
  5463. from w: 0
  5464. actual: 0
  5465. layer 6 offset: 910
  5466. full thisloss: 3.3898
  5467. layer 1 offset: 0
  5468. layer 1
  5469. from w: 0
  5470. actual: 0
  5471. layer 2 offset: 50
  5472. layer 3 offset: 50
  5473. layer 3
  5474. from w: 0
  5475. actual: 0
  5476. layer 4 offset: 280
  5477. layer 5 offset: 280
  5478. layer 5
  5479. from w: 0
  5480. actual: 0
  5481. layer 6 offset: 910
  5482. full thisloss: 3.3898
  5483. batch time 8252 ms
  5484. dump enabled=0
  5485. clblas teardown
  5486. [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2_softmax (8705 ms)
  5487. [ RUN ] testsinglebatch.imagesize4_filtersize3_batchsize2_pooling
  5488. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  5489. Using OpenCL device: Tahiti
  5490. initializing clblas
  5491. layer 0:InputLayer{ outputPlanes=1 outputSize=12 }
  5492. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=12 numFilters=5 filterSize=3 outputSize=12 padZeros=1 biased=1 skip=0} }
  5493. layer 2:ActivationLayer{ RELU }
  5494. layer 3:PoolingLayer{ inputPlanes=5 inputSize=12 poolingSize=2 }
  5495. layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=5 inputSize=6 numFilters=5 filterSize=3 outputSize=6 padZeros=1 biased=1 skip=0} }
  5496. layer 5:ActivationLayer{ RELU }
  5497. layer 6:PoolingLayer{ inputPlanes=5 inputSize=6 poolingSize=2 }
  5498. layer 7:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
  5499. layer 8:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
  5500. Parameters overview: (skipping 6 layers with 0 params)
  5501. layer 1: params=50 9.8%
  5502. layer 4: params=230 45.1%
  5503. layer 7: params=230 45.1%
  5504. TOTAL : params=510
  5505. forward try kernel 0
  5506. ... not plausibly optimal, skipping
  5507. forward try kernel 1
  5508. ... seems valid
  5509. ForwardAuto: kernel 1 0ms
  5510. forward try kernel 0
  5511. ... not plausibly optimal, skipping
  5512. forward try kernel 1
  5513. ... seems valid
  5514. ForwardAuto: kernel 1 16ms
  5515. forward try kernel 0
  5516. ... not plausibly optimal, skipping
  5517. forward try kernel 1
  5518. ... seems valid
  5519. ForwardAuto: kernel 1 0ms
  5520. forward try kernel 2
  5521. ... seems valid
  5522. ForwardAuto: kernel 2 0ms
  5523. forward try kernel 2
  5524. ... seems valid
  5525. ForwardAuto: kernel 2 0ms
  5526. forward try kernel 2
  5527. ... seems valid
  5528. ForwardAuto: kernel 2 0ms
  5529. backward try kernel 0
  5530. ... not plausibly optimal, skipping
  5531. backward try kernel 1
  5532. ... seems valid
  5533. BackwardAuto: kernel 1 0ms
  5534. calcGradWeights try kernel 0
  5535. ... not plausibly optimal, skipping
  5536. calcGradWeights try kernel 1
  5537. ... seems valid
  5538. BackpropWeightsAuto: kernel 1 0ms
  5539. backward try kernel 0
  5540. ... not plausibly optimal, skipping
  5541. backward try kernel 1
  5542. ... seems valid
  5543. BackwardAuto: kernel 1 0ms
  5544. calcGradWeights try kernel 0
  5545. ... not plausibly optimal, skipping
  5546. calcGradWeights try kernel 1
  5547. ... seems valid
  5548. BackpropWeightsAuto: kernel 1 0ms
  5549. calcGradWeights try kernel 0
  5550. ... not plausibly optimal, skipping
  5551. calcGradWeights try kernel 1
  5552. ... seems valid
  5553. BackpropWeightsAuto: kernel 1 0ms
  5554. layer 1 offset: 0
  5555. forward try kernel 3
  5556. ... seems valid
  5557. ForwardAuto: kernel 3 0ms
  5558. forward try kernel 3
  5559. ... seems valid
  5560. ForwardAuto: kernel 3 0ms
  5561. forward try kernel 3
  5562. ... seems valid
  5563. ForwardAuto: kernel 3 0ms
  5564. layer 1
  5565. from w: 0
  5566. actual: -3.3063
  5567. layer 2 offset: 50
  5568. layer 3 offset: 50
  5569. layer 4 offset: 50
  5570. forward try kernel 4
  5571. ... seems valid
  5572. ForwardAuto: kernel 4 0ms
  5573. forward try kernel 4
  5574. ... seems valid
  5575. ForwardAuto: kernel 4 0ms
  5576. forward try kernel 4
  5577. ... seems valid
  5578. ForwardAuto: kernel 4 0ms
  5579. layer 4
  5580. from w: 0
  5581. actual: -3.3063
  5582. layer 5 offset: 280
  5583. layer 6 offset: 280
  5584. layer 7 offset: 280
  5585. forward try kernel 5
  5586. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  5587. ... not valid
  5588. forward try kernel 6
  5589. ... seems valid
  5590. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5591. forward try kernel 7
  5592. ... seems valid
  5593. ForwardAuto: kernel 7 561ms
  5594. forward try kernel 5
  5595. ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
  5596. ... not valid
  5597. forward try kernel 6
  5598. ... seems valid
  5599. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5600. forward try kernel 7
  5601. ... seems valid
  5602. ForwardAuto: kernel 7 562ms
  5603. forward try kernel 5
  5604. cl/forward_fc_wgperrow.cl build log:
  5605. "C:\Users\pz\AppData\Local\Temp\OCLF1F7.tmp.cl", line 75: warning: variable
  5606. "loopsPerExample" was declared but never referenced
  5607. const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
  5608. ^
  5609.  
  5610.  
  5611. ... seems valid
  5612. ForwardAuto: kernel 5 0ms
  5613. layer 7
  5614. from w: 0
  5615. actual: -3.3063
  5616. layer 8 offset: 510
  5617. forward kernel 0: cannot be used
  5618. forward kernel 1 time: 0ms
  5619. forward kernel 2 time: 0ms
  5620. forward kernel 3 time: 0ms
  5621. forward kernel 4 time: 0ms
  5622. forward kernel 5: cannot be used
  5623. forward kernel 6: cannot be used
  5624. forward kernel 7 time: 561ms
  5625. forward layer selected kernel 1
  5626. forward kernel 0: cannot be used
  5627. forward kernel 1 time: 16ms
  5628. forward kernel 2 time: 0ms
  5629. forward kernel 3 time: 0ms
  5630. forward kernel 4 time: 0ms
  5631. forward kernel 5: cannot be used
  5632. forward kernel 6: cannot be used
  5633. forward kernel 7 time: 562ms
  5634. forward layer selected kernel 2
  5635. forward try kernel 6
  5636. ... seems valid
  5637. ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
  5638. forward try kernel 7
  5639. ... seems valid
  5640. ForwardAuto: kernel 7 328ms
  5641. full thisloss: 3.3063
  5642. forward kernel 0: cannot be used
  5643. forward kernel 1 time: 0ms
  5644. forward kernel 2 time: 0ms
  5645. forward kernel 3 time: 0ms
  5646. forward kernel 4 time: 0ms
  5647. forward kernel 5 time: 0ms
  5648. forward kernel 6: cannot be used
  5649. forward kernel 7 time: 328ms
  5650. forward layer selected kernel 1
  5651. backward try kernel 2
  5652. ... seems valid
  5653. BackwardAuto: kernel 2 0ms
  5654. calcGradWeights try kernel 2
  5655. ... seems valid
  5656. BackpropWeightsAuto: kernel 2 0ms
  5657. backward try kernel 2
  5658. ... seems valid
  5659. BackwardAuto: kernel 2 0ms
  5660. calcGradWeights try kernel 2
  5661. ... seems valid
  5662. BackpropWeightsAuto: kernel 2 0ms
  5663. calcGradWeights try kernel 2
  5664. ... seems valid
  5665. BackpropWeightsAuto: kernel 2 0ms
  5666. layer 1 offset: 0
  5667. layer 1
  5668. from w: 0
  5669. actual: 0
  5670. layer 2 offset: 50
  5671. layer 3 offset: 50
  5672. layer 4 offset: 50
  5673. layer 4
  5674. from w: 0
  5675. actual: 0
  5676. layer 5 offset: 280
  5677. layer 6 offset: 280
  5678. layer 7 offset: 280
  5679. layer 7
  5680. from w: 0
  5681. actual: 0
  5682. layer 8 offset: 510
  5683. full thisloss: 3.3063
  5684. backward try kernel 3
  5685. ... seems valid
  5686. BackwardAuto: kernel 3 421ms
  5687. calcGradWeights try kernel 3
  5688. options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
  5689. ... seems valid
  5690. BackpropWeightsAuto: kernel 3 0ms
  5691. backward try kernel 3
  5692. ... seems valid
  5693. BackwardAuto: kernel 3 920ms
  5694. calcGradWeights try kernel 3
  5695. options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=6 -D gInputSizeSquared=36 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=6 -D gOutputSizeSquared=36 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=6 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=36 -DgInputStripeOuterSize=60 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=6 -DgOutputStripeSize=36
  5696. ... seems valid
  5697. BackpropWeightsAuto: kernel 3 0ms
  5698. calcGradWeights try kernel 3
  5699. options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=12 -D gInputSizeSquared=144 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=12 -D gOutputSizeSquared=144 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=12 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=144 -DgInputStripeOuterSize=192 -DgInputStripeMarginSize=24 -DgOutputStripeNumRows=12 -DgOutputStripeSize=144
  5700. ... seems valid
  5701. BackpropWeightsAuto: kernel 3 0ms
  5702. layer 1 offset: 0
  5703. layer 1
  5704. from w: 0
  5705. actual: 0
  5706. layer 2 offset: 50
  5707. layer 3 offset: 50
  5708. layer 4 offset: 50
  5709. layer 4
  5710. from w: 0
  5711. actual: 0
  5712. layer 5 offset: 280
  5713. layer 6 offset: 280
  5714. layer 7 offset: 280
  5715. layer 7
  5716. from w: 0
  5717. actual: 0
  5718. layer 8 offset: 510
  5719. full thisloss: 3.3063
  5720. backward kernel 0: cannot be used
  5721. backward kernel 1 time: 0ms
  5722. backward kernel 2 time: 0ms
  5723. backward kernel 3 time: 421ms
  5724. backward layer selected kernel 1
  5725. calcGradWeights try kernel 4
  5726. ... seems valid
  5727. BackpropWeightsAuto: kernel 4 842ms
  5728. backward kernel 0: cannot be used
  5729. backward kernel 1 time: 0ms
  5730. backward kernel 2 time: 0ms
  5731. backward kernel 3 time: 920ms
  5732. backward layer selected kernel 1
  5733. calcGradWeights try kernel 4
  5734. ... seems valid
  5735. BackpropWeightsAuto: kernel 4 889ms
  5736. calcGradWeights try kernel 4
  5737. ... seems valid
  5738. BackpropWeightsAuto: kernel 4 546ms
  5739. layer 1 offset: 0
  5740. layer 1
  5741. from w: 0
  5742. actual: 0
  5743. layer 2 offset: 50
  5744. layer 3 offset: 50
  5745. layer 4 offset: 50
  5746. layer 4
  5747. from w: 0
  5748. actual: 0
  5749. layer 5 offset: 280
  5750. layer 6 offset: 280
  5751. layer 7 offset: 280
  5752. layer 7
  5753. from w: 0
  5754. actual: 0
  5755. layer 8 offset: 510
  5756. full thisloss: 3.3063
  5757. calcGradWeights kernel 0: cannot be used
  5758. calcGradWeights kernel 1 time: 0ms
  5759. calcGradWeights kernel 2 time: 0ms
  5760. calcGradWeights kernel 3 time: 0ms
  5761. calcGradWeights kernel 4 time: 842ms
  5762. calcGradWeights layer selected kernel 1
  5763. calcGradWeights kernel 0: cannot be used
  5764. calcGradWeights kernel 1 time: 0ms
  5765. calcGradWeights kernel 2 time: 0ms
  5766. calcGradWeights kernel 3 time: 0ms
  5767. calcGradWeights kernel 4 time: 889ms
  5768. calcGradWeights layer selected kernel 1
  5769. calcGradWeights kernel 0: cannot be used
  5770. calcGradWeights kernel 1 time: 0ms
  5771. calcGradWeights kernel 2 time: 0ms
  5772. calcGradWeights kernel 3 time: 0ms
  5773. calcGradWeights kernel 4 time: 546ms
  5774. calcGradWeights layer selected kernel 1
  5775. layer 1 offset: 0
  5776. layer 1
  5777. from w: 0
  5778. actual: 0
  5779. layer 2 offset: 50
  5780. layer 3 offset: 50
  5781. layer 4 offset: 50
  5782. layer 4
  5783. from w: 0
  5784. actual: 0
  5785. layer 5 offset: 280
  5786. layer 6 offset: 280
  5787. layer 7 offset: 280
  5788. layer 7
  5789. from w: 0
  5790. actual: 0
  5791. layer 8 offset: 510
  5792. full thisloss: 3.3063
  5793. layer 1 offset: 0
  5794. layer 1
  5795. from w: 0
  5796. actual: 0
  5797. layer 2 offset: 50
  5798. layer 3 offset: 50
  5799. layer 4 offset: 50
  5800. layer 4
  5801. from w: 0
  5802. actual: 0
  5803. layer 5 offset: 280
  5804. layer 6 offset: 280
  5805. layer 7 offset: 280
  5806. layer 7
  5807. from w: 0
  5808. actual: 0
  5809. layer 8 offset: 510
  5810. full thisloss: 3.3063
  5811. layer 1 offset: 0
  5812. layer 1
  5813. from w: 0
  5814. actual: 0
  5815. layer 2 offset: 50
  5816. layer 3 offset: 50
  5817. layer 4 offset: 50
  5818. layer 4
  5819. from w: 0
  5820. actual: 0
  5821. layer 5 offset: 280
  5822. layer 6 offset: 280
  5823. layer 7 offset: 280
  5824. layer 7
  5825. from w: 0
  5826. actual: 0
  5827. layer 8 offset: 510
  5828. full thisloss: 3.3063
  5829. layer 1 offset: 0
  5830. layer 1
  5831. from w: 0
  5832. actual: 0
  5833. layer 2 offset: 50
  5834. layer 3 offset: 50
  5835. layer 4 offset: 50
  5836. layer 4
  5837. from w: 0
  5838. actual: 0
  5839. layer 5 offset: 280
  5840. layer 6 offset: 280
  5841. layer 7 offset: 280
  5842. layer 7
  5843. from w: 0
  5844. actual: 0
  5845. layer 8 offset: 510
  5846. full thisloss: 3.3063
  5847. layer 1 offset: 0
  5848. layer 1
  5849. from w: 0
  5850. actual: 0
  5851. layer 2 offset: 50
  5852. layer 3 offset: 50
  5853. layer 4 offset: 50
  5854. layer 4
  5855. from w: 0
  5856. actual: 0
  5857. layer 5 offset: 280
  5858. layer 6 offset: 280
  5859. layer 7 offset: 280
  5860. layer 7
  5861. from w: 0
  5862. actual: 0
  5863. layer 8 offset: 510
  5864. full thisloss: 3.3063
  5865. layer 1 offset: 0
  5866. layer 1
  5867. from w: 0
  5868. actual: 0
  5869. layer 2 offset: 50
  5870. layer 3 offset: 50
  5871. layer 4 offset: 50
  5872. layer 4
  5873. from w: 0
  5874. actual: 0
  5875. layer 5 offset: 280
  5876. layer 6 offset: 280
  5877. layer 7 offset: 280
  5878. layer 7
  5879. from w: 0
  5880. actual: 0
  5881. layer 8 offset: 510
  5882. full thisloss: 3.3063
  5883. layer 1 offset: 0
  5884. layer 1
  5885. from w: 0
  5886. actual: 0
  5887. layer 2 offset: 50
  5888. layer 3 offset: 50
  5889. layer 4 offset: 50
  5890. layer 4
  5891. from w: 0
  5892. actual: 0
  5893. layer 5 offset: 280
  5894. layer 6 offset: 280
  5895. layer 7 offset: 280
  5896. layer 7
  5897. from w: 0
  5898. actual: 0
  5899. layer 8 offset: 510
  5900. full thisloss: 3.3063
  5901. layer 1 offset: 0
  5902. layer 1
  5903. from w: 0
  5904. actual: 0
  5905. layer 2 offset: 50
  5906. layer 3 offset: 50
  5907. layer 4 offset: 50
  5908. layer 4
  5909. from w: 0
  5910. actual: 0
  5911. layer 5 offset: 280
  5912. layer 6 offset: 280
  5913. layer 7 offset: 280
  5914. layer 7
  5915. from w: 0
  5916. actual: 0
  5917. layer 8 offset: 510
  5918. full thisloss: 3.3063
  5919. layer 1 offset: 0
  5920. layer 1
  5921. from w: 0
  5922. actual: 0
  5923. layer 2 offset: 50
  5924. layer 3 offset: 50
  5925. layer 4 offset: 50
  5926. layer 4
  5927. from w: 0
  5928. actual: 0
  5929. layer 5 offset: 280
  5930. layer 6 offset: 280
  5931. layer 7 offset: 280
  5932. layer 7
  5933. from w: 0
  5934. actual: 0
  5935. layer 8 offset: 510
  5936. full thisloss: 3.3063
  5937. layer 1 offset: 0
  5938. layer 1
  5939. from w: 0
  5940. actual: 0
  5941. layer 2 offset: 50
  5942. layer 3 offset: 50
  5943. layer 4 offset: 50
  5944. layer 4
  5945. from w: 0
  5946. actual: 0
  5947. layer 5 offset: 280
  5948. layer 6 offset: 280
  5949. layer 7 offset: 280
  5950. layer 7
  5951. from w: 0
  5952. actual: 0
  5953. layer 8 offset: 510
  5954. full thisloss: 3.3063
  5955. layer 1 offset: 0
  5956. layer 1
  5957. from w: 0
  5958. actual: 0
  5959. layer 2 offset: 50
  5960. layer 3 offset: 50
  5961. layer 4 offset: 50
  5962. layer 4
  5963. from w: 0
  5964. actual: 0
  5965. layer 5 offset: 280
  5966. layer 6 offset: 280
  5967. layer 7 offset: 280
  5968. layer 7
  5969. from w: 0
  5970. actual: 0
  5971. layer 8 offset: 510
  5972. full thisloss: 3.3063
  5973. layer 1 offset: 0
  5974. layer 1
  5975. from w: 0
  5976. actual: 0
  5977. layer 2 offset: 50
  5978. layer 3 offset: 50
  5979. layer 4 offset: 50
  5980. layer 4
  5981. from w: 0
  5982. actual: 0
  5983. layer 5 offset: 280
  5984. layer 6 offset: 280
  5985. layer 7 offset: 280
  5986. layer 7
  5987. from w: 0
  5988. actual: 0
  5989. layer 8 offset: 510
  5990. full thisloss: 3.3063
  5991. layer 1 offset: 0
  5992. layer 1
  5993. from w: 0
  5994. actual: 0
  5995. layer 2 offset: 50
  5996. layer 3 offset: 50
  5997. layer 4 offset: 50
  5998. layer 4
  5999. from w: 0
  6000. actual: 0
  6001. layer 5 offset: 280
  6002. layer 6 offset: 280
  6003. layer 7 offset: 280
  6004. layer 7
  6005. from w: 0
  6006. actual: 0
  6007. layer 8 offset: 510
  6008. full thisloss: 3.3063
  6009. layer 1 offset: 0
  6010. layer 1
  6011. from w: 0
  6012. actual: 0
  6013. layer 2 offset: 50
  6014. layer 3 offset: 50
  6015. layer 4 offset: 50
  6016. layer 4
  6017. from w: 0
  6018. actual: 0
  6019. layer 5 offset: 280
  6020. layer 6 offset: 280
  6021. layer 7 offset: 280
  6022. layer 7
  6023. from w: 0
  6024. actual: 0
  6025. layer 8 offset: 510
  6026. full thisloss: 3.3063
  6027. layer 1 offset: 0
  6028. layer 1
  6029. from w: 0
  6030. actual: 0
  6031. layer 2 offset: 50
  6032. layer 3 offset: 50
  6033. layer 4 offset: 50
  6034. layer 4
  6035. from w: 0
  6036. actual: 0
  6037. layer 5 offset: 280
  6038. layer 6 offset: 280
  6039. layer 7 offset: 280
  6040. layer 7
  6041. from w: 0
  6042. actual: 0
  6043. layer 8 offset: 510
  6044. full thisloss: 3.3063
  6045. layer 1 offset: 0
  6046. layer 1
  6047. from w: 0
  6048. actual: 0
  6049. layer 2 offset: 50
  6050. layer 3 offset: 50
  6051. layer 4 offset: 50
  6052. layer 4
  6053. from w: 0
  6054. actual: 0
  6055. layer 5 offset: 280
  6056. layer 6 offset: 280
  6057. layer 7 offset: 280
  6058. layer 7
  6059. from w: 0
  6060. actual: 0
  6061. layer 8 offset: 510
  6062. full thisloss: 3.3063
  6063. batch time 8954 ms
  6064. dump enabled=0
  6065. clblas teardown
  6066. [ OK ] testsinglebatch.imagesize4_filtersize3_batchsize2_pooling (9734 ms)
  6067. [----------] 6 tests from testsinglebatch (59763 ms total)
  6068.  
  6069. [----------] 9 tests from testpoolingforward
  6070. [ RUN ] testpoolingforward.basic
  6071. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6072. Using OpenCL device: Tahiti
  6073. [ OK ] testpoolingforward.basic (94 ms)
  6074. [ RUN ] testpoolingforward.basic_2plane_batchsize2
  6075. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6076. Using OpenCL device: Tahiti
  6077. [ OK ] testpoolingforward.basic_2plane_batchsize2 (78 ms)
  6078. [ RUN ] testpoolingforward.fromwrappers
  6079. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6080. Using OpenCL device: Tahiti
  6081. [ OK ] testpoolingforward.fromwrappers (93 ms)
  6082. [ RUN ] testpoolingforward.comparespecific_0_1_pooling2
  6083. instance0: 0
  6084. instance1: 1
  6085. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6086. Using OpenCL device: Tahiti
  6087. [ OK ] testpoolingforward.comparespecific_0_1_pooling2 (94 ms)
  6088. [ RUN ] testpoolingforward.comparespecific_0_1_pooling3
  6089. instance0: 0
  6090. instance1: 1
  6091. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6092. Using OpenCL device: Tahiti
  6093. [ OK ] testpoolingforward.comparespecific_0_1_pooling3 (94 ms)
  6094. [ RUN ] testpoolingforward.comparespecific_0_1_pooling2_pz
  6095. instance0: 0
  6096. instance1: 1
  6097. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6098. Using OpenCL device: Tahiti
  6099. [ OK ] testpoolingforward.comparespecific_0_1_pooling2_pz (93 ms)
  6100. [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_pz
  6101. instance0: 0
  6102. instance1: 1
  6103. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6104. Using OpenCL device: Tahiti
  6105. [ OK ] testpoolingforward.comparespecific_0_1_pooling3_pz (109 ms)
  6106. [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_small
  6107. instance0: 0
  6108. instance1: 1
  6109. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6110. Using OpenCL device: Tahiti
  6111. [ OK ] testpoolingforward.comparespecific_0_1_pooling3_small (78 ms)
  6112. [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_small2
  6113. instance0: 0
  6114. instance1: 1
  6115. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6116. Using OpenCL device: Tahiti
  6117. [ OK ] testpoolingforward.comparespecific_0_1_pooling3_small2 (78 ms)
  6118. [----------] 9 tests from testpoolingforward (811 ms total)
  6119.  
  6120. [----------] 2 tests from testpoolingbackward
  6121. [ RUN ] testpoolingbackward.basic
  6122. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6123. Using OpenCL device: Tahiti
  6124. [ OK ] testpoolingbackward.basic (16 ms)
  6125. [ RUN ] testpoolingbackward.basic_2plane_batchsize2
  6126. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6127. Using OpenCL device: Tahiti
  6128. [ OK ] testpoolingbackward.basic_2plane_batchsize2 (16 ms)
  6129. [----------] 2 tests from testpoolingbackward (32 ms total)
  6130.  
  6131. [----------] 1 test from testNorbLoader
  6132. [ RUN ] testNorbLoader.load1000
  6133. unknown file: error: C++ exception with description "failed to open file: ..\data\norb\training-shuffled-dat.mat" thrown in the test body.
  6134. [ FAILED ] testNorbLoader.load1000 (0 ms)
  6135. [----------] 1 test from testNorbLoader (0 ms total)
  6136.  
  6137. [----------] 7 tests from teststringhelper
  6138. [ RUN ] teststringhelper.split
  6139. [ OK ] teststringhelper.split (0 ms)
  6140. [ RUN ] teststringhelper.split2
  6141. [ OK ] teststringhelper.split2 (0 ms)
  6142. [ RUN ] teststringhelper.split3
  6143. [ OK ] teststringhelper.split3 (0 ms)
  6144. [ RUN ] teststringhelper.tolower
  6145. [ OK ] teststringhelper.tolower (0 ms)
  6146. [ RUN ] teststringhelper.replace
  6147. [ OK ] teststringhelper.replace (0 ms)
  6148. [ RUN ] teststringhelper.replaceglobal
  6149. [ OK ] teststringhelper.replaceglobal (0 ms)
  6150. [ RUN ] teststringhelper.strcpy_safe
  6151. [ OK ] teststringhelper.strcpy_safe (0 ms)
  6152. [----------] 7 tests from teststringhelper (0 ms total)
  6153.  
  6154. [----------] 1 test from testGtestGlobals
  6155. [ RUN ] testGtestGlobals.basic
  6156. There are 1 parameters:
  6157. argv[0]=deepcl_unittests.exe
  6158. [ OK ] testGtestGlobals.basic (0 ms)
  6159. [----------] 1 test from testGtestGlobals (0 ms total)
  6160.  
  6161. [----------] 1 test from testMemset
  6162. [ RUN ] testMemset.basic
  6163. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6164. Using OpenCL device: Tahiti
  6165. myArray[0]=99
  6166. myArray[1]=99
  6167. myArray[2]=99
  6168. myArray[3]=99
  6169. myArray[4]=99
  6170. myArray[5]=99
  6171. myArray[6]=99
  6172. myArray[7]=99
  6173. myArray[8]=99
  6174. myArray[9]=99
  6175. [ OK ] testMemset.basic (78 ms)
  6176. [----------] 1 test from testMemset (78 ms total)
  6177.  
  6178. [----------] 2 tests from testCopyBuffer
  6179. [ RUN ] testCopyBuffer.floats
  6180. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6181. Using OpenCL device: Tahiti
  6182. 3
  6183. 4
  6184. 5
  6185. 6
  6186. 7
  6187. 8
  6188. 9
  6189. 10
  6190. 11
  6191. 12
  6192. 3
  6193. 4
  6194. 5
  6195. 6
  6196. 7
  6197. 8
  6198. 9
  6199. 10
  6200. 11
  6201. 12
  6202. 3
  6203. 4
  6204. 5
  6205. 6
  6206. 7
  6207. 8
  6208. 9
  6209. 10
  6210. 11
  6211. 12
  6212. [ OK ] testCopyBuffer.floats (187 ms)
  6213. [ RUN ] testCopyBuffer.nits
  6214. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6215. Using OpenCL device: Tahiti
  6216. 3
  6217. 4
  6218. 5
  6219. 6
  6220. 7
  6221. 8
  6222. 9
  6223. 10
  6224. 11
  6225. 12
  6226. 3
  6227. 4
  6228. 5
  6229. 6
  6230. 7
  6231. 8
  6232. 9
  6233. 10
  6234. 11
  6235. 12
  6236. 3
  6237. 4
  6238. 5
  6239. 6
  6240. 7
  6241. 8
  6242. 9
  6243. 10
  6244. 11
  6245. 12
  6246. [ OK ] testCopyBuffer.nits (171 ms)
  6247. [----------] 2 tests from testCopyBuffer (358 ms total)
  6248.  
  6249. [----------] 2 tests from testCopyBlock
  6250. [ RUN ] testCopyBlock.testPos
  6251. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6252. Using OpenCL device: Tahiti
  6253. in[0]=3076
  6254. in[1]=8
  6255. in[2]=14
  6256. res[0]=3
  6257. res[1]=4
  6258. res[2]=8206
  6259. res[3]=8
  6260. res[4]=14
  6261. [ OK ] testCopyBlock.testPos (110 ms)
  6262. [ RUN ] testCopyBlock.basic
  6263. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6264. Using OpenCL device: Tahiti
  6265. 2 3 4
  6266. 6 7 8
  6267.  
  6268. 0 0 0 0
  6269.  
  6270. 5 6 7
  6271. 9 10 11
  6272.  
  6273. 0 0 0 0
  6274.  
  6275. [ OK ] testCopyBlock.basic (93 ms)
  6276. [----------] 2 tests from testCopyBlock (203 ms total)
  6277.  
  6278. [----------] 1 test from testCopyLocal
  6279. [ RUN ] testCopyLocal.basic
  6280. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6281. Using OpenCL device: Tahiti
  6282. 0 0 0 0
  6283. 1 2 3 4
  6284. 5 6 7 8
  6285. 9 10 11 12
  6286.  
  6287. 0 0 0 0
  6288. [ OK ] testCopyLocal.basic (78 ms)
  6289. [----------] 1 test from testCopyLocal (78 ms total)
  6290.  
  6291. [----------] 8 tests from testNetdefToNet
  6292. [ RUN ] testNetdefToNet.empty
  6293. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6294. Using OpenCL device: Tahiti
  6295. [ OK ] testNetdefToNet.empty (16 ms)
  6296. [ RUN ] testNetdefToNet.onefc
  6297. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6298. Using OpenCL device: Tahiti
  6299. [ OK ] testNetdefToNet.onefc (171 ms)
  6300. [ RUN ] testNetdefToNet.onefclinear
  6301. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6302. Using OpenCL device: Tahiti
  6303. [ OK ] testNetdefToNet.onefclinear (156 ms)
  6304. [ RUN ] testNetdefToNet.150n_10n
  6305. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6306. Using OpenCL device: Tahiti
  6307. [ OK ] testNetdefToNet.150n_10n (172 ms)
  6308. [ RUN ] testNetdefToNet.3xfclinear
  6309. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6310. Using OpenCL device: Tahiti
  6311. nnString: [3]
  6312. repeatNum 3
  6313. remainderString [150n]
  6314. inner [150n]
  6315. multiplied string: 150n-150n-150n
  6316. layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
  6317. layer 1:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
  6318. layer 2:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
  6319. layer 3:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
  6320. layer 4:SoftMaxLayer{ perPlane=0 numPlanes=150 imageSize=1 }
  6321. Parameters overview: (skipping 2 layers with 0 params)
  6322. layer 1: params=54300 54.5%
  6323. layer 2: params=22650 22.7%
  6324. layer 3: params=22650 22.7%
  6325. TOTAL : params=99600
  6326. [ OK ] testNetdefToNet.3xfclinear (156 ms)
  6327. [ RUN ] testNetdefToNet.mp2_3x32c5z_10n
  6328. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6329. Using OpenCL device: Tahiti
  6330. prefix: [mp2]
  6331. nnString: [3]
  6332. repeatNum 3
  6333. remainderString [32c5z-10n ]
  6334. postfix [10n ]
  6335. inner [32c5z]
  6336. multiplied string: mp2-32c5z-32c5z-32c5z-10n
  6337. layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
  6338. layer 1:PoolingLayer{ inputPlanes=1 inputSize=19 poolingSize=2 }
  6339. layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
  6340. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
  6341. layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
  6342. layer 5:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  6343. layer 6:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
  6344. Parameters overview: (skipping 3 layers with 0 params)
  6345. layer 2: params=832 1.1%
  6346. layer 3: params=25632 32.9%
  6347. layer 4: params=25632 32.9%
  6348. layer 5: params=25930 33.2%
  6349. TOTAL : params=78026
  6350. [ OK ] testNetdefToNet.mp2_3x32c5z_10n (343 ms)
  6351. [ RUN ] testNetdefToNet.3x32c5zmp2
  6352. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6353. Using OpenCL device: Tahiti
  6354. nnString: [3]
  6355. repeatNum 3
  6356. remainderString [(32c5z-mp2)-10n]
  6357. inner [32c5z-mp2]
  6358. newRemainder [-10n]
  6359. postfix [10n]
  6360. multiplied string: 32c5z-mp2-32c5z-mp2-32c5z-mp2-10n
  6361. layer 0:InputLayer{ outputPlanes=1 outputSize=128 }
  6362. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=128 numFilters=32 filterSize=5 outputSize=128 padZeros=1 biased=1 skip=0} }
  6363. layer 2:PoolingLayer{ inputPlanes=32 inputSize=128 poolingSize=2 }
  6364. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=64 numFilters=32 filterSize=5 outputSize=64 padZeros=1 biased=1 skip=0} }
  6365. layer 4:PoolingLayer{ inputPlanes=32 inputSize=64 poolingSize=2 }
  6366. layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=32 numFilters=32 filterSize=5 outputSize=32 padZeros=1 biased=1 skip=0} }
  6367. layer 6:PoolingLayer{ inputPlanes=32 inputSize=32 poolingSize=2 }
  6368. layer 7:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  6369. layer 8:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
  6370. Parameters overview: (skipping 5 layers with 0 params)
  6371. layer 1: params=832 0.6%
  6372. layer 3: params=25632 19.1%
  6373. layer 5: params=25632 19.1%
  6374. layer 7: params=81930 61.1%
  6375. TOTAL : params=134026
  6376. [ OK ] testNetdefToNet.3x32c5zmp2 (702 ms)
  6377. [ RUN ] testNetdefToNet.2x32c7_3x32c5z
  6378. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6379. Using OpenCL device: Tahiti
  6380. nnString: [2]
  6381. repeatNum 2
  6382. remainderString [32c7z-3*32c5z-10n]
  6383. postfix [3*32c5z-10n]
  6384. inner [32c7z]
  6385. nnString: [3]
  6386. repeatNum 3
  6387. remainderString [32c5z-10n]
  6388. postfix [10n]
  6389. inner [32c5z]
  6390. multiplied string: 32c5z-32c5z-32c5z-10n
  6391. multiplied string: 32c7z-32c7z-32c5z-32c5z-32c5z-10n
  6392. layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
  6393. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=19 numFilters=32 filterSize=7 outputSize=19 padZeros=1 biased=1 skip=0} }
  6394. layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=7 outputSize=19 padZeros=1 biased=1 skip=0} }
  6395. layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
  6396. layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
  6397. layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
  6398. layer 6:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
  6399. layer 7:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
  6400. Parameters overview: (skipping 2 layers with 0 params)
  6401. layer 1: params=1600 0.7%
  6402. layer 2: params=50208 20.6%
  6403. layer 3: params=25632 10.5%
  6404. layer 4: params=25632 10.5%
  6405. layer 5: params=25632 10.5%
  6406. layer 6: params=115530 47.3%
  6407. TOTAL : params=244234
  6408. [ OK ] testNetdefToNet.2x32c7_3x32c5z (172 ms)
  6409. [----------] 8 tests from testNetdefToNet (1888 ms total)
  6410.  
  6411. [----------] 10 tests from testactivationforward
  6412. [ RUN ] testactivationforward.basic
  6413. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6414. Using OpenCL device: Tahiti
  6415. [ OK ] testactivationforward.basic (15 ms)
  6416. [ RUN ] testactivationforward.basic_2plane_batchsize2
  6417. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6418. Using OpenCL device: Tahiti
  6419. [ OK ] testactivationforward.basic_2plane_batchsize2 (16 ms)
  6420. [ RUN ] testactivationforward.fromwrappers
  6421. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6422. Using OpenCL device: Tahiti
  6423. [ OK ] testactivationforward.fromwrappers (78 ms)
  6424. [ RUN ] testactivationforward.comparespecific_0_1_activation2
  6425. instance0: 0
  6426. instance1: 1
  6427. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6428. Using OpenCL device: Tahiti
  6429. [ OK ] testactivationforward.comparespecific_0_1_activation2 (78 ms)
  6430. [ RUN ] testactivationforward.comparespecific_0_1_activation3
  6431. instance0: 0
  6432. instance1: 1
  6433. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6434. Using OpenCL device: Tahiti
  6435. [ OK ] testactivationforward.comparespecific_0_1_activation3 (94 ms)
  6436. [ RUN ] testactivationforward.comparespecific_0_1_activation2_pz
  6437. instance0: 0
  6438. instance1: 1
  6439. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6440. Using OpenCL device: Tahiti
  6441. [ OK ] testactivationforward.comparespecific_0_1_activation2_pz (78 ms)
  6442. [ RUN ] testactivationforward.comparespecific_0_1_activation3_pz
  6443. instance0: 0
  6444. instance1: 1
  6445. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6446. Using OpenCL device: Tahiti
  6447. [ OK ] testactivationforward.comparespecific_0_1_activation3_pz (78 ms)
  6448. [ RUN ] testactivationforward.comparespecific_0_1_activation3_small
  6449. instance0: 0
  6450. instance1: 1
  6451. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6452. Using OpenCL device: Tahiti
  6453. [ OK ] testactivationforward.comparespecific_0_1_activation3_small (78 ms)
  6454. [ RUN ] testactivationforward.comparespecific_0_1_activation3_small2
  6455. instance0: 0
  6456. instance1: 1
  6457. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6458. Using OpenCL device: Tahiti
  6459. [ OK ] testactivationforward.comparespecific_0_1_activation3_small2 (78 ms)
  6460. [ RUN ] testactivationforward.comparespecific_0_1_activation3_small2_tanh
  6461. instance0: 0
  6462. instance1: 1
  6463. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6464. Using OpenCL device: Tahiti
  6465. [ OK ] testactivationforward.comparespecific_0_1_activation3_small2_tanh (109 ms)
  6466. [----------] 10 tests from testactivationforward (702 ms total)
  6467.  
  6468. [----------] 2 tests from testactivationbackward
  6469. [ RUN ] testactivationbackward.basic
  6470. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6471. Using OpenCL device: Tahiti
  6472. gradInput=3
  6473. gradInput=0
  6474. gradInput=-2.7
  6475. gradInput=2
  6476. gradInput=-0
  6477. gradInput=2.1
  6478. gradInput=0
  6479. gradInput=-1.1
  6480. gradInput=0
  6481. [ OK ] testactivationbackward.basic (0 ms)
  6482. [ RUN ] testactivationbackward.basic_2plane_batchsize2
  6483. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6484. Using OpenCL device: Tahiti
  6485. gradInput=3
  6486. gradInput=0
  6487. gradInput=0
  6488. gradInput=9
  6489. [ OK ] testactivationbackward.basic_2plane_batchsize2 (15 ms)
  6490. [----------] 2 tests from testactivationbackward (15 ms total)
  6491.  
  6492. [----------] 1 test from testRandomSingleton
  6493. [ RUN ] testRandomSingleton.testMockRandom
  6494. 0.549356
  6495. 0.634521
  6496. 0.5968
  6497. 0.863601
  6498. 0.982891
  6499. 0.637683
  6500. 0.248837
  6501. 0.351605
  6502. 0.225401
  6503. 0.220224
  6504. [ OK ] testRandomSingleton.testMockRandom (0 ms)
  6505. [----------] 1 test from testRandomSingleton (0 ms total)
  6506.  
  6507. [----------] 10 tests from testdropoutforward
  6508. [ RUN ] testdropoutforward.basic
  6509. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6510. Using OpenCL device: Tahiti
  6511. [ OK ] testdropoutforward.basic (16 ms)
  6512. [ RUN ] testdropoutforward.basic_2plane_batchsize2
  6513. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6514. Using OpenCL device: Tahiti
  6515. [ OK ] testdropoutforward.basic_2plane_batchsize2 (16 ms)
  6516. [ RUN ] testdropoutforward.fromwrappers
  6517. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6518. Using OpenCL device: Tahiti
  6519. [ OK ] testdropoutforward.fromwrappers (15 ms)
  6520. [ RUN ] testdropoutforward.comparespecific_0_1_dropout2
  6521. instance0: 0
  6522. instance1: 1
  6523. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6524. Using OpenCL device: Tahiti
  6525. [ OK ] testdropoutforward.comparespecific_0_1_dropout2 (78 ms)
  6526. [ RUN ] testdropoutforward.comparespecific_0_1_dropout3
  6527. instance0: 0
  6528. instance1: 1
  6529. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6530. Using OpenCL device: Tahiti
  6531. [ OK ] testdropoutforward.comparespecific_0_1_dropout3 (78 ms)
  6532. [ RUN ] testdropoutforward.comparespecific_0_1_dropout2_pz
  6533. instance0: 0
  6534. instance1: 1
  6535. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6536. Using OpenCL device: Tahiti
  6537. [ OK ] testdropoutforward.comparespecific_0_1_dropout2_pz (94 ms)
  6538. [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_pz
  6539. instance0: 0
  6540. instance1: 1
  6541. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6542. Using OpenCL device: Tahiti
  6543. [ OK ] testdropoutforward.comparespecific_0_1_dropout3_pz (78 ms)
  6544. [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small
  6545. instance0: 0
  6546. instance1: 1
  6547. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6548. Using OpenCL device: Tahiti
  6549. [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small (93 ms)
  6550. [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small2
  6551. instance0: 0
  6552. instance1: 1
  6553. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6554. Using OpenCL device: Tahiti
  6555. [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small2 (78 ms)
  6556. [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh
  6557. instance0: 0
  6558. instance1: 1
  6559. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6560. Using OpenCL device: Tahiti
  6561. [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh (78 ms)
  6562. [----------] 10 tests from testdropoutforward (624 ms total)
  6563.  
  6564. [----------] 3 tests from testdropoutbackward
  6565. [ RUN ] testdropoutbackward.basic
  6566. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6567. Using OpenCL device: Tahiti
  6568. [ OK ] testdropoutbackward.basic (94 ms)
  6569. [ RUN ] testdropoutbackward.basic_2plane_batchsize2
  6570. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6571. Using OpenCL device: Tahiti
  6572. [ OK ] testdropoutbackward.basic_2plane_batchsize2 (78 ms)
  6573. [ RUN ] testdropoutbackward.compare_args
  6574. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6575. Using OpenCL device: Tahiti
  6576. [ OK ] testdropoutbackward.compare_args (78 ms)
  6577. [----------] 3 tests from testdropoutbackward (250 ms total)
  6578.  
  6579. [----------] 1 test from testsgd
  6580. [ RUN ] testsgd.basic
  6581. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6582. Using OpenCL device: Tahiti
  6583. layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
  6584. layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
  6585. layer 2:SquareLossLayer{}
  6586.  
  6587. inputtotalsize=50 outputTotalSize=18
  6588. forward try kernel 0
  6589. ... not plausibly optimal, skipping
  6590. forward try kernel 1
  6591. ... seems valid
  6592. ForwardAuto: kernel 1 0ms
  6593. calcGradWeights try kernel 0
  6594. ... not plausibly optimal, skipping
  6595. calcGradWeights try kernel 1
  6596. ... seems valid
  6597. BackpropWeightsAuto: kernel 1 0ms
  6598. [ OK ] testsgd.basic (577 ms)
  6599. [----------] 1 test from testsgd (577 ms total)
  6600.  
  6601. [----------] 9 tests from testCLMathWrapper
  6602. [ RUN ] testCLMathWrapper.assign
  6603. a[0]=4
  6604. a[1]=2.1
  6605. a[2]=5
  6606. a[3]=3
  6607. a[4]=9.2
  6608. [ OK ] testCLMathWrapper.assign (78 ms)
  6609. [ RUN ] testCLMathWrapper.assignScalar
  6610. a[0]=3.4
  6611. a[1]=3.4
  6612. a[2]=3.4
  6613. a[3]=3.4
  6614. a[4]=3.4
  6615. [ OK ] testCLMathWrapper.assignScalar (78 ms)
  6616. [ RUN ] testCLMathWrapper.addinplace
  6617. a[0]=5
  6618. a[1]=5.1
  6619. a[2]=14
  6620. a[3]=15.5
  6621. a[4]=11.7
  6622. [ OK ] testCLMathWrapper.addinplace (78 ms)
  6623. [ RUN ] testCLMathWrapper.multiplyinplace
  6624. a[0]=1.5
  6625. a[1]=4.5
  6626. a[2]=13.5
  6627. a[3]=18.75
  6628. a[4]=3.75
  6629. [ OK ] testCLMathWrapper.multiplyinplace (78 ms)
  6630. [ RUN ] testCLMathWrapper.addscalar
  6631. a[0]=2.5
  6632. a[1]=4.5
  6633. a[2]=10.5
  6634. a[3]=14
  6635. a[4]=4
  6636. [ OK ] testCLMathWrapper.addscalar (78 ms)
  6637. [ RUN ] testCLMathWrapper.sqrt
  6638. a[0]=1
  6639. a[1]=1.73205
  6640. a[2]=3
  6641. a[3]=3.53553
  6642. a[4]=1.58114
  6643. [ OK ] testCLMathWrapper.sqrt (78 ms)
  6644. [ RUN ] testCLMathWrapper.squared
  6645. a[0]=1
  6646. a[1]=9
  6647. a[2]=81
  6648. a[3]=156.25
  6649. a[4]=6.25
  6650. [ OK ] testCLMathWrapper.squared (78 ms)
  6651. [ RUN ] testCLMathWrapper.inverse
  6652. a[0]=1
  6653. a[1]=0.333333
  6654. a[2]=0.111111
  6655. a[3]=0.08
  6656. a[4]=0.4
  6657. [ OK ] testCLMathWrapper.inverse (78 ms)
  6658. [ RUN ] testCLMathWrapper.perelementmult
  6659. a[0]=4
  6660. a[1]=6.3
  6661. a[2]=45
  6662. a[3]=37.5
  6663. a[4]=23
  6664. [ OK ] testCLMathWrapper.perelementmult (78 ms)
  6665. [----------] 9 tests from testCLMathWrapper (702 ms total)
  6666.  
  6667. [----------] 1 test from testreducesegments
  6668. [ RUN ] testreducesegments.basic
  6669. Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
  6670. Using OpenCL device: Tahiti
  6671. [ OK ] testreducesegments.basic (78 ms)
  6672. [----------] 1 test from testreducesegments (78 ms total)
  6673.  
  6674. [----------] 4 tests from testGpuOp
  6675. [ RUN ] testGpuOp.addinplace
  6676. a[0]=5
  6677. a[1]=5.1
  6678. a[2]=14
  6679. a[3]=15.5
  6680. a[4]=11.7
  6681. [ OK ] testGpuOp.addinplace (78 ms)
  6682. [ RUN ] testGpuOp.addoutofplace
  6683. a[0]=1
  6684. a[1]=3
  6685. a[2]=9
  6686. a[3]=12.5
  6687. a[4]=2.5
  6688. c[0]=5
  6689. c[1]=5.1
  6690. c[2]=14
  6691. c[3]=15.5
  6692. c[4]=11.7
  6693. [ OK ] testGpuOp.addoutofplace (94 ms)
  6694. [ RUN ] testGpuOp.inverse
  6695. a[0]=1
  6696. a[1]=0.333333
  6697. a[2]=0.111111
  6698. a[3]=0.08
  6699. a[4]=0.4
  6700. [ OK ] testGpuOp.inverse (78 ms)
  6701. [ RUN ] testGpuOp.addscalarinplace
  6702. a[0]=5.2
  6703. a[1]=7.2
  6704. a[2]=13.2
  6705. a[3]=16.7
  6706. a[4]=6.7
  6707. [ OK ] testGpuOp.addscalarinplace (78 ms)
  6708. [----------] 4 tests from testGpuOp (328 ms total)
  6709.  
  6710. [----------] 1 test from testjpeghelper
  6711. [ RUN ] testjpeghelper.writeread
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement