Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- args: deepcl_unittests.exe --gtest_filter=-SLOW*
- Note: Google Test filter = -SLOW*
- [==========] Running 159 tests from 29 test cases.
- [----------] Global test environment set-up.
- [----------] 7 tests from testClBlas
- [ RUN ] testClBlas.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.basic (430 ms)
- [ RUN ] testClBlas.transA
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 1 2 9
- 3 7 5
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.transA (90 ms)
- [ RUN ] testClBlas.transB
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 3
- -1
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.transB (100 ms)
- [ RUN ] testClBlas.colMajor
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.colMajor (80 ms)
- [ RUN ] testClBlas.colMajor2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.colMajor2 (90 ms)
- [ RUN ] testClBlas.colMajorTransA
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.colMajorTransA (100 ms)
- [ RUN ] testClBlas.colMajorTransB
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- [ OK ] testClBlas.colMajorTransB (90 ms)
- [----------] 7 tests from testClBlas (980 ms total)
- [----------] 1 test from testDeepCL
- [ RUN ] testDeepCL.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- expected number of output: 4
- clblas teardown
- [ OK ] testDeepCL.basic (430 ms)
- [----------] 1 test from testDeepCL (430 ms total)
- [----------] 23 tests from testupdateweights
- [ RUN ] testupdateweights.conv1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- batchSize: 4
- inputtotalsize=200 outputTotalSize=72
- layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- weightsize=36 biassize=0
- statefultimer v0.7
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=2 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=5 numFilters=2 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- Parameters overview: (skipping 2 layers with 0 params)
- layer 1: params=36 100.0%
- TOTAL : params=36
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- idx=8 predicted losschange=0.000111445 actual=0.000112534
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- idx=13 predicted losschange=-0.000886715 actual=-0.000884056
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- idx=0 predicted losschange=0.000210491 actual=0.000212669
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 440ms
- idx=22 predicted losschange=-0.000164224 actual=0.000212669
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 440ms
- forward layer selected kernel 1
- idx=22 predicted losschange=-0.000164224 actual=-0.000163078
- idx=35 predicted losschange=-0.000391028 actual=-0.000391006
- idx=26 predicted losschange=2.23142e-05 actual=2.57492e-05
- idx=27 predicted losschange=9.38328e-05 actual=9.44138e-05
- idx=27 predicted losschange=9.38328e-05 actual=9.44138e-05
- idx=10 predicted losschange=0.00186697 actual=0.00187111
- clblas teardown
- [ OK ] testupdateweights.conv1 (1380 ms)
- [ RUN ] testupdateweights.conv1z
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- batchSize: 4
- inputtotalsize=72 outputTotalSize=72
- layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
- weightsize=36 biassize=0
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=2 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=3 numFilters=2 filterSize=3 outputSize=3 padZeros=1 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- Parameters overview: (skipping 2 layers with 0 params)
- layer 1: params=36 100.0%
- TOTAL : params=36
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- idx=8 predicted losschange=0.00039831 actual=0.000397682
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- idx=13 predicted losschange=-0.000426502 actual=-0.000426292
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- idx=0 predicted losschange=0.000143287 actual=0.000144005
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, padzeros must be disabled
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 430ms
- idx=22 predicted losschange=-1.7916e-06 actual=0.000144005
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 430ms
- forward layer selected kernel 1
- idx=22 predicted losschange=-1.7916e-06 actual=0
- idx=35 predicted losschange=-2.82565e-05 actual=-2.76566e-05
- idx=26 predicted losschange=3.62191e-05 actual=3.71933e-05
- idx=27 predicted losschange=-0.000319862 actual=-0.000317574
- idx=27 predicted losschange=-0.000319862 actual=-0.000317574
- idx=10 predicted losschange=-0.000883857 actual=-0.000883102
- clblas teardown
- [ OK ] testupdateweights.conv1z (1390 ms)
- [ RUN ] testupdateweights.numericallytest
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 10ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=1 filterSize=1 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=1 filterSize=1 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- loss 0.0367983 loss2 0.0367913 change: 7.01472e-06
- sumweightsdiff -0.000264842
- loss change 7.01472e-06
- estimatedLossChangeFromW 7.01413e-06
- [ OK ] testupdateweights.numericallytest (850 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=1 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=1 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- loss 1.23358 loss2 1.21612 change: 0.0174606
- sumweightsdiff -0.0132709
- loss change 0.0174606
- estimatedLossChangeFromW 0.0176118
- [ OK ] testupdateweights.numericallytest_imagesize3 (860 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize5
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=1 outputSize=5 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=1 outputSize=5 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- loss 4.12958 loss2 4.11952 change: 0.0100665
- sumweightsdiff -0.0101708
- loss change 0.0100665
- estimatedLossChangeFromW 0.0103444
- [ OK ] testupdateweights.numericallytest_imagesize5 (890 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize9
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=1 outputSize=9 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=1 outputSize=9 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=1 100.0%
- TOTAL : params=1
- loss 13.4341 loss2 13.4339 change: 0.000207901
- sumweightsdiff 0.00153953
- loss change 0.000207901
- estimatedLossChangeFromW 0.000237015
- [ OK ] testupdateweights.numericallytest_imagesize9 (890 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize9_filtersize9
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=9 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=81 100.0%
- TOTAL : params=81
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=9 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=81 100.0%
- TOTAL : params=81
- loss 0.135896 loss2 0.0848782 change: 0.0510182
- sumweightsdiff -0.0322406
- loss change 0.0510182
- estimatedLossChangeFromW 0.0555841
- [ OK ] testupdateweights.numericallytest_imagesize9_filtersize9 (930 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize9_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=3 outputSize=7 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=9 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=1 filterSize=3 outputSize=7 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- loss 7.70633 loss2 7.41581 change: 0.290529
- sumweightsdiff -0.0898812
- loss change 0.290529
- estimatedLossChangeFromW 0.316231
- [ OK ] testupdateweights.numericallytest_imagesize9_filtersize3 (940 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize3_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=3 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=3 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=3 numFilters=1 filterSize=3 outputSize=1 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- loss 0.0719101 loss2 0.0694461 change: 0.00246406
- sumweightsdiff -0.0110647
- loss change 0.00246406
- estimatedLossChangeFromW 0.00248372
- [ OK ] testupdateweights.numericallytest_imagesize3_filtersize3 (880 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 10ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- loss 1.20022 loss2 1.17241 change: 0.0278131
- sumweightsdiff -0.0203888
- loss change 0.0278131
- estimatedLossChangeFromW 0.0280929
- [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3 (910 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=9 100.0%
- TOTAL : params=9
- loss 4.97142 loss2 4.78768 change: 0.183744
- sumweightsdiff -0.056004
- loss change 0.183744
- estimatedLossChangeFromW 0.193264
- [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_batchsize3 (900 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=27 100.0%
- TOTAL : params=27
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=27 100.0%
- TOTAL : params=27
- loss 1.08887 loss2 0.9575 change: 0.13137
- sumweightsdiff -0.00764531
- loss change 0.13137
- estimatedLossChangeFromW 0.134379
- [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3 (940 ms)
- [ RUN ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 10ms
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=27 100.0%
- TOTAL : params=27
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:ActivationLayer{ TANH }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=27 100.0%
- TOTAL : params=27
- loss 4.76631 loss2 4.18154 change: 0.584769
- sumweightsdiff 0.029606
- loss change 0.584769
- estimatedLossChangeFromW 0.620442
- [ OK ] testupdateweights.numericallytest_imagesize5_filtersize3_planes3_batchsize3 (940 ms)
- [ RUN ] testupdateweights.backprop_weights_2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2 (110 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=2 -D gInputSizeSquared=4 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=2 -DgInputStripeOuterNumRows=2 -DgInputStripeInnerSize=4 -DgInputStripeOuterSize=4 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize2 (110 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- mismatch for i 0
- mismatch for i 1
- mismatch for i 2
- mismatch for i 3
- mismatch for i 4
- mismatch for i 5
- mismatch for i 6
- mismatch for i 7
- mismatch for i 8
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize3 (110 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=2 -D gOutputSizeSquared=4 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=8 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=32 -DgInputStripeMarginSize=8 -DgOutputStripeNumRows=2 -DgOutputStripeSize=4
- mismatch for i 0
- mismatch for i 8
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize4_filtersize3 (120 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- mismatch for i 0
- mismatch for i 4
- mismatch for i 8
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize5_filtersize3 (120 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=3 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=9 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize3_filtersize1 (110 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=16 -D gInputSizeSquared=256 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=16 -D gOutputSizeSquared=256 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=16 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=256 -DgInputStripeOuterSize=256 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=16 -DgOutputStripeSize=256
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize16_filtersize1 (240 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1
- LayerDimensions{ inputPlanes=1 inputSize=17 numFilters=1 filterSize=1 outputSize=17 padZeros=0 biased=0 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=17 -DgInputStripeOuterNumRows=17 -DgInputStripeInnerSize=289 -DgInputStripeOuterSize=289 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=17 -DgOutputStripeSize=289
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1 (390 ms)
- [ RUN ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata
- expectedresult: -958.715
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=17 -D gInputSizeSquared=289 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=17 -D gOutputSizeSquared=289 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=17 -DgInputStripeOuterNumRows=17 -DgInputStripeInnerSize=289 -DgInputStripeOuterSize=289 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=17 -DgOutputStripeSize=289
- mismatch for i 0
- [ OK ] testupdateweights.backprop_weights_2_upstreamimagesize17_filtersize1_moredata (380 ms)
- [ RUN ] testupdateweights.backprop_instance3_smaller2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- numweights: 36
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=96 -D gInputSizeSquared=9216 -D gNumFilters=1 -D gFilterSize=6 -D gHalfFilterSize=3 -D gFilterSizeSquared=36 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=91 -D gOutputSizeSquared=8281 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=16 -DgInputStripeMarginRows=5 -DgInputStripeInnerNumRows=6 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=576 -DgInputStripeOuterSize=1536 -DgInputStripeMarginSize=480 -DgOutputStripeNumRows=6 -DgOutputStripeSize=546
- 138 0 0 0 0 0
- 132 0 0 0 0 0
- 138 0 0 0 0 0
- 138 0 0 0 0 0
- 138 0 0 0 0 0
- 132 0 0 0 0 0
- 138 0 0 0 0 0
- 132 0 0 0 0 0
- 138 0 0 0 0 0
- 138 0 0 0 0 0
- 138 0 0 0 0 0
- 132 0 0 0 0 0
- ......
- ......
- ......
- ......
- ......
- ......
- 0=0 0 0 0 0 0 0 0
- 1=0 0 0 0 0 0 0 0
- 2=0 0 0 0 0 0 0 0
- 3=0 0 0 0 0 0 0 0
- 4=0 0 0 0 0 0 0 0
- 5=0 0 0 0 0 0 0 0
- 6=0 0 0 0 0 0 0 0
- 7=0 0 0 0 0 0 0 0
- 8=0 0 0 0 0 0 0 0
- 9=0 0 0 0 0 0 0 0
- 10=0 0 0 0 0 0 0 0
- 11=0 0 0 0 0 0 0 0
- 0=0 0 0 0 0 0 0 0
- 1=0 0 0 0 0 0 0 0
- 2=0 0 0 0 0 0 0 0
- 3=0 0 0 0 0 0 0 0
- 4=0 0 0 0 0 0 0 0
- 5=0 0 0 0 0 0 0 0
- 6=0 0 0 0 0 0 0 0
- 7=0 0 0 0 0 0 0 0
- 8=0 0 0 0 0 0 0 0
- 9=0 0 0 0 0 0 0 0
- 10=0 0 0 0 0 0 0 0
- 11=0 0 0 0 0 0 0 0
- 12=0 0 0 0 0 0 0 0
- 13=0 0 0 0 0 0 0 0
- 14=0 0 0 0 0 0 0 0
- 15=0 0 0 0 0 0 0 0
- 16=0 0 0 0 0 0 0 0
- 17=0 0 0 0 0 0 0 0
- 18=0 0 0 0 0 0 0 0
- 19=0 0 0 0 0 0 0 0
- [ OK ] testupdateweights.backprop_instance3_smaller2 (800 ms)
- [----------] 23 tests from testupdateweights (15190 ms total)
- [----------] 17 tests from testforward
- [ RUN ] testforward.imagesize2_nopadzeros
- expected number of output: 4
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testforward.imagesize2_nopadzeros (430 ms)
- [ RUN ] testforward.imagesize2_padzeros
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- checking result[0]=0 expecting: 0
- checking result[1]=0 expecting: 0
- checking result[2]=0 expecting: 0
- checking result[3]=0.2 expecting: 0.2
- checking result[4]=-0.13 expecting: -0.13
- checking result[5]=-0.15 expecting: -0.15
- checking result[6]=0 expecting: 0
- checking result[7]=0 expecting: 0
- checking result[8]=0 expecting: 0
- checking result[9]=0 expecting: 0
- checking result[10]=0 expecting: 0
- checking result[11]=0 expecting: 0
- checking result[12]=-0.55 expecting: -0.55
- checking result[13]=0.02 expecting: 0.02
- checking result[14]=0.21 expecting: 0.21
- checking result[27]=-14.3 expecting: -14.3
- checking result[28]=-9.6 expecting: -9.6
- checking result[29]=11.9 expecting: 11.9
- checking result[35]=0.46 expecting: 0.46
- [ OK ] testforward.imagesize2_padzeros (170 ms)
- [ RUN ] testforward.imagesize3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- test1 ok
- [ OK ] testforward.imagesize3 (180 ms)
- [ RUN ] testforward.test2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testforward.test2 (160 ms)
- [ RUN ] testforward.test3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testforward.test3 (170 ms)
- [ RUN ] testforward.compare_0_1_biased_nopad
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- clblas teardown
- [ OK ] testforward.compare_0_1_biased_nopad (340 ms)
- [ RUN ] testforward.compare_0_1_biased_pad
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- clblas teardown
- [ OK ] testforward.compare_0_1_biased_pad (360 ms)
- [ RUN ] testforward.compare_1_n_biased_nopad
- instance: 2
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- clblas teardown
- instance: 3
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- clblas teardown
- instance: 4
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- clblas teardown
- instance: 6
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=15 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- clblas teardown
- unknown file: error: C++ exception with description "memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size" thrown in the test body.
- [ FAILED ] testforward.compare_1_n_biased_nopad (2230 ms)
- [ RUN ] testforward.compare_1_n_biased_pad
- instance: 2
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- unknown file: error: C++ exception with description "cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize" thrown in the test body.
- [ FAILED ] testforward.compare_1_n_biased_pad (310 ms)
- [ RUN ] testforward.compare_1_5_biased_nopad
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=19 outputSize=1 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLB4E.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=8 inputSize=19 numFilters=8 filterSize=19 outputSize=1 padZeros=0 biased=1 skip=0}
- clblas teardown
- [ OK ] testforward.compare_1_5_biased_nopad (820 ms)
- [ RUN ] testforward.compare_1_4_fcscenario
- LayerDimensions{ inputPlanes=10 inputSize=24 numFilters=10 filterSize=24 outputSize=1 padZeros=0 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=10 inputSize=24 numFilters=10 filterSize=24 outputSize=1 padZeros=0 biased=1 skip=0}
- clblas teardown
- [ OK ] testforward.compare_1_4_fcscenario (1200 ms)
- [ RUN ] testforward.compare_break1_0_1
- LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 1
- dump enabled=0
- batch 0 batchsize 1
- dump enabled=0
- LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
- clblas teardown
- [ OK ] testforward.compare_break1_0_1 (170 ms)
- [ RUN ] testforward.compare_break1_0_4
- LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- batch 0 batchsize 1
- dump enabled=0
- batch 0 batchsize 1
- dump enabled=0
- LayerDimensions{ inputPlanes=1 inputSize=33 numFilters=1 filterSize=1 outputSize=33 padZeros=0 biased=0 skip=0}
- clblas teardown
- [ OK ] testforward.compare_break1_0_4 (160 ms)
- [ RUN ] testforward.comparespecific_break2
- LayerDimensions{ inputPlanes=64 inputSize=19 numFilters=64 filterSize=19 outputSize=1 padZeros=0 biased=0 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL14AC.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- batch 0 batchsize 4
- dump enabled=0
- batch 0 batchsize 4
- dump enabled=0
- LayerDimensions{ inputPlanes=64 inputSize=19 numFilters=64 filterSize=19 outputSize=1 padZeros=0 biased=0 skip=0}
- clblas teardown
- [ OK ] testforward.comparespecific_break2 (850 ms)
- [ RUN ] testforward.softmax
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- output[0]=0.0320586
- output[1]=0.0871443
- output[2]=0.643914
- output[3]=0.236883
- loss 0.44019
- loss 3.44019
- loss 2.44019
- loss 1.44019
- [ OK ] testforward.softmax (20 ms)
- [ RUN ] testforward.softmax_byplane
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- output[0]=0.0320586
- output[1]=0.0871443
- output[2]=0.643914
- output[3]=0.236883
- loss 0.44019
- loss 3.44019
- loss 2.44019
- loss 1.44019
- [ OK ] testforward.softmax_byplane (10 ms)
- [ RUN ] testforward.crash_from_jm
- -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=20 -D gFilterSize=28 -D gHalfFilterSize=14 -D gFilterSizeSquared=784 -D gNumOutputPlanes=20 -D gOutputPlanes=20 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- dump enabled=0
- [ OK ] testforward.crash_from_jm (410 ms)
- [----------] 17 tests from testforward (7990 ms total)
- [----------] 2 tests from testfilehelper
- [ RUN ] testfilehelper.testfilehelper
- [ OK ] testfilehelper.testfilehelper (10 ms)
- [ RUN ] testfilehelper.testreadchunk
- [ OK ] testfilehelper.testreadchunk (0 ms)
- [----------] 2 tests from testfilehelper (10 ms total)
- [----------] 12 tests from testsimpleconvolvenet
- [ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 0.141046
- accuracy: 2/2 100%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 290ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL1C51.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 10ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 290ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 200ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 10ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 200ms
- forward layer selected kernel 1
- loss, E, 0.0733092
- accuracy: 2/2 100%
- loss, E, 0.0426809
- accuracy: 2/2 100%
- loss, E, 0.0262452
- accuracy: 2/2 100%
- loss, E, 0.0164245
- accuracy: 2/2 100%
- loss, E, 0.0107573
- accuracy: 2/2 100%
- accuracy: 2/2
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize1_planes2_filters2_unbiased_tanh (1920 ms)
- [ RUN ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 0.964924
- accuracy: 1/2 50%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 420ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL247A.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 420ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 200ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 200ms
- forward layer selected kernel 1
- loss, E, 0.00570461
- accuracy: 2/2 100%
- loss, E, 1.34828e-05
- accuracy: 2/2 100%
- loss, E, 3.62078e-08
- accuracy: 2/2 100%
- accuracy: 2/2
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize1_planes2_filters2_tanh (2050 ms)
- [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 10ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 1.13283
- accuracy: 3/4 75%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 490ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL2D1F.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 490ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 260ms
- loss, E, 0.00996342
- accuracy: 4/4 100%
- forward kernel 0: cannot be used
- forward kernel 1 time: 10ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 260ms
- forward layer selected kernel 2
- loss, E, 4.70669e-05
- accuracy: 4/4 100%
- loss, E, 4.0975e-07
- accuracy: 4/4 100%
- accuracy: 4/4
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_tanh (2250 ms)
- [ RUN ] testsimpleconvolvenet.imagesize1_2planes_filtersize1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 0.751601
- accuracy: 2/2 100%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 430ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL3507.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 430ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 210ms
- loss, E, 0.195916
- accuracy: 2/2 100%
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 210ms
- forward layer selected kernel 1
- loss, E, 0.0679117
- accuracy: 2/2 100%
- loss, E, 0.023677
- accuracy: 2/2 100%
- loss, E, 0.00825563
- accuracy: 2/2 100%
- loss, E, 0.00287856
- accuracy: 2/2 100%
- loss, E, 0.00100369
- accuracy: 2/2 100%
- loss, E, 0.000349964
- accuracy: 2/2 100%
- accuracy: 2/2 100%
- accuracy: 2/2
- loss, E, 0.000150648
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize1_2planes_filtersize1 (1950 ms)
- [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 1.48951
- accuracy: 2/4 50%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 500ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL3DDA.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 500ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 250ms
- loss, E, 1.12957
- accuracy: 2/4 50%
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 250ms
- forward layer selected kernel 1
- loss, E, 0.070782
- accuracy: 4/4 100%
- loss, E, 0.003026
- accuracy: 4/4 100%
- loss, E, 0.00021158
- accuracy: 4/4 100%
- loss, E, 1.96858e-05
- accuracy: 4/4 100%
- loss, E, 2.03002e-06
- accuracy: 4/4 100%
- loss, E, 2.15572e-07
- accuracy: 4/4 100%
- loss, E, 2.3083e-08
- accuracy: 4/4 100%
- loss, E, 2.48239e-09
- accuracy: 4/4 100%
- loss, E, 4.14442e-10
- accuracy: 4/4 100%
- accuracy: 4/4
- loss, E, 4.14442e-10
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_relu (2330 ms)
- [ RUN ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 10ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 0.50604
- accuracy: 4/4 100%
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=2 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 510ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL466E.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 510ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 260ms
- loss, E, 0.0565529
- accuracy: 4/4 100%
- forward kernel 0: cannot be used
- forward kernel 1 time: 10ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 260ms
- forward layer selected kernel 2
- loss, E, 0.00777245
- accuracy: 4/4 100%
- loss, E, 0.00106831
- accuracy: 4/4 100%
- loss, E, 0.000218376
- accuracy: 4/4 100%
- accuracy: 4/4
- loss, E, 0.000218376
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize3_n4_filtersize3_linear (2110 ms)
- [ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- epoch 0 loss, E, 0.0559531
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- epoch 1 loss, E, 0.0254554
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 223ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- epoch 2 loss, E, 0.0172943
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 223ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 430ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 70ms
- epoch 3 loss, E, 0.0138013
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL52A9.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL52F8.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 430ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 70ms
- calcGradWeights layer selected kernel 1
- epoch 4 loss, E, 0.0115848
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 200ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 80ms
- epoch 5 loss, E, 0.00987036
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 200ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 80ms
- forward layer selected kernel 1
- epoch 6 loss, E, 0.00844797
- epoch 7 loss, E, 0.00724182
- epoch 8 loss, E, 0.00621212
- epoch 9 loss, E, 0.00533106
- epoch 10 loss, E, 0.00457645
- epoch 11 loss, E, 0.00392979
- epoch 12 loss, E, 0.00337539
- epoch 13 loss, E, 0.00289992
- epoch 14 loss, E, 0.002492
- epoch 15 loss, E, 0.00214191
- epoch 16 loss, E, 0.00184138
- epoch 17 loss, E, 0.00158331
- epoch 18 loss, E, 0.00136164
- epoch 19 loss, E, 0.0011712
- epoch 20 loss, E, 0.00100754
- epoch 21 loss, E, 0.000866877
- epoch 22 loss, E, 0.000745946
- epoch 23 loss, E, 0.000641966
- epoch 24 loss, E, 0.000552543
- epoch 25 loss, E, 0.000475625
- epoch 26 loss, E, 0.000409454
- epoch 27 loss, E, 0.000352522
- epoch 28 loss, E, 0.000303531
- epoch 29 loss, E, 0.00026137
- epoch 30 loss, E, 0.000225082
- epoch 31 loss, E, 0.000193845
- epoch 32 loss, E, 0.000166954
- epoch 33 loss, E, 0.000143801
- epoch 34 loss, E, 0.000123866
- epoch 35 loss, E, 0.000106699
- epoch 36 loss, E, 9.19176e-05
- epoch 37 loss, E, 7.91864e-05
- epoch 38 loss, E, 6.82211e-05
- epoch 39 loss, E, 5.87767e-05
- layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 4:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=4 40.0%
- layer 3: params=6 60.0%
- TOTAL : params=10
- loss, E, 5.87767e-05
- accuracy: 2/2 100%
- accuracy: 2/2
- loss, E, 5.87767e-05
- loss, E, 5.87767e-05
- layer 0:InputLayer{ outputPlanes=1 outputSize=1 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 4:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=4 40.0%
- layer 3: params=6 60.0%
- TOTAL : params=10
- float weights1[] = {-0.303866f, -1.59823f};
- float weights3[] = {0.426358f, -0.719592f, -0.420361f, 0.719566f};
- float bias1[] = {-0.324465f, 0.60279f};
- float bias3[] = {0.506862f, -0.506837f};
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize1_n2_2layers_unbiased (3450 ms)
- [ RUN ] testsimpleconvolvenet.imagesize1_n2_2layers_biased
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 1.19067
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 217ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 16ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 217ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 421ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 78ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL6040.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL607F.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 421ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 16ms
- calcGradWeights kernel 4 time: 78ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 218ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 78ms
- loss, E, 0.0667568
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 218ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 78ms
- forward layer selected kernel 1
- loss, E, 0.00923595
- loss, E, 0.00112611
- loss, E, 0.0001174
- loss, E, 1.15642e-05
- dump enabled=0
- loss, E, 1.78564e-06
- accuracy: 2/2 100%
- accuracy: 2/2
- loss, E, 1.78564e-06
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize1_n2_2layers_biased (3385 ms)
- [ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 1.6207
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 249ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=3 -D gFilterSize=4 -D gHalfFilterSize=2 -D gFilterSizeSquared=16 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=3 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=40 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=4 -D gOutputSizeSquared=16 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=1 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=35 -DgInputStripeMarginSize=5 -DgOutputStripeNumRows=4 -DgOutputStripeSize=16
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 249ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 468ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 437ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 234ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL7234.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 468ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 437ms
- calcGradWeights layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 234ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 234ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 234ms
- forward layer selected kernel 1
- loss, E, 0.000427028
- loss, E, 8.40991e-08
- loss, E, 3.03482e-11
- loss, E, 2.59792e-13
- loss, E, 1.15907e-13
- loss, E, 8.03801e-14
- loss, E, 7.14984e-14
- loss, E, 6.08402e-14
- loss, E, 6.9722e-14
- loss, E, 6.9722e-14
- accuracy: 3/3 100%
- accuracy: 3/3
- loss, E, 6.9722e-14
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n3 (6365 ms)
- [ RUN ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 16ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 3.64011
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 265ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=4 -D gInputSizeSquared=16 -D gNumFilters=3 -D gFilterSize=4 -D gHalfFilterSize=2 -D gFilterSizeSquared=16 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=3 -DgInputStripeInnerNumRows=4 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=16 -DgInputStripeOuterSize=40 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=2 -D gHalfFilterSize=1 -D gFilterSizeSquared=4 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=4 -D gOutputSizeSquared=16 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=1 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=35 -DgInputStripeMarginSize=5 -DgOutputStripeNumRows=4 -DgOutputStripeSize=16
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 16ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 265ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 452ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 452ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 234ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL8B45.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 452ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 452ms
- calcGradWeights layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 16ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 16ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 234ms
- forward layer selected kernel 2
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 234ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 234ms
- forward layer selected kernel 1
- loss, E, 4.07297e-10
- loss, E, 2.30926e-14
- loss, E, 3.9968e-15
- loss, E, 3.9968e-15
- loss, E, 1.55431e-14
- accuracy: 6/6 100%
- accuracy: 6/6
- loss, E, 1.55431e-14
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize_5_4_2layers_filtersize_2_4_biased_n6 (5398 ms)
- [ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 4.00796
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 280ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 16ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 280ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 499ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 640ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 437ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLA243.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 16ms
- calcGradWeights kernel 4 time: 499ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 640ms
- calcGradWeights layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 437ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 249ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 249ms
- forward layer selected kernel 1
- loss, E, 1.87712e-08
- loss, E, 5.01821e-14
- loss, E, 8.88178e-15
- accuracy: 6/6 100%
- accuracy: 6/6
- loss, E, 8.88178e-15
- clblas teardown
- [ OK ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n6 (5304 ms)
- [ RUN ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- loss, E, 17.9931
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 15ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 16ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 281ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=3 -D gInputPlanes=3 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=3 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=3 -D gOutputPlanes=3 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 281ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 484ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 655ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 437ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLB730.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 16ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 484ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 655ms
- calcGradWeights layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 437ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 249ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 15ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 249ms
- forward layer selected kernel 1
- loss, E, 2.93736
- loss, E, 2.74045
- loss, E, 2.72813
- loss, E, 2.72734
- loss, E, 2.72728
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- loss, E, 2.72727
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=3 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=3 inputSize=3 numFilters=3 filterSize=3 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 4:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 1: params=30 26.3%
- layer 3: params=84 73.7%
- TOTAL : params=114
- loss, E, 2.72727
- accuracy: 13/18 72.2222%
- accuracy: 13/18
- C:\Users\pz\Documents\ml\DeepCL\test\testsimpleconvolvenet.cpp(1055): error: Value of: N
- Actual: 18
- Expected: numCorrect
- Which is: 13
- loss, E, 2.72727
- C:\Users\pz\Documents\ml\DeepCL\test\testsimpleconvolvenet.cpp(1059): error: Expected: (0.1f) >= (loss), actual: 0.1 vs 2.72727
- clblas teardown
- [ FAILED ] testsimpleconvolvenet.imagesize_5_3_2layers_filtersize_3_3_biased_n18 (11310 ms)
- [----------] 12 tests from testsimpleconvolvenet (47822 ms total)
- [----------] 3 tests from testlogicaloperators
- [ RUN ] testlogicaloperators.Convolve_1layer_biased_And
- And
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- Loss L 2.13088
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 421ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLD91E.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 421ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 203ms
- Loss L 0.679527
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 203ms
- forward layer selected kernel 1
- Loss L 0.398398
- Loss L 0.301735
- accuracy: 4/4
- loss, E, 0.27227
- clblas teardown
- [ OK ] testlogicaloperators.Convolve_1layer_biased_And (1919 ms)
- [ RUN ] testlogicaloperators.Convolve_1layerbiased_Or
- Or, convolve
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 15ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- Loss L 5.59056
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 422ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLE0B6.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 422ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 203ms
- Loss L 1.22162
- forward kernel 0: cannot be used
- forward kernel 1 time: 15ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 203ms
- forward layer selected kernel 2
- Loss L 0.583397
- Loss L 0.366216
- accuracy: 4/4 100%
- loss, E, 0.300027
- clblas teardown
- [ OK ] testlogicaloperators.Convolve_1layerbiased_Or (1934 ms)
- [ RUN ] testlogicaloperators.Convolve_2layers_relu_Xor
- Xor, convolve
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- hand-setting weights...
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- Loss L 0.152638
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 15ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 219ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=2 -D gInputPlanes=2 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=2 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=2 -D gOutputPlanes=2 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 219ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 421ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 78ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLECF3.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLED42.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 421ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 78ms
- calcGradWeights layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 187ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 78ms
- Loss L 0.00640068
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 187ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 15ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 78ms
- forward layer selected kernel 1
- Loss L 0.00139435
- Loss L 0.000383307
- Loss L 0.000117079
- Loss L 4.63626e-05
- Loss L 1.8873e-05
- Loss L 7.15534e-06
- Loss L 2.83958e-06
- Loss L 1.12727e-06
- Loss L 4.44109e-07
- Loss L 1.72233e-07
- Loss L 6.82345e-08
- Loss L 2.76343e-08
- Loss L 1.04286e-08
- Loss L 4.13357e-09
- Loss L 1.67201e-09
- Loss L 6.29148e-10
- Loss L 2.4837e-10
- Loss L 1.00833e-10
- Loss L 3.80673e-11
- Loss L 1.5131e-11
- Loss L 5.84421e-12
- Loss L 2.16893e-12
- Loss L 9.52127e-13
- Loss L 3.58824e-13
- Loss L 1.56319e-13
- Loss L 9.9476e-14
- Loss L 9.9476e-14
- Loss L 9.9476e-14
- Loss L 9.9476e-14
- Loss L 9.23706e-14
- Loss L 9.23706e-14
- Loss L 9.41469e-14
- Loss L 8.70415e-14
- Loss L 9.41469e-14
- Loss L 8.52651e-14
- Loss L 8.52651e-14
- Loss L 8.52651e-14
- Loss L 8.52651e-14
- layer 0:InputLayer{ outputPlanes=2 outputSize=1 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=1 numFilters=2 filterSize=1 outputSize=1 padZeros=0 biased=1 skip=0} }
- layer 4:ActivationLayer{ RELU }
- layer 5:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=6 50.0%
- layer 3: params=6 50.0%
- TOTAL : params=12
- accuracy: 4/4 100%
- loss, E, 8.52651e-14
- clblas teardown
- [ OK ] testlogicaloperators.Convolve_2layers_relu_Xor (3916 ms)
- [----------] 3 tests from testlogicaloperators (7769 ms total)
- [----------] 12 tests from testbackward
- [ RUN ] testbackward.squareloss
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
- layer 2:SquareLossLayer{}
- inputtotalsize=2400 outputTotalSize=2400
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
- layer 2:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=44 predicted losschange=-0.000912508 actual=-0.000976563
- idx=2245 predicted losschange=0.00785823 actual=0.00805664
- idx=648 predicted losschange=0.00965759 actual=0.00976563
- idx=586 predicted losschange=0.0136895 actual=0.0136719
- idx=730 predicted losschange=0.00117897 actual=0.00146484
- idx=611 predicted losschange=0.00152302 actual=0.00195313
- idx=1130 predicted losschange=0.0159167 actual=0.0161133
- idx=15 predicted losschange=0.0434798 actual=0.0439453
- idx=1923 predicted losschange=-0.00790002 actual=-0.0078125
- idx=670 predicted losschange=0.0335141 actual=0.0336914
- [ OK ] testbackward.squareloss (15 ms)
- [ RUN ] testbackward.crossentropyloss
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
- layer 2:Layer{}
- inputtotalsize=300 outputTotalSize=300
- layer 0:InputLayer{ outputPlanes=3 outputSize=5 }
- layer 1:ForceBackpropLayer{ outputPlanes=3 outputSize=5 }
- layer 2:Layer{}
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=44 predicted losschange=0.000274935 actual=0.000274658
- idx=145 predicted losschange=-0.000885784 actual=-0.00088501
- idx=48 predicted losschange=-0.000859834 actual=-0.000854492
- idx=286 predicted losschange=0.00713042 actual=0.00717163
- idx=130 predicted losschange=-0.000264829 actual=-0.000244141
- idx=11 predicted losschange=-1.98163e-05 actual=0
- idx=230 predicted losschange=-0.000594819 actual=-0.000610352
- idx=15 predicted losschange=-0.0006499 actual=-0.000640869
- idx=123 predicted losschange=-0.000846121 actual=-0.000823975
- idx=70 predicted losschange=0.000790196 actual=0.000793457
- [ OK ] testbackward.crossentropyloss (16 ms)
- [ RUN ] testbackward.softmaxloss
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- inputtotalsize=10 outputTotalSize=10
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=4 predicted losschange=0.000113075 actual=0.00011301
- idx=5 predicted losschange=0.000145627 actual=0.000145674
- idx=8 predicted losschange=3.16699e-05 actual=3.19481e-05
- idx=6 predicted losschange=4.89271e-06 actual=5.24521e-06
- idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
- idx=1 predicted losschange=-8.26119e-05 actual=-8.27312e-05
- idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
- idx=5 predicted losschange=0.000145627 actual=0.000145674
- idx=3 predicted losschange=-5.50179e-05 actual=-5.50747e-05
- idx=0 predicted losschange=2.29469e-05 actual=2.28882e-05
- [ OK ] testbackward.softmaxloss (0 ms)
- [ RUN ] testbackward.squareloss2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SquareLossLayer{}
- batchSize: 32
- inputtotalsize=160 outputTotalSize=160
- layer SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=44 predicted losschange=0.000126406 actual=0.000125885
- idx=5 predicted losschange=0.00461891 actual=0.00464439
- idx=8 predicted losschange=0.000356787 actual=0.000356674
- idx=106 predicted losschange=0.00716324 actual=0.00719643
- idx=90 predicted losschange=0.000474759 actual=0.000480652
- idx=131 predicted losschange=0.000979017 actual=0.000984192
- idx=10 predicted losschange=0.000660134 actual=0.000663757
- idx=15 predicted losschange=0.00961313 actual=0.00965118
- idx=3 predicted losschange=0.00264732 actual=0.00267029
- idx=30 predicted losschange=0.00865312 actual=0.00868607
- [ OK ] testbackward.squareloss2 (31 ms)
- [ RUN ] testbackward.crossentropy2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:Layer{}
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:Layer{}
- batchSize: 2
- inputtotalsize=10 outputTotalSize=10
- layer Layer{}
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:Layer{}
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=4 predicted losschange=0.00258649 actual=-nan(ind)
- idx=5 predicted losschange=0.0227095 actual=-nan(ind)
- idx=8 predicted losschange=-0.00202714 actual=-nan(ind)
- idx=6 predicted losschange=-0.000846508 actual=-nan(ind)
- idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
- idx=1 predicted losschange=-0.00171216 actual=-nan(ind)
- idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
- idx=5 predicted losschange=0.0227095 actual=-nan(ind)
- idx=3 predicted losschange=0.0123444 actual=-nan(ind)
- idx=0 predicted losschange=-0.000424821 actual=-nan(ind)
- [ OK ] testbackward.crossentropy2 (31 ms)
- [ RUN ] testbackward.softmax2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- batchSize: 2
- inputtotalsize=10 outputTotalSize=10
- layer SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- layer 0:InputLayer{ outputPlanes=5 outputSize=1 }
- layer 1:ForceBackpropLayer{ outputPlanes=5 outputSize=1 }
- layer 2:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- Parameters overview: (skipping 3 layers with 0 params)
- TOTAL : params=0
- idx=4 predicted losschange=0.00035729 actual=0.000357628
- idx=5 predicted losschange=0.0015055 actual=0.00151086
- idx=8 predicted losschange=-5.63632e-05 actual=-5.65052e-05
- idx=6 predicted losschange=-1.48864e-05 actual=-1.4782e-05
- idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
- idx=1 predicted losschange=-0.000287167 actual=-0.000287056
- idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
- idx=5 predicted losschange=0.0015055 actual=0.00151086
- idx=3 predicted losschange=-0.000152824 actual=-0.00014782
- idx=0 predicted losschange=1.96542e-05 actual=1.97887e-05
- [ OK ] testbackward.softmax2 (63 ms)
- [ RUN ] testbackward.conv1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
- layer 3:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
- layer 3:SquareLossLayer{}
- batchSize: 4
- inputtotalsize=128 outputTotalSize=32
- layer ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=2 inputSize=4 numFilters=2 filterSize=3 outputSize=2 padZeros=0 biased=0 skip=0} }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 2: params=36 100.0%
- TOTAL : params=36
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 16ms
- idx=44 predicted losschange=0.000198655 actual=0.000199318
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- idx=37 predicted losschange=-0.00664573 actual=-0.00663185
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- idx=40 predicted losschange=0.00305358 actual=0.00306416
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 266ms
- idx=106 predicted losschange=0.000651619 actual=0.00306416
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 16ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 266ms
- forward layer selected kernel 1
- idx=122 predicted losschange=0.0040653 actual=0.00407314
- idx=99 predicted losschange=-0.000240484 actual=-0.00024128
- idx=10 predicted losschange=0.00158175 actual=0.00158405
- idx=47 predicted losschange=0.00140132 actual=0.00140285
- idx=67 predicted losschange=-0.00154732 actual=-0.00154686
- idx=126 predicted losschange=-0.000393638 actual=-0.000391006
- clblas teardown
- [ OK ] testbackward.conv1 (1294 ms)
- [ RUN ] testbackward.fc1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
- layer 3:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
- layer 3:SquareLossLayer{}
- batchSize: 4
- inputtotalsize=128 outputTotalSize=16
- layer FullyConnectedLayer{ numPlanes=4 imageSize=1 }
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- layer 0:InputLayer{ outputPlanes=2 outputSize=4 }
- layer 1:ForceBackpropLayer{ outputPlanes=2 outputSize=4 }
- layer 2:FullyConnectedLayer{ numPlanes=4 imageSize=1 }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 3 layers with 0 params)
- layer 2: params=128 100.0%
- TOTAL : params=128
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- idx=44 predicted losschange=-2.78137e-06 actual=-2.86102e-06
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- idx=37 predicted losschange=-0.000552869 actual=-0.000545502
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- idx=40 predicted losschange=0.00245549 actual=0.00246334
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLFB2C.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- idx=106 predicted losschange=0.00259146 actual=0.00259662
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 250ms
- idx=122 predicted losschange=0.000431057 actual=0.00259662
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 250ms
- forward layer selected kernel 1
- idx=99 predicted losschange=-0.00116097 actual=-0.00116014
- idx=10 predicted losschange=-0.000360866 actual=-0.00036025
- idx=47 predicted losschange=0.000165997 actual=0.000166655
- idx=67 predicted losschange=-0.000468417 actual=-0.000465631
- idx=126 predicted losschange=3.95745e-05 actual=4.1008e-05
- clblas teardown
- [ OK ] testbackward.fc1 (1389 ms)
- [ RUN ] testbackward.act1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
- layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
- layer 2:ActivationLayer{ RELU }
- layer 3:SquareLossLayer{}
- layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
- layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
- layer 2:ActivationLayer{ RELU }
- layer 3:SquareLossLayer{}
- batchSize: 1
- inputtotalsize=4 outputTotalSize=4
- layer ActivationLayer{ RELU }
- layer 0:InputLayer{ outputPlanes=1 outputSize=2 }
- layer 1:ForceBackpropLayer{ outputPlanes=1 outputSize=2 }
- layer 2:ActivationLayer{ RELU }
- layer 3:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- TOTAL : params=0
- idx=0 predicted losschange=-0.000880961 actual=-0.00088048
- idx=1 predicted losschange=-0.00151209 actual=-0.00151044
- idx=0 predicted losschange=-0.000880961 actual=-0.00088048
- idx=2 predicted losschange=-0.00245153 actual=-0.0024423
- idx=2 predicted losschange=-0.00245153 actual=-0.0024423
- idx=3 predicted losschange=-0.00214455 actual=-0.00212085
- idx=2 predicted losschange=-0.00245153 actual=-0.0024423
- idx=3 predicted losschange=-0.00214455 actual=-0.00212085
- idx=3 predicted losschange=-0.00214455 actual=-0.00212085
- idx=2 predicted losschange=-0.00245153 actual=-0.0024423
- [ OK ] testbackward.act1 (140 ms)
- [ RUN ] testbackward.checknumerically
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- loss 0.0986296 loss2 0.0984814 change: 0.000148199
- sumweightsdiff 0.0038507
- loss change 0.000148199
- estimatedLossChangeFromW 0.000148279
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL3D3.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 15ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL422.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 156ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 78ms
- loss 0.0984814 loss2 0.0983336 change: 0.000147872
- sumweightsdiff 0.00384641
- loss change 0.000147872
- estimatedLossChangeFromW 0.000147948
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 15ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 156ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 78ms
- forward layer selected kernel 1
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 140ms
- calcGradWeights try kernel 3
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=1 -D gInputSizeSquared=1 -D gNumFilters=1 -D gFilterSize=1 -D gHalfFilterSize=0 -D gFilterSizeSquared=1 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=0 -DgInputStripeInnerNumRows=1 -DgInputStripeOuterNumRows=1 -DgInputStripeInnerSize=1 -DgInputStripeOuterSize=1 -DgInputStripeMarginSize=0 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- loss 0.0983336 loss2 0.098186 change: 0.000147544
- sumweightsdiff 0.00384223
- loss change 0.000147544
- estimatedLossChangeFromW 0.000147628
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 140ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 218ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 78ms
- loss 0.098186 loss2 0.0980388 change: 0.000147216
- sumweightsdiff 0.00383794
- loss change 0.000147216
- estimatedLossChangeFromW 0.000147298
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 218ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 78ms
- calcGradWeights layer selected kernel 1
- loss 0.0980388 loss2 0.0978919 change: 0.000146888
- sumweightsdiff 0.00383377
- loss change 0.000146888
- estimatedLossChangeFromW 0.000146978
- clblas teardown
- [ OK ] testbackward.checknumerically (3027 ms)
- [ RUN ] testbackward.checknumerically_imagesize5_filter3_relu
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- loss 630.466 loss2 608.021 change: 22.4443
- sumweightsdiff -0.035685
- loss change 22.4443
- estimatedLossChangeFromW 22.6629
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 281ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 94ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 281ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 94ms
- forward layer selected kernel 1
- loss 608.021 loss2 586.349 change: 21.672
- sumweightsdiff -0.0350289
- loss change 21.672
- estimatedLossChangeFromW 21.7974
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 406ms
- calcGradWeights try kernel 3
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=1 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=1 -D gOutputPlanes=1 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- loss 586.349 loss2 565.324 change: 21.025
- sumweightsdiff -0.0345262
- loss change 21.025
- estimatedLossChangeFromW 21.2378
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 406ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 359ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 109ms
- loss 565.324 loss2 545.133 change: 20.1916
- sumweightsdiff -0.0338754
- loss change 20.1916
- estimatedLossChangeFromW 20.3956
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 359ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 109ms
- calcGradWeights layer selected kernel 1
- loss 545.133 loss2 525.742 change: 19.3912
- sumweightsdiff -0.0332378
- loss change 19.3912
- estimatedLossChangeFromW 19.5872
- loss 525.742 loss2 507.119 change: 18.6229
- sumweightsdiff -0.0326132
- loss change 18.6229
- estimatedLossChangeFromW 18.8111
- loss 507.119 loss2 489.233 change: 17.8853
- sumweightsdiff -0.032001
- loss change 17.8853
- estimatedLossChangeFromW 18.066
- loss 489.233 loss2 472.056 change: 17.1772
- sumweightsdiff -0.0314012
- loss change 17.1772
- estimatedLossChangeFromW 17.3506
- loss 472.056 loss2 455.559 change: 16.4975
- sumweightsdiff -0.0308135
- loss change 16.4975
- estimatedLossChangeFromW 16.6639
- loss 455.559 loss2 439.714 change: 15.8447
- sumweightsdiff -0.0302379
- loss change 15.8447
- estimatedLossChangeFromW 16.0046
- loss 439.714 loss2 424.416 change: 15.2976
- sumweightsdiff -0.0296733
- loss change 15.2976
- estimatedLossChangeFromW 15.3717
- loss 424.416 loss2 409.545 change: 14.871
- sumweightsdiff -0.0299227
- loss change 14.871
- estimatedLossChangeFromW 15.0234
- loss 409.545 loss2 395.271 change: 14.274
- sumweightsdiff -0.0293575
- loss change 14.274
- estimatedLossChangeFromW 14.4202
- loss 395.271 loss2 381.57 change: 13.7013
- sumweightsdiff -0.0288033
- loss change 13.7013
- estimatedLossChangeFromW 13.8415
- loss 381.57 loss2 368.418 change: 13.1519
- sumweightsdiff -0.0282608
- loss change 13.1519
- estimatedLossChangeFromW 13.2864
- loss 368.418 loss2 355.794 change: 12.6248
- sumweightsdiff -0.0277294
- loss change 12.6248
- estimatedLossChangeFromW 12.7538
- loss 355.794 loss2 343.675 change: 12.119
- sumweightsdiff -0.027209
- loss change 12.119
- estimatedLossChangeFromW 12.2429
- loss 343.675 loss2 332.041 change: 11.634
- sumweightsdiff -0.0266991
- loss change 11.634
- estimatedLossChangeFromW 11.7526
- loss 332.041 loss2 320.872 change: 11.1684
- sumweightsdiff -0.0261997
- loss change 11.1684
- estimatedLossChangeFromW 11.2823
- loss 320.872 loss2 310.15 change: 10.7218
- sumweightsdiff -0.0257105
- loss change 10.7218
- estimatedLossChangeFromW 10.8312
- clblas teardown
- [ OK ] testbackward.checknumerically_imagesize5_filter3_relu (3915 ms)
- [ RUN ] testbackward.compare_1_n_kgsgo_32c5
- -D BIASED -D gNumInputPlanes=32 -D gInputPlanes=32 -D gInputSize=19 -D gInputSizeSquared=361 -D gNumFilters=32 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=32 -D gOutputPlanes=32 -D gOutputSize=19 -D gOutputSizeSquared=361 -D gPadZeros=1 -D gMargin=2 -D gEven=0 -D gSkip=0
- batchsize=8 LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- output[0]=-0.0308112 -0.0308112 SAME || -0.129603 || -0.048413 || 0.07916 || -0.118675 || 0.0416933 || 0.100887 || -0.106013
- output[1]=-0.0574008 -0.0574008 SAME || 0.099984 || 0.0155394 || 0.00411644 || 0.131031 || -0.0107744 || 0.121347 || 0.0437087
- output[2]=-0.0227139 -0.0227139 SAME || -0.0115189 || -0.190989 || -0.0445787 || -0.013341 || -0.04953 || -0.109186 || 0.104814
- output[3]=-0.0805896 -0.0805896 SAME || 0.0216207 || -0.128649 || -0.0159031 || 0.0534839 || 0.0301581 || 0.104269 || -0.0841106
- output[4]=-0.0723994 -0.0723994 SAME || -0.0164838 || -0.00649171 || -0.042007 || 0.147102 || -0.0702085 || -0.0120931 || 0.0597854
- output[5]=0.130336 0.130336 SAME || -0.0816751 || -0.272227 || 0.0707071 || 0.133967 || 0.0323092 || 0.124248 || -0.0138626
- output[6]=-0.00415662 -0.00415662 SAME || -0.0920411 || 0.0352436 || 0.0541946 || 0.00491123 || -0.0805987 || 0.0834764 || 0.0631893
- output[7]=-0.0915931 -0.0915931 SAME || -0.0358497 || 0.0445722 || -0.0472172 || 0.0778742 || -0.0550363 || -0.179262 || -0.0812755
- output[8]=0.0556533 0.0556533 SAME || -0.0684331 || -0.0243033 || -0.0822076 || -0.0104788 || -0.043145 || -0.0481164 || 0.0538944
- output[9]=-0.0725742 -0.0725742 SAME || 0.0486592 || -0.0286811 || -0.0249626 || 0.0394469 || -0.144496 || 0.0909432 || -0.0152857
- output[10]=-0.0153476 -0.0153476 SAME || -0.0677297 || -0.140709 || -0.0161164 || 0.131645 || 0.0545684 || -0.0210541 || 0.0611338
- output[11]=-0.0212713 -0.0212713 SAME || 0.100494 || 0.2122 || -0.0812487 || 0.0532493 || -0.0183774 || -0.0937923 || -0.069912
- output[12]=0.0389741 0.0389741 SAME || 0.0809882 || 0.0370538 || 0.0241565 || -0.0582968 || 0.0437625 || 0.139931 || -0.065007
- output[13]=0.0349705 0.0349705 SAME || -0.0251775 || -0.0759114 || 0.0945214 || 0.00389841 || -0.0377205 || 0.17624 || -0.114476
- output[14]=0.0366689 0.0366689 SAME || -0.0348694 || -0.0581568 || 0.0376178 || -0.0298947 || -0.0299259 || -0.0913825 || -0.0745193
- output[15]=0.0186965 0.0186965 SAME || 0.0281147 || 0.00937999 || 0.108983 || -0.0505074 || -0.0573388 || 0.067382 || 0.0387854
- output[16]=0.0658136 0.0658136 SAME || -0.0412163 || -0.128719 || 0.150029 || 0.0555238 || -0.0203267 || -0.0795422 || -0.123847
- output[17]=0.0705919 0.0705919 SAME || 0.147334 || 0.151016 || -0.0122364 || 0.0360484 || -0.0609187 || 0.0166715 || -0.141399
- output[18]=-0.0508929 -0.0508929 SAME || 0.0131358 || -0.0101773 || -0.120741 || -0.00821514 || 0.00894922 || -0.117651 || 0.0631629
- output[19]=-0.0110406 -0.0110406 SAME || 0.189081 || 0.0665268 || 0.0622702 || 0.151629 || -0.0172241 || -0.0215623 || 0.0457666
- clblas teardown
- batchsize=8 LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0}
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- clblas teardown
- unknown file: error: C++ exception with description "
- kernel source:
- 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: void copyLocal(local float *target, global float const *source, int N) {
- 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 9: for (int loop = 0; loop < numLoops; loop++) {
- 10: int offset = loop * get_local_size(0) + get_local_id(0);
- 11: if (offset < N) {
- 12: target[offset] = source[offset];
- 13: }
- 14: }
- 15: }
- 16:
- 17: // as calcGradInput, but with local cache
- 18: // convolve weights with gradOutput to produce gradInput
- 19: // workgroupid: [n][inputPlane]
- 20: // localid: [upstreamrow][upstreamcol]
- 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
- 22: // need to store locally:
- 23: // - _gradOutputPlane. size = outputSizeSquared
- 24: // - _filterPlane. size = filtersizesquared
- 25: // note: currently doesnt use bias as input. thats probably an error?
- 26: // inputs: gradOutput :convolve: filters => gradInput
- 27: //
- 28: // global:
- 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
- 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
- 31: // per workgroup:
- 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
- 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
- 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
- 35: void kernel calcGradInputCached(
- 36: const int batchSize,
- 37: global const float *gradOutputGlobal,
- 38: global const float *filtersGlobal,
- 39: global float *gradInput,
- 40: local float *_gradOutputPlane,
- 41: local float *_filterPlane) {
- 42:
- 43: #define globalId get_global_id(0)
- 44: #define localId get_local_id(0)
- 45: #define workgroupId get_group_id(0)
- 46: #define workgroupSize get_local_size(0)
- 47:
- 48: const int n = workgroupId / gInputPlanes;
- 49: const int upstreamPlane = workgroupId % gInputPlanes;
- 50:
- 51: const int upstreamRow = localId / gInputSize;
- 52: const int upstreamCol = localId % gInputSize;
- 53:
- 54: float sumWeightTimesOutError = 0;
- 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
- 56: barrier(CLK_LOCAL_MEM_FENCE);
- 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
- 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
- 59: barrier(CLK_LOCAL_MEM_FENCE);
- 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
- 61: int outRow = upstreamRow + gMargin - filterRow;
- 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
- 63: int outCol = upstreamCol + gMargin - filterCol;
- 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
- 65: float thisWeightTimesError =
- 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
- 67: _filterPlane[filterRow * gFilterSize + filterCol];
- 68: sumWeightTimesOutError += thisWeightTimesError;
- 69: }
- 70: }
- 71: }
- 72: }
- 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
- 74: if (localId < gInputSizeSquared) {
- 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
- 76: }
- 77: }
- 78:
- 79:
- Something went wrong, code -55" thrown in the test body.
- [ FAILED ] testbackward.compare_1_n_kgsgo_32c5 (843 ms)
- [----------] 12 tests from testbackward (10764 ms total)
- [----------] 6 tests from testsinglebatch
- [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=5 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ LINEAR }
- layer 3:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
- layer 4:ActivationLayer{ TANH }
- layer 5:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=50 17.9%
- layer 3: params=230 82.1%
- TOTAL : params=280
- weightsTotalSize=280
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 16ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 15ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 702ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL2857.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 421ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 702ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 312ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 15ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 312ms
- forward layer selected kernel 1
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 421ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 843ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 1092ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 16ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 843ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 1092ms
- calcGradWeights layer selected kernel 1
- batch time 5694 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2 (6130 ms)
- [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2_10filters
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=10 filterSize=3 outputSize=3 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 4:ActivationLayer{ TANH }
- layer 5:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=100 9.9%
- layer 3: params=910 90.1%
- TOTAL : params=1010
- weightsTotalSize=1010
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 702ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL410C.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 374ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=3 -D gOutputSizeSquared=9 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=3 -DgOutputStripeSize=9
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 702ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 312ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 312ms
- forward layer selected kernel 1
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 374ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 827ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 1076ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 827ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 1076ms
- calcGradWeights layer selected kernel 1
- batch time 6037 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2_10filters (6490 ms)
- [ RUN ] testsinglebatch.imagesize28
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=28 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=28 numFilters=10 filterSize=3 outputSize=26 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 4:ActivationLayer{ TANH }
- layer 5:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=100 0.1%
- layer 3: params=67610 99.9%
- TOTAL : params=67710
- weightsTotalSize=67710
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ForwardAuto: kernel 2: this instance cant be used: cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize
- ... not valid
- forward try kernel 3
- ForwardAuto: kernel 3: this instance cant be used: cannot use forward3, since outputimagesize * outputimagesize > maxworkgroupsize
- ... not valid
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 562ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: void copyLocal(local float *target, global float const *source, int N) {
- 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 9: for (int loop = 0; loop < numLoops; loop++) {
- 10: int offset = loop * get_local_size(0) + get_local_id(0);
- 11: if (offset < N) {
- 12: target[offset] = source[offset];
- 13: }
- 14: }
- 15: }
- 16:
- 17: // as calcGradInput, but with local cache
- 18: // convolve weights with gradOutput to produce gradInput
- 19: // workgroupid: [n][inputPlane]
- 20: // localid: [upstreamrow][upstreamcol]
- 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
- 22: // need to store locally:
- 23: // - _gradOutputPlane. size = outputSizeSquared
- 24: // - _filterPlane. size = filtersizesquared
- 25: // note: currently doesnt use bias as input. thats probably an error?
- 26: // inputs: gradOutput :convolve: filters => gradInput
- 27: //
- 28: // global:
- 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
- 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
- 31: // per workgroup:
- 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
- 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
- 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
- 35: void kernel calcGradInputCached(
- 36: const int batchSize,
- 37: global const float *gradOutputGlobal,
- 38: global const float *filtersGlobal,
- 39: global float *gradInput,
- 40: local float *_gradOutputPlane,
- 41: local float *_filterPlane) {
- 42:
- 43: #define globalId get_global_id(0)
- 44: #define localId get_local_id(0)
- 45: #define workgroupId get_group_id(0)
- 46: #define workgroupSize get_local_size(0)
- 47:
- 48: const int n = workgroupId / gInputPlanes;
- 49: const int upstreamPlane = workgroupId % gInputPlanes;
- 50:
- 51: const int upstreamRow = localId / gInputSize;
- 52: const int upstreamCol = localId % gInputSize;
- 53:
- 54: float sumWeightTimesOutError = 0;
- 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
- 56: barrier(CLK_LOCAL_MEM_FENCE);
- 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
- 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
- 59: barrier(CLK_LOCAL_MEM_FENCE);
- 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
- 61: int outRow = upstreamRow + gMargin - filterRow;
- 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
- 63: int outCol = upstreamCol + gMargin - filterCol;
- 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
- 65: float thisWeightTimesError =
- 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
- 67: _filterPlane[filterRow * gFilterSize + filterCol];
- 68: sumWeightTimesOutError += thisWeightTimesError;
- 69: }
- 70: }
- 71: }
- 72: }
- 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
- 74: if (localId < gInputSizeSquared) {
- 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
- 76: }
- 77: }
- 78:
- 79:
- Something went wrong, code -55
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 889ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: // expected defines:
- 8: // BIASED (or not)
- 9:
- 10: // including cl/copyLocal.cl:
- 11: // Copyright Hugh Perkins 2015 hughperkins at gmail
- 12: //
- 13: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 14: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 15: // obtain one at http://mozilla.org/MPL/2.0/.
- 16:
- 17: void copyLocal(local float *target, global float const *source, int N) {
- 18: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 19: for (int loop = 0; loop < numLoops; loop++) {
- 20: int offset = loop * get_local_size(0) + get_local_id(0);
- 21: if (offset < N) {
- 22: target[offset] = source[offset];
- 23: }
- 24: }
- 25: }
- 26:
- 27: void copyGlobal(global float *target, local float const *source, int N) {
- 28: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 29: for (int loop = 0; loop < numLoops; loop++) {
- 30: int offset = loop * get_local_size(0) + get_local_id(0);
- 31: if (offset < N) {
- 32: target[offset] = source[offset];
- 33: }
- 34: }
- 35: }
- 36:
- 37:
- 38: // including cl/ids.cl:
- 39: // Copyright Hugh Perkins 2015 hughperkins at gmail
- 40: //
- 41: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 42: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 43: // obtain one at http://mozilla.org/MPL/2.0/.
- 44:
- 45: #define globalId (get_global_id(0))
- 46: #define localId (get_local_id(0) )
- 47: #define workgroupId (get_group_id(0))
- 48: #define workgroupSize (get_local_size(0))
- 49:
- 50:
- 51:
- 52:
- 53: // workgroupId: [outputPlane][inputPlane]
- 54: // localId: [filterRow][filterCol]
- 55: // per-thread iteration: [n][outputRow][outputCol]
- 56: // local: errorimage: outputSize * outputSize
- 57: // imageimage: inputSize * inputSize
- 58: void kernel backprop_floats_withscratch_dobias(
- 59: const float learningRateMultiplier, const int batchSize,
- 60: global const float *gradOutput, global const float *images,
- 61: global float *gradWeights,
- 62: #ifdef BIASED
- 63: global float *gradBiasWeights,
- 64: #endif
- 65: local float *_errorImage, local float *_imageImage
- 66: ) {
- 67: const int filterRow = localId / gFilterSize;
- 68: const int filterCol = localId % gFilterSize;
- 69:
- 70: #define outPlane (workgroupId / gInputPlanes)
- 71: #define upstreamPlane (workgroupId % gInputPlanes)
- 72:
- 73: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 74: // aggregate over: [outRow][outCol][n]
- 75: float thiswchange = 0;
- 76: #ifdef BIASED
- 77: float thisbiaschange = 0;
- 78: #endif
- 79: for (int n = 0; n < batchSize; n++) {
- 80: barrier(CLK_LOCAL_MEM_FENCE);
- 81: copyLocal(_imageImage, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
- 82: copyLocal(_errorImage, gradOutput + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
- 83: barrier(CLK_LOCAL_MEM_FENCE);
- 84: if (localId < gFilterSizeSquared) {
- 85: for (int outRow = 0; outRow < gOutputSize; outRow++) {
- 86: int upstreamRow = outRow - gMargin + filterRow;
- 87: for (int outCol = 0; outCol < gOutputSize; outCol++) {
- 88: const int upstreamCol = outCol - gMargin + filterCol;
- 89: #define proceed (upstreamRow >= 0 && upstreamCol >= 0 && upstreamRow < gInputSize && upstreamCol < gInputSize)
- 90: if (proceed) {
- 91: // these defines reduce register pressure, compared to const
- 92: // giving a 40% speedup on nvidia :-)
- 93: #define resultIndex (outRow * gOutputSize + outCol)
- 94: #define error (_errorImage[resultIndex])
- 95: //const float error = _errorImage[resultIndex];
- 96: #define upstreamDataIndex (upstreamRow * gInputSize + upstreamCol)
- 97: #define upstreamResult (_imageImage[upstreamDataIndex])
- 98: thiswchange += upstreamResult * error;
- 99: #ifdef BIASED
- 100: thisbiaschange += error;
- 101: #endif
- 102: }
- 103: }
- 104: }
- 105: }
- 106: }
- 107: if (localId < gFilterSizeSquared) {
- 108: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
- 109: }
- 110: #ifdef BIASED
- 111: #define writeBias (upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin)
- 112: if (writeBias) {
- 113: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
- 114: }
- 115: #endif
- 116: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 117: // aggregate over: [outRow][outCol][n]
- 118: }
- 119:
- 120:
- Something went wrong, code -55
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=26 -D gInputSizeSquared=676 -D gNumFilters=10 -D gFilterSize=26 -D gHalfFilterSize=13 -D gFilterSizeSquared=676 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=25 -DgInputStripeInnerNumRows=26 -DgInputStripeOuterNumRows=76 -DgInputStripeInnerSize=676 -DgInputStripeOuterSize=1976 -DgInputStripeMarginSize=650 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: // expected defines:
- 8: // BIASED (or not)
- 9:
- 10: // workgroupId: [outputPlane][inputPlane]
- 11: // localId: [filterRow][filterCol]
- 12: // per-thread iteration: [n][outputRow][outputCol]
- 13: // local: errorimage: outputSize * outputSize
- 14: // imageimage: inputSize * inputSize
- 15: // specific characteristic: load one stripe of each image at a time,
- 16: // so we dont run out of memory
- 17: // number of stripes set in: gNumStripes
- 18: // note that whilst we can stripe the gradOutput simply,
- 19: // we actually need to add a half-filter widthed additional few rows
- 20: // onto the images stripe, otherwise we will be missing data
- 21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
- 22: // the outersize, including the two margins is: gInputStripeOuterSize
- 23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
- 24: // corresponding outer margin would be
- 25: void kernel backprop_floats_withscratch_dobias_striped(
- 26: const float learningRateMultiplier, const int batchSize,
- 27: global const float *gradOutput, global const float *images,
- 28: global float *gradWeights,
- 29: #ifdef BIASED
- 30: global float *gradBiasWeights,
- 31: #endif
- 32: local float *_errorStripe, local float *_imageStripe
- 33: ) {
- 34: // gHalfFilterSize
- 35: // gInputSize
- 36: //
- 37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
- 38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
- 39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
- 40: // if we just added gFilterSize)
- 41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
- 42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
- 43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
- 44: //
- 45: // gOutputStripeNumRows
- 46: // gOutputStripeSize
- 47:
- 48: const int globalId = get_global_id(0);
- 49: const int localId = get_local_id(0);
- 50: const int workgroupId = get_group_id(0);
- 51: const int workgroupSize = get_local_size(0);
- 52:
- 53: const int filterRow = localId / gFilterSize;
- 54: const int filterCol = localId % gFilterSize;
- 55:
- 56: const int outPlane = workgroupId / gInputPlanes;
- 57: const int upstreamPlane = workgroupId % gInputPlanes;
- 58:
- 59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 60: // aggregate over: [outRow][outCol][n]
- 61: float thiswchange = 0;
- 62: #ifdef BIASED
- 63: float thisbiaschange = 0;
- 64: #endif
- 65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
- 66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
- 67: for (int n = 0; n < batchSize; n++) {
- 68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
- 69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
- 70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
- 71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
- 72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
- 73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
- 74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
- 75: // need to fetch the image, but it's bigger than us, so will need to loop...
- 76: barrier(CLK_LOCAL_MEM_FENCE);
- 77: for (int i = 0; i < numLoopsForImageStripe; i++) {
- 78: int thisOffset = i * workgroupSize + localId;
- 79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
- 80: bool process = thisOffset < gInputStripeOuterSize
- 81: && thisGlobalImagesOffset >= imageImageGlobalOffset
- 82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
- 83: if (process) {
- 84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
- 85: }
- 86: }
- 87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
- 88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
- 89: int thisOffset = i * workgroupSize + localId;
- 90: int globalErrorsOffset = errorStripeOffset + thisOffset;
- 91: bool process = thisOffset < gOutputStripeSize
- 92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
- 93: if (process) {
- 94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
- 95: }
- 96: }
- 97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
- 98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
- 99: barrier(CLK_LOCAL_MEM_FENCE);
- 100: // if (localId == 13) {
- 101: // for (int i = 0; i < 12; i++) {
- 102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
- 103: // }
- 104: // for (int i = 0; i < 20; i++) {
- 105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
- 106: // }
- 107: // }
- 108: if (localId < gFilterSizeSquared) {
- 109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
- 110: int upstreamRow = outRow - gMargin + filterRow;
- 111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
- 112: int upstreamCol = outCol - gMargin + filterCol;
- 113: bool proceed =
- 114: upstreamRow >= 0 && upstreamCol >= 0
- 115: && upstreamRow < gInputSize && upstreamCol < gInputSize
- 116: && outRow < gOutputSize;
- 117: if (proceed) {
- 118: int resultIndex = outRow * gOutputSize + outCol;
- 119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
- 120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
- 121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
- 122: - stripe * gInputStripeInnerSize ];
- 123: thiswchange += upstreamResult * error;
- 124: #ifdef BIASED
- 125: thisbiaschange += error;
- 126: #endif
- 127: }
- 128: }
- 129: }
- 130: }
- 131: }
- 132: }
- 133: if (localId < gFilterSizeSquared) {
- 134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
- 135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
- 136: }
- 137: #ifdef BIASED
- 138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
- 139: if (writeBias) {
- 140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
- 141: }
- 142: #endif
- 143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 144: // aggregate over: [outRow][outCol][n]
- 145: }
- 146:
- 147:
- Something went wrong, code -55
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 1341ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2: cannot be used
- forward kernel 3: cannot be used
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 562ms
- forward layer selected kernel 1
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCL7D02.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2: cannot be used
- backward kernel 3 time: 889ms
- backward layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2: cannot be used
- calcGradWeights kernel 3: cannot be used
- calcGradWeights kernel 4 time: 1341ms
- calcGradWeights layer selected kernel 1
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=10 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=26 -D gOutputSizeSquared=676 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=28 -DgInputStripeOuterNumRows=32 -DgInputStripeInnerSize=784 -DgInputStripeOuterSize=896 -DgInputStripeMarginSize=56 -DgOutputStripeNumRows=26 -DgOutputStripeSize=676
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 827ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 827ms
- forward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 889ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 889ms
- calcGradWeights layer selected kernel 1
- batch time 15116 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize28 (15553 ms)
- [ RUN ] testsinglebatch.imagesize28_filtersize5
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=28 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=28 numFilters=10 filterSize=5 outputSize=24 padZeros=0 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 4:ActivationLayer{ TANH }
- layer 5:SquareLossLayer{}
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=260 0.4%
- layer 3: params=57610 99.6%
- TOTAL : params=57870
- weightsTotalSize=57870
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 15ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- forward try kernel 2
- ForwardAuto: kernel 2: this instance cant be used: cannot use forward2, since outputimagesize * outputimagesize > maxworkgroupsize
- ... not valid
- forward try kernel 3
- ForwardAuto: kernel 3: this instance cant be used: cannot use forward3, since outputimagesize * outputimagesize > maxworkgroupsize
- ... not valid
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 562ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014, 2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: void copyLocal(local float *target, global float const *source, int N) {
- 8: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 9: for (int loop = 0; loop < numLoops; loop++) {
- 10: int offset = loop * get_local_size(0) + get_local_id(0);
- 11: if (offset < N) {
- 12: target[offset] = source[offset];
- 13: }
- 14: }
- 15: }
- 16:
- 17: // as calcGradInput, but with local cache
- 18: // convolve weights with gradOutput to produce gradInput
- 19: // workgroupid: [n][inputPlane]
- 20: // localid: [upstreamrow][upstreamcol]
- 21: // per-thread aggregation: [outPlane][filterRow][filterCol]
- 22: // need to store locally:
- 23: // - _gradOutputPlane. size = outputSizeSquared
- 24: // - _filterPlane. size = filtersizesquared
- 25: // note: currently doesnt use bias as input. thats probably an error?
- 26: // inputs: gradOutput :convolve: filters => gradInput
- 27: //
- 28: // global:
- 29: // gradOutput: [n][outPlane][outRow][outCol] 128 * 32 * 19 * 19 * 4
- 30: // weights: [filterId][upstreamplane][filterRow][filterCol] 32 * 32 * 5 * 5 * 4
- 31: // per workgroup:
- 32: // gradOutput: [outPlane][outRow][outCol] 32 * 19 * 19 * 4 = 46KB
- 33: // weights: [filterId][filterRow][filterCol] 32 * 5 * 5 * 4 = 3.2KB
- 34: // gradOutputforupstream: [n][upstreamPlane][upstreamRow][upstreamCol]
- 35: void kernel calcGradInputCached(
- 36: const int batchSize,
- 37: global const float *gradOutputGlobal,
- 38: global const float *filtersGlobal,
- 39: global float *gradInput,
- 40: local float *_gradOutputPlane,
- 41: local float *_filterPlane) {
- 42:
- 43: #define globalId get_global_id(0)
- 44: #define localId get_local_id(0)
- 45: #define workgroupId get_group_id(0)
- 46: #define workgroupSize get_local_size(0)
- 47:
- 48: const int n = workgroupId / gInputPlanes;
- 49: const int upstreamPlane = workgroupId % gInputPlanes;
- 50:
- 51: const int upstreamRow = localId / gInputSize;
- 52: const int upstreamCol = localId % gInputSize;
- 53:
- 54: float sumWeightTimesOutError = 0;
- 55: for (int outPlane = 0; outPlane < gNumFilters; outPlane++) {
- 56: barrier(CLK_LOCAL_MEM_FENCE);
- 57: copyLocal(_filterPlane, filtersGlobal + (outPlane * gInputPlanes + upstreamPlane) * gFilterSizeSquared, gFilterSizeSquared);
- 58: copyLocal(_gradOutputPlane, gradOutputGlobal + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
- 59: barrier(CLK_LOCAL_MEM_FENCE);
- 60: for (int filterRow = 0; filterRow < gFilterSize; filterRow++) {
- 61: int outRow = upstreamRow + gMargin - filterRow;
- 62: for (int filterCol = 0; filterCol < gFilterSize; filterCol++) {
- 63: int outCol = upstreamCol + gMargin - filterCol;
- 64: if (outCol >= 0 && outCol < gOutputSize && outRow >= 0 && outRow < gOutputSize) {
- 65: float thisWeightTimesError =
- 66: _gradOutputPlane[outRow * gOutputSize + outCol] *
- 67: _filterPlane[filterRow * gFilterSize + filterCol];
- 68: sumWeightTimesOutError += thisWeightTimesError;
- 69: }
- 70: }
- 71: }
- 72: }
- 73: const int upstreamImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
- 74: if (localId < gInputSizeSquared) {
- 75: gradInput[upstreamImageGlobalOffset + localId] = sumWeightTimesOutError;
- 76: }
- 77: }
- 78:
- 79:
- Something went wrong, code -55
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 562ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: // expected defines:
- 8: // BIASED (or not)
- 9:
- 10: // including cl/copyLocal.cl:
- 11: // Copyright Hugh Perkins 2015 hughperkins at gmail
- 12: //
- 13: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 14: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 15: // obtain one at http://mozilla.org/MPL/2.0/.
- 16:
- 17: void copyLocal(local float *target, global float const *source, int N) {
- 18: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 19: for (int loop = 0; loop < numLoops; loop++) {
- 20: int offset = loop * get_local_size(0) + get_local_id(0);
- 21: if (offset < N) {
- 22: target[offset] = source[offset];
- 23: }
- 24: }
- 25: }
- 26:
- 27: void copyGlobal(global float *target, local float const *source, int N) {
- 28: int numLoops = (N + get_local_size(0) - 1) / get_local_size(0);
- 29: for (int loop = 0; loop < numLoops; loop++) {
- 30: int offset = loop * get_local_size(0) + get_local_id(0);
- 31: if (offset < N) {
- 32: target[offset] = source[offset];
- 33: }
- 34: }
- 35: }
- 36:
- 37:
- 38: // including cl/ids.cl:
- 39: // Copyright Hugh Perkins 2015 hughperkins at gmail
- 40: //
- 41: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 42: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 43: // obtain one at http://mozilla.org/MPL/2.0/.
- 44:
- 45: #define globalId (get_global_id(0))
- 46: #define localId (get_local_id(0) )
- 47: #define workgroupId (get_group_id(0))
- 48: #define workgroupSize (get_local_size(0))
- 49:
- 50:
- 51:
- 52:
- 53: // workgroupId: [outputPlane][inputPlane]
- 54: // localId: [filterRow][filterCol]
- 55: // per-thread iteration: [n][outputRow][outputCol]
- 56: // local: errorimage: outputSize * outputSize
- 57: // imageimage: inputSize * inputSize
- 58: void kernel backprop_floats_withscratch_dobias(
- 59: const float learningRateMultiplier, const int batchSize,
- 60: global const float *gradOutput, global const float *images,
- 61: global float *gradWeights,
- 62: #ifdef BIASED
- 63: global float *gradBiasWeights,
- 64: #endif
- 65: local float *_errorImage, local float *_imageImage
- 66: ) {
- 67: const int filterRow = localId / gFilterSize;
- 68: const int filterCol = localId % gFilterSize;
- 69:
- 70: #define outPlane (workgroupId / gInputPlanes)
- 71: #define upstreamPlane (workgroupId % gInputPlanes)
- 72:
- 73: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 74: // aggregate over: [outRow][outCol][n]
- 75: float thiswchange = 0;
- 76: #ifdef BIASED
- 77: float thisbiaschange = 0;
- 78: #endif
- 79: for (int n = 0; n < batchSize; n++) {
- 80: barrier(CLK_LOCAL_MEM_FENCE);
- 81: copyLocal(_imageImage, images + (n * gInputPlanes + upstreamPlane) * gInputSizeSquared, gInputSizeSquared);
- 82: copyLocal(_errorImage, gradOutput + (n * gNumFilters + outPlane) * gOutputSizeSquared, gOutputSizeSquared);
- 83: barrier(CLK_LOCAL_MEM_FENCE);
- 84: if (localId < gFilterSizeSquared) {
- 85: for (int outRow = 0; outRow < gOutputSize; outRow++) {
- 86: int upstreamRow = outRow - gMargin + filterRow;
- 87: for (int outCol = 0; outCol < gOutputSize; outCol++) {
- 88: const int upstreamCol = outCol - gMargin + filterCol;
- 89: #define proceed (upstreamRow >= 0 && upstreamCol >= 0 && upstreamRow < gInputSize && upstreamCol < gInputSize)
- 90: if (proceed) {
- 91: // these defines reduce register pressure, compared to const
- 92: // giving a 40% speedup on nvidia :-)
- 93: #define resultIndex (outRow * gOutputSize + outCol)
- 94: #define error (_errorImage[resultIndex])
- 95: //const float error = _errorImage[resultIndex];
- 96: #define upstreamDataIndex (upstreamRow * gInputSize + upstreamCol)
- 97: #define upstreamResult (_imageImage[upstreamDataIndex])
- 98: thiswchange += upstreamResult * error;
- 99: #ifdef BIASED
- 100: thisbiaschange += error;
- 101: #endif
- 102: }
- 103: }
- 104: }
- 105: }
- 106: }
- 107: if (localId < gFilterSizeSquared) {
- 108: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
- 109: }
- 110: #ifdef BIASED
- 111: #define writeBias (upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin)
- 112: if (writeBias) {
- 113: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
- 114: }
- 115: #endif
- 116: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 117: // aggregate over: [outRow][outCol][n]
- 118: }
- 119:
- 120:
- Something went wrong, code -55
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=10 -D gInputPlanes=10 -D gInputSize=24 -D gInputSizeSquared=576 -D gNumFilters=10 -D gFilterSize=24 -D gHalfFilterSize=12 -D gFilterSizeSquared=576 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=1 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=23 -DgInputStripeInnerNumRows=24 -DgInputStripeOuterNumRows=70 -DgInputStripeInnerSize=576 -DgInputStripeOuterSize=1680 -DgInputStripeMarginSize=552 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 this instance cant be used:
- kernel source:
- 1: // Copyright Hugh Perkins 2014,2015 hughperkins at gmail
- 2: //
- 3: // This Source Code Form is subject to the terms of the Mozilla Public License,
- 4: // v. 2.0. If a copy of the MPL was not distributed with this file, You can
- 5: // obtain one at http://mozilla.org/MPL/2.0/.
- 6:
- 7: // expected defines:
- 8: // BIASED (or not)
- 9:
- 10: // workgroupId: [outputPlane][inputPlane]
- 11: // localId: [filterRow][filterCol]
- 12: // per-thread iteration: [n][outputRow][outputCol]
- 13: // local: errorimage: outputSize * outputSize
- 14: // imageimage: inputSize * inputSize
- 15: // specific characteristic: load one stripe of each image at a time,
- 16: // so we dont run out of memory
- 17: // number of stripes set in: gNumStripes
- 18: // note that whilst we can stripe the gradOutput simply,
- 19: // we actually need to add a half-filter widthed additional few rows
- 20: // onto the images stripe, otherwise we will be missing data
- 21: // we will call the size of the non-overlapping image stripes: gInputStripeInnerSize
- 22: // the outersize, including the two margins is: gInputStripeOuterSize
- 23: // of course, the first and last stripes will be missing a bit off the top/bottom, where the
- 24: // corresponding outer margin would be
- 25: void kernel backprop_floats_withscratch_dobias_striped(
- 26: const float learningRateMultiplier, const int batchSize,
- 27: global const float *gradOutput, global const float *images,
- 28: global float *gradWeights,
- 29: #ifdef BIASED
- 30: global float *gradBiasWeights,
- 31: #endif
- 32: local float *_errorStripe, local float *_imageStripe
- 33: ) {
- 34: // gHalfFilterSize
- 35: // gInputSize
- 36: //
- 37: // gInputStripeMarginRows => basically equal to gHalfFilterSize
- 38: // gInputStripeInnerNumRows = gInputSize / gNumStripes
- 39: // gInputStripeOuterNumRows = gInputStripeInnerNumRows + 2 * gHalfFilterSize (note: one row less than
- 40: // if we just added gFilterSize)
- 41: // gInputStripeInnerSize = gInputStripeInnerNumRows * gInputSize
- 42: // gInputStripeOuterSize = gInputStripeOuterNumRows * gInputSize
- 43: // gInputStripeMarginSize = gInputStripeMarginRows * gInputSize
- 44: //
- 45: // gOutputStripeNumRows
- 46: // gOutputStripeSize
- 47:
- 48: const int globalId = get_global_id(0);
- 49: const int localId = get_local_id(0);
- 50: const int workgroupId = get_group_id(0);
- 51: const int workgroupSize = get_local_size(0);
- 52:
- 53: const int filterRow = localId / gFilterSize;
- 54: const int filterCol = localId % gFilterSize;
- 55:
- 56: const int outPlane = workgroupId / gInputPlanes;
- 57: const int upstreamPlane = workgroupId % gInputPlanes;
- 58:
- 59: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 60: // aggregate over: [outRow][outCol][n]
- 61: float thiswchange = 0;
- 62: #ifdef BIASED
- 63: float thisbiaschange = 0;
- 64: #endif
- 65: const int numLoopsForImageStripe = (gInputStripeOuterSize + workgroupSize - 1) / workgroupSize;
- 66: const int numLoopsForErrorStripe = (gOutputSizeSquared + workgroupSize - 1) / workgroupSize;
- 67: for (int n = 0; n < batchSize; n++) {
- 68: const int imageImageGlobalOffset = (n * gInputPlanes + upstreamPlane) * gInputSizeSquared;
- 69: const int imageImageGlobalOffsetAfter = imageImageGlobalOffset + gInputSizeSquared;
- 70: const int errorImageGlobalOffset = (n * gNumFilters + outPlane) * gOutputSizeSquared;
- 71: const int errorImageGlobalOffsetAfter = errorImageGlobalOffset + gOutputSizeSquared;
- 72: for (int stripe = 0; stripe < gNumStripes; stripe++) {
- 73: const int imageStripeInnerOffset = imageImageGlobalOffset + stripe * gInputStripeInnerSize;
- 74: const int imageStripeOuterOffset = imageStripeInnerOffset - gInputStripeMarginSize;
- 75: // need to fetch the image, but it's bigger than us, so will need to loop...
- 76: barrier(CLK_LOCAL_MEM_FENCE);
- 77: for (int i = 0; i < numLoopsForImageStripe; i++) {
- 78: int thisOffset = i * workgroupSize + localId;
- 79: int thisGlobalImagesOffset = imageStripeOuterOffset + thisOffset;
- 80: bool process = thisOffset < gInputStripeOuterSize
- 81: && thisGlobalImagesOffset >= imageImageGlobalOffset
- 82: && thisGlobalImagesOffset < imageImageGlobalOffsetAfter;
- 83: if (process) {
- 84: _imageStripe[thisOffset] = images[ thisGlobalImagesOffset ];
- 85: }
- 86: }
- 87: int errorStripeOffset = errorImageGlobalOffset + stripe * gOutputStripeSize;
- 88: for (int i = 0; i < numLoopsForErrorStripe; i++) {
- 89: int thisOffset = i * workgroupSize + localId;
- 90: int globalErrorsOffset = errorStripeOffset + thisOffset;
- 91: bool process = thisOffset < gOutputStripeSize
- 92: && globalErrorsOffset < errorImageGlobalOffsetAfter;
- 93: if (process) {
- 94: _errorStripe[thisOffset ] = gradOutput[globalErrorsOffset];
- 95: }
- 96: }
- 97: const int stripeOutRowStart = stripe * gOutputStripeNumRows;
- 98: const int stripeOutRowEndExcl = stripeOutRowStart + gOutputStripeNumRows;
- 99: barrier(CLK_LOCAL_MEM_FENCE);
- 100: // if (localId == 13) {
- 101: // for (int i = 0; i < 12; i++) {
- 102: // gradWeights[100 + stripe * 12 + i ] = _errorStripe[i * gOutputSize];
- 103: // }
- 104: // for (int i = 0; i < 20; i++) {
- 105: // gradWeights[200 + stripe * 20 + i ] = _imageStripe[i * gInputSize];
- 106: // }
- 107: // }
- 108: if (localId < gFilterSizeSquared) {
- 109: for (int outRow = stripeOutRowStart; outRow < stripeOutRowEndExcl; outRow++) {
- 110: int upstreamRow = outRow - gMargin + filterRow;
- 111: for (int outCol = 0; outCol < gOutputSize; outCol++) {
- 112: int upstreamCol = outCol - gMargin + filterCol;
- 113: bool proceed =
- 114: upstreamRow >= 0 && upstreamCol >= 0
- 115: && upstreamRow < gInputSize && upstreamCol < gInputSize
- 116: && outRow < gOutputSize;
- 117: if (proceed) {
- 118: int resultIndex = outRow * gOutputSize + outCol;
- 119: float error = _errorStripe[resultIndex - stripe * gOutputStripeSize];
- 120: int upstreamDataIndex = upstreamRow * gInputSize + upstreamCol;
- 121: float upstreamResult = _imageStripe[upstreamDataIndex + gInputStripeMarginSize
- 122: - stripe * gInputStripeInnerSize ];
- 123: thiswchange += upstreamResult * error;
- 124: #ifdef BIASED
- 125: thisbiaschange += error;
- 126: #endif
- 127: }
- 128: }
- 129: }
- 130: }
- 131: }
- 132: }
- 133: if (localId < gFilterSizeSquared) {
- 134: gradWeights[ workgroupId * gFilterSizeSquared + localId ] = learningRateMultiplier * thiswchange;
- 135: // weightChanges[ workgroupId * gFilterSizeSquared + localId ] = workgroupId;
- 136: }
- 137: #ifdef BIASED
- 138: bool writeBias = upstreamPlane == 0 && filterRow == gMargin && filterCol == gMargin;
- 139: if (writeBias) {
- 140: gradBiasWeights[outPlane] = learningRateMultiplier * thisbiaschange;
- 141: }
- 142: #endif
- 143: // gradWeights: [outPlane][upstreamPlane][filterRow][filterCol]
- 144: // aggregate over: [outRow][outCol][n]
- 145: }
- 146:
- 147:
- Something went wrong, code -55
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 1014ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 15ms
- forward kernel 2: cannot be used
- forward kernel 3: cannot be used
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 562ms
- forward layer selected kernel 4
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLB324.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2: cannot be used
- backward kernel 3 time: 562ms
- backward layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2: cannot be used
- calcGradWeights kernel 3: cannot be used
- calcGradWeights kernel 4 time: 1014ms
- calcGradWeights layer selected kernel 1
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=28 -D gInputSizeSquared=784 -D gNumFilters=10 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=10 -D gOutputPlanes=10 -D gOutputSize=24 -D gOutputSizeSquared=576 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=4 -DgInputStripeInnerNumRows=28 -DgInputStripeOuterNumRows=36 -DgInputStripeInnerSize=784 -DgInputStripeOuterSize=1008 -DgInputStripeMarginSize=112 -DgOutputStripeNumRows=24 -DgOutputStripeSize=576
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 499ms
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 499ms
- forward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 702ms
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 702ms
- calcGradWeights layer selected kernel 1
- batch time 12714 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize28_filtersize5 (13151 ms)
- [ RUN ] testsinglebatch.imagesize5_filtersize3_batchsize2_softmax
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=5 filterSize=3 outputSize=5 padZeros=1 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=5 inputSize=5 numFilters=5 filterSize=3 outputSize=5 padZeros=1 biased=1 skip=0} }
- layer 4:ActivationLayer{ RELU }
- layer 5:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
- layer 6:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- Parameters overview: (skipping 4 layers with 0 params)
- layer 1: params=50 5.5%
- layer 3: params=230 25.3%
- layer 5: params=630 69.2%
- TOTAL : params=910
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 16ms
- layer 1 offset: 0
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 1
- from w: 0
- actual: -3.3898
- layer 2 offset: 50
- layer 3 offset: 50
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 16ms
- layer 3
- from w: 0
- actual: -3.3898
- layer 4 offset: 280
- layer 5 offset: 280
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 687ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 109ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLCC8F.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- layer 5
- from w: 0
- actual: -3.38981
- layer 6 offset: 910
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 687ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 109ms
- forward layer selected kernel 1
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 328ms
- full thisloss: 3.3898
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 16ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 328ms
- forward layer selected kernel 1
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 436ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=5 -D gHalfFilterSize=2 -D gFilterSizeSquared=25 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=4 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=13 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=65 -DgInputStripeMarginSize=20 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 1108ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
- ... seems valid
- BackpropWeightsAuto: kernel 3 15ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=5 -D gInputSizeSquared=25 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=5 -D gOutputSizeSquared=25 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=5 -DgInputStripeOuterNumRows=9 -DgInputStripeInnerSize=25 -DgInputStripeOuterSize=45 -DgInputStripeMarginSize=10 -DgOutputStripeNumRows=5 -DgOutputStripeSize=25
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 436ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 858ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 1108ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 1076ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 109ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 858ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 15ms
- calcGradWeights kernel 4 time: 1076ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 16ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 109ms
- calcGradWeights layer selected kernel 2
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 3
- from w: 0
- actual: 0
- layer 4 offset: 280
- layer 5 offset: 280
- layer 5
- from w: 0
- actual: 0
- layer 6 offset: 910
- full thisloss: 3.3898
- batch time 8252 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize5_filtersize3_batchsize2_softmax (8705 ms)
- [ RUN ] testsinglebatch.imagesize4_filtersize3_batchsize2_pooling
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- initializing clblas
- layer 0:InputLayer{ outputPlanes=1 outputSize=12 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=12 numFilters=5 filterSize=3 outputSize=12 padZeros=1 biased=1 skip=0} }
- layer 2:ActivationLayer{ RELU }
- layer 3:PoolingLayer{ inputPlanes=5 inputSize=12 poolingSize=2 }
- layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=5 inputSize=6 numFilters=5 filterSize=3 outputSize=6 padZeros=1 biased=1 skip=0} }
- layer 5:ActivationLayer{ RELU }
- layer 6:PoolingLayer{ inputPlanes=5 inputSize=6 poolingSize=2 }
- layer 7:FullyConnectedLayer{ numPlanes=5 imageSize=1 }
- layer 8:SoftMaxLayer{ perPlane=0 numPlanes=5 imageSize=1 }
- Parameters overview: (skipping 6 layers with 0 params)
- layer 1: params=50 9.8%
- layer 4: params=230 45.1%
- layer 7: params=230 45.1%
- TOTAL : params=510
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 16ms
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- forward try kernel 2
- ... seems valid
- ForwardAuto: kernel 2 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- backward try kernel 0
- ... not plausibly optimal, skipping
- backward try kernel 1
- ... seems valid
- BackwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- layer 1 offset: 0
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- forward try kernel 3
- ... seems valid
- ForwardAuto: kernel 3 0ms
- layer 1
- from w: 0
- actual: -3.3063
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- forward try kernel 4
- ... seems valid
- ForwardAuto: kernel 4 0ms
- layer 4
- from w: 0
- actual: -3.3063
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 561ms
- forward try kernel 5
- ForwardAuto: kernel 5: this instance cant be used: For ForwardFc, filtersize and inputimagesize must be identical
- ... not valid
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 562ms
- forward try kernel 5
- cl/forward_fc_wgperrow.cl build log:
- "C:\Users\pz\AppData\Local\Temp\OCLF1F7.tmp.cl", line 75: warning: variable
- "loopsPerExample" was declared but never referenced
- const int loopsPerExample = (gInputSize + workgroupSize - 1) / workgroupSize;
- ^
- ... seems valid
- ForwardAuto: kernel 5 0ms
- layer 7
- from w: 0
- actual: -3.3063
- layer 8 offset: 510
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 561ms
- forward layer selected kernel 1
- forward kernel 0: cannot be used
- forward kernel 1 time: 16ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5: cannot be used
- forward kernel 6: cannot be used
- forward kernel 7 time: 562ms
- forward layer selected kernel 2
- forward try kernel 6
- ... seems valid
- ForwardAuto: kernel 6 this instance cant be used: memallocsize too small to use this kernel on this device. Need: 0MB, but only have: -1984MB max alloc size
- forward try kernel 7
- ... seems valid
- ForwardAuto: kernel 7 328ms
- full thisloss: 3.3063
- forward kernel 0: cannot be used
- forward kernel 1 time: 0ms
- forward kernel 2 time: 0ms
- forward kernel 3 time: 0ms
- forward kernel 4 time: 0ms
- forward kernel 5 time: 0ms
- forward kernel 6: cannot be used
- forward kernel 7 time: 328ms
- forward layer selected kernel 1
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- backward try kernel 2
- ... seems valid
- BackwardAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- calcGradWeights try kernel 2
- ... seems valid
- BackpropWeightsAuto: kernel 2 0ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 421ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=3 -D gInputSizeSquared=9 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=1 -D gOutputSizeSquared=1 -D gPadZeros=0 -D gMargin=0 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=3 -DgInputStripeOuterNumRows=7 -DgInputStripeInnerSize=9 -DgInputStripeOuterSize=21 -DgInputStripeMarginSize=6 -DgOutputStripeNumRows=1 -DgOutputStripeSize=1
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- backward try kernel 3
- ... seems valid
- BackwardAuto: kernel 3 920ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=5 -D gInputPlanes=5 -D gInputSize=6 -D gInputSizeSquared=36 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=6 -D gOutputSizeSquared=36 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=6 -DgInputStripeOuterNumRows=10 -DgInputStripeInnerSize=36 -DgInputStripeOuterSize=60 -DgInputStripeMarginSize=12 -DgOutputStripeNumRows=6 -DgOutputStripeSize=36
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- calcGradWeights try kernel 3
- options: -D BIASED -D gNumInputPlanes=1 -D gInputPlanes=1 -D gInputSize=12 -D gInputSizeSquared=144 -D gNumFilters=5 -D gFilterSize=3 -D gHalfFilterSize=1 -D gFilterSizeSquared=9 -D gNumOutputPlanes=5 -D gOutputPlanes=5 -D gOutputSize=12 -D gOutputSizeSquared=144 -D gPadZeros=1 -D gMargin=1 -D gEven=0 -D gSkip=0 -DgNumStripes=1 -DgInputStripeMarginRows=2 -DgInputStripeInnerNumRows=12 -DgInputStripeOuterNumRows=16 -DgInputStripeInnerSize=144 -DgInputStripeOuterSize=192 -DgInputStripeMarginSize=24 -DgOutputStripeNumRows=12 -DgOutputStripeSize=144
- ... seems valid
- BackpropWeightsAuto: kernel 3 0ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 421ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 842ms
- backward kernel 0: cannot be used
- backward kernel 1 time: 0ms
- backward kernel 2 time: 0ms
- backward kernel 3 time: 920ms
- backward layer selected kernel 1
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 889ms
- calcGradWeights try kernel 4
- ... seems valid
- BackpropWeightsAuto: kernel 4 546ms
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 842ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 889ms
- calcGradWeights layer selected kernel 1
- calcGradWeights kernel 0: cannot be used
- calcGradWeights kernel 1 time: 0ms
- calcGradWeights kernel 2 time: 0ms
- calcGradWeights kernel 3 time: 0ms
- calcGradWeights kernel 4 time: 546ms
- calcGradWeights layer selected kernel 1
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- layer 1 offset: 0
- layer 1
- from w: 0
- actual: 0
- layer 2 offset: 50
- layer 3 offset: 50
- layer 4 offset: 50
- layer 4
- from w: 0
- actual: 0
- layer 5 offset: 280
- layer 6 offset: 280
- layer 7 offset: 280
- layer 7
- from w: 0
- actual: 0
- layer 8 offset: 510
- full thisloss: 3.3063
- batch time 8954 ms
- dump enabled=0
- clblas teardown
- [ OK ] testsinglebatch.imagesize4_filtersize3_batchsize2_pooling (9734 ms)
- [----------] 6 tests from testsinglebatch (59763 ms total)
- [----------] 9 tests from testpoolingforward
- [ RUN ] testpoolingforward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.basic (94 ms)
- [ RUN ] testpoolingforward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.basic_2plane_batchsize2 (78 ms)
- [ RUN ] testpoolingforward.fromwrappers
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.fromwrappers (93 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling2 (94 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling3
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling3 (94 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling2_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling2_pz (93 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling3_pz (109 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_small
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling3_small (78 ms)
- [ RUN ] testpoolingforward.comparespecific_0_1_pooling3_small2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingforward.comparespecific_0_1_pooling3_small2 (78 ms)
- [----------] 9 tests from testpoolingforward (811 ms total)
- [----------] 2 tests from testpoolingbackward
- [ RUN ] testpoolingbackward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingbackward.basic (16 ms)
- [ RUN ] testpoolingbackward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testpoolingbackward.basic_2plane_batchsize2 (16 ms)
- [----------] 2 tests from testpoolingbackward (32 ms total)
- [----------] 1 test from testNorbLoader
- [ RUN ] testNorbLoader.load1000
- unknown file: error: C++ exception with description "failed to open file: ..\data\norb\training-shuffled-dat.mat" thrown in the test body.
- [ FAILED ] testNorbLoader.load1000 (0 ms)
- [----------] 1 test from testNorbLoader (0 ms total)
- [----------] 7 tests from teststringhelper
- [ RUN ] teststringhelper.split
- [ OK ] teststringhelper.split (0 ms)
- [ RUN ] teststringhelper.split2
- [ OK ] teststringhelper.split2 (0 ms)
- [ RUN ] teststringhelper.split3
- [ OK ] teststringhelper.split3 (0 ms)
- [ RUN ] teststringhelper.tolower
- [ OK ] teststringhelper.tolower (0 ms)
- [ RUN ] teststringhelper.replace
- [ OK ] teststringhelper.replace (0 ms)
- [ RUN ] teststringhelper.replaceglobal
- [ OK ] teststringhelper.replaceglobal (0 ms)
- [ RUN ] teststringhelper.strcpy_safe
- [ OK ] teststringhelper.strcpy_safe (0 ms)
- [----------] 7 tests from teststringhelper (0 ms total)
- [----------] 1 test from testGtestGlobals
- [ RUN ] testGtestGlobals.basic
- There are 1 parameters:
- argv[0]=deepcl_unittests.exe
- [ OK ] testGtestGlobals.basic (0 ms)
- [----------] 1 test from testGtestGlobals (0 ms total)
- [----------] 1 test from testMemset
- [ RUN ] testMemset.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- myArray[0]=99
- myArray[1]=99
- myArray[2]=99
- myArray[3]=99
- myArray[4]=99
- myArray[5]=99
- myArray[6]=99
- myArray[7]=99
- myArray[8]=99
- myArray[9]=99
- [ OK ] testMemset.basic (78 ms)
- [----------] 1 test from testMemset (78 ms total)
- [----------] 2 tests from testCopyBuffer
- [ RUN ] testCopyBuffer.floats
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- [ OK ] testCopyBuffer.floats (187 ms)
- [ RUN ] testCopyBuffer.nits
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- [ OK ] testCopyBuffer.nits (171 ms)
- [----------] 2 tests from testCopyBuffer (358 ms total)
- [----------] 2 tests from testCopyBlock
- [ RUN ] testCopyBlock.testPos
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- in[0]=3076
- in[1]=8
- in[2]=14
- res[0]=3
- res[1]=4
- res[2]=8206
- res[3]=8
- res[4]=14
- [ OK ] testCopyBlock.testPos (110 ms)
- [ RUN ] testCopyBlock.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 2 3 4
- 6 7 8
- 0 0 0 0
- 5 6 7
- 9 10 11
- 0 0 0 0
- [ OK ] testCopyBlock.basic (93 ms)
- [----------] 2 tests from testCopyBlock (203 ms total)
- [----------] 1 test from testCopyLocal
- [ RUN ] testCopyLocal.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- 0 0 0 0
- 1 2 3 4
- 5 6 7 8
- 9 10 11 12
- 0 0 0 0
- [ OK ] testCopyLocal.basic (78 ms)
- [----------] 1 test from testCopyLocal (78 ms total)
- [----------] 8 tests from testNetdefToNet
- [ RUN ] testNetdefToNet.empty
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testNetdefToNet.empty (16 ms)
- [ RUN ] testNetdefToNet.onefc
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testNetdefToNet.onefc (171 ms)
- [ RUN ] testNetdefToNet.onefclinear
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testNetdefToNet.onefclinear (156 ms)
- [ RUN ] testNetdefToNet.150n_10n
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testNetdefToNet.150n_10n (172 ms)
- [ RUN ] testNetdefToNet.3xfclinear
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- nnString: [3]
- repeatNum 3
- remainderString [150n]
- inner [150n]
- multiplied string: 150n-150n-150n
- layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
- layer 1:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
- layer 2:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
- layer 3:FullyConnectedLayer{ numPlanes=150 imageSize=1 }
- layer 4:SoftMaxLayer{ perPlane=0 numPlanes=150 imageSize=1 }
- Parameters overview: (skipping 2 layers with 0 params)
- layer 1: params=54300 54.5%
- layer 2: params=22650 22.7%
- layer 3: params=22650 22.7%
- TOTAL : params=99600
- [ OK ] testNetdefToNet.3xfclinear (156 ms)
- [ RUN ] testNetdefToNet.mp2_3x32c5z_10n
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- prefix: [mp2]
- nnString: [3]
- repeatNum 3
- remainderString [32c5z-10n ]
- postfix [10n ]
- inner [32c5z]
- multiplied string: mp2-32c5z-32c5z-32c5z-10n
- layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
- layer 1:PoolingLayer{ inputPlanes=1 inputSize=19 poolingSize=2 }
- layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
- layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=9 numFilters=32 filterSize=5 outputSize=9 padZeros=1 biased=1 skip=0} }
- layer 5:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 6:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
- Parameters overview: (skipping 3 layers with 0 params)
- layer 2: params=832 1.1%
- layer 3: params=25632 32.9%
- layer 4: params=25632 32.9%
- layer 5: params=25930 33.2%
- TOTAL : params=78026
- [ OK ] testNetdefToNet.mp2_3x32c5z_10n (343 ms)
- [ RUN ] testNetdefToNet.3x32c5zmp2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- nnString: [3]
- repeatNum 3
- remainderString [(32c5z-mp2)-10n]
- inner [32c5z-mp2]
- newRemainder [-10n]
- postfix [10n]
- multiplied string: 32c5z-mp2-32c5z-mp2-32c5z-mp2-10n
- layer 0:InputLayer{ outputPlanes=1 outputSize=128 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=128 numFilters=32 filterSize=5 outputSize=128 padZeros=1 biased=1 skip=0} }
- layer 2:PoolingLayer{ inputPlanes=32 inputSize=128 poolingSize=2 }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=64 numFilters=32 filterSize=5 outputSize=64 padZeros=1 biased=1 skip=0} }
- layer 4:PoolingLayer{ inputPlanes=32 inputSize=64 poolingSize=2 }
- layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=32 numFilters=32 filterSize=5 outputSize=32 padZeros=1 biased=1 skip=0} }
- layer 6:PoolingLayer{ inputPlanes=32 inputSize=32 poolingSize=2 }
- layer 7:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 8:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
- Parameters overview: (skipping 5 layers with 0 params)
- layer 1: params=832 0.6%
- layer 3: params=25632 19.1%
- layer 5: params=25632 19.1%
- layer 7: params=81930 61.1%
- TOTAL : params=134026
- [ OK ] testNetdefToNet.3x32c5zmp2 (702 ms)
- [ RUN ] testNetdefToNet.2x32c7_3x32c5z
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- nnString: [2]
- repeatNum 2
- remainderString [32c7z-3*32c5z-10n]
- postfix [3*32c5z-10n]
- inner [32c7z]
- nnString: [3]
- repeatNum 3
- remainderString [32c5z-10n]
- postfix [10n]
- inner [32c5z]
- multiplied string: 32c5z-32c5z-32c5z-10n
- multiplied string: 32c7z-32c7z-32c5z-32c5z-32c5z-10n
- layer 0:InputLayer{ outputPlanes=1 outputSize=19 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=19 numFilters=32 filterSize=7 outputSize=19 padZeros=1 biased=1 skip=0} }
- layer 2:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=7 outputSize=19 padZeros=1 biased=1 skip=0} }
- layer 3:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
- layer 4:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
- layer 5:ConvolutionalLayer{ LayerDimensions{ inputPlanes=32 inputSize=19 numFilters=32 filterSize=5 outputSize=19 padZeros=1 biased=1 skip=0} }
- layer 6:FullyConnectedLayer{ numPlanes=10 imageSize=1 }
- layer 7:SoftMaxLayer{ perPlane=0 numPlanes=10 imageSize=1 }
- Parameters overview: (skipping 2 layers with 0 params)
- layer 1: params=1600 0.7%
- layer 2: params=50208 20.6%
- layer 3: params=25632 10.5%
- layer 4: params=25632 10.5%
- layer 5: params=25632 10.5%
- layer 6: params=115530 47.3%
- TOTAL : params=244234
- [ OK ] testNetdefToNet.2x32c7_3x32c5z (172 ms)
- [----------] 8 tests from testNetdefToNet (1888 ms total)
- [----------] 10 tests from testactivationforward
- [ RUN ] testactivationforward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.basic (15 ms)
- [ RUN ] testactivationforward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.basic_2plane_batchsize2 (16 ms)
- [ RUN ] testactivationforward.fromwrappers
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.fromwrappers (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation2 (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation3
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation3 (94 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation2_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation2_pz (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation3_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation3_pz (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation3_small
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation3_small (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation3_small2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation3_small2 (78 ms)
- [ RUN ] testactivationforward.comparespecific_0_1_activation3_small2_tanh
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testactivationforward.comparespecific_0_1_activation3_small2_tanh (109 ms)
- [----------] 10 tests from testactivationforward (702 ms total)
- [----------] 2 tests from testactivationbackward
- [ RUN ] testactivationbackward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- gradInput=3
- gradInput=0
- gradInput=-2.7
- gradInput=2
- gradInput=-0
- gradInput=2.1
- gradInput=0
- gradInput=-1.1
- gradInput=0
- [ OK ] testactivationbackward.basic (0 ms)
- [ RUN ] testactivationbackward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- gradInput=3
- gradInput=0
- gradInput=0
- gradInput=9
- [ OK ] testactivationbackward.basic_2plane_batchsize2 (15 ms)
- [----------] 2 tests from testactivationbackward (15 ms total)
- [----------] 1 test from testRandomSingleton
- [ RUN ] testRandomSingleton.testMockRandom
- 0.549356
- 0.634521
- 0.5968
- 0.863601
- 0.982891
- 0.637683
- 0.248837
- 0.351605
- 0.225401
- 0.220224
- [ OK ] testRandomSingleton.testMockRandom (0 ms)
- [----------] 1 test from testRandomSingleton (0 ms total)
- [----------] 10 tests from testdropoutforward
- [ RUN ] testdropoutforward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.basic (16 ms)
- [ RUN ] testdropoutforward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.basic_2plane_batchsize2 (16 ms)
- [ RUN ] testdropoutforward.fromwrappers
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.fromwrappers (15 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout2 (78 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout3
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout3 (78 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout2_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout2_pz (94 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_pz
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout3_pz (78 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small (93 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small2
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small2 (78 ms)
- [ RUN ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh
- instance0: 0
- instance1: 1
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutforward.comparespecific_0_1_dropout3_small2_tanh (78 ms)
- [----------] 10 tests from testdropoutforward (624 ms total)
- [----------] 3 tests from testdropoutbackward
- [ RUN ] testdropoutbackward.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutbackward.basic (94 ms)
- [ RUN ] testdropoutbackward.basic_2plane_batchsize2
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutbackward.basic_2plane_batchsize2 (78 ms)
- [ RUN ] testdropoutbackward.compare_args
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testdropoutbackward.compare_args (78 ms)
- [----------] 3 tests from testdropoutbackward (250 ms total)
- [----------] 1 test from testsgd
- [ RUN ] testsgd.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- layer 0:InputLayer{ outputPlanes=1 outputSize=5 }
- layer 1:ConvolutionalLayer{ LayerDimensions{ inputPlanes=1 inputSize=5 numFilters=1 filterSize=3 outputSize=3 padZeros=0 biased=0 skip=0} }
- layer 2:SquareLossLayer{}
- inputtotalsize=50 outputTotalSize=18
- forward try kernel 0
- ... not plausibly optimal, skipping
- forward try kernel 1
- ... seems valid
- ForwardAuto: kernel 1 0ms
- calcGradWeights try kernel 0
- ... not plausibly optimal, skipping
- calcGradWeights try kernel 1
- ... seems valid
- BackpropWeightsAuto: kernel 1 0ms
- [ OK ] testsgd.basic (577 ms)
- [----------] 1 test from testsgd (577 ms total)
- [----------] 9 tests from testCLMathWrapper
- [ RUN ] testCLMathWrapper.assign
- a[0]=4
- a[1]=2.1
- a[2]=5
- a[3]=3
- a[4]=9.2
- [ OK ] testCLMathWrapper.assign (78 ms)
- [ RUN ] testCLMathWrapper.assignScalar
- a[0]=3.4
- a[1]=3.4
- a[2]=3.4
- a[3]=3.4
- a[4]=3.4
- [ OK ] testCLMathWrapper.assignScalar (78 ms)
- [ RUN ] testCLMathWrapper.addinplace
- a[0]=5
- a[1]=5.1
- a[2]=14
- a[3]=15.5
- a[4]=11.7
- [ OK ] testCLMathWrapper.addinplace (78 ms)
- [ RUN ] testCLMathWrapper.multiplyinplace
- a[0]=1.5
- a[1]=4.5
- a[2]=13.5
- a[3]=18.75
- a[4]=3.75
- [ OK ] testCLMathWrapper.multiplyinplace (78 ms)
- [ RUN ] testCLMathWrapper.addscalar
- a[0]=2.5
- a[1]=4.5
- a[2]=10.5
- a[3]=14
- a[4]=4
- [ OK ] testCLMathWrapper.addscalar (78 ms)
- [ RUN ] testCLMathWrapper.sqrt
- a[0]=1
- a[1]=1.73205
- a[2]=3
- a[3]=3.53553
- a[4]=1.58114
- [ OK ] testCLMathWrapper.sqrt (78 ms)
- [ RUN ] testCLMathWrapper.squared
- a[0]=1
- a[1]=9
- a[2]=81
- a[3]=156.25
- a[4]=6.25
- [ OK ] testCLMathWrapper.squared (78 ms)
- [ RUN ] testCLMathWrapper.inverse
- a[0]=1
- a[1]=0.333333
- a[2]=0.111111
- a[3]=0.08
- a[4]=0.4
- [ OK ] testCLMathWrapper.inverse (78 ms)
- [ RUN ] testCLMathWrapper.perelementmult
- a[0]=4
- a[1]=6.3
- a[2]=45
- a[3]=37.5
- a[4]=23
- [ OK ] testCLMathWrapper.perelementmult (78 ms)
- [----------] 9 tests from testCLMathWrapper (702 ms total)
- [----------] 1 test from testreducesegments
- [ RUN ] testreducesegments.basic
- Using Advanced Micro Devices, Inc. , OpenCL platform: AMD Accelerated Parallel Processing
- Using OpenCL device: Tahiti
- [ OK ] testreducesegments.basic (78 ms)
- [----------] 1 test from testreducesegments (78 ms total)
- [----------] 4 tests from testGpuOp
- [ RUN ] testGpuOp.addinplace
- a[0]=5
- a[1]=5.1
- a[2]=14
- a[3]=15.5
- a[4]=11.7
- [ OK ] testGpuOp.addinplace (78 ms)
- [ RUN ] testGpuOp.addoutofplace
- a[0]=1
- a[1]=3
- a[2]=9
- a[3]=12.5
- a[4]=2.5
- c[0]=5
- c[1]=5.1
- c[2]=14
- c[3]=15.5
- c[4]=11.7
- [ OK ] testGpuOp.addoutofplace (94 ms)
- [ RUN ] testGpuOp.inverse
- a[0]=1
- a[1]=0.333333
- a[2]=0.111111
- a[3]=0.08
- a[4]=0.4
- [ OK ] testGpuOp.inverse (78 ms)
- [ RUN ] testGpuOp.addscalarinplace
- a[0]=5.2
- a[1]=7.2
- a[2]=13.2
- a[3]=16.7
- a[4]=6.7
- [ OK ] testGpuOp.addscalarinplace (78 ms)
- [----------] 4 tests from testGpuOp (328 ms total)
- [----------] 1 test from testjpeghelper
- [ RUN ] testjpeghelper.writeread
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement