Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- &&&& RUNNING RNN-T_Harness # /work/./build/bin/harness_rnnt
- I1103 19:58:43.096913 866 main_rnnt.cc:2903] Found 1 GPUs
- [I] Starting creating QSL.
- [I] Finished creating QSL.
- [I] Starting creating SUT.
- [I] Set to device 0
- Dali pipeline creating..
- Dali pipeline created
- [I] Creating stream 0/1
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +531, GPU +0, now: CPU 966, GPU 2541 (MiB)
- [I] [TRT] Loaded engine size: 81 MiB
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1239, GPU +348, now: CPU 2388, GPU 2891 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +178, GPU +56, now: CPU 2566, GPU 2947 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2593, GPU 3007 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2593, GPU 3015 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +232, now: CPU 0, GPU 232 (MiB)
- [I] Created RnntEncoder runner: encoder
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2593, GPU 3249 (MiB)
- [I] [TRT] Loaded engine size: 3 MiB
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2599, GPU 3257 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2599, GPU 3267 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 232 (MiB)
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2599, GPU 3271 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2599, GPU 3279 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1, now: CPU 0, GPU 233 (MiB)
- [I] Created RnntDecoder runner: decoder
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2600, GPU 3279 (MiB)
- [I] [TRT] Loaded engine size: 1 MiB
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +1, now: CPU 0, GPU 234 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] Created RnntJointFc1 runner: fc1_a
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2600, GPU 3279 (MiB)
- [I] [TRT] Loaded engine size: 0 MiB
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2601, GPU 3287 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2601, GPU 3287 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] Created RnntJointFc1 runner: fc1_b
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2600, GPU 3287 (MiB)
- [I] [TRT] Loaded engine size: 0 MiB
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2601, GPU 3295 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 2601, GPU 3295 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] Created RnntJointBackend runner: joint_backend
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2600, GPU 3295 (MiB)
- [I] [TRT] Loaded engine size: 0 MiB
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2601, GPU 3303 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 2601, GPU 3313 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2601, GPU 3305 (MiB)
- [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2601, GPU 3313 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] Created RnntIsel runner: isel
- [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2601, GPU 3313 (MiB)
- [I] [TRT] Loaded engine size: 0 MiB
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +0, now: CPU 0, GPU 234 (MiB)
- [I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +2, now: CPU 0, GPU 236 (MiB)
- [I] Created RnntIgather runner: igather
- [I] Instantiated RnntEngineContainer runner
- cudaMemcpy blocking
- cudaMemcpy blocking
- [I] Instantiated RnntTensorContainer host memory
- Stream::Stream sampleSize: 61440
- Stream::Stream singleSampleSize: 480
- Stream::Stream fullseqSampleSize: 61440
- Stream::Stream mBatchSize: 16
- [I] Finished creating SUT.
- [I] Starting warming up SUT.
- [I] Finished warming up SUT.
- [I] Starting running actual test.
- Generating '/tmp/nsys-report-ecc6.qdstrm'
- [1/1] [========================100%] nsys_rnnt.nsys-rep
- Generated:
- /work/nsys_rnnt.nsys-rep
Add Comment
Please, Sign In to add comment