Guest User

Untitled

a guest
Dec 1st, 2018
208
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 91.82 KB | None | 0 0
  1. ** beam_size 12
  2. RETURNN starting up, version 20181130.185608--git-d30181f, date/time 2018-12-01-09-58-31 (UTC+0100), pid 22317, cwd /work/smt2/makarov/NMT, Python /usr/bin/python3
  3. RETURNN command line options: ['hmm-factorization/en-de/transformer-hmm', '++load_epoch', '114', '++device', 'gpu', '--task', 'search', '++search_data', 'config:dev', '++beam_size', '12', '++need_data', 'False', '++max_seq_length', '0', '++search_output_file', 'hmm-factorization/en-de/hyp/transformer-hmm', '++batch_size', '2000']
  4. Hostname: cluster-cn-258
  5. TensorFlow: 1.9.0 (v1.9.0-0-g25c197e023) (<site-package> in /u/makarov/.local/lib/python3.5/site-packages/tensorflow)
  6. Setup TF inter and intra global thread pools, num_threads None, session opts {'device_count': {'GPU': 0}, 'log_device_placement': False}.
  7. 2018-12-01 09:58:32.562859: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
  8. 2018-12-01 09:58:32.978259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
  9. name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
  10. pciBusID: 0000:02:00.0
  11. totalMemory: 10.92GiB freeMemory: 10.76GiB
  12. 2018-12-01 09:58:32.978317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
  13. 2018-12-01 09:58:32.978337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
  14. 2018-12-01 09:58:32.978348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
  15. 2018-12-01 09:58:32.978358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
  16. CUDA_VISIBLE_DEVICES is set to '0'.
  17. 2018-12-01 09:58:33.282635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
  18. 2018-12-01 09:58:33.828527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
  19. 2018-12-01 09:58:33.828579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
  20. 2018-12-01 09:58:33.828588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
  21. 2018-12-01 09:58:33.828955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
  22. Collecting TensorFlow device list...
  23. Local devices available to TensorFlow:
  24. 1/2: name: "/device:CPU:0"
  25. device_type: "CPU"
  26. memory_limit: 268435456
  27. locality {
  28. }
  29. incarnation: 616944120252845792
  30. 2/2: name: "/device:GPU:0"
  31. device_type: "GPU"
  32. memory_limit: 10915220685
  33. locality {
  34. bus_id: 1
  35. links {
  36. }
  37. }
  38. incarnation: 955148772328989222
  39. physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1"
  40. Using gpu device 0: GeForce GTX 1080 Ti
  41. Setup tf.Session with options {'device_count': {'GPU': 1}, 'log_device_placement': False} ...
  42. 2018-12-01 09:58:38.902018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
  43. 2018-12-01 09:58:38.902091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
  44. 2018-12-01 09:58:38.902105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
  45. 2018-12-01 09:58:38.902115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
  46. 2018-12-01 09:58:38.902372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
  47. layer root/'data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=46300)
  48. layer root/'source_embed_raw' output: Data(name='source_embed_raw_output', shape=(None, 512))
  49. debug_add_check_numerics_on_output: add for layer 'source_embed_raw': <tf.Tensor 'source_embed_raw/linear/embedding_lookup:0' shape=(?, ?, 512) dtype=float32>
  50. layer root/'source_embed_weighted' output: Data(name='source_embed_weighted_output', shape=(None, 512))
  51. debug_add_check_numerics_on_output: add for layer 'source_embed_weighted': <tf.Tensor 'source_embed_weighted/mul:0' shape=(?, ?, 512) dtype=float32>
  52. layer root/'source_embed_with_pos' output: Data(name='source_embed_with_pos_output', shape=(None, 512))
  53. debug_add_check_numerics_on_output: add for layer 'source_embed_with_pos': <tf.Tensor 'source_embed_with_pos/add:0' shape=(?, ?, 512) dtype=float32>
  54. layer root/'source_embed' output: Data(name='source_embed_output', shape=(None, 512))
  55. debug_add_check_numerics_on_output: add for layer 'source_embed': <tf.Tensor 'source_embed_with_pos/source_embed_with_pos_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  56. layer root/'enc_01_self_att_laynorm' output: Data(name='enc_01_self_att_laynorm_output', shape=(None, 512))
  57. debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_laynorm': <tf.Tensor 'enc_01_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  58. layer root/'enc_01_self_att_att' output: Data(name='enc_01_self_att_att_output', shape=(None, 512))
  59. debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_att': <tf.Tensor 'enc_01_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  60. layer root/'enc_01_self_att_lin' output: Data(name='enc_01_self_att_lin_output', shape=(None, 512))
  61. debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_lin': <tf.Tensor 'enc_01_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  62. layer root/'enc_01_self_att_drop' output: Data(name='enc_01_self_att_drop_output', shape=(None, 512))
  63. debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_drop': <tf.Tensor 'enc_01_self_att_lin/enc_01_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  64. layer root/'enc_01_self_att_out' output: Data(name='enc_01_self_att_out_output', shape=(None, 512))
  65. debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_out': <tf.Tensor 'enc_01_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  66. layer root/'enc_01_ff_laynorm' output: Data(name='enc_01_ff_laynorm_output', shape=(None, 512))
  67. debug_add_check_numerics_on_output: add for layer 'enc_01_ff_laynorm': <tf.Tensor 'enc_01_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  68. layer root/'enc_01_ff_conv1' output: Data(name='enc_01_ff_conv1_output', shape=(None, 2048))
  69. debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv1': <tf.Tensor 'enc_01_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  70. layer root/'enc_01_ff_conv2' output: Data(name='enc_01_ff_conv2_output', shape=(None, 512))
  71. debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv2': <tf.Tensor 'enc_01_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  72. layer root/'enc_01_ff_drop' output: Data(name='enc_01_ff_drop_output', shape=(None, 512))
  73. debug_add_check_numerics_on_output: add for layer 'enc_01_ff_drop': <tf.Tensor 'enc_01_ff_conv2/enc_01_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  74. layer root/'enc_01_ff_out' output: Data(name='enc_01_ff_out_output', shape=(None, 512))
  75. debug_add_check_numerics_on_output: add for layer 'enc_01_ff_out': <tf.Tensor 'enc_01_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  76. layer root/'enc_01' output: Data(name='enc_01_output', shape=(None, 512))
  77. debug_add_check_numerics_on_output: add for layer 'enc_01': <tf.Tensor 'enc_01_ff_out/enc_01_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  78. layer root/'enc_02_self_att_laynorm' output: Data(name='enc_02_self_att_laynorm_output', shape=(None, 512))
  79. debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_laynorm': <tf.Tensor 'enc_02_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  80. layer root/'enc_02_self_att_att' output: Data(name='enc_02_self_att_att_output', shape=(None, 512))
  81. debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_att': <tf.Tensor 'enc_02_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  82. layer root/'enc_02_self_att_lin' output: Data(name='enc_02_self_att_lin_output', shape=(None, 512))
  83. debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_lin': <tf.Tensor 'enc_02_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  84. layer root/'enc_02_self_att_drop' output: Data(name='enc_02_self_att_drop_output', shape=(None, 512))
  85. debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_drop': <tf.Tensor 'enc_02_self_att_lin/enc_02_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  86. layer root/'enc_02_self_att_out' output: Data(name='enc_02_self_att_out_output', shape=(None, 512))
  87. debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_out': <tf.Tensor 'enc_02_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  88. layer root/'enc_02_ff_laynorm' output: Data(name='enc_02_ff_laynorm_output', shape=(None, 512))
  89. debug_add_check_numerics_on_output: add for layer 'enc_02_ff_laynorm': <tf.Tensor 'enc_02_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  90. layer root/'enc_02_ff_conv1' output: Data(name='enc_02_ff_conv1_output', shape=(None, 2048))
  91. debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv1': <tf.Tensor 'enc_02_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  92. layer root/'enc_02_ff_conv2' output: Data(name='enc_02_ff_conv2_output', shape=(None, 512))
  93. debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv2': <tf.Tensor 'enc_02_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  94. layer root/'enc_02_ff_drop' output: Data(name='enc_02_ff_drop_output', shape=(None, 512))
  95. debug_add_check_numerics_on_output: add for layer 'enc_02_ff_drop': <tf.Tensor 'enc_02_ff_conv2/enc_02_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  96. layer root/'enc_02_ff_out' output: Data(name='enc_02_ff_out_output', shape=(None, 512))
  97. debug_add_check_numerics_on_output: add for layer 'enc_02_ff_out': <tf.Tensor 'enc_02_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  98. layer root/'enc_02' output: Data(name='enc_02_output', shape=(None, 512))
  99. debug_add_check_numerics_on_output: add for layer 'enc_02': <tf.Tensor 'enc_02_ff_out/enc_02_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  100. layer root/'enc_03_self_att_laynorm' output: Data(name='enc_03_self_att_laynorm_output', shape=(None, 512))
  101. debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_laynorm': <tf.Tensor 'enc_03_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  102. layer root/'enc_03_self_att_att' output: Data(name='enc_03_self_att_att_output', shape=(None, 512))
  103. debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_att': <tf.Tensor 'enc_03_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  104. layer root/'enc_03_self_att_lin' output: Data(name='enc_03_self_att_lin_output', shape=(None, 512))
  105. debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_lin': <tf.Tensor 'enc_03_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  106. layer root/'enc_03_self_att_drop' output: Data(name='enc_03_self_att_drop_output', shape=(None, 512))
  107. debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_drop': <tf.Tensor 'enc_03_self_att_lin/enc_03_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  108. layer root/'enc_03_self_att_out' output: Data(name='enc_03_self_att_out_output', shape=(None, 512))
  109. debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_out': <tf.Tensor 'enc_03_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  110. layer root/'enc_03_ff_laynorm' output: Data(name='enc_03_ff_laynorm_output', shape=(None, 512))
  111. debug_add_check_numerics_on_output: add for layer 'enc_03_ff_laynorm': <tf.Tensor 'enc_03_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  112. layer root/'enc_03_ff_conv1' output: Data(name='enc_03_ff_conv1_output', shape=(None, 2048))
  113. debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv1': <tf.Tensor 'enc_03_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  114. layer root/'enc_03_ff_conv2' output: Data(name='enc_03_ff_conv2_output', shape=(None, 512))
  115. debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv2': <tf.Tensor 'enc_03_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  116. layer root/'enc_03_ff_drop' output: Data(name='enc_03_ff_drop_output', shape=(None, 512))
  117. debug_add_check_numerics_on_output: add for layer 'enc_03_ff_drop': <tf.Tensor 'enc_03_ff_conv2/enc_03_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  118. layer root/'enc_03_ff_out' output: Data(name='enc_03_ff_out_output', shape=(None, 512))
  119. debug_add_check_numerics_on_output: add for layer 'enc_03_ff_out': <tf.Tensor 'enc_03_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  120. layer root/'enc_03' output: Data(name='enc_03_output', shape=(None, 512))
  121. debug_add_check_numerics_on_output: add for layer 'enc_03': <tf.Tensor 'enc_03_ff_out/enc_03_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  122. layer root/'enc_04_self_att_laynorm' output: Data(name='enc_04_self_att_laynorm_output', shape=(None, 512))
  123. debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_laynorm': <tf.Tensor 'enc_04_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  124. layer root/'enc_04_self_att_att' output: Data(name='enc_04_self_att_att_output', shape=(None, 512))
  125. debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_att': <tf.Tensor 'enc_04_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  126. layer root/'enc_04_self_att_lin' output: Data(name='enc_04_self_att_lin_output', shape=(None, 512))
  127. debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_lin': <tf.Tensor 'enc_04_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  128. layer root/'enc_04_self_att_drop' output: Data(name='enc_04_self_att_drop_output', shape=(None, 512))
  129. debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_drop': <tf.Tensor 'enc_04_self_att_lin/enc_04_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  130. layer root/'enc_04_self_att_out' output: Data(name='enc_04_self_att_out_output', shape=(None, 512))
  131. debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_out': <tf.Tensor 'enc_04_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  132. layer root/'enc_04_ff_laynorm' output: Data(name='enc_04_ff_laynorm_output', shape=(None, 512))
  133. debug_add_check_numerics_on_output: add for layer 'enc_04_ff_laynorm': <tf.Tensor 'enc_04_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  134. layer root/'enc_04_ff_conv1' output: Data(name='enc_04_ff_conv1_output', shape=(None, 2048))
  135. debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv1': <tf.Tensor 'enc_04_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  136. layer root/'enc_04_ff_conv2' output: Data(name='enc_04_ff_conv2_output', shape=(None, 512))
  137. debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv2': <tf.Tensor 'enc_04_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  138. layer root/'enc_04_ff_drop' output: Data(name='enc_04_ff_drop_output', shape=(None, 512))
  139. debug_add_check_numerics_on_output: add for layer 'enc_04_ff_drop': <tf.Tensor 'enc_04_ff_conv2/enc_04_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  140. layer root/'enc_04_ff_out' output: Data(name='enc_04_ff_out_output', shape=(None, 512))
  141. debug_add_check_numerics_on_output: add for layer 'enc_04_ff_out': <tf.Tensor 'enc_04_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  142. layer root/'enc_04' output: Data(name='enc_04_output', shape=(None, 512))
  143. debug_add_check_numerics_on_output: add for layer 'enc_04': <tf.Tensor 'enc_04_ff_out/enc_04_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  144. layer root/'enc_05_self_att_laynorm' output: Data(name='enc_05_self_att_laynorm_output', shape=(None, 512))
  145. debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_laynorm': <tf.Tensor 'enc_05_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  146. layer root/'enc_05_self_att_att' output: Data(name='enc_05_self_att_att_output', shape=(None, 512))
  147. debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_att': <tf.Tensor 'enc_05_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  148. layer root/'enc_05_self_att_lin' output: Data(name='enc_05_self_att_lin_output', shape=(None, 512))
  149. debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_lin': <tf.Tensor 'enc_05_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  150. layer root/'enc_05_self_att_drop' output: Data(name='enc_05_self_att_drop_output', shape=(None, 512))
  151. debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_drop': <tf.Tensor 'enc_05_self_att_lin/enc_05_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  152. layer root/'enc_05_self_att_out' output: Data(name='enc_05_self_att_out_output', shape=(None, 512))
  153. debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_out': <tf.Tensor 'enc_05_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  154. layer root/'enc_05_ff_laynorm' output: Data(name='enc_05_ff_laynorm_output', shape=(None, 512))
  155. debug_add_check_numerics_on_output: add for layer 'enc_05_ff_laynorm': <tf.Tensor 'enc_05_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  156. layer root/'enc_05_ff_conv1' output: Data(name='enc_05_ff_conv1_output', shape=(None, 2048))
  157. debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv1': <tf.Tensor 'enc_05_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  158. layer root/'enc_05_ff_conv2' output: Data(name='enc_05_ff_conv2_output', shape=(None, 512))
  159. debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv2': <tf.Tensor 'enc_05_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  160. layer root/'enc_05_ff_drop' output: Data(name='enc_05_ff_drop_output', shape=(None, 512))
  161. debug_add_check_numerics_on_output: add for layer 'enc_05_ff_drop': <tf.Tensor 'enc_05_ff_conv2/enc_05_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  162. layer root/'enc_05_ff_out' output: Data(name='enc_05_ff_out_output', shape=(None, 512))
  163. debug_add_check_numerics_on_output: add for layer 'enc_05_ff_out': <tf.Tensor 'enc_05_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  164. layer root/'enc_05' output: Data(name='enc_05_output', shape=(None, 512))
  165. debug_add_check_numerics_on_output: add for layer 'enc_05': <tf.Tensor 'enc_05_ff_out/enc_05_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  166. layer root/'enc_06_self_att_laynorm' output: Data(name='enc_06_self_att_laynorm_output', shape=(None, 512))
  167. debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_laynorm': <tf.Tensor 'enc_06_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  168. layer root/'enc_06_self_att_att' output: Data(name='enc_06_self_att_att_output', shape=(None, 512))
  169. debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_att': <tf.Tensor 'enc_06_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
  170. layer root/'enc_06_self_att_lin' output: Data(name='enc_06_self_att_lin_output', shape=(None, 512))
  171. debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_lin': <tf.Tensor 'enc_06_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  172. layer root/'enc_06_self_att_drop' output: Data(name='enc_06_self_att_drop_output', shape=(None, 512))
  173. debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_drop': <tf.Tensor 'enc_06_self_att_lin/enc_06_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  174. layer root/'enc_06_self_att_out' output: Data(name='enc_06_self_att_out_output', shape=(None, 512))
  175. debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_out': <tf.Tensor 'enc_06_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
  176. layer root/'enc_06_ff_laynorm' output: Data(name='enc_06_ff_laynorm_output', shape=(None, 512))
  177. debug_add_check_numerics_on_output: add for layer 'enc_06_ff_laynorm': <tf.Tensor 'enc_06_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
  178. layer root/'enc_06_ff_conv1' output: Data(name='enc_06_ff_conv1_output', shape=(None, 2048))
  179. debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv1': <tf.Tensor 'enc_06_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
  180. layer root/'enc_06_ff_conv2' output: Data(name='enc_06_ff_conv2_output', shape=(None, 512))
  181. debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv2': <tf.Tensor 'enc_06_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
  182. layer root/'enc_06_ff_drop' output: Data(name='enc_06_ff_drop_output', shape=(None, 512))
  183. debug_add_check_numerics_on_output: add for layer 'enc_06_ff_drop': <tf.Tensor 'enc_06_ff_conv2/enc_06_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  184. layer root/'enc_06_ff_out' output: Data(name='enc_06_ff_out_output', shape=(None, 512))
  185. debug_add_check_numerics_on_output: add for layer 'enc_06_ff_out': <tf.Tensor 'enc_06_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
  186. layer root/'enc_06' output: Data(name='enc_06_output', shape=(None, 512))
  187. debug_add_check_numerics_on_output: add for layer 'enc_06': <tf.Tensor 'enc_06_ff_out/enc_06_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
  188. layer root/'encoder' output: Data(name='encoder_output', shape=(None, 512))
  189. debug_add_check_numerics_on_output: add for layer 'encoder': <tf.Tensor 'encoder/add:0' shape=(?, ?, 512) dtype=float32>
  190. layer root/'dec_01_att_key0' output: Data(name='dec_01_att_key0_output', shape=(None, 512))
  191. debug_add_check_numerics_on_output: add for layer 'dec_01_att_key0': <tf.Tensor 'dec_01_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  192. layer root/'dec_01_att_key' output: Data(name='dec_01_att_key_output', shape=(None, 8, 64))
  193. debug_add_check_numerics_on_output: add for layer 'dec_01_att_key': <tf.Tensor 'dec_01_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  194. layer root/'dec_03_att_key0' output: Data(name='dec_03_att_key0_output', shape=(None, 512))
  195. debug_add_check_numerics_on_output: add for layer 'dec_03_att_key0': <tf.Tensor 'dec_03_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  196. layer root/'dec_03_att_key' output: Data(name='dec_03_att_key_output', shape=(None, 8, 64))
  197. debug_add_check_numerics_on_output: add for layer 'dec_03_att_key': <tf.Tensor 'dec_03_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  198. layer root/'dec_04_att_value0' output: Data(name='dec_04_att_value0_output', shape=(None, 512))
  199. debug_add_check_numerics_on_output: add for layer 'dec_04_att_value0': <tf.Tensor 'dec_04_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  200. layer root/'dec_04_att_value' output: Data(name='dec_04_att_value_output', shape=(None, 8, 64))
  201. debug_add_check_numerics_on_output: add for layer 'dec_04_att_value': <tf.Tensor 'dec_04_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  202. layer root/'dec_06_att_key0' output: Data(name='dec_06_att_key0_output', shape=(None, 512))
  203. debug_add_check_numerics_on_output: add for layer 'dec_06_att_key0': <tf.Tensor 'dec_06_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  204. layer root/'dec_06_att_key' output: Data(name='dec_06_att_key_output', shape=(None, 8, 64))
  205. debug_add_check_numerics_on_output: add for layer 'dec_06_att_key': <tf.Tensor 'dec_06_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  206. layer root/'dec_06_att_value0' output: Data(name='dec_06_att_value0_output', shape=(None, 512))
  207. debug_add_check_numerics_on_output: add for layer 'dec_06_att_value0': <tf.Tensor 'dec_06_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  208. layer root/'dec_06_att_value' output: Data(name='dec_06_att_value_output', shape=(None, 8, 64))
  209. debug_add_check_numerics_on_output: add for layer 'dec_06_att_value': <tf.Tensor 'dec_06_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  210. layer root/'dec_03_att_value0' output: Data(name='dec_03_att_value0_output', shape=(None, 512))
  211. debug_add_check_numerics_on_output: add for layer 'dec_03_att_value0': <tf.Tensor 'dec_03_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  212. layer root/'dec_03_att_value' output: Data(name='dec_03_att_value_output', shape=(None, 8, 64))
  213. debug_add_check_numerics_on_output: add for layer 'dec_03_att_value': <tf.Tensor 'dec_03_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  214. layer root/'dec_05_att_value0' output: Data(name='dec_05_att_value0_output', shape=(None, 512))
  215. debug_add_check_numerics_on_output: add for layer 'dec_05_att_value0': <tf.Tensor 'dec_05_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  216. layer root/'dec_05_att_value' output: Data(name='dec_05_att_value_output', shape=(None, 8, 64))
  217. debug_add_check_numerics_on_output: add for layer 'dec_05_att_value': <tf.Tensor 'dec_05_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  218. layer root/'dec_02_att_key0' output: Data(name='dec_02_att_key0_output', shape=(None, 512))
  219. debug_add_check_numerics_on_output: add for layer 'dec_02_att_key0': <tf.Tensor 'dec_02_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  220. layer root/'dec_02_att_key' output: Data(name='dec_02_att_key_output', shape=(None, 8, 64))
  221. debug_add_check_numerics_on_output: add for layer 'dec_02_att_key': <tf.Tensor 'dec_02_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  222. layer root/'dec_01_att_value0' output: Data(name='dec_01_att_value0_output', shape=(None, 512))
  223. debug_add_check_numerics_on_output: add for layer 'dec_01_att_value0': <tf.Tensor 'dec_01_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  224. layer root/'dec_01_att_value' output: Data(name='dec_01_att_value_output', shape=(None, 8, 64))
  225. debug_add_check_numerics_on_output: add for layer 'dec_01_att_value': <tf.Tensor 'dec_01_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  226. layer root/'dec_04_att_key0' output: Data(name='dec_04_att_key0_output', shape=(None, 512))
  227. debug_add_check_numerics_on_output: add for layer 'dec_04_att_key0': <tf.Tensor 'dec_04_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  228. layer root/'dec_04_att_key' output: Data(name='dec_04_att_key_output', shape=(None, 8, 64))
  229. debug_add_check_numerics_on_output: add for layer 'dec_04_att_key': <tf.Tensor 'dec_04_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  230. layer root/'dec_05_att_key0' output: Data(name='dec_05_att_key0_output', shape=(None, 512))
  231. debug_add_check_numerics_on_output: add for layer 'dec_05_att_key0': <tf.Tensor 'dec_05_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  232. layer root/'dec_05_att_key' output: Data(name='dec_05_att_key_output', shape=(None, 8, 64))
  233. debug_add_check_numerics_on_output: add for layer 'dec_05_att_key': <tf.Tensor 'dec_05_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  234. layer root/'dec_02_att_value0' output: Data(name='dec_02_att_value0_output', shape=(None, 512))
  235. debug_add_check_numerics_on_output: add for layer 'dec_02_att_value0': <tf.Tensor 'dec_02_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
  236. layer root/'dec_02_att_value' output: Data(name='dec_02_att_value_output', shape=(None, 8, 64))
  237. debug_add_check_numerics_on_output: add for layer 'dec_02_att_value': <tf.Tensor 'dec_02_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
  238. layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)
  239. Rec layer sub net:
  240. Input layers moved out of loop: (#: 1)
  241. encoder_int
  242. Output layers moved out of loop: (#: 0)
  243. None
  244. Layers in loop: (#: 142)
  245. end
  246. output
  247. output_prob
  248. dec_06_att_weights
  249. dec_06_att_energy
  250. dec_06_att_query
  251. dec_06_att_query0
  252. dec_06_att_laynorm
  253. dec_06_self_att_out
  254. dec_05
  255. dec_05_ff_out
  256. dec_05_att_out
  257. dec_05_att_drop
  258. dec_05_att_lin
  259. dec_05_att_att
  260. dec_05_att0
  261. dec_05_att_weights_drop
  262. dec_05_att_weights
  263. dec_05_att_energy
  264. dec_05_att_query
  265. dec_05_att_query0
  266. dec_05_att_laynorm
  267. dec_05_self_att_out
  268. dec_04
  269. dec_04_ff_out
  270. dec_04_att_out
  271. dec_04_att_drop
  272. dec_04_att_lin
  273. dec_04_att_att
  274. dec_04_att0
  275. dec_04_att_weights_drop
  276. dec_04_att_weights
  277. dec_04_att_energy
  278. dec_04_att_query
  279. dec_04_att_query0
  280. dec_04_att_laynorm
  281. dec_04_self_att_out
  282. dec_03
  283. dec_03_ff_out
  284. dec_03_att_out
  285. dec_03_att_drop
  286. dec_03_att_lin
  287. dec_03_att_att
  288. dec_03_att0
  289. dec_03_att_weights_drop
  290. dec_03_att_weights
  291. dec_03_att_energy
  292. dec_03_att_query
  293. dec_03_att_query0
  294. dec_03_att_laynorm
  295. dec_03_self_att_out
  296. dec_02
  297. dec_02_ff_out
  298. dec_02_att_out
  299. dec_02_att_drop
  300. dec_02_att_lin
  301. dec_02_att_att
  302. dec_02_att0
  303. dec_02_att_weights_drop
  304. dec_02_att_weights
  305. dec_02_att_energy
  306. dec_02_att_query
  307. dec_02_att_query0
  308. dec_02_att_laynorm
  309. dec_02_self_att_out
  310. dec_01
  311. dec_01_ff_out
  312. dec_01_att_out
  313. dec_01_att_drop
  314. dec_01_att_lin
  315. dec_01_att_att
  316. dec_01_att0
  317. dec_01_att_weights_drop
  318. dec_01_att_weights
  319. dec_01_att_energy
  320. dec_01_att_query
  321. dec_01_att_query0
  322. dec_01_att_laynorm
  323. dec_01_self_att_out
  324. dec_01_self_att_drop
  325. dec_01_self_att_lin
  326. dec_01_self_att_att
  327. dec_01_self_att_laynorm
  328. target_embed
  329. target_embed_with_pos
  330. target_embed_weighted
  331. target_embed_raw
  332. dec_01_ff_drop
  333. dec_01_ff_conv2
  334. dec_01_ff_conv1
  335. dec_01_ff_laynorm
  336. dec_02_self_att_drop
  337. dec_02_self_att_lin
  338. dec_02_self_att_att
  339. dec_02_self_att_laynorm
  340. dec_02_ff_drop
  341. dec_02_ff_conv2
  342. dec_02_ff_conv1
  343. dec_02_ff_laynorm
  344. dec_03_self_att_drop
  345. dec_03_self_att_lin
  346. dec_03_self_att_att
  347. dec_03_self_att_laynorm
  348. dec_03_ff_drop
  349. dec_03_ff_conv2
  350. dec_03_ff_conv1
  351. dec_03_ff_laynorm
  352. dec_04_self_att_drop
  353. dec_04_self_att_lin
  354. dec_04_self_att_att
  355. dec_04_self_att_laynorm
  356. dec_04_ff_drop
  357. dec_04_ff_conv2
  358. dec_04_ff_conv1
  359. dec_04_ff_laynorm
  360. dec_05_self_att_drop
  361. dec_05_self_att_lin
  362. dec_05_self_att_att
  363. dec_05_self_att_laynorm
  364. dec_05_ff_drop
  365. dec_05_ff_conv2
  366. dec_05_ff_conv1
  367. dec_05_ff_laynorm
  368. dec_06_self_att_drop
  369. dec_06_self_att_lin
  370. dec_06_self_att_att
  371. dec_06_self_att_laynorm
  372. decoder_int
  373. decoder
  374. dec_06
  375. dec_06_ff_out
  376. dec_06_att_out
  377. dec_06_att_drop
  378. dec_06_att_lin
  379. dec_06_att_att
  380. dec_06_att0
  381. dec_06_att_weights_drop
  382. dec_06_ff_drop
  383. dec_06_ff_conv2
  384. dec_06_ff_conv1
  385. dec_06_ff_laynorm
  386. prev_outputs_int
  387. Unused layers: (#: 0)
  388. None
  389. layer root/output:rec-subnet-input/'encoder_int' output: Data(name='encoder_int_output', shape=(None, 1000))
  390. debug_add_check_numerics_on_output: add for layer 'encoder_int': <tf.Tensor 'output/rec/encoder_int/linear/dot/Reshape_1:0' shape=(?, ?, 1000) dtype=float32>
  391. Exception creating layer root/'output' of class RecLayer with opts:
  392. {'max_seq_len': <tf.Tensor 'mul:0' shape=() dtype=int32>,
  393. 'n_out': None,
  394. 'name': 'output',
  395. 'network': <TFNetwork 'root' train=False search>,
  396. 'output': Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12),
  397. 'sources': [],
  398. 'target': 'classes',
  399. 'unit': {'dec_01': {'class': 'copy', 'from': ['dec_01_ff_out']},
  400. 'dec_01_att0': {'base': 'base:dec_01_att_value',
  401. 'class': 'generic_attention',
  402. 'weights': 'dec_01_att_weights_drop'},
  403. 'dec_01_att_att': {'axes': 'static',
  404. 'class': 'merge_dims',
  405. 'from': ['dec_01_att0']},
  406. 'dec_01_att_drop': {'class': 'dropout',
  407. 'dropout': 0.1,
  408. 'from': ['dec_01_att_lin']},
  409. 'dec_01_att_energy': {'class': 'dot',
  410. 'from': ['base:dec_01_att_key',
  411. 'dec_01_att_query'],
  412. 'red1': -1,
  413. 'red2': -1,
  414. 'var1': 'T',
  415. 'var2': 'T?'},
  416. 'dec_01_att_laynorm': {'class': 'layer_norm',
  417. 'from': ['dec_01_self_att_out']},
  418. 'dec_01_att_lin': {'activation': None,
  419. 'class': 'linear',
  420. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  421. "distribution='uniform', "
  422. 'scale=0.78)',
  423. 'from': ['dec_01_att_att'],
  424. 'n_out': 512,
  425. 'with_bias': False},
  426. 'dec_01_att_out': {'class': 'combine',
  427. 'from': ['dec_01_self_att_out', 'dec_01_att_drop'],
  428. 'kind': 'add',
  429. 'n_out': 512},
  430. 'dec_01_att_query': {'axis': 'F',
  431. 'class': 'split_dims',
  432. 'dims': (8, 64),
  433. 'from': ['dec_01_att_query0']},
  434. 'dec_01_att_query0': {'activation': None,
  435. 'class': 'linear',
  436. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  437. "distribution='uniform', "
  438. 'scale=0.78)',
  439. 'from': ['dec_01_att_laynorm'],
  440. 'n_out': 512,
  441. 'with_bias': False},
  442. 'dec_01_att_weights': {'class': 'softmax_over_spatial',
  443. 'energy_factor': 0.125,
  444. 'from': ['dec_01_att_energy']},
  445. 'dec_01_att_weights_drop': {'class': 'dropout',
  446. 'dropout': 0.1,
  447. 'dropout_noise_shape': {'*': None},
  448. 'from': ['dec_01_att_weights']},
  449. 'dec_01_ff_conv1': {'activation': 'relu',
  450. 'class': 'linear',
  451. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  452. "distribution='uniform', "
  453. 'scale=0.78)',
  454. 'from': ['dec_01_ff_laynorm'],
  455. 'n_out': 2048,
  456. 'with_bias': True},
  457. 'dec_01_ff_conv2': {'activation': None,
  458. 'class': 'linear',
  459. 'dropout': 0.1,
  460. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  461. "distribution='uniform', "
  462. 'scale=0.78)',
  463. 'from': ['dec_01_ff_conv1'],
  464. 'n_out': 512,
  465. 'with_bias': True},
  466. 'dec_01_ff_drop': {'class': 'dropout',
  467. 'dropout': 0.1,
  468. 'from': ['dec_01_ff_conv2']},
  469. 'dec_01_ff_laynorm': {'class': 'layer_norm',
  470. 'from': ['dec_01_att_out']},
  471. 'dec_01_ff_out': {'class': 'combine',
  472. 'from': ['dec_01_att_out', 'dec_01_ff_drop'],
  473. 'kind': 'add',
  474. 'n_out': 512},
  475. 'dec_01_self_att_att': {'attention_dropout': 0.1,
  476. 'attention_left_only': True,
  477. 'class': 'self_attention',
  478. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  479. "distribution='uniform', "
  480. 'scale=0.78)',
  481. 'from': ['dec_01_self_att_laynorm'],
  482. 'n_out': 512,
  483. 'num_heads': 8,
  484. 'total_key_dim': 512},
  485. 'dec_01_self_att_drop': {'class': 'dropout',
  486. 'dropout': 0.1,
  487. 'from': ['dec_01_self_att_lin']},
  488. 'dec_01_self_att_laynorm': {'class': 'layer_norm',
  489. 'from': ['target_embed']},
  490. 'dec_01_self_att_lin': {'activation': None,
  491. 'class': 'linear',
  492. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  493. "distribution='uniform', "
  494. 'scale=0.78)',
  495. 'from': ['dec_01_self_att_att'],
  496. 'n_out': 512,
  497. 'with_bias': False},
  498. 'dec_01_self_att_out': {'class': 'combine',
  499. 'from': ['target_embed',
  500. 'dec_01_self_att_drop'],
  501. 'kind': 'add',
  502. 'n_out': 512},
  503. 'dec_02': {'class': 'copy', 'from': ['dec_02_ff_out']},
  504. 'dec_02_att0': {'base': 'base:dec_02_att_value',
  505. 'class': 'generic_attention',
  506. 'weights': 'dec_02_att_weights_drop'},
  507. 'dec_02_att_att': {'axes': 'static',
  508. 'class': 'merge_dims',
  509. 'from': ['dec_02_att0']},
  510. 'dec_02_att_drop': {'class': 'dropout',
  511. 'dropout': 0.1,
  512. 'from': ['dec_02_att_lin']},
  513. 'dec_02_att_energy': {'class': 'dot',
  514. 'from': ['base:dec_02_att_key',
  515. 'dec_02_att_query'],
  516. 'red1': -1,
  517. 'red2': -1,
  518. 'var1': 'T',
  519. 'var2': 'T?'},
  520. 'dec_02_att_laynorm': {'class': 'layer_norm',
  521. 'from': ['dec_02_self_att_out']},
  522. 'dec_02_att_lin': {'activation': None,
  523. 'class': 'linear',
  524. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  525. "distribution='uniform', "
  526. 'scale=0.78)',
  527. 'from': ['dec_02_att_att'],
  528. 'n_out': 512,
  529. 'with_bias': False},
  530. 'dec_02_att_out': {'class': 'combine',
  531. 'from': ['dec_02_self_att_out', 'dec_02_att_drop'],
  532. 'kind': 'add',
  533. 'n_out': 512},
  534. 'dec_02_att_query': {'axis': 'F',
  535. 'class': 'split_dims',
  536. 'dims': (8, 64),
  537. 'from': ['dec_02_att_query0']},
  538. 'dec_02_att_query0': {'activation': None,
  539. 'class': 'linear',
  540. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  541. "distribution='uniform', "
  542. 'scale=0.78)',
  543. 'from': ['dec_02_att_laynorm'],
  544. 'n_out': 512,
  545. 'with_bias': False},
  546. 'dec_02_att_weights': {'class': 'softmax_over_spatial',
  547. 'energy_factor': 0.125,
  548. 'from': ['dec_02_att_energy']},
  549. 'dec_02_att_weights_drop': {'class': 'dropout',
  550. 'dropout': 0.1,
  551. 'dropout_noise_shape': {'*': None},
  552. 'from': ['dec_02_att_weights']},
  553. 'dec_02_ff_conv1': {'activation': 'relu',
  554. 'class': 'linear',
  555. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  556. "distribution='uniform', "
  557. 'scale=0.78)',
  558. 'from': ['dec_02_ff_laynorm'],
  559. 'n_out': 2048,
  560. 'with_bias': True},
  561. 'dec_02_ff_conv2': {'activation': None,
  562. 'class': 'linear',
  563. 'dropout': 0.1,
  564. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  565. "distribution='uniform', "
  566. 'scale=0.78)',
  567. 'from': ['dec_02_ff_conv1'],
  568. 'n_out': 512,
  569. 'with_bias': True},
  570. 'dec_02_ff_drop': {'class': 'dropout',
  571. 'dropout': 0.1,
  572. 'from': ['dec_02_ff_conv2']},
  573. 'dec_02_ff_laynorm': {'class': 'layer_norm',
  574. 'from': ['dec_02_att_out']},
  575. 'dec_02_ff_out': {'class': 'combine',
  576. 'from': ['dec_02_att_out', 'dec_02_ff_drop'],
  577. 'kind': 'add',
  578. 'n_out': 512},
  579. 'dec_02_self_att_att': {'attention_dropout': 0.1,
  580. 'attention_left_only': True,
  581. 'class': 'self_attention',
  582. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  583. "distribution='uniform', "
  584. 'scale=0.78)',
  585. 'from': ['dec_02_self_att_laynorm'],
  586. 'n_out': 512,
  587. 'num_heads': 8,
  588. 'total_key_dim': 512},
  589. 'dec_02_self_att_drop': {'class': 'dropout',
  590. 'dropout': 0.1,
  591. 'from': ['dec_02_self_att_lin']},
  592. 'dec_02_self_att_laynorm': {'class': 'layer_norm',
  593. 'from': ['dec_01']},
  594. 'dec_02_self_att_lin': {'activation': None,
  595. 'class': 'linear',
  596. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  597. "distribution='uniform', "
  598. 'scale=0.78)',
  599. 'from': ['dec_02_self_att_att'],
  600. 'n_out': 512,
  601. 'with_bias': False},
  602. 'dec_02_self_att_out': {'class': 'combine',
  603. 'from': ['dec_01', 'dec_02_self_att_drop'],
  604. 'kind': 'add',
  605. 'n_out': 512},
  606. 'dec_03': {'class': 'copy', 'from': ['dec_03_ff_out']},
  607. 'dec_03_att0': {'base': 'base:dec_03_att_value',
  608. 'class': 'generic_attention',
  609. 'weights': 'dec_03_att_weights_drop'},
  610. 'dec_03_att_att': {'axes': 'static',
  611. 'class': 'merge_dims',
  612. 'from': ['dec_03_att0']},
  613. 'dec_03_att_drop': {'class': 'dropout',
  614. 'dropout': 0.1,
  615. 'from': ['dec_03_att_lin']},
  616. 'dec_03_att_energy': {'class': 'dot',
  617. 'from': ['base:dec_03_att_key',
  618. 'dec_03_att_query'],
  619. 'red1': -1,
  620. 'red2': -1,
  621. 'var1': 'T',
  622. 'var2': 'T?'},
  623. 'dec_03_att_laynorm': {'class': 'layer_norm',
  624. 'from': ['dec_03_self_att_out']},
  625. 'dec_03_att_lin': {'activation': None,
  626. 'class': 'linear',
  627. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  628. "distribution='uniform', "
  629. 'scale=0.78)',
  630. 'from': ['dec_03_att_att'],
  631. 'n_out': 512,
  632. 'with_bias': False},
  633. 'dec_03_att_out': {'class': 'combine',
  634. 'from': ['dec_03_self_att_out', 'dec_03_att_drop'],
  635. 'kind': 'add',
  636. 'n_out': 512},
  637. 'dec_03_att_query': {'axis': 'F',
  638. 'class': 'split_dims',
  639. 'dims': (8, 64),
  640. 'from': ['dec_03_att_query0']},
  641. 'dec_03_att_query0': {'activation': None,
  642. 'class': 'linear',
  643. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  644. "distribution='uniform', "
  645. 'scale=0.78)',
  646. 'from': ['dec_03_att_laynorm'],
  647. 'n_out': 512,
  648. 'with_bias': False},
  649. 'dec_03_att_weights': {'class': 'softmax_over_spatial',
  650. 'energy_factor': 0.125,
  651. 'from': ['dec_03_att_energy']},
  652. 'dec_03_att_weights_drop': {'class': 'dropout',
  653. 'dropout': 0.1,
  654. 'dropout_noise_shape': {'*': None},
  655. 'from': ['dec_03_att_weights']},
  656. 'dec_03_ff_conv1': {'activation': 'relu',
  657. 'class': 'linear',
  658. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  659. "distribution='uniform', "
  660. 'scale=0.78)',
  661. 'from': ['dec_03_ff_laynorm'],
  662. 'n_out': 2048,
  663. 'with_bias': True},
  664. 'dec_03_ff_conv2': {'activation': None,
  665. 'class': 'linear',
  666. 'dropout': 0.1,
  667. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  668. "distribution='uniform', "
  669. 'scale=0.78)',
  670. 'from': ['dec_03_ff_conv1'],
  671. 'n_out': 512,
  672. 'with_bias': True},
  673. 'dec_03_ff_drop': {'class': 'dropout',
  674. 'dropout': 0.1,
  675. 'from': ['dec_03_ff_conv2']},
  676. 'dec_03_ff_laynorm': {'class': 'layer_norm',
  677. 'from': ['dec_03_att_out']},
  678. 'dec_03_ff_out': {'class': 'combine',
  679. 'from': ['dec_03_att_out', 'dec_03_ff_drop'],
  680. 'kind': 'add',
  681. 'n_out': 512},
  682. 'dec_03_self_att_att': {'attention_dropout': 0.1,
  683. 'attention_left_only': True,
  684. 'class': 'self_attention',
  685. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  686. "distribution='uniform', "
  687. 'scale=0.78)',
  688. 'from': ['dec_03_self_att_laynorm'],
  689. 'n_out': 512,
  690. 'num_heads': 8,
  691. 'total_key_dim': 512},
  692. 'dec_03_self_att_drop': {'class': 'dropout',
  693. 'dropout': 0.1,
  694. 'from': ['dec_03_self_att_lin']},
  695. 'dec_03_self_att_laynorm': {'class': 'layer_norm',
  696. 'from': ['dec_02']},
  697. 'dec_03_self_att_lin': {'activation': None,
  698. 'class': 'linear',
  699. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  700. "distribution='uniform', "
  701. 'scale=0.78)',
  702. 'from': ['dec_03_self_att_att'],
  703. 'n_out': 512,
  704. 'with_bias': False},
  705. 'dec_03_self_att_out': {'class': 'combine',
  706. 'from': ['dec_02', 'dec_03_self_att_drop'],
  707. 'kind': 'add',
  708. 'n_out': 512},
  709. 'dec_04': {'class': 'copy', 'from': ['dec_04_ff_out']},
  710. 'dec_04_att0': {'base': 'base:dec_04_att_value',
  711. 'class': 'generic_attention',
  712. 'weights': 'dec_04_att_weights_drop'},
  713. 'dec_04_att_att': {'axes': 'static',
  714. 'class': 'merge_dims',
  715. 'from': ['dec_04_att0']},
  716. 'dec_04_att_drop': {'class': 'dropout',
  717. 'dropout': 0.1,
  718. 'from': ['dec_04_att_lin']},
  719. 'dec_04_att_energy': {'class': 'dot',
  720. 'from': ['base:dec_04_att_key',
  721. 'dec_04_att_query'],
  722. 'red1': -1,
  723. 'red2': -1,
  724. 'var1': 'T',
  725. 'var2': 'T?'},
  726. 'dec_04_att_laynorm': {'class': 'layer_norm',
  727. 'from': ['dec_04_self_att_out']},
  728. 'dec_04_att_lin': {'activation': None,
  729. 'class': 'linear',
  730. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  731. "distribution='uniform', "
  732. 'scale=0.78)',
  733. 'from': ['dec_04_att_att'],
  734. 'n_out': 512,
  735. 'with_bias': False},
  736. 'dec_04_att_out': {'class': 'combine',
  737. 'from': ['dec_04_self_att_out', 'dec_04_att_drop'],
  738. 'kind': 'add',
  739. 'n_out': 512},
  740. 'dec_04_att_query': {'axis': 'F',
  741. 'class': 'split_dims',
  742. 'dims': (8, 64),
  743. 'from': ['dec_04_att_query0']},
  744. 'dec_04_att_query0': {'activation': None,
  745. 'class': 'linear',
  746. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  747. "distribution='uniform', "
  748. 'scale=0.78)',
  749. 'from': ['dec_04_att_laynorm'],
  750. 'n_out': 512,
  751. 'with_bias': False},
  752. 'dec_04_att_weights': {'class': 'softmax_over_spatial',
  753. 'energy_factor': 0.125,
  754. 'from': ['dec_04_att_energy']},
  755. 'dec_04_att_weights_drop': {'class': 'dropout',
  756. 'dropout': 0.1,
  757. 'dropout_noise_shape': {'*': None},
  758. 'from': ['dec_04_att_weights']},
  759. 'dec_04_ff_conv1': {'activation': 'relu',
  760. 'class': 'linear',
  761. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  762. "distribution='uniform', "
  763. 'scale=0.78)',
  764. 'from': ['dec_04_ff_laynorm'],
  765. 'n_out': 2048,
  766. 'with_bias': True},
  767. 'dec_04_ff_conv2': {'activation': None,
  768. 'class': 'linear',
  769. 'dropout': 0.1,
  770. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  771. "distribution='uniform', "
  772. 'scale=0.78)',
  773. 'from': ['dec_04_ff_conv1'],
  774. 'n_out': 512,
  775. 'with_bias': True},
  776. 'dec_04_ff_drop': {'class': 'dropout',
  777. 'dropout': 0.1,
  778. 'from': ['dec_04_ff_conv2']},
  779. 'dec_04_ff_laynorm': {'class': 'layer_norm',
  780. 'from': ['dec_04_att_out']},
  781. 'dec_04_ff_out': {'class': 'combine',
  782. 'from': ['dec_04_att_out', 'dec_04_ff_drop'],
  783. 'kind': 'add',
  784. 'n_out': 512},
  785. 'dec_04_self_att_att': {'attention_dropout': 0.1,
  786. 'attention_left_only': True,
  787. 'class': 'self_attention',
  788. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  789. "distribution='uniform', "
  790. 'scale=0.78)',
  791. 'from': ['dec_04_self_att_laynorm'],
  792. 'n_out': 512,
  793. 'num_heads': 8,
  794. 'total_key_dim': 512},
  795. 'dec_04_self_att_drop': {'class': 'dropout',
  796. 'dropout': 0.1,
  797. 'from': ['dec_04_self_att_lin']},
  798. 'dec_04_self_att_laynorm': {'class': 'layer_norm',
  799. 'from': ['dec_03']},
  800. 'dec_04_self_att_lin': {'activation': None,
  801. 'class': 'linear',
  802. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  803. "distribution='uniform', "
  804. 'scale=0.78)',
  805. 'from': ['dec_04_self_att_att'],
  806. 'n_out': 512,
  807. 'with_bias': False},
  808. 'dec_04_self_att_out': {'class': 'combine',
  809. 'from': ['dec_03', 'dec_04_self_att_drop'],
  810. 'kind': 'add',
  811. 'n_out': 512},
  812. 'dec_05': {'class': 'copy', 'from': ['dec_05_ff_out']},
  813. 'dec_05_att0': {'base': 'base:dec_05_att_value',
  814. 'class': 'generic_attention',
  815. 'weights': 'dec_05_att_weights_drop'},
  816. 'dec_05_att_att': {'axes': 'static',
  817. 'class': 'merge_dims',
  818. 'from': ['dec_05_att0']},
  819. 'dec_05_att_drop': {'class': 'dropout',
  820. 'dropout': 0.1,
  821. 'from': ['dec_05_att_lin']},
  822. 'dec_05_att_energy': {'class': 'dot',
  823. 'from': ['base:dec_05_att_key',
  824. 'dec_05_att_query'],
  825. 'red1': -1,
  826. 'red2': -1,
  827. 'var1': 'T',
  828. 'var2': 'T?'},
  829. 'dec_05_att_laynorm': {'class': 'layer_norm',
  830. 'from': ['dec_05_self_att_out']},
  831. 'dec_05_att_lin': {'activation': None,
  832. 'class': 'linear',
  833. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  834. "distribution='uniform', "
  835. 'scale=0.78)',
  836. 'from': ['dec_05_att_att'],
  837. 'n_out': 512,
  838. 'with_bias': False},
  839. 'dec_05_att_out': {'class': 'combine',
  840. 'from': ['dec_05_self_att_out', 'dec_05_att_drop'],
  841. 'kind': 'add',
  842. 'n_out': 512},
  843. 'dec_05_att_query': {'axis': 'F',
  844. 'class': 'split_dims',
  845. 'dims': (8, 64),
  846. 'from': ['dec_05_att_query0']},
  847. 'dec_05_att_query0': {'activation': None,
  848. 'class': 'linear',
  849. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  850. "distribution='uniform', "
  851. 'scale=0.78)',
  852. 'from': ['dec_05_att_laynorm'],
  853. 'n_out': 512,
  854. 'with_bias': False},
  855. 'dec_05_att_weights': {'class': 'softmax_over_spatial',
  856. 'energy_factor': 0.125,
  857. 'from': ['dec_05_att_energy']},
  858. 'dec_05_att_weights_drop': {'class': 'dropout',
  859. 'dropout': 0.1,
  860. 'dropout_noise_shape': {'*': None},
  861. 'from': ['dec_05_att_weights']},
  862. 'dec_05_ff_conv1': {'activation': 'relu',
  863. 'class': 'linear',
  864. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  865. "distribution='uniform', "
  866. 'scale=0.78)',
  867. 'from': ['dec_05_ff_laynorm'],
  868. 'n_out': 2048,
  869. 'with_bias': True},
  870. 'dec_05_ff_conv2': {'activation': None,
  871. 'class': 'linear',
  872. 'dropout': 0.1,
  873. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  874. "distribution='uniform', "
  875. 'scale=0.78)',
  876. 'from': ['dec_05_ff_conv1'],
  877. 'n_out': 512,
  878. 'with_bias': True},
  879. 'dec_05_ff_drop': {'class': 'dropout',
  880. 'dropout': 0.1,
  881. 'from': ['dec_05_ff_conv2']},
  882. 'dec_05_ff_laynorm': {'class': 'layer_norm',
  883. 'from': ['dec_05_att_out']},
  884. 'dec_05_ff_out': {'class': 'combine',
  885. 'from': ['dec_05_att_out', 'dec_05_ff_drop'],
  886. 'kind': 'add',
  887. 'n_out': 512},
  888. 'dec_05_self_att_att': {'attention_dropout': 0.1,
  889. 'attention_left_only': True,
  890. 'class': 'self_attention',
  891. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  892. "distribution='uniform', "
  893. 'scale=0.78)',
  894. 'from': ['dec_05_self_att_laynorm'],
  895. 'n_out': 512,
  896. 'num_heads': 8,
  897. 'total_key_dim': 512},
  898. 'dec_05_self_att_drop': {'class': 'dropout',
  899. 'dropout': 0.1,
  900. 'from': ['dec_05_self_att_lin']},
  901. 'dec_05_self_att_laynorm': {'class': 'layer_norm',
  902. 'from': ['dec_04']},
  903. 'dec_05_self_att_lin': {'activation': None,
  904. 'class': 'linear',
  905. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  906. "distribution='uniform', "
  907. 'scale=0.78)',
  908. 'from': ['dec_05_self_att_att'],
  909. 'n_out': 512,
  910. 'with_bias': False},
  911. 'dec_05_self_att_out': {'class': 'combine',
  912. 'from': ['dec_04', 'dec_05_self_att_drop'],
  913. 'kind': 'add',
  914. 'n_out': 512},
  915. 'dec_06': {'class': 'copy', 'from': ['dec_06_ff_out']},
  916. 'dec_06_att0': {'base': 'base:dec_06_att_value',
  917. 'class': 'generic_attention',
  918. 'weights': 'dec_06_att_weights_drop'},
  919. 'dec_06_att_att': {'axes': 'static',
  920. 'class': 'merge_dims',
  921. 'from': ['dec_06_att0']},
  922. 'dec_06_att_drop': {'class': 'dropout',
  923. 'dropout': 0.1,
  924. 'from': ['dec_06_att_lin']},
  925. 'dec_06_att_energy': {'class': 'dot',
  926. 'from': ['base:dec_06_att_key',
  927. 'dec_06_att_query'],
  928. 'red1': -1,
  929. 'red2': -1,
  930. 'var1': 'T',
  931. 'var2': 'T?'},
  932. 'dec_06_att_laynorm': {'class': 'layer_norm',
  933. 'from': ['dec_06_self_att_out']},
  934. 'dec_06_att_lin': {'activation': None,
  935. 'class': 'linear',
  936. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  937. "distribution='uniform', "
  938. 'scale=0.78)',
  939. 'from': ['dec_06_att_att'],
  940. 'n_out': 512,
  941. 'with_bias': False},
  942. 'dec_06_att_out': {'class': 'combine',
  943. 'from': ['dec_06_self_att_out', 'dec_06_att_drop'],
  944. 'kind': 'add',
  945. 'n_out': 512},
  946. 'dec_06_att_query': {'axis': 'F',
  947. 'class': 'split_dims',
  948. 'dims': (8, 64),
  949. 'from': ['dec_06_att_query0']},
  950. 'dec_06_att_query0': {'activation': None,
  951. 'class': 'linear',
  952. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  953. "distribution='uniform', "
  954. 'scale=0.78)',
  955. 'from': ['dec_06_att_laynorm'],
  956. 'n_out': 512,
  957. 'with_bias': False},
  958. 'dec_06_att_weights': {'class': 'softmax_over_spatial',
  959. 'energy_factor': 0.125,
  960. 'from': ['dec_06_att_energy']},
  961. 'dec_06_att_weights_drop': {'class': 'dropout',
  962. 'dropout': 0.1,
  963. 'dropout_noise_shape': {'*': None},
  964. 'from': ['dec_06_att_weights']},
  965. 'dec_06_ff_conv1': {'activation': 'relu',
  966. 'class': 'linear',
  967. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  968. "distribution='uniform', "
  969. 'scale=0.78)',
  970. 'from': ['dec_06_ff_laynorm'],
  971. 'n_out': 2048,
  972. 'with_bias': True},
  973. 'dec_06_ff_conv2': {'activation': None,
  974. 'class': 'linear',
  975. 'dropout': 0.1,
  976. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  977. "distribution='uniform', "
  978. 'scale=0.78)',
  979. 'from': ['dec_06_ff_conv1'],
  980. 'n_out': 512,
  981. 'with_bias': True},
  982. 'dec_06_ff_drop': {'class': 'dropout',
  983. 'dropout': 0.1,
  984. 'from': ['dec_06_ff_conv2']},
  985. 'dec_06_ff_laynorm': {'class': 'layer_norm',
  986. 'from': ['dec_06_att_out']},
  987. 'dec_06_ff_out': {'class': 'combine',
  988. 'from': ['dec_06_att_out', 'dec_06_ff_drop'],
  989. 'kind': 'add',
  990. 'n_out': 512},
  991. 'dec_06_self_att_att': {'attention_dropout': 0.1,
  992. 'attention_left_only': True,
  993. 'class': 'self_attention',
  994. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  995. "distribution='uniform', "
  996. 'scale=0.78)',
  997. 'from': ['dec_06_self_att_laynorm'],
  998. 'n_out': 512,
  999. 'num_heads': 8,
  1000. 'total_key_dim': 512},
  1001. 'dec_06_self_att_drop': {'class': 'dropout',
  1002. 'dropout': 0.1,
  1003. 'from': ['dec_06_self_att_lin']},
  1004. 'dec_06_self_att_laynorm': {'class': 'layer_norm',
  1005. 'from': ['dec_05']},
  1006. 'dec_06_self_att_lin': {'activation': None,
  1007. 'class': 'linear',
  1008. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  1009. "distribution='uniform', "
  1010. 'scale=0.78)',
  1011. 'from': ['dec_06_self_att_att'],
  1012. 'n_out': 512,
  1013. 'with_bias': False},
  1014. 'dec_06_self_att_out': {'class': 'combine',
  1015. 'from': ['dec_05', 'dec_06_self_att_drop'],
  1016. 'kind': 'add',
  1017. 'n_out': 512},
  1018. 'decoder': {'class': 'layer_norm', 'from': ['dec_06'], 'n_out': 512},
  1019. 'decoder_int': {'activation': None,
  1020. 'class': 'linear',
  1021. 'from': ['decoder'],
  1022. 'n_out': 1000,
  1023. 'with_bias': False},
  1024. 'encoder_int': {'activation': None,
  1025. 'class': 'linear',
  1026. 'from': ['base:encoder'],
  1027. 'n_out': 1000,
  1028. 'with_bias': False},
  1029. 'end': {'class': 'compare', 'from': ['output'], 'value': 0},
  1030. 'output': {'beam_size': 12,
  1031. 'class': 'choice',
  1032. 'from': ['output_prob'],
  1033. 'initial_output': 0,
  1034. 'target': 'classes'},
  1035. 'output_prob': {'attention_weights': 'dec_06_att_weights',
  1036. 'base_encoder_transformed': 'encoder_int',
  1037. 'class': 'hmm_factorization',
  1038. 'debug': False,
  1039. 'from': 'dec_06_att_weights',
  1040. 'loss': 'ce',
  1041. 'n_out': 34908,
  1042. 'prev_outputs': 'prev_outputs_int',
  1043. 'prev_state': 'decoder_int',
  1044. 'target': 'classes',
  1045. 'threshold': None,
  1046. 'transpose_and_average_att_weights': True},
  1047. 'prev_outputs_int': {'activation': None,
  1048. 'class': 'linear',
  1049. 'from': ['prev:target_embed_raw'],
  1050. 'n_out': 1000,
  1051. 'with_bias': False},
  1052. 'target_embed': {'class': 'dropout',
  1053. 'dropout': 0.0,
  1054. 'from': ['target_embed_with_pos']},
  1055. 'target_embed_raw': {'activation': None,
  1056. 'class': 'linear',
  1057. 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
  1058. "distribution='uniform', "
  1059. 'scale=0.78)',
  1060. 'from': ['prev:output'],
  1061. 'n_out': 512,
  1062. 'with_bias': False},
  1063. 'target_embed_weighted': {'class': 'eval',
  1064. 'eval': 'source(0) * 22.627417',
  1065. 'from': ['target_embed_raw']},
  1066. 'target_embed_with_pos': {'add_to_input': True,
  1067. 'class': 'positional_encoding',
  1068. 'from': ['target_embed_weighted']}}}
  1069. EXCEPTION
  1070. Traceback (most recent call last):
  1071. File "/u/makarov/returnn-hmm-fac/rnn.py", line 591, in <module>
  1072.  line: main(sys.argv)
  1073.  locals:
  1074. main = <local> <function main at 0x7fe7b0180bf8>
  1075. sys = <local> <module 'sys' (built-in)>
  1076. sys.argv = <local> ['/u/makarov/returnn-hmm-fac/rnn.py', 'hmm-factorization/en-de/transformer-hmm', '++load_epoch', '114', '++device', 'gpu', '--task', 'search', '++search_data', 'config:dev', '++beam_size', '12', '++need_data', 'False', '++max_seq_length', '0', '++search_output_file', 'hmm-factorization/en-de/hyp/..., len = 20, _[0]: {len = 33}
  1077. File "/u/makarov/returnn-hmm-fac/rnn.py", line 579, in main
  1078.  line: executeMainTask()
  1079.  locals:
  1080. executeMainTask = <global> <function executeMainTask at 0x7fe7b0180ae8>
  1081. File "/u/makarov/returnn-hmm-fac/rnn.py", line 434, in executeMainTask
  1082.  line: engine.init_network_from_config(config)
  1083.  locals:
  1084. engine = <global> <TFEngine.Engine object at 0x7fe8124a5e48>
  1085. engine.init_network_from_config = <global> <bound method Engine.init_network_from_config of <TFEngine.Engine object at 0x7fe8124a5e48>>
  1086. config = <global> <Config.Config object at 0x7fe810aca080>
  1087. File "/u/makarov/returnn-hmm-fac/TFEngine.py", line 936, in init_network_from_config
  1088.  line: self._init_network(net_desc=net_dict, epoch=self.epoch)
  1089.  locals:
  1090. self = <local> <TFEngine.Engine object at 0x7fe8124a5e48>
  1091. self._init_network = <local> <bound method Engine._init_network of <TFEngine.Engine object at 0x7fe8124a5e48>>
  1092. net_desc = <not found>
  1093. net_dict = <local> {'enc_05_self_att_att': {'total_key_dim': 512, 'from': ['enc_05_self_att_laynorm'], 'class': 'self_attention', 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)", 'num_heads': 8, 'attention_left_only': False, 'attention_dropout': 0.1, 'n_out'..., len = 97
  1094. epoch = <local> 114
  1095. self.epoch = <local> 114
  1096. File "/u/makarov/returnn-hmm-fac/TFEngine.py", line 1059, in _init_network
  1097.  line: self.network, self.updater = self.create_network(
  1098. config=self.config,
  1099. rnd_seed=net_random_seed,
  1100. train_flag=train_flag, eval_flag=self.use_eval_flag, search_flag=self.use_search_flag,
  1101. initial_learning_rate=getattr(self, "initial_learning_rate", None),
  1102. net_dict=net_desc)
  1103.  locals:
  1104. self = <local> <TFEngine.Engine object at 0x7fe8124a5e48>
  1105. self.network = <local> None
  1106. self.updater = <local> None
  1107. self.create_network = <local> <bound method Engine.create_network of <class 'TFEngine.Engine'>>
  1108. config = <not found>
  1109. self.config = <local> <Config.Config object at 0x7fe810aca080>
  1110. rnd_seed = <not found>
  1111. net_random_seed = <local> 114
  1112. train_flag = <local> False
  1113. eval_flag = <not found>
  1114. self.use_eval_flag = <local> True
  1115. search_flag = <not found>
  1116. self.use_search_flag = <local> True
  1117. initial_learning_rate = <not found>
  1118. getattr = <builtin> <built-in function getattr>
  1119. net_dict = <not found>
  1120. net_desc = <local> {'enc_05_self_att_att': {'total_key_dim': 512, 'from': ['enc_05_self_att_laynorm'], 'class': 'self_attention', 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)", 'num_heads': 8, 'attention_left_only': False, 'attention_dropout': 0.1, 'n_out'..., len = 97
  1121. File "/u/makarov/returnn-hmm-fac/TFEngine.py", line 1090, in create_network
  1122.  line: network.construct_from_dict(net_dict)
  1123.  locals:
  1124. network = <local> <TFNetwork 'root' train=False search>
  1125. network.construct_from_dict = <local> <bound method TFNetwork.construct_from_dict of <TFNetwork 'root' train=False search>>
  1126. net_dict = <local> {'enc_05_self_att_att': {'total_key_dim': 512, 'from': ['enc_05_self_att_laynorm'], 'class': 'self_attention', 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)", 'num_heads': 8, 'attention_left_only': False, 'attention_dropout': 0.1, 'n_out'..., len = 97
  1127. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 338, in construct_from_dict
  1128.  line: self.construct_layer(net_dict, name)
  1129.  locals:
  1130. self = <local> <TFNetwork 'root' train=False search>
  1131. self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=False search>>
  1132. net_dict = <local> {'enc_05_self_att_att': {'total_key_dim': 512, 'from': ['enc_05_self_att_laynorm'], 'class': 'self_attention', 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)", 'num_heads': 8, 'attention_left_only': False, 'attention_dropout': 0.1, 'n_out'..., len = 97
  1133. name = <local> 'decision', len = 8
  1134. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 407, in construct_layer
  1135.  line: layer_class.transform_config_dict(layer_desc, network=self, get_layer=get_layer)
  1136.  locals:
  1137. layer_class = <local> <class 'TFNetworkRecLayer.DecideLayer'>
  1138. layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'TFNetworkRecLayer.DecideLayer'>>
  1139. layer_desc = <local> {'loss_opts': {}, 'target': 'classes', 'loss': 'edit_distance'}
  1140. network = <not found>
  1141. self = <local> <TFNetwork 'root' train=False search>
  1142. get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7fe7603061e0>
  1143. File "/u/makarov/returnn-hmm-fac/TFNetworkLayer.py", line 358, in transform_config_dict
  1144.  line: for src_name in src_names
  1145.  locals:
  1146. src_name = <not found>
  1147. src_names = <local> ['output'], _[0]: {len = 6}
  1148. File "/u/makarov/returnn-hmm-fac/TFNetworkLayer.py", line 359, in <listcomp>
  1149.  line: d["sources"] = [
  1150. get_layer(src_name)
  1151. for src_name in src_names
  1152. if not src_name == "none"]
  1153.  locals:
  1154. d = <not found>
  1155. get_layer = <local> <function TFNetwork.construct_layer.<locals>.get_layer at 0x7fe7603061e0>
  1156. src_name = <local> 'output', len = 6
  1157. src_names = <not found>
  1158. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 397, in get_layer
  1159.  line: return self.construct_layer(net_dict=net_dict, name=src_name)
  1160.  locals:
  1161. self = <local> <TFNetwork 'root' train=False search>
  1162. self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=False search>>
  1163. net_dict = <local> {'enc_05_self_att_att': {'total_key_dim': 512, 'from': ['enc_05_self_att_laynorm'], 'class': 'self_attention', 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)", 'num_heads': 8, 'attention_left_only': False, 'attention_dropout': 0.1, 'n_out'..., len = 97
  1164. name = <not found>
  1165. src_name = <local> 'output', len = 6
  1166. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 410, in construct_layer
  1167.  line: return add_layer(name=name, layer_class=layer_class, **layer_desc)
  1168.  locals:
  1169. add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=False search>>
  1170. name = <local> 'output', len = 6
  1171. layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
  1172. layer_desc = <local> {'max_seq_len': <tf.Tensor 'mul:0' shape=() dtype=int32>, 'unit': {'dec_06_att_out': {'from': ['dec_06_self_att_out', 'dec_06_att_drop'], 'kind': 'add', 'class': 'combine', 'n_out': 512}, 'dec_05_att_weights_drop': {'dropout_noise_shape': {'*': None}, 'from': ['dec_05_att_weights'], 'class': 'dro...
  1173. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 497, in add_layer
  1174.  line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
  1175.  locals:
  1176. layer = <not found>
  1177. self = <local> <TFNetwork 'root' train=False search>
  1178. self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=False search>>
  1179. name = <local> 'output', len = 6
  1180. layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
  1181. layer_desc = <local> {'unit': {'dec_06_att_out': {'from': ['dec_06_self_att_out', 'dec_06_att_drop'], 'kind': 'add', 'class': 'combine', 'n_out': 512}, 'dec_05_att_weights_drop': {'dropout_noise_shape': {'*': None}, 'from': ['dec_05_att_weights'], 'class': 'dropout', 'dropout': 0.1}, 'dec_01_att_energy': {'from': ['b...
  1182. File "/u/makarov/returnn-hmm-fac/TFNetwork.py", line 456, in _create_layer
  1183.  line: layer = layer_class(**layer_desc)
  1184.  locals:
  1185. layer = <not found>
  1186. layer_class = <local> <class 'TFNetworkRecLayer.RecLayer'>
  1187. layer_desc = <local> {'max_seq_len': <tf.Tensor 'mul:0' shape=() dtype=int32>, 'network': <TFNetwork 'root' train=False search>, 'name': 'output', 'unit': {'dec_06_att_out': {'from': ['dec_06_self_att_out', 'dec_06_att_drop'], 'kind': 'add', 'class': 'combine', 'n_out': 512}, 'dec_05_att_weights_drop': {'dropout_nois..., len = 8
  1188. File "/u/makarov/returnn-hmm-fac/TFNetworkRecLayer.py", line 179, in __init__
  1189.  line: y = self._get_output_subnet_unit(self.cell)
  1190.  locals:
  1191. y = <not found>
  1192. self = <local> <RecLayer 'output' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)>
  1193. self._get_output_subnet_unit = <local> <bound method RecLayer._get_output_subnet_unit of <RecLayer 'output' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)>>
  1194. self.cell = <local> <TFNetworkRecLayer._SubnetworkRecCell object at 0x7fe37643e7b8>
  1195. File "/u/makarov/returnn-hmm-fac/TFNetworkRecLayer.py", line 703, in _get_output_subnet_unit
  1196.  line: output, search_choices = cell.get_output(rec_layer=self)
  1197.  locals:
  1198. output = <not found>
  1199. search_choices = <not found>
  1200. cell = <local> <TFNetworkRecLayer._SubnetworkRecCell object at 0x7fe37643e7b8>
  1201. cell.get_output = <local> <bound method _SubnetworkRecCell.get_output of <TFNetworkRecLayer._SubnetworkRecCell object at 0x7fe37643e7b8>>
  1202. rec_layer = <not found>
  1203. self = <local> <RecLayer 'output' out_type=Data(shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)>
  1204. File "/u/makarov/returnn-hmm-fac/TFNetworkRecLayer.py", line 1459, in get_output
  1205.  line: assert fixed_seq_len is not None
  1206.  locals:
  1207. fixed_seq_len = <local> None
  1208. AssertionError
  1209. Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140634717132544)>, proc 22317.
  1210.  
  1211. Thread current, main, <_MainThread(MainThread, started 140634717132544)>:
  1212. (Excluded thread.)
  1213.  
  1214. That were all threads.
Advertisement
Add Comment
Please, Sign In to add comment