Untitled

** beam_size 12
RETURNN starting up, version 20181130.185608--git-d30181f, date/time 2018-12-01-09-58-31 (UTC+0100), pid 22317, cwd /work/smt2/makarov/NMT, Python /usr/bin/python3
RETURNN command line options: ['hmm-factorization/en-de/transformer-hmm', '++load_epoch', '114', '++device', 'gpu', '--task', 'search', '++search_data', 'config:dev', '++beam_size', '12', '++need_data', 'False', '++max_seq_length', '0', '++search_output_file', 'hmm-factorization/en-de/hyp/transformer-hmm', '++batch_size', '2000']
Hostname: cluster-cn-258
TensorFlow: 1.9.0 (v1.9.0-0-g25c197e023) (<site-package> in /u/makarov/.local/lib/python3.5/site-packages/tensorflow)
Setup TF inter and intra global thread pools, num_threads None, session opts {'device_count': {'GPU': 0}, 'log_device_placement': False}.
2018-12-01 09:58:32.562859: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-01 09:58:32.978259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.92GiB freeMemory: 10.76GiB
2018-12-01 09:58:32.978317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-12-01 09:58:32.978337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-01 09:58:32.978348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0
2018-12-01 09:58:32.978358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N
CUDA_VISIBLE_DEVICES is set to '0'.
2018-12-01 09:58:33.282635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-12-01 09:58:33.828527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-01 09:58:33.828579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0
2018-12-01 09:58:33.828588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N
2018-12-01 09:58:33.828955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
Collecting TensorFlow device list...
Local devices available to TensorFlow:
  1/2: name: "/device:CPU:0"
       device_type: "CPU"
       memory_limit: 268435456
       locality {
       }
       incarnation: 616944120252845792
  2/2: name: "/device:GPU:0"
       device_type: "GPU"
       memory_limit: 10915220685
       locality {
         bus_id: 1
         links {
         }
       }
       incarnation: 955148772328989222
       physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1"
Using gpu device 0: GeForce GTX 1080 Ti
Setup tf.Session with options {'device_count': {'GPU': 1}, 'log_device_placement': False} ...
2018-12-01 09:58:38.902018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2018-12-01 09:58:38.902091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-01 09:58:38.902105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0
2018-12-01 09:58:38.902115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N
2018-12-01 09:58:38.902372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
layer root/'data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=46300)
layer root/'source_embed_raw' output: Data(name='source_embed_raw_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'source_embed_raw': <tf.Tensor 'source_embed_raw/linear/embedding_lookup:0' shape=(?, ?, 512) dtype=float32>
layer root/'source_embed_weighted' output: Data(name='source_embed_weighted_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'source_embed_weighted': <tf.Tensor 'source_embed_weighted/mul:0' shape=(?, ?, 512) dtype=float32>
layer root/'source_embed_with_pos' output: Data(name='source_embed_with_pos_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'source_embed_with_pos': <tf.Tensor 'source_embed_with_pos/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'source_embed' output: Data(name='source_embed_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'source_embed': <tf.Tensor 'source_embed_with_pos/source_embed_with_pos_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_self_att_laynorm' output: Data(name='enc_01_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_laynorm': <tf.Tensor 'enc_01_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_self_att_att' output: Data(name='enc_01_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_att': <tf.Tensor 'enc_01_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_self_att_lin' output: Data(name='enc_01_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_lin': <tf.Tensor 'enc_01_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_self_att_drop' output: Data(name='enc_01_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_drop': <tf.Tensor 'enc_01_self_att_lin/enc_01_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_self_att_out' output: Data(name='enc_01_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_out': <tf.Tensor 'enc_01_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_ff_laynorm' output: Data(name='enc_01_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_ff_laynorm': <tf.Tensor 'enc_01_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_ff_conv1' output: Data(name='enc_01_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv1': <tf.Tensor 'enc_01_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_01_ff_conv2' output: Data(name='enc_01_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv2': <tf.Tensor 'enc_01_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_ff_drop' output: Data(name='enc_01_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_ff_drop': <tf.Tensor 'enc_01_ff_conv2/enc_01_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01_ff_out' output: Data(name='enc_01_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01_ff_out': <tf.Tensor 'enc_01_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_01' output: Data(name='enc_01_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_01': <tf.Tensor 'enc_01_ff_out/enc_01_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_self_att_laynorm' output: Data(name='enc_02_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_laynorm': <tf.Tensor 'enc_02_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_self_att_att' output: Data(name='enc_02_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_att': <tf.Tensor 'enc_02_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_self_att_lin' output: Data(name='enc_02_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_lin': <tf.Tensor 'enc_02_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_self_att_drop' output: Data(name='enc_02_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_drop': <tf.Tensor 'enc_02_self_att_lin/enc_02_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_self_att_out' output: Data(name='enc_02_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_out': <tf.Tensor 'enc_02_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_ff_laynorm' output: Data(name='enc_02_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_ff_laynorm': <tf.Tensor 'enc_02_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_ff_conv1' output: Data(name='enc_02_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv1': <tf.Tensor 'enc_02_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_02_ff_conv2' output: Data(name='enc_02_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv2': <tf.Tensor 'enc_02_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_ff_drop' output: Data(name='enc_02_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_ff_drop': <tf.Tensor 'enc_02_ff_conv2/enc_02_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02_ff_out' output: Data(name='enc_02_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02_ff_out': <tf.Tensor 'enc_02_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_02' output: Data(name='enc_02_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_02': <tf.Tensor 'enc_02_ff_out/enc_02_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_self_att_laynorm' output: Data(name='enc_03_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_laynorm': <tf.Tensor 'enc_03_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_self_att_att' output: Data(name='enc_03_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_att': <tf.Tensor 'enc_03_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_self_att_lin' output: Data(name='enc_03_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_lin': <tf.Tensor 'enc_03_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_self_att_drop' output: Data(name='enc_03_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_drop': <tf.Tensor 'enc_03_self_att_lin/enc_03_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_self_att_out' output: Data(name='enc_03_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_out': <tf.Tensor 'enc_03_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_ff_laynorm' output: Data(name='enc_03_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_ff_laynorm': <tf.Tensor 'enc_03_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_ff_conv1' output: Data(name='enc_03_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv1': <tf.Tensor 'enc_03_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_03_ff_conv2' output: Data(name='enc_03_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv2': <tf.Tensor 'enc_03_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_ff_drop' output: Data(name='enc_03_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_ff_drop': <tf.Tensor 'enc_03_ff_conv2/enc_03_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03_ff_out' output: Data(name='enc_03_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03_ff_out': <tf.Tensor 'enc_03_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_03' output: Data(name='enc_03_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_03': <tf.Tensor 'enc_03_ff_out/enc_03_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_self_att_laynorm' output: Data(name='enc_04_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_laynorm': <tf.Tensor 'enc_04_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_self_att_att' output: Data(name='enc_04_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_att': <tf.Tensor 'enc_04_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_self_att_lin' output: Data(name='enc_04_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_lin': <tf.Tensor 'enc_04_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_self_att_drop' output: Data(name='enc_04_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_drop': <tf.Tensor 'enc_04_self_att_lin/enc_04_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_self_att_out' output: Data(name='enc_04_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_out': <tf.Tensor 'enc_04_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_ff_laynorm' output: Data(name='enc_04_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_ff_laynorm': <tf.Tensor 'enc_04_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_ff_conv1' output: Data(name='enc_04_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv1': <tf.Tensor 'enc_04_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_04_ff_conv2' output: Data(name='enc_04_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv2': <tf.Tensor 'enc_04_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_ff_drop' output: Data(name='enc_04_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_ff_drop': <tf.Tensor 'enc_04_ff_conv2/enc_04_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04_ff_out' output: Data(name='enc_04_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04_ff_out': <tf.Tensor 'enc_04_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_04' output: Data(name='enc_04_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_04': <tf.Tensor 'enc_04_ff_out/enc_04_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_self_att_laynorm' output: Data(name='enc_05_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_laynorm': <tf.Tensor 'enc_05_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_self_att_att' output: Data(name='enc_05_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_att': <tf.Tensor 'enc_05_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_self_att_lin' output: Data(name='enc_05_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_lin': <tf.Tensor 'enc_05_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_self_att_drop' output: Data(name='enc_05_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_drop': <tf.Tensor 'enc_05_self_att_lin/enc_05_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_self_att_out' output: Data(name='enc_05_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_out': <tf.Tensor 'enc_05_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_ff_laynorm' output: Data(name='enc_05_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_ff_laynorm': <tf.Tensor 'enc_05_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_ff_conv1' output: Data(name='enc_05_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv1': <tf.Tensor 'enc_05_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_05_ff_conv2' output: Data(name='enc_05_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv2': <tf.Tensor 'enc_05_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_ff_drop' output: Data(name='enc_05_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_ff_drop': <tf.Tensor 'enc_05_ff_conv2/enc_05_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05_ff_out' output: Data(name='enc_05_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05_ff_out': <tf.Tensor 'enc_05_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_05' output: Data(name='enc_05_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_05': <tf.Tensor 'enc_05_ff_out/enc_05_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_self_att_laynorm' output: Data(name='enc_06_self_att_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_laynorm': <tf.Tensor 'enc_06_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_self_att_att' output: Data(name='enc_06_self_att_att_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_att': <tf.Tensor 'enc_06_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_self_att_lin' output: Data(name='enc_06_self_att_lin_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_lin': <tf.Tensor 'enc_06_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_self_att_drop' output: Data(name='enc_06_self_att_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_drop': <tf.Tensor 'enc_06_self_att_lin/enc_06_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_self_att_out' output: Data(name='enc_06_self_att_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_out': <tf.Tensor 'enc_06_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_ff_laynorm' output: Data(name='enc_06_ff_laynorm_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_ff_laynorm': <tf.Tensor 'enc_06_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_ff_conv1' output: Data(name='enc_06_ff_conv1_output', shape=(None, 2048))
debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv1': <tf.Tensor 'enc_06_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
layer root/'enc_06_ff_conv2' output: Data(name='enc_06_ff_conv2_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv2': <tf.Tensor 'enc_06_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_ff_drop' output: Data(name='enc_06_ff_drop_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_ff_drop': <tf.Tensor 'enc_06_ff_conv2/enc_06_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06_ff_out' output: Data(name='enc_06_ff_out_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06_ff_out': <tf.Tensor 'enc_06_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
layer root/'enc_06' output: Data(name='enc_06_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'enc_06': <tf.Tensor 'enc_06_ff_out/enc_06_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
layer root/'encoder' output: Data(name='encoder_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'encoder': <tf.Tensor 'encoder/add:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_01_att_key0' output: Data(name='dec_01_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_01_att_key0': <tf.Tensor 'dec_01_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_01_att_key' output: Data(name='dec_01_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_01_att_key': <tf.Tensor 'dec_01_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_03_att_key0' output: Data(name='dec_03_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_03_att_key0': <tf.Tensor 'dec_03_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_03_att_key' output: Data(name='dec_03_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_03_att_key': <tf.Tensor 'dec_03_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_04_att_value0' output: Data(name='dec_04_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_04_att_value0': <tf.Tensor 'dec_04_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_04_att_value' output: Data(name='dec_04_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_04_att_value': <tf.Tensor 'dec_04_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_06_att_key0' output: Data(name='dec_06_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_06_att_key0': <tf.Tensor 'dec_06_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_06_att_key' output: Data(name='dec_06_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_06_att_key': <tf.Tensor 'dec_06_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_06_att_value0' output: Data(name='dec_06_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_06_att_value0': <tf.Tensor 'dec_06_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_06_att_value' output: Data(name='dec_06_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_06_att_value': <tf.Tensor 'dec_06_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_03_att_value0' output: Data(name='dec_03_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_03_att_value0': <tf.Tensor 'dec_03_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_03_att_value' output: Data(name='dec_03_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_03_att_value': <tf.Tensor 'dec_03_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_05_att_value0' output: Data(name='dec_05_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_05_att_value0': <tf.Tensor 'dec_05_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_05_att_value' output: Data(name='dec_05_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_05_att_value': <tf.Tensor 'dec_05_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_02_att_key0' output: Data(name='dec_02_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_02_att_key0': <tf.Tensor 'dec_02_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_02_att_key' output: Data(name='dec_02_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_02_att_key': <tf.Tensor 'dec_02_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_01_att_value0' output: Data(name='dec_01_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_01_att_value0': <tf.Tensor 'dec_01_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_01_att_value' output: Data(name='dec_01_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_01_att_value': <tf.Tensor 'dec_01_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_04_att_key0' output: Data(name='dec_04_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_04_att_key0': <tf.Tensor 'dec_04_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_04_att_key' output: Data(name='dec_04_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_04_att_key': <tf.Tensor 'dec_04_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_05_att_key0' output: Data(name='dec_05_att_key0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_05_att_key0': <tf.Tensor 'dec_05_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_05_att_key' output: Data(name='dec_05_att_key_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_05_att_key': <tf.Tensor 'dec_05_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'dec_02_att_value0' output: Data(name='dec_02_att_value0_output', shape=(None, 512))
debug_add_check_numerics_on_output: add for layer 'dec_02_att_value0': <tf.Tensor 'dec_02_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
layer root/'dec_02_att_value' output: Data(name='dec_02_att_value_output', shape=(None, 8, 64))
debug_add_check_numerics_on_output: add for layer 'dec_02_att_value': <tf.Tensor 'dec_02_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)
Rec layer sub net:
  Input layers moved out of loop: (#: 1)
    encoder_int
  Output layers moved out of loop: (#: 0)
    None
  Layers in loop: (#: 142)
    end
    output
    output_prob
    dec_06_att_weights
    dec_06_att_energy
    dec_06_att_query
    dec_06_att_query0
    dec_06_att_laynorm
    dec_06_self_att_out
    dec_05
    dec_05_ff_out
    dec_05_att_out
    dec_05_att_drop
    dec_05_att_lin
    dec_05_att_att
    dec_05_att0
    dec_05_att_weights_drop
    dec_05_att_weights
    dec_05_att_energy
    dec_05_att_query
    dec_05_att_query0
    dec_05_att_laynorm
    dec_05_self_att_out
    dec_04
    dec_04_ff_out
    dec_04_att_out
    dec_04_att_drop
    dec_04_att_lin
    dec_04_att_att
    dec_04_att0
    dec_04_att_weights_drop
    dec_04_att_weights
    dec_04_att_energy
    dec_04_att_query
    dec_04_att_query0
    dec_04_att_laynorm
    dec_04_self_att_out
    dec_03
    dec_03_ff_out
    dec_03_att_out
    dec_03_att_drop
    dec_03_att_lin
    dec_03_att_att
    dec_03_att0
    dec_03_att_weights_drop
    dec_03_att_weights
    dec_03_att_energy
    dec_03_att_query
    dec_03_att_query0
    dec_03_att_laynorm
    dec_03_self_att_out
    dec_02
    dec_02_ff_out
    dec_02_att_out
    dec_02_att_drop
    dec_02_att_lin
    dec_02_att_att
    dec_02_att0
    dec_02_att_weights_drop
    dec_02_att_weights
    dec_02_att_energy
    dec_02_att_query
    dec_02_att_query0
    dec_02_att_laynorm
    dec_02_self_att_out
    dec_01
    dec_01_ff_out
    dec_01_att_out
    dec_01_att_drop
    dec_01_att_lin
    dec_01_att_att
    dec_01_att0
    dec_01_att_weights_drop
    dec_01_att_weights
    dec_01_att_energy
    dec_01_att_query
    dec_01_att_query0
    dec_01_att_laynorm
    dec_01_self_att_out
    dec_01_self_att_drop
    dec_01_self_att_lin
    dec_01_self_att_att
    dec_01_self_att_laynorm
    target_embed
    target_embed_with_pos
    target_embed_weighted
    target_embed_raw
    dec_01_ff_drop
    dec_01_ff_conv2
    dec_01_ff_conv1
    dec_01_ff_laynorm
    dec_02_self_att_drop
    dec_02_self_att_lin
    dec_02_self_att_att
    dec_02_self_att_laynorm
    dec_02_ff_drop
    dec_02_ff_conv2
    dec_02_ff_conv1
    dec_02_ff_laynorm
    dec_03_self_att_drop
    dec_03_self_att_lin
    dec_03_self_att_att
    dec_03_self_att_laynorm
    dec_03_ff_drop
    dec_03_ff_conv2
    dec_03_ff_conv1
    dec_03_ff_laynorm
    dec_04_self_att_drop
    dec_04_self_att_lin
    dec_04_self_att_att
    dec_04_self_att_laynorm
    dec_04_ff_drop
    dec_04_ff_conv2
    dec_04_ff_conv1
    dec_04_ff_laynorm
    dec_05_self_att_drop
    dec_05_self_att_lin
    dec_05_self_att_att
    dec_05_self_att_laynorm
    dec_05_ff_drop
    dec_05_ff_conv2
    dec_05_ff_conv1
    dec_05_ff_laynorm
    dec_06_self_att_drop
    dec_06_self_att_lin
    dec_06_self_att_att
    dec_06_self_att_laynorm
    decoder_int
    decoder
    dec_06
    dec_06_ff_out
    dec_06_att_out
    dec_06_att_drop
    dec_06_att_lin
    dec_06_att_att
    dec_06_att0
    dec_06_att_weights_drop
    dec_06_ff_drop
    dec_06_ff_conv2
    dec_06_ff_conv1
    dec_06_ff_laynorm
    prev_outputs_int
  Unused layers: (#: 0)
    None
layer root/output:rec-subnet-input/'encoder_int' output: Data(name='encoder_int_output', shape=(None, 1000))
debug_add_check_numerics_on_output: add for layer 'encoder_int': <tf.Tensor 'output/rec/encoder_int/linear/dot/Reshape_1:0' shape=(?, ?, 1000) dtype=float32>
Exception creating layer root/'output' of class RecLayer with opts:
{'max_seq_len': <tf.Tensor 'mul:0' shape=() dtype=int32>,
 'n_out': None,
 'name': 'output',
 'network': <TFNetwork 'root' train=False search>,
 'output': Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12),
 'sources': [],
 'target': 'classes',
 'unit': {'dec_01': {'class': 'copy', 'from': ['dec_01_ff_out']},
          'dec_01_att0': {'base': 'base:dec_01_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_01_att_weights_drop'},
          'dec_01_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_01_att0']},
          'dec_01_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_01_att_lin']},
          'dec_01_att_energy': {'class': 'dot',
                                'from': ['base:dec_01_att_key',
                                         'dec_01_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_01_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_01_self_att_out']},
          'dec_01_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_01_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_01_att_out': {'class': 'combine',
                             'from': ['dec_01_self_att_out', 'dec_01_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_01_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_01_att_query0']},
          'dec_01_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_01_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_01_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_01_att_energy']},
          'dec_01_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_01_att_weights']},
          'dec_01_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_01_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_01_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_01_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_01_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_01_ff_conv2']},
          'dec_01_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_01_att_out']},
          'dec_01_ff_out': {'class': 'combine',
                            'from': ['dec_01_att_out', 'dec_01_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_01_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_01_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_01_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_01_self_att_lin']},
          'dec_01_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['target_embed']},
          'dec_01_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_01_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_01_self_att_out': {'class': 'combine',
                                  'from': ['target_embed',
                                           'dec_01_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'dec_02': {'class': 'copy', 'from': ['dec_02_ff_out']},
          'dec_02_att0': {'base': 'base:dec_02_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_02_att_weights_drop'},
          'dec_02_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_02_att0']},
          'dec_02_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_02_att_lin']},
          'dec_02_att_energy': {'class': 'dot',
                                'from': ['base:dec_02_att_key',
                                         'dec_02_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_02_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_02_self_att_out']},
          'dec_02_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_02_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_02_att_out': {'class': 'combine',
                             'from': ['dec_02_self_att_out', 'dec_02_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_02_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_02_att_query0']},
          'dec_02_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_02_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_02_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_02_att_energy']},
          'dec_02_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_02_att_weights']},
          'dec_02_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_02_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_02_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_02_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_02_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_02_ff_conv2']},
          'dec_02_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_02_att_out']},
          'dec_02_ff_out': {'class': 'combine',
                            'from': ['dec_02_att_out', 'dec_02_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_02_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_02_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_02_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_02_self_att_lin']},
          'dec_02_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['dec_01']},
          'dec_02_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_02_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_02_self_att_out': {'class': 'combine',
                                  'from': ['dec_01', 'dec_02_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'dec_03': {'class': 'copy', 'from': ['dec_03_ff_out']},
          'dec_03_att0': {'base': 'base:dec_03_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_03_att_weights_drop'},
          'dec_03_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_03_att0']},
          'dec_03_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_03_att_lin']},
          'dec_03_att_energy': {'class': 'dot',
                                'from': ['base:dec_03_att_key',
                                         'dec_03_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_03_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_03_self_att_out']},
          'dec_03_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_03_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_03_att_out': {'class': 'combine',
                             'from': ['dec_03_self_att_out', 'dec_03_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_03_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_03_att_query0']},
          'dec_03_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_03_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_03_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_03_att_energy']},
          'dec_03_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_03_att_weights']},
          'dec_03_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_03_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_03_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_03_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_03_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_03_ff_conv2']},
          'dec_03_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_03_att_out']},
          'dec_03_ff_out': {'class': 'combine',
                            'from': ['dec_03_att_out', 'dec_03_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_03_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_03_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_03_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_03_self_att_lin']},
          'dec_03_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['dec_02']},
          'dec_03_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_03_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_03_self_att_out': {'class': 'combine',
                                  'from': ['dec_02', 'dec_03_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'dec_04': {'class': 'copy', 'from': ['dec_04_ff_out']},
          'dec_04_att0': {'base': 'base:dec_04_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_04_att_weights_drop'},
          'dec_04_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_04_att0']},
          'dec_04_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_04_att_lin']},
          'dec_04_att_energy': {'class': 'dot',
                                'from': ['base:dec_04_att_key',
                                         'dec_04_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_04_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_04_self_att_out']},
          'dec_04_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_04_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_04_att_out': {'class': 'combine',
                             'from': ['dec_04_self_att_out', 'dec_04_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_04_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_04_att_query0']},
          'dec_04_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_04_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_04_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_04_att_energy']},
          'dec_04_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_04_att_weights']},
          'dec_04_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_04_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_04_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_04_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_04_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_04_ff_conv2']},
          'dec_04_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_04_att_out']},
          'dec_04_ff_out': {'class': 'combine',
                            'from': ['dec_04_att_out', 'dec_04_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_04_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_04_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_04_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_04_self_att_lin']},
          'dec_04_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['dec_03']},
          'dec_04_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_04_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_04_self_att_out': {'class': 'combine',
                                  'from': ['dec_03', 'dec_04_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'dec_05': {'class': 'copy', 'from': ['dec_05_ff_out']},
          'dec_05_att0': {'base': 'base:dec_05_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_05_att_weights_drop'},
          'dec_05_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_05_att0']},
          'dec_05_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_05_att_lin']},
          'dec_05_att_energy': {'class': 'dot',
                                'from': ['base:dec_05_att_key',
                                         'dec_05_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_05_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_05_self_att_out']},
          'dec_05_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_05_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_05_att_out': {'class': 'combine',
                             'from': ['dec_05_self_att_out', 'dec_05_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_05_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_05_att_query0']},
          'dec_05_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_05_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_05_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_05_att_energy']},
          'dec_05_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_05_att_weights']},
          'dec_05_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_05_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_05_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_05_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_05_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_05_ff_conv2']},
          'dec_05_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_05_att_out']},
          'dec_05_ff_out': {'class': 'combine',
                            'from': ['dec_05_att_out', 'dec_05_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_05_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_05_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_05_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_05_self_att_lin']},
          'dec_05_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['dec_04']},
          'dec_05_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_05_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_05_self_att_out': {'class': 'combine',
                                  'from': ['dec_04', 'dec_05_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'dec_06': {'class': 'copy', 'from': ['dec_06_ff_out']},
          'dec_06_att0': {'base': 'base:dec_06_att_value',
                          'class': 'generic_attention',
                          'weights': 'dec_06_att_weights_drop'},
          'dec_06_att_att': {'axes': 'static',
                             'class': 'merge_dims',
                             'from': ['dec_06_att0']},
          'dec_06_att_drop': {'class': 'dropout',
                              'dropout': 0.1,
                              'from': ['dec_06_att_lin']},
          'dec_06_att_energy': {'class': 'dot',
                                'from': ['base:dec_06_att_key',
                                         'dec_06_att_query'],
                                'red1': -1,
                                'red2': -1,
                                'var1': 'T',
                                'var2': 'T?'},
          'dec_06_att_laynorm': {'class': 'layer_norm',
                                 'from': ['dec_06_self_att_out']},
          'dec_06_att_lin': {'activation': None,
                             'class': 'linear',
                             'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                     "distribution='uniform', "
                                                     'scale=0.78)',
                             'from': ['dec_06_att_att'],
                             'n_out': 512,
                             'with_bias': False},
          'dec_06_att_out': {'class': 'combine',
                             'from': ['dec_06_self_att_out', 'dec_06_att_drop'],
                             'kind': 'add',
                             'n_out': 512},
          'dec_06_att_query': {'axis': 'F',
                               'class': 'split_dims',
                               'dims': (8, 64),
                               'from': ['dec_06_att_query0']},
          'dec_06_att_query0': {'activation': None,
                                'class': 'linear',
                                'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                        "distribution='uniform', "
                                                        'scale=0.78)',
                                'from': ['dec_06_att_laynorm'],
                                'n_out': 512,
                                'with_bias': False},
          'dec_06_att_weights': {'class': 'softmax_over_spatial',
                                 'energy_factor': 0.125,
                                 'from': ['dec_06_att_energy']},
          'dec_06_att_weights_drop': {'class': 'dropout',
                                      'dropout': 0.1,
                                      'dropout_noise_shape': {'*': None},
                                      'from': ['dec_06_att_weights']},
          'dec_06_ff_conv1': {'activation': 'relu',
                              'class': 'linear',
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_06_ff_laynorm'],
                              'n_out': 2048,
                              'with_bias': True},
          'dec_06_ff_conv2': {'activation': None,
                              'class': 'linear',
                              'dropout': 0.1,
                              'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                      "distribution='uniform', "
                                                      'scale=0.78)',
                              'from': ['dec_06_ff_conv1'],
                              'n_out': 512,
                              'with_bias': True},
          'dec_06_ff_drop': {'class': 'dropout',
                             'dropout': 0.1,
                             'from': ['dec_06_ff_conv2']},
          'dec_06_ff_laynorm': {'class': 'layer_norm',
                                'from': ['dec_06_att_out']},
          'dec_06_ff_out': {'class': 'combine',
                            'from': ['dec_06_att_out', 'dec_06_ff_drop'],
                            'kind': 'add',
                            'n_out': 512},
          'dec_06_self_att_att': {'attention_dropout': 0.1,
                                  'attention_left_only': True,
                                  'class': 'self_attention',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_06_self_att_laynorm'],
                                  'n_out': 512,
                                  'num_heads': 8,
                                  'total_key_dim': 512},
          'dec_06_self_att_drop': {'class': 'dropout',
                                   'dropout': 0.1,
                                   'from': ['dec_06_self_att_lin']},
          'dec_06_self_att_laynorm': {'class': 'layer_norm',
                                      'from': ['dec_05']},
          'dec_06_self_att_lin': {'activation': None,
                                  'class': 'linear',
                                  'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                          "distribution='uniform', "
                                                          'scale=0.78)',
                                  'from': ['dec_06_self_att_att'],
                                  'n_out': 512,
                                  'with_bias': False},
          'dec_06_self_att_out': {'class': 'combine',
                                  'from': ['dec_05', 'dec_06_self_att_drop'],
                                  'kind': 'add',
                                  'n_out': 512},
          'decoder': {'class': 'layer_norm', 'from': ['dec_06'], 'n_out': 512},
          'decoder_int': {'activation': None,
                          'class': 'linear',
                          'from': ['decoder'],
                          'n_out': 1000,
                          'with_bias': False},
          'encoder_int': {'activation': None,
                          'class': 'linear',
                          'from': ['base:encoder'],
                          'n_out': 1000,
                          'with_bias': False},
          'end': {'class': 'compare', 'from': ['output'], 'value': 0},
          'output': {'beam_size': 12,
                     'class': 'choice',
                     'from': ['output_prob'],
                     'initial_output': 0,
                     'target': 'classes'},
          'output_prob': {'attention_weights': 'dec_06_att_weights',
                          'base_encoder_transformed': 'encoder_int',
                          'class': 'hmm_factorization',
                          'debug': False,
                          'from': 'dec_06_att_weights',
                          'loss': 'ce',
                          'n_out': 34908,
                          'prev_outputs': 'prev_outputs_int',
                          'prev_state': 'decoder_int',
                          'target': 'classes',
                          'threshold': None,
                          'transpose_and_average_att_weights': True},
          'prev_outputs_int': {'activation': None,
                               'class': 'linear',
                               'from': ['prev:target_embed_raw'],
                               'n_out': 1000,
                               'with_bias': False},
          'target_embed': {'class': 'dropout',
                           'dropout': 0.0,
                           'from': ['target_embed_with_pos']},
          'target_embed_raw': {'activation': None,
                               'class': 'linear',
                               'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
                                                       "distribution='uniform', "
                                                       'scale=0.78)',
                               'from': ['prev:output'],
                               'n_out': 512,
                               'with_bias': False},
          'target_embed_weighted': {'class': 'eval',
                                    'eval': 'source(0) * 22.627417',
                                    'from': ['target_embed_raw']},
          'target_embed_with_pos': {'add_to_input': True,
                                    'class': 'positional_encoding',
                                    'from': ['target_embed_weighted']}}}
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m591[0m, [34min [0m<module>
[34m    line: [0mmain[34m([0msys[34m.[0margv[34m)[0m
[34m    locals:[0m
      main [34;1m= [0m[34m<local> [0m[34m<[0mfunction main at 0x7fe7b0180bf8[34m>[0m
      sys [34;1m= [0m[34m<local> [0m[34m<[0mmodule [36m'sys'[0m [34m([0mbuilt[34m-[0m[34min[0m[34m)[0m[34m>[0m
      sys[34;1m.[0margv [34;1m= [0m[34m<local> [0m[34m[[0m[36m'/u/makarov/returnn-hmm-fac/rnn.py'[0m[34m,[0m [36m'hmm-factorization/en-de/transformer-hmm'[0m[34m,[0m [36m'++load_epoch'[0m[34m,[0m [36m'114'[0m[34m,[0m [36m'++device'[0m[34m,[0m [36m'gpu'[0m[34m,[0m [36m'--task'[0m[34m,[0m [36m'search'[0m[34m,[0m [36m'++search_data'[0m[34m,[0m [36m'config:dev'[0m[34m,[0m [36m'++beam_size'[0m[34m,[0m [36m'12'[0m[34m,[0m [36m'++need_data'[0m[34m,[0m [36m'False'[0m[34m,[0m [36m'++max_seq_length'[0m[34m,[0m [36m'0'[0m[34m,[0m [36m'++search_output_file'[0m[34m,[0m [36m'hmm-factorization/en-de/hyp/..., len = 20, _[0]: {len = 33}[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m579[0m, [34min [0mmain
[34m    line: [0mexecuteMainTask[34m([0m[34m)[0m
[34m    locals:[0m
      executeMainTask [34;1m= [0m[34m<global> [0m[34m<[0mfunction executeMainTask at 0x7fe7b0180ae8[34m>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m434[0m, [34min [0mexecuteMainTask
[34m    line: [0mengine[34m.[0minit_network_from_config[34m([0mconfig[34m)[0m
[34m    locals:[0m
      engine [34;1m= [0m[34m<global> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
      engine[34;1m.[0minit_network_from_config [34;1m= [0m[34m<global> [0m[34m<[0mbound method Engine[34m.[0minit_network_from_config of [34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m[34m>[0m
      config [34;1m= [0m[34m<global> [0m[34m<[0mConfig[34m.[0mConfig object at 0x7fe810aca080[34m>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m936[0m, [34min [0minit_network_from_config
[34m    line: [0mself[34m.[0m_init_network[34m([0mnet_desc[34m=[0mnet_dict[34m,[0m epoch[34m=[0mself[34m.[0mepoch[34m)[0m
[34m    locals:[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
      self[34;1m.[0m_init_network [34;1m= [0m[34m<local> [0m[34m<[0mbound method Engine[34m.[0m_init_network of [34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m[34m>[0m
      net_desc [34;1m= [0m[34m<not found>[0m
      net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
      epoch [34;1m= [0m[34m<local> [0m114
      self[34;1m.[0mepoch [34;1m= [0m[34m<local> [0m114
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m1059[0m, [34min [0m_init_network
[34m    line: [0mself[34m.[0mnetwork[34m,[0m self[34m.[0mupdater [34m=[0m self[34m.[0mcreate_network[34m([0m
            config[34m=[0mself[34m.[0mconfig[34m,[0m
            rnd_seed[34m=[0mnet_random_seed[34m,[0m
            train_flag[34m=[0mtrain_flag[34m,[0m eval_flag[34m=[0mself[34m.[0muse_eval_flag[34m,[0m search_flag[34m=[0mself[34m.[0muse_search_flag[34m,[0m
            initial_learning_rate[34m=[0mgetattr[34m([0mself[34m,[0m [36m"initial_learning_rate"[0m[34m,[0m [34mNone[0m[34m)[0m[34m,[0m
            net_dict[34m=[0mnet_desc[34m)[0m
[34m    locals:[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
      self[34;1m.[0mnetwork [34;1m= [0m[34m<local> [0m[34mNone[0m
      self[34;1m.[0mupdater [34;1m= [0m[34m<local> [0m[34mNone[0m
      self[34;1m.[0mcreate_network [34;1m= [0m[34m<local> [0m[34m<[0mbound method Engine[34m.[0mcreate_network of [34m<[0m[34mclass [0m[36m'TFEngine.Engine'[0m[34m>[0m[34m>[0m
      config [34;1m= [0m[34m<not found>[0m
      self[34;1m.[0mconfig [34;1m= [0m[34m<local> [0m[34m<[0mConfig[34m.[0mConfig object at 0x7fe810aca080[34m>[0m
      rnd_seed [34;1m= [0m[34m<not found>[0m
      net_random_seed [34;1m= [0m[34m<local> [0m114
      train_flag [34;1m= [0m[34m<local> [0m[34mFalse[0m
      eval_flag [34;1m= [0m[34m<not found>[0m
      self[34;1m.[0muse_eval_flag [34;1m= [0m[34m<local> [0m[34mTrue[0m
      search_flag [34;1m= [0m[34m<not found>[0m
      self[34;1m.[0muse_search_flag [34;1m= [0m[34m<local> [0m[34mTrue[0m
      initial_learning_rate [34;1m= [0m[34m<not found>[0m
      getattr [34;1m= [0m[34m<builtin> [0m[34m<[0mbuilt[34m-[0m[34min [0mfunction getattr[34m>[0m
      net_dict [34;1m= [0m[34m<not found>[0m
      net_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m1090[0m, [34min [0mcreate_network
[34m    line: [0mnetwork[34m.[0mconstruct_from_dict[34m([0mnet_dict[34m)[0m
[34m    locals:[0m
      network [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
      network[34;1m.[0mconstruct_from_dict [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_from_dict of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
      net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m338[0m, [34min [0mconstruct_from_dict
[34m    line: [0mself[34m.[0mconstruct_layer[34m([0mnet_dict[34m,[0m name[34m)[0m
[34m    locals:[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
      self[34;1m.[0mconstruct_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
      net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
      name [34;1m= [0m[34m<local> [0m[36m'decision'[0m[34m,[0m len [34m=[0m 8
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m407[0m, [34min [0mconstruct_layer
[34m    line: [0mlayer_class[34m.[0mtransform_config_dict[34m([0mlayer_desc[34m,[0m network[34m=[0mself[34m,[0m get_layer[34m=[0mget_layer[34m)[0m
[34m    locals:[0m
      layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.DecideLayer'[0m[34m>[0m
      layer_class[34;1m.[0mtransform_config_dict [34;1m= [0m[34m<local> [0m[34m<[0mbound method LayerBase[34m.[0mtransform_config_dict of [34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.DecideLayer'[0m[34m>[0m[34m>[0m
      layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'loss_opts'[0m[34m:[0m [34m{[0m[34m}[0m[34m,[0m [36m'target'[0m[34m:[0m [36m'classes'[0m[34m,[0m [36m'loss'[0m[34m:[0m [36m'edit_distance'[0m[34m}[0m
      network [34;1m= [0m[34m<not found>[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
      get_layer [34;1m= [0m[34m<local> [0m[34m<[0mfunction TFNetwork[34m.[0mconstruct_layer[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0mget_layer at 0x7fe7603061e0[34m>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkLayer.py[0m[36m"[0m, [34mline [0m[35m358[0m, [34min [0mtransform_config_dict
[34m    line: [0m[34mfor [0msrc_name [34min [0msrc_names
[34m    locals:[0m
      src_name [34;1m= [0m[34m<not found>[0m
      src_names [34;1m= [0m[34m<local> [0m[34m[[0m[36m'output'[0m[34m][0m[34m,[0m _[34m[[0m0[34m][0m[34m:[0m [34m{[0mlen [34m=[0m 6[34m}[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkLayer.py[0m[36m"[0m, [34mline [0m[35m359[0m, [34min [0m<listcomp>
[34m    line: [0md[34m[[0m[36m"sources"[0m[34m][0m [34m=[0m [34m[[0m
            get_layer[34m([0msrc_name[34m)[0m
            [34mfor [0msrc_name [34min [0msrc_names
            [34mif [0m[34mnot [0msrc_name [34m=[0m[34m=[0m [36m"none"[0m[34m][0m
[34m    locals:[0m
      d [34;1m= [0m[34m<not found>[0m
      get_layer [34;1m= [0m[34m<local> [0m[34m<[0mfunction TFNetwork[34m.[0mconstruct_layer[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0mget_layer at 0x7fe7603061e0[34m>[0m
      src_name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
      src_names [34;1m= [0m[34m<not found>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m397[0m, [34min [0mget_layer
[34m    line: [0m[34mreturn [0mself[34m.[0mconstruct_layer[34m([0mnet_dict[34m=[0mnet_dict[34m,[0m name[34m=[0msrc_name[34m)[0m
[34m    locals:[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
      self[34;1m.[0mconstruct_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
      net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
      name [34;1m= [0m[34m<not found>[0m
      src_name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m410[0m, [34min [0mconstruct_layer
[34m    line: [0m[34mreturn [0madd_layer[34m([0mname[34m=[0mname[34m,[0m layer_class[34m=[0mlayer_class[34m,[0m [34m*[0m[34m*[0mlayer_desc[34m)[0m
[34m    locals:[0m
      add_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0madd_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
      name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
      layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
      layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'max_seq_len'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'mul:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_noise_shape'[0m[34m:[0m [34m{[0m[36m'*'[0m[34m:[0m [34mNone[0m[34m}[0m[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'dec_05_att_weights'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'dro...[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m497[0m, [34min [0madd_layer
[34m    line: [0mlayer [34m=[0m self[34m.[0m_create_layer[34m([0mname[34m=[0mname[34m,[0m layer_class[34m=[0mlayer_class[34m,[0m [34m*[0m[34m*[0mlayer_desc[34m)[0m
[34m    locals:[0m
      layer [34;1m= [0m[34m<not found>[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
      self[34;1m.[0m_create_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0m_create_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
      name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
      layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
      layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_noise_shape'[0m[34m:[0m [34m{[0m[36m'*'[0m[34m:[0m [34mNone[0m[34m}[0m[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'dec_05_att_weights'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'dropout'[0m[34m,[0m [36m'dropout'[0m[34m:[0m 0[34m.[0m1[34m}[0m[34m,[0m [36m'dec_01_att_energy'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'b...[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m456[0m, [34min [0m_create_layer
[34m    line: [0mlayer [34m=[0m layer_class[34m([0m[34m*[0m[34m*[0mlayer_desc[34m)[0m
[34m    locals:[0m
      layer [34;1m= [0m[34m<not found>[0m
      layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
      layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'max_seq_len'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'mul:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'network'[0m[34m:[0m [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m,[0m [36m'name'[0m[34m:[0m [36m'output'[0m[34m,[0m [36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_nois..., len = 8[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m179[0m, [34min [0m__init__
[34m    line: [0my [34m=[0m self[34m.[0m_get_output_subnet_unit[34m([0mself[34m.[0mcell[34m)[0m
[34m    locals:[0m
      y [34;1m= [0m[34m<not found>[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m
      self[34;1m.[0m_get_output_subnet_unit [34;1m= [0m[34m<local> [0m[34m<[0mbound method RecLayer[34m.[0m_get_output_subnet_unit of [34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m[34m>[0m
      self[34;1m.[0mcell [34;1m= [0m[34m<local> [0m[34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m703[0m, [34min [0m_get_output_subnet_unit
[34m    line: [0moutput[34m,[0m search_choices [34m=[0m cell[34m.[0mget_output[34m([0mrec_layer[34m=[0mself[34m)[0m
[34m    locals:[0m
      output [34;1m= [0m[34m<not found>[0m
      search_choices [34;1m= [0m[34m<not found>[0m
      cell [34;1m= [0m[34m<local> [0m[34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m
      cell[34;1m.[0mget_output [34;1m= [0m[34m<local> [0m[34m<[0mbound method _SubnetworkRecCell[34m.[0mget_output of [34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m[34m>[0m
      rec_layer [34;1m= [0m[34m<not found>[0m
      self [34;1m= [0m[34m<local> [0m[34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m
  [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m1459[0m, [34min [0mget_output
[34m    line: [0m[34massert [0mfixed_seq_len [34mis [0m[34mnot [0m[34mNone[0m
[34m    locals:[0m
      fixed_seq_len [34;1m= [0m[34m<local> [0m[34mNone[0m
[31mAssertionError[0m
Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140634717132544)>, proc 22317.

Thread current, main, <_MainThread(MainThread, started 140634717132544)>:
(Excluded thread.)

That were all threads.