Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ** beam_size 12
- RETURNN starting up, version 20181130.185608--git-d30181f, date/time 2018-12-01-09-58-31 (UTC+0100), pid 22317, cwd /work/smt2/makarov/NMT, Python /usr/bin/python3
- RETURNN command line options: ['hmm-factorization/en-de/transformer-hmm', '++load_epoch', '114', '++device', 'gpu', '--task', 'search', '++search_data', 'config:dev', '++beam_size', '12', '++need_data', 'False', '++max_seq_length', '0', '++search_output_file', 'hmm-factorization/en-de/hyp/transformer-hmm', '++batch_size', '2000']
- Hostname: cluster-cn-258
- TensorFlow: 1.9.0 (v1.9.0-0-g25c197e023) (<site-package> in /u/makarov/.local/lib/python3.5/site-packages/tensorflow)
- Setup TF inter and intra global thread pools, num_threads None, session opts {'device_count': {'GPU': 0}, 'log_device_placement': False}.
- 2018-12-01 09:58:32.562859: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
- 2018-12-01 09:58:32.978259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
- name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
- pciBusID: 0000:02:00.0
- totalMemory: 10.92GiB freeMemory: 10.76GiB
- 2018-12-01 09:58:32.978317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
- 2018-12-01 09:58:32.978337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
- 2018-12-01 09:58:32.978348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
- 2018-12-01 09:58:32.978358: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
- CUDA_VISIBLE_DEVICES is set to '0'.
- 2018-12-01 09:58:33.282635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
- 2018-12-01 09:58:33.828527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
- 2018-12-01 09:58:33.828579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
- 2018-12-01 09:58:33.828588: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
- 2018-12-01 09:58:33.828955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
- Collecting TensorFlow device list...
- Local devices available to TensorFlow:
- 1/2: name: "/device:CPU:0"
- device_type: "CPU"
- memory_limit: 268435456
- locality {
- }
- incarnation: 616944120252845792
- 2/2: name: "/device:GPU:0"
- device_type: "GPU"
- memory_limit: 10915220685
- locality {
- bus_id: 1
- links {
- }
- }
- incarnation: 955148772328989222
- physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1"
- Using gpu device 0: GeForce GTX 1080 Ti
- Setup tf.Session with options {'device_count': {'GPU': 1}, 'log_device_placement': False} ...
- 2018-12-01 09:58:38.902018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
- 2018-12-01 09:58:38.902091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
- 2018-12-01 09:58:38.902105: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
- 2018-12-01 09:58:38.902115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
- 2018-12-01 09:58:38.902372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10409 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
- layer root/'data' output: Data(name='data', shape=(None,), dtype='int32', sparse=True, dim=46300)
- layer root/'source_embed_raw' output: Data(name='source_embed_raw_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'source_embed_raw': <tf.Tensor 'source_embed_raw/linear/embedding_lookup:0' shape=(?, ?, 512) dtype=float32>
- layer root/'source_embed_weighted' output: Data(name='source_embed_weighted_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'source_embed_weighted': <tf.Tensor 'source_embed_weighted/mul:0' shape=(?, ?, 512) dtype=float32>
- layer root/'source_embed_with_pos' output: Data(name='source_embed_with_pos_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'source_embed_with_pos': <tf.Tensor 'source_embed_with_pos/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'source_embed' output: Data(name='source_embed_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'source_embed': <tf.Tensor 'source_embed_with_pos/source_embed_with_pos_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_self_att_laynorm' output: Data(name='enc_01_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_laynorm': <tf.Tensor 'enc_01_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_self_att_att' output: Data(name='enc_01_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_att': <tf.Tensor 'enc_01_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_self_att_lin' output: Data(name='enc_01_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_lin': <tf.Tensor 'enc_01_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_self_att_drop' output: Data(name='enc_01_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_drop': <tf.Tensor 'enc_01_self_att_lin/enc_01_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_self_att_out' output: Data(name='enc_01_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_self_att_out': <tf.Tensor 'enc_01_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_ff_laynorm' output: Data(name='enc_01_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_ff_laynorm': <tf.Tensor 'enc_01_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_ff_conv1' output: Data(name='enc_01_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv1': <tf.Tensor 'enc_01_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_01_ff_conv2' output: Data(name='enc_01_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_ff_conv2': <tf.Tensor 'enc_01_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_ff_drop' output: Data(name='enc_01_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_ff_drop': <tf.Tensor 'enc_01_ff_conv2/enc_01_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01_ff_out' output: Data(name='enc_01_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01_ff_out': <tf.Tensor 'enc_01_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_01' output: Data(name='enc_01_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_01': <tf.Tensor 'enc_01_ff_out/enc_01_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_self_att_laynorm' output: Data(name='enc_02_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_laynorm': <tf.Tensor 'enc_02_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_self_att_att' output: Data(name='enc_02_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_att': <tf.Tensor 'enc_02_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_self_att_lin' output: Data(name='enc_02_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_lin': <tf.Tensor 'enc_02_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_self_att_drop' output: Data(name='enc_02_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_drop': <tf.Tensor 'enc_02_self_att_lin/enc_02_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_self_att_out' output: Data(name='enc_02_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_self_att_out': <tf.Tensor 'enc_02_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_ff_laynorm' output: Data(name='enc_02_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_ff_laynorm': <tf.Tensor 'enc_02_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_ff_conv1' output: Data(name='enc_02_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv1': <tf.Tensor 'enc_02_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_02_ff_conv2' output: Data(name='enc_02_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_ff_conv2': <tf.Tensor 'enc_02_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_ff_drop' output: Data(name='enc_02_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_ff_drop': <tf.Tensor 'enc_02_ff_conv2/enc_02_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02_ff_out' output: Data(name='enc_02_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02_ff_out': <tf.Tensor 'enc_02_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_02' output: Data(name='enc_02_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_02': <tf.Tensor 'enc_02_ff_out/enc_02_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_self_att_laynorm' output: Data(name='enc_03_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_laynorm': <tf.Tensor 'enc_03_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_self_att_att' output: Data(name='enc_03_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_att': <tf.Tensor 'enc_03_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_self_att_lin' output: Data(name='enc_03_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_lin': <tf.Tensor 'enc_03_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_self_att_drop' output: Data(name='enc_03_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_drop': <tf.Tensor 'enc_03_self_att_lin/enc_03_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_self_att_out' output: Data(name='enc_03_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_self_att_out': <tf.Tensor 'enc_03_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_ff_laynorm' output: Data(name='enc_03_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_ff_laynorm': <tf.Tensor 'enc_03_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_ff_conv1' output: Data(name='enc_03_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv1': <tf.Tensor 'enc_03_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_03_ff_conv2' output: Data(name='enc_03_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_ff_conv2': <tf.Tensor 'enc_03_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_ff_drop' output: Data(name='enc_03_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_ff_drop': <tf.Tensor 'enc_03_ff_conv2/enc_03_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03_ff_out' output: Data(name='enc_03_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03_ff_out': <tf.Tensor 'enc_03_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_03' output: Data(name='enc_03_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_03': <tf.Tensor 'enc_03_ff_out/enc_03_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_self_att_laynorm' output: Data(name='enc_04_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_laynorm': <tf.Tensor 'enc_04_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_self_att_att' output: Data(name='enc_04_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_att': <tf.Tensor 'enc_04_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_self_att_lin' output: Data(name='enc_04_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_lin': <tf.Tensor 'enc_04_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_self_att_drop' output: Data(name='enc_04_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_drop': <tf.Tensor 'enc_04_self_att_lin/enc_04_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_self_att_out' output: Data(name='enc_04_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_self_att_out': <tf.Tensor 'enc_04_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_ff_laynorm' output: Data(name='enc_04_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_ff_laynorm': <tf.Tensor 'enc_04_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_ff_conv1' output: Data(name='enc_04_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv1': <tf.Tensor 'enc_04_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_04_ff_conv2' output: Data(name='enc_04_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_ff_conv2': <tf.Tensor 'enc_04_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_ff_drop' output: Data(name='enc_04_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_ff_drop': <tf.Tensor 'enc_04_ff_conv2/enc_04_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04_ff_out' output: Data(name='enc_04_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04_ff_out': <tf.Tensor 'enc_04_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_04' output: Data(name='enc_04_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_04': <tf.Tensor 'enc_04_ff_out/enc_04_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_self_att_laynorm' output: Data(name='enc_05_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_laynorm': <tf.Tensor 'enc_05_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_self_att_att' output: Data(name='enc_05_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_att': <tf.Tensor 'enc_05_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_self_att_lin' output: Data(name='enc_05_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_lin': <tf.Tensor 'enc_05_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_self_att_drop' output: Data(name='enc_05_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_drop': <tf.Tensor 'enc_05_self_att_lin/enc_05_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_self_att_out' output: Data(name='enc_05_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_self_att_out': <tf.Tensor 'enc_05_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_ff_laynorm' output: Data(name='enc_05_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_ff_laynorm': <tf.Tensor 'enc_05_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_ff_conv1' output: Data(name='enc_05_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv1': <tf.Tensor 'enc_05_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_05_ff_conv2' output: Data(name='enc_05_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_ff_conv2': <tf.Tensor 'enc_05_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_ff_drop' output: Data(name='enc_05_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_ff_drop': <tf.Tensor 'enc_05_ff_conv2/enc_05_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05_ff_out' output: Data(name='enc_05_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05_ff_out': <tf.Tensor 'enc_05_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_05' output: Data(name='enc_05_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_05': <tf.Tensor 'enc_05_ff_out/enc_05_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_self_att_laynorm' output: Data(name='enc_06_self_att_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_laynorm': <tf.Tensor 'enc_06_self_att_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_self_att_att' output: Data(name='enc_06_self_att_att_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_att': <tf.Tensor 'enc_06_self_att_att/merge_vdim:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_self_att_lin' output: Data(name='enc_06_self_att_lin_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_lin': <tf.Tensor 'enc_06_self_att_lin/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_self_att_drop' output: Data(name='enc_06_self_att_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_drop': <tf.Tensor 'enc_06_self_att_lin/enc_06_self_att_lin_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_self_att_out' output: Data(name='enc_06_self_att_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_self_att_out': <tf.Tensor 'enc_06_self_att_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_ff_laynorm' output: Data(name='enc_06_ff_laynorm_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_ff_laynorm': <tf.Tensor 'enc_06_ff_laynorm/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_ff_conv1' output: Data(name='enc_06_ff_conv1_output', shape=(None, 2048))
- debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv1': <tf.Tensor 'enc_06_ff_conv1/activation/Relu:0' shape=(?, ?, 2048) dtype=float32>
- layer root/'enc_06_ff_conv2' output: Data(name='enc_06_ff_conv2_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_ff_conv2': <tf.Tensor 'enc_06_ff_conv2/linear/add_bias:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_ff_drop' output: Data(name='enc_06_ff_drop_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_ff_drop': <tf.Tensor 'enc_06_ff_conv2/enc_06_ff_conv2_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06_ff_out' output: Data(name='enc_06_ff_out_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06_ff_out': <tf.Tensor 'enc_06_ff_out/Add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'enc_06' output: Data(name='enc_06_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'enc_06': <tf.Tensor 'enc_06_ff_out/enc_06_ff_out_identity_with_check_numerics_output/Identity:0' shape=(?, ?, 512) dtype=float32>
- layer root/'encoder' output: Data(name='encoder_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'encoder': <tf.Tensor 'encoder/add:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_01_att_key0' output: Data(name='dec_01_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_01_att_key0': <tf.Tensor 'dec_01_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_01_att_key' output: Data(name='dec_01_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_01_att_key': <tf.Tensor 'dec_01_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_03_att_key0' output: Data(name='dec_03_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_03_att_key0': <tf.Tensor 'dec_03_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_03_att_key' output: Data(name='dec_03_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_03_att_key': <tf.Tensor 'dec_03_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_04_att_value0' output: Data(name='dec_04_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_04_att_value0': <tf.Tensor 'dec_04_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_04_att_value' output: Data(name='dec_04_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_04_att_value': <tf.Tensor 'dec_04_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_06_att_key0' output: Data(name='dec_06_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_06_att_key0': <tf.Tensor 'dec_06_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_06_att_key' output: Data(name='dec_06_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_06_att_key': <tf.Tensor 'dec_06_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_06_att_value0' output: Data(name='dec_06_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_06_att_value0': <tf.Tensor 'dec_06_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_06_att_value' output: Data(name='dec_06_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_06_att_value': <tf.Tensor 'dec_06_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_03_att_value0' output: Data(name='dec_03_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_03_att_value0': <tf.Tensor 'dec_03_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_03_att_value' output: Data(name='dec_03_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_03_att_value': <tf.Tensor 'dec_03_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_05_att_value0' output: Data(name='dec_05_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_05_att_value0': <tf.Tensor 'dec_05_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_05_att_value' output: Data(name='dec_05_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_05_att_value': <tf.Tensor 'dec_05_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_02_att_key0' output: Data(name='dec_02_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_02_att_key0': <tf.Tensor 'dec_02_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_02_att_key' output: Data(name='dec_02_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_02_att_key': <tf.Tensor 'dec_02_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_01_att_value0' output: Data(name='dec_01_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_01_att_value0': <tf.Tensor 'dec_01_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_01_att_value' output: Data(name='dec_01_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_01_att_value': <tf.Tensor 'dec_01_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_04_att_key0' output: Data(name='dec_04_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_04_att_key0': <tf.Tensor 'dec_04_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_04_att_key' output: Data(name='dec_04_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_04_att_key': <tf.Tensor 'dec_04_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_05_att_key0' output: Data(name='dec_05_att_key0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_05_att_key0': <tf.Tensor 'dec_05_att_key0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_05_att_key' output: Data(name='dec_05_att_key_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_05_att_key': <tf.Tensor 'dec_05_att_key/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'dec_02_att_value0' output: Data(name='dec_02_att_value0_output', shape=(None, 512))
- debug_add_check_numerics_on_output: add for layer 'dec_02_att_value0': <tf.Tensor 'dec_02_att_value0/linear/dot/Reshape_1:0' shape=(?, ?, 512) dtype=float32>
- layer root/'dec_02_att_value' output: Data(name='dec_02_att_value_output', shape=(None, 8, 64))
- debug_add_check_numerics_on_output: add for layer 'dec_02_att_value': <tf.Tensor 'dec_02_att_value/Reshape:0' shape=(?, ?, 8, 64) dtype=float32>
- layer root/'output' output: Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12)
- Rec layer sub net:
- Input layers moved out of loop: (#: 1)
- encoder_int
- Output layers moved out of loop: (#: 0)
- None
- Layers in loop: (#: 142)
- end
- output
- output_prob
- dec_06_att_weights
- dec_06_att_energy
- dec_06_att_query
- dec_06_att_query0
- dec_06_att_laynorm
- dec_06_self_att_out
- dec_05
- dec_05_ff_out
- dec_05_att_out
- dec_05_att_drop
- dec_05_att_lin
- dec_05_att_att
- dec_05_att0
- dec_05_att_weights_drop
- dec_05_att_weights
- dec_05_att_energy
- dec_05_att_query
- dec_05_att_query0
- dec_05_att_laynorm
- dec_05_self_att_out
- dec_04
- dec_04_ff_out
- dec_04_att_out
- dec_04_att_drop
- dec_04_att_lin
- dec_04_att_att
- dec_04_att0
- dec_04_att_weights_drop
- dec_04_att_weights
- dec_04_att_energy
- dec_04_att_query
- dec_04_att_query0
- dec_04_att_laynorm
- dec_04_self_att_out
- dec_03
- dec_03_ff_out
- dec_03_att_out
- dec_03_att_drop
- dec_03_att_lin
- dec_03_att_att
- dec_03_att0
- dec_03_att_weights_drop
- dec_03_att_weights
- dec_03_att_energy
- dec_03_att_query
- dec_03_att_query0
- dec_03_att_laynorm
- dec_03_self_att_out
- dec_02
- dec_02_ff_out
- dec_02_att_out
- dec_02_att_drop
- dec_02_att_lin
- dec_02_att_att
- dec_02_att0
- dec_02_att_weights_drop
- dec_02_att_weights
- dec_02_att_energy
- dec_02_att_query
- dec_02_att_query0
- dec_02_att_laynorm
- dec_02_self_att_out
- dec_01
- dec_01_ff_out
- dec_01_att_out
- dec_01_att_drop
- dec_01_att_lin
- dec_01_att_att
- dec_01_att0
- dec_01_att_weights_drop
- dec_01_att_weights
- dec_01_att_energy
- dec_01_att_query
- dec_01_att_query0
- dec_01_att_laynorm
- dec_01_self_att_out
- dec_01_self_att_drop
- dec_01_self_att_lin
- dec_01_self_att_att
- dec_01_self_att_laynorm
- target_embed
- target_embed_with_pos
- target_embed_weighted
- target_embed_raw
- dec_01_ff_drop
- dec_01_ff_conv2
- dec_01_ff_conv1
- dec_01_ff_laynorm
- dec_02_self_att_drop
- dec_02_self_att_lin
- dec_02_self_att_att
- dec_02_self_att_laynorm
- dec_02_ff_drop
- dec_02_ff_conv2
- dec_02_ff_conv1
- dec_02_ff_laynorm
- dec_03_self_att_drop
- dec_03_self_att_lin
- dec_03_self_att_att
- dec_03_self_att_laynorm
- dec_03_ff_drop
- dec_03_ff_conv2
- dec_03_ff_conv1
- dec_03_ff_laynorm
- dec_04_self_att_drop
- dec_04_self_att_lin
- dec_04_self_att_att
- dec_04_self_att_laynorm
- dec_04_ff_drop
- dec_04_ff_conv2
- dec_04_ff_conv1
- dec_04_ff_laynorm
- dec_05_self_att_drop
- dec_05_self_att_lin
- dec_05_self_att_att
- dec_05_self_att_laynorm
- dec_05_ff_drop
- dec_05_ff_conv2
- dec_05_ff_conv1
- dec_05_ff_laynorm
- dec_06_self_att_drop
- dec_06_self_att_lin
- dec_06_self_att_att
- dec_06_self_att_laynorm
- decoder_int
- decoder
- dec_06
- dec_06_ff_out
- dec_06_att_out
- dec_06_att_drop
- dec_06_att_lin
- dec_06_att_att
- dec_06_att0
- dec_06_att_weights_drop
- dec_06_ff_drop
- dec_06_ff_conv2
- dec_06_ff_conv1
- dec_06_ff_laynorm
- prev_outputs_int
- Unused layers: (#: 0)
- None
- layer root/output:rec-subnet-input/'encoder_int' output: Data(name='encoder_int_output', shape=(None, 1000))
- debug_add_check_numerics_on_output: add for layer 'encoder_int': <tf.Tensor 'output/rec/encoder_int/linear/dot/Reshape_1:0' shape=(?, ?, 1000) dtype=float32>
- Exception creating layer root/'output' of class RecLayer with opts:
- {'max_seq_len': <tf.Tensor 'mul:0' shape=() dtype=int32>,
- 'n_out': None,
- 'name': 'output',
- 'network': <TFNetwork 'root' train=False search>,
- 'output': Data(name='output_output', shape=(None,), dtype='int32', sparse=True, dim=34908, batch_dim_axis=1, beam_size=12),
- 'sources': [],
- 'target': 'classes',
- 'unit': {'dec_01': {'class': 'copy', 'from': ['dec_01_ff_out']},
- 'dec_01_att0': {'base': 'base:dec_01_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_01_att_weights_drop'},
- 'dec_01_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_01_att0']},
- 'dec_01_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_01_att_lin']},
- 'dec_01_att_energy': {'class': 'dot',
- 'from': ['base:dec_01_att_key',
- 'dec_01_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_01_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_01_self_att_out']},
- 'dec_01_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_01_att_out': {'class': 'combine',
- 'from': ['dec_01_self_att_out', 'dec_01_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_01_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_01_att_query0']},
- 'dec_01_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_01_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_01_att_energy']},
- 'dec_01_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_01_att_weights']},
- 'dec_01_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_01_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_01_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_01_ff_conv2']},
- 'dec_01_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_01_att_out']},
- 'dec_01_ff_out': {'class': 'combine',
- 'from': ['dec_01_att_out', 'dec_01_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_01_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_01_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_01_self_att_lin']},
- 'dec_01_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['target_embed']},
- 'dec_01_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_01_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_01_self_att_out': {'class': 'combine',
- 'from': ['target_embed',
- 'dec_01_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_02': {'class': 'copy', 'from': ['dec_02_ff_out']},
- 'dec_02_att0': {'base': 'base:dec_02_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_02_att_weights_drop'},
- 'dec_02_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_02_att0']},
- 'dec_02_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_02_att_lin']},
- 'dec_02_att_energy': {'class': 'dot',
- 'from': ['base:dec_02_att_key',
- 'dec_02_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_02_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_02_self_att_out']},
- 'dec_02_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_02_att_out': {'class': 'combine',
- 'from': ['dec_02_self_att_out', 'dec_02_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_02_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_02_att_query0']},
- 'dec_02_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_02_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_02_att_energy']},
- 'dec_02_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_02_att_weights']},
- 'dec_02_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_02_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_02_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_02_ff_conv2']},
- 'dec_02_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_02_att_out']},
- 'dec_02_ff_out': {'class': 'combine',
- 'from': ['dec_02_att_out', 'dec_02_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_02_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_02_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_02_self_att_lin']},
- 'dec_02_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_01']},
- 'dec_02_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_02_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_02_self_att_out': {'class': 'combine',
- 'from': ['dec_01', 'dec_02_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_03': {'class': 'copy', 'from': ['dec_03_ff_out']},
- 'dec_03_att0': {'base': 'base:dec_03_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_03_att_weights_drop'},
- 'dec_03_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_03_att0']},
- 'dec_03_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_03_att_lin']},
- 'dec_03_att_energy': {'class': 'dot',
- 'from': ['base:dec_03_att_key',
- 'dec_03_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_03_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_03_self_att_out']},
- 'dec_03_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_03_att_out': {'class': 'combine',
- 'from': ['dec_03_self_att_out', 'dec_03_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_03_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_03_att_query0']},
- 'dec_03_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_03_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_03_att_energy']},
- 'dec_03_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_03_att_weights']},
- 'dec_03_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_03_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_03_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_03_ff_conv2']},
- 'dec_03_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_03_att_out']},
- 'dec_03_ff_out': {'class': 'combine',
- 'from': ['dec_03_att_out', 'dec_03_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_03_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_03_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_03_self_att_lin']},
- 'dec_03_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_02']},
- 'dec_03_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_03_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_03_self_att_out': {'class': 'combine',
- 'from': ['dec_02', 'dec_03_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_04': {'class': 'copy', 'from': ['dec_04_ff_out']},
- 'dec_04_att0': {'base': 'base:dec_04_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_04_att_weights_drop'},
- 'dec_04_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_04_att0']},
- 'dec_04_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_04_att_lin']},
- 'dec_04_att_energy': {'class': 'dot',
- 'from': ['base:dec_04_att_key',
- 'dec_04_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_04_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_04_self_att_out']},
- 'dec_04_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_04_att_out': {'class': 'combine',
- 'from': ['dec_04_self_att_out', 'dec_04_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_04_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_04_att_query0']},
- 'dec_04_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_04_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_04_att_energy']},
- 'dec_04_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_04_att_weights']},
- 'dec_04_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_04_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_04_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_04_ff_conv2']},
- 'dec_04_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_04_att_out']},
- 'dec_04_ff_out': {'class': 'combine',
- 'from': ['dec_04_att_out', 'dec_04_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_04_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_04_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_04_self_att_lin']},
- 'dec_04_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_03']},
- 'dec_04_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_04_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_04_self_att_out': {'class': 'combine',
- 'from': ['dec_03', 'dec_04_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_05': {'class': 'copy', 'from': ['dec_05_ff_out']},
- 'dec_05_att0': {'base': 'base:dec_05_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_05_att_weights_drop'},
- 'dec_05_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_05_att0']},
- 'dec_05_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_05_att_lin']},
- 'dec_05_att_energy': {'class': 'dot',
- 'from': ['base:dec_05_att_key',
- 'dec_05_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_05_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_05_self_att_out']},
- 'dec_05_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_05_att_out': {'class': 'combine',
- 'from': ['dec_05_self_att_out', 'dec_05_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_05_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_05_att_query0']},
- 'dec_05_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_05_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_05_att_energy']},
- 'dec_05_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_05_att_weights']},
- 'dec_05_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_05_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_05_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_05_ff_conv2']},
- 'dec_05_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_05_att_out']},
- 'dec_05_ff_out': {'class': 'combine',
- 'from': ['dec_05_att_out', 'dec_05_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_05_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_05_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_05_self_att_lin']},
- 'dec_05_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_04']},
- 'dec_05_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_05_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_05_self_att_out': {'class': 'combine',
- 'from': ['dec_04', 'dec_05_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_06': {'class': 'copy', 'from': ['dec_06_ff_out']},
- 'dec_06_att0': {'base': 'base:dec_06_att_value',
- 'class': 'generic_attention',
- 'weights': 'dec_06_att_weights_drop'},
- 'dec_06_att_att': {'axes': 'static',
- 'class': 'merge_dims',
- 'from': ['dec_06_att0']},
- 'dec_06_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_06_att_lin']},
- 'dec_06_att_energy': {'class': 'dot',
- 'from': ['base:dec_06_att_key',
- 'dec_06_att_query'],
- 'red1': -1,
- 'red2': -1,
- 'var1': 'T',
- 'var2': 'T?'},
- 'dec_06_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_06_self_att_out']},
- 'dec_06_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_06_att_out': {'class': 'combine',
- 'from': ['dec_06_self_att_out', 'dec_06_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_06_att_query': {'axis': 'F',
- 'class': 'split_dims',
- 'dims': (8, 64),
- 'from': ['dec_06_att_query0']},
- 'dec_06_att_query0': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_att_laynorm'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_06_att_weights': {'class': 'softmax_over_spatial',
- 'energy_factor': 0.125,
- 'from': ['dec_06_att_energy']},
- 'dec_06_att_weights_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'dropout_noise_shape': {'*': None},
- 'from': ['dec_06_att_weights']},
- 'dec_06_ff_conv1': {'activation': 'relu',
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_ff_laynorm'],
- 'n_out': 2048,
- 'with_bias': True},
- 'dec_06_ff_conv2': {'activation': None,
- 'class': 'linear',
- 'dropout': 0.1,
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_ff_conv1'],
- 'n_out': 512,
- 'with_bias': True},
- 'dec_06_ff_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_06_ff_conv2']},
- 'dec_06_ff_laynorm': {'class': 'layer_norm',
- 'from': ['dec_06_att_out']},
- 'dec_06_ff_out': {'class': 'combine',
- 'from': ['dec_06_att_out', 'dec_06_ff_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'dec_06_self_att_att': {'attention_dropout': 0.1,
- 'attention_left_only': True,
- 'class': 'self_attention',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_self_att_laynorm'],
- 'n_out': 512,
- 'num_heads': 8,
- 'total_key_dim': 512},
- 'dec_06_self_att_drop': {'class': 'dropout',
- 'dropout': 0.1,
- 'from': ['dec_06_self_att_lin']},
- 'dec_06_self_att_laynorm': {'class': 'layer_norm',
- 'from': ['dec_05']},
- 'dec_06_self_att_lin': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['dec_06_self_att_att'],
- 'n_out': 512,
- 'with_bias': False},
- 'dec_06_self_att_out': {'class': 'combine',
- 'from': ['dec_05', 'dec_06_self_att_drop'],
- 'kind': 'add',
- 'n_out': 512},
- 'decoder': {'class': 'layer_norm', 'from': ['dec_06'], 'n_out': 512},
- 'decoder_int': {'activation': None,
- 'class': 'linear',
- 'from': ['decoder'],
- 'n_out': 1000,
- 'with_bias': False},
- 'encoder_int': {'activation': None,
- 'class': 'linear',
- 'from': ['base:encoder'],
- 'n_out': 1000,
- 'with_bias': False},
- 'end': {'class': 'compare', 'from': ['output'], 'value': 0},
- 'output': {'beam_size': 12,
- 'class': 'choice',
- 'from': ['output_prob'],
- 'initial_output': 0,
- 'target': 'classes'},
- 'output_prob': {'attention_weights': 'dec_06_att_weights',
- 'base_encoder_transformed': 'encoder_int',
- 'class': 'hmm_factorization',
- 'debug': False,
- 'from': 'dec_06_att_weights',
- 'loss': 'ce',
- 'n_out': 34908,
- 'prev_outputs': 'prev_outputs_int',
- 'prev_state': 'decoder_int',
- 'target': 'classes',
- 'threshold': None,
- 'transpose_and_average_att_weights': True},
- 'prev_outputs_int': {'activation': None,
- 'class': 'linear',
- 'from': ['prev:target_embed_raw'],
- 'n_out': 1000,
- 'with_bias': False},
- 'target_embed': {'class': 'dropout',
- 'dropout': 0.0,
- 'from': ['target_embed_with_pos']},
- 'target_embed_raw': {'activation': None,
- 'class': 'linear',
- 'forward_weights_init': "variance_scaling_initializer(mode='fan_in', "
- "distribution='uniform', "
- 'scale=0.78)',
- 'from': ['prev:output'],
- 'n_out': 512,
- 'with_bias': False},
- 'target_embed_weighted': {'class': 'eval',
- 'eval': 'source(0) * 22.627417',
- 'from': ['target_embed_raw']},
- 'target_embed_with_pos': {'add_to_input': True,
- 'class': 'positional_encoding',
- 'from': ['target_embed_weighted']}}}
- [31;1mEXCEPTION[0m
- [34mTraceback (most recent call last):[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m591[0m, [34min [0m<module>
- [34m line: [0mmain[34m([0msys[34m.[0margv[34m)[0m
- [34m locals:[0m
- main [34;1m= [0m[34m<local> [0m[34m<[0mfunction main at 0x7fe7b0180bf8[34m>[0m
- sys [34;1m= [0m[34m<local> [0m[34m<[0mmodule [36m'sys'[0m [34m([0mbuilt[34m-[0m[34min[0m[34m)[0m[34m>[0m
- sys[34;1m.[0margv [34;1m= [0m[34m<local> [0m[34m[[0m[36m'/u/makarov/returnn-hmm-fac/rnn.py'[0m[34m,[0m [36m'hmm-factorization/en-de/transformer-hmm'[0m[34m,[0m [36m'++load_epoch'[0m[34m,[0m [36m'114'[0m[34m,[0m [36m'++device'[0m[34m,[0m [36m'gpu'[0m[34m,[0m [36m'--task'[0m[34m,[0m [36m'search'[0m[34m,[0m [36m'++search_data'[0m[34m,[0m [36m'config:dev'[0m[34m,[0m [36m'++beam_size'[0m[34m,[0m [36m'12'[0m[34m,[0m [36m'++need_data'[0m[34m,[0m [36m'False'[0m[34m,[0m [36m'++max_seq_length'[0m[34m,[0m [36m'0'[0m[34m,[0m [36m'++search_output_file'[0m[34m,[0m [36m'hmm-factorization/en-de/hyp/..., len = 20, _[0]: {len = 33}[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m579[0m, [34min [0mmain
- [34m line: [0mexecuteMainTask[34m([0m[34m)[0m
- [34m locals:[0m
- executeMainTask [34;1m= [0m[34m<global> [0m[34m<[0mfunction executeMainTask at 0x7fe7b0180ae8[34m>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mrnn.py[0m[36m"[0m, [34mline [0m[35m434[0m, [34min [0mexecuteMainTask
- [34m line: [0mengine[34m.[0minit_network_from_config[34m([0mconfig[34m)[0m
- [34m locals:[0m
- engine [34;1m= [0m[34m<global> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
- engine[34;1m.[0minit_network_from_config [34;1m= [0m[34m<global> [0m[34m<[0mbound method Engine[34m.[0minit_network_from_config of [34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m[34m>[0m
- config [34;1m= [0m[34m<global> [0m[34m<[0mConfig[34m.[0mConfig object at 0x7fe810aca080[34m>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m936[0m, [34min [0minit_network_from_config
- [34m line: [0mself[34m.[0m_init_network[34m([0mnet_desc[34m=[0mnet_dict[34m,[0m epoch[34m=[0mself[34m.[0mepoch[34m)[0m
- [34m locals:[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
- self[34;1m.[0m_init_network [34;1m= [0m[34m<local> [0m[34m<[0mbound method Engine[34m.[0m_init_network of [34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m[34m>[0m
- net_desc [34;1m= [0m[34m<not found>[0m
- net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
- epoch [34;1m= [0m[34m<local> [0m114
- self[34;1m.[0mepoch [34;1m= [0m[34m<local> [0m114
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m1059[0m, [34min [0m_init_network
- [34m line: [0mself[34m.[0mnetwork[34m,[0m self[34m.[0mupdater [34m=[0m self[34m.[0mcreate_network[34m([0m
- config[34m=[0mself[34m.[0mconfig[34m,[0m
- rnd_seed[34m=[0mnet_random_seed[34m,[0m
- train_flag[34m=[0mtrain_flag[34m,[0m eval_flag[34m=[0mself[34m.[0muse_eval_flag[34m,[0m search_flag[34m=[0mself[34m.[0muse_search_flag[34m,[0m
- initial_learning_rate[34m=[0mgetattr[34m([0mself[34m,[0m [36m"initial_learning_rate"[0m[34m,[0m [34mNone[0m[34m)[0m[34m,[0m
- net_dict[34m=[0mnet_desc[34m)[0m
- [34m locals:[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFEngine[34m.[0mEngine object at 0x7fe8124a5e48[34m>[0m
- self[34;1m.[0mnetwork [34;1m= [0m[34m<local> [0m[34mNone[0m
- self[34;1m.[0mupdater [34;1m= [0m[34m<local> [0m[34mNone[0m
- self[34;1m.[0mcreate_network [34;1m= [0m[34m<local> [0m[34m<[0mbound method Engine[34m.[0mcreate_network of [34m<[0m[34mclass [0m[36m'TFEngine.Engine'[0m[34m>[0m[34m>[0m
- config [34;1m= [0m[34m<not found>[0m
- self[34;1m.[0mconfig [34;1m= [0m[34m<local> [0m[34m<[0mConfig[34m.[0mConfig object at 0x7fe810aca080[34m>[0m
- rnd_seed [34;1m= [0m[34m<not found>[0m
- net_random_seed [34;1m= [0m[34m<local> [0m114
- train_flag [34;1m= [0m[34m<local> [0m[34mFalse[0m
- eval_flag [34;1m= [0m[34m<not found>[0m
- self[34;1m.[0muse_eval_flag [34;1m= [0m[34m<local> [0m[34mTrue[0m
- search_flag [34;1m= [0m[34m<not found>[0m
- self[34;1m.[0muse_search_flag [34;1m= [0m[34m<local> [0m[34mTrue[0m
- initial_learning_rate [34;1m= [0m[34m<not found>[0m
- getattr [34;1m= [0m[34m<builtin> [0m[34m<[0mbuilt[34m-[0m[34min [0mfunction getattr[34m>[0m
- net_dict [34;1m= [0m[34m<not found>[0m
- net_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFEngine.py[0m[36m"[0m, [34mline [0m[35m1090[0m, [34min [0mcreate_network
- [34m line: [0mnetwork[34m.[0mconstruct_from_dict[34m([0mnet_dict[34m)[0m
- [34m locals:[0m
- network [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
- network[34;1m.[0mconstruct_from_dict [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_from_dict of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
- net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m338[0m, [34min [0mconstruct_from_dict
- [34m line: [0mself[34m.[0mconstruct_layer[34m([0mnet_dict[34m,[0m name[34m)[0m
- [34m locals:[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
- self[34;1m.[0mconstruct_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
- net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
- name [34;1m= [0m[34m<local> [0m[36m'decision'[0m[34m,[0m len [34m=[0m 8
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m407[0m, [34min [0mconstruct_layer
- [34m line: [0mlayer_class[34m.[0mtransform_config_dict[34m([0mlayer_desc[34m,[0m network[34m=[0mself[34m,[0m get_layer[34m=[0mget_layer[34m)[0m
- [34m locals:[0m
- layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.DecideLayer'[0m[34m>[0m
- layer_class[34;1m.[0mtransform_config_dict [34;1m= [0m[34m<local> [0m[34m<[0mbound method LayerBase[34m.[0mtransform_config_dict of [34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.DecideLayer'[0m[34m>[0m[34m>[0m
- layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'loss_opts'[0m[34m:[0m [34m{[0m[34m}[0m[34m,[0m [36m'target'[0m[34m:[0m [36m'classes'[0m[34m,[0m [36m'loss'[0m[34m:[0m [36m'edit_distance'[0m[34m}[0m
- network [34;1m= [0m[34m<not found>[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
- get_layer [34;1m= [0m[34m<local> [0m[34m<[0mfunction TFNetwork[34m.[0mconstruct_layer[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0mget_layer at 0x7fe7603061e0[34m>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkLayer.py[0m[36m"[0m, [34mline [0m[35m358[0m, [34min [0mtransform_config_dict
- [34m line: [0m[34mfor [0msrc_name [34min [0msrc_names
- [34m locals:[0m
- src_name [34;1m= [0m[34m<not found>[0m
- src_names [34;1m= [0m[34m<local> [0m[34m[[0m[36m'output'[0m[34m][0m[34m,[0m _[34m[[0m0[34m][0m[34m:[0m [34m{[0mlen [34m=[0m 6[34m}[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkLayer.py[0m[36m"[0m, [34mline [0m[35m359[0m, [34min [0m<listcomp>
- [34m line: [0md[34m[[0m[36m"sources"[0m[34m][0m [34m=[0m [34m[[0m
- get_layer[34m([0msrc_name[34m)[0m
- [34mfor [0msrc_name [34min [0msrc_names
- [34mif [0m[34mnot [0msrc_name [34m=[0m[34m=[0m [36m"none"[0m[34m][0m
- [34m locals:[0m
- d [34;1m= [0m[34m<not found>[0m
- get_layer [34;1m= [0m[34m<local> [0m[34m<[0mfunction TFNetwork[34m.[0mconstruct_layer[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0mget_layer at 0x7fe7603061e0[34m>[0m
- src_name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
- src_names [34;1m= [0m[34m<not found>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m397[0m, [34min [0mget_layer
- [34m line: [0m[34mreturn [0mself[34m.[0mconstruct_layer[34m([0mnet_dict[34m=[0mnet_dict[34m,[0m name[34m=[0msrc_name[34m)[0m
- [34m locals:[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
- self[34;1m.[0mconstruct_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0mconstruct_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
- net_dict [34;1m= [0m[34m<local> [0m[34m{[0m[36m'enc_05_self_att_att'[0m[34m:[0m [34m{[0m[36m'total_key_dim'[0m[34m:[0m 512[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'enc_05_self_att_laynorm'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'self_attention'[0m[34m,[0m [36m'forward_weights_init'[0m[34m:[0m [36m"variance_scaling_initializer(mode='fan_in', distribution='uniform', scale=0.78)"[0m[34m,[0m [36m'num_heads'[0m[34m:[0m 8[34m,[0m [36m'attention_left_only'[0m[34m:[0m [34mFalse[0m[34m,[0m [36m'attention_dropout'[0m[34m:[0m 0[34m.[0m1[34m,[0m [36m'n_out'[0m[34m.[0m[34m.[0m[34m.[0m[34m,[0m len [34m=[0m 97
- name [34;1m= [0m[34m<not found>[0m
- src_name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m410[0m, [34min [0mconstruct_layer
- [34m line: [0m[34mreturn [0madd_layer[34m([0mname[34m=[0mname[34m,[0m layer_class[34m=[0mlayer_class[34m,[0m [34m*[0m[34m*[0mlayer_desc[34m)[0m
- [34m locals:[0m
- add_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0madd_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
- name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
- layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
- layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'max_seq_len'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'mul:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_noise_shape'[0m[34m:[0m [34m{[0m[36m'*'[0m[34m:[0m [34mNone[0m[34m}[0m[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'dec_05_att_weights'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'dro...[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m497[0m, [34min [0madd_layer
- [34m line: [0mlayer [34m=[0m self[34m.[0m_create_layer[34m([0mname[34m=[0mname[34m,[0m layer_class[34m=[0mlayer_class[34m,[0m [34m*[0m[34m*[0mlayer_desc[34m)[0m
- [34m locals:[0m
- layer [34;1m= [0m[34m<not found>[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m
- self[34;1m.[0m_create_layer [34;1m= [0m[34m<local> [0m[34m<[0mbound method TFNetwork[34m.[0m_create_layer of [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m>[0m
- name [34;1m= [0m[34m<local> [0m[36m'output'[0m[34m,[0m len [34m=[0m 6
- layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
- layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_noise_shape'[0m[34m:[0m [34m{[0m[36m'*'[0m[34m:[0m [34mNone[0m[34m}[0m[34m,[0m [36m'from'[0m[34m:[0m [34m[[0m[36m'dec_05_att_weights'[0m[34m][0m[34m,[0m [36m'class'[0m[34m:[0m [36m'dropout'[0m[34m,[0m [36m'dropout'[0m[34m:[0m 0[34m.[0m1[34m}[0m[34m,[0m [36m'dec_01_att_energy'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'b...[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetwork.py[0m[36m"[0m, [34mline [0m[35m456[0m, [34min [0m_create_layer
- [34m line: [0mlayer [34m=[0m layer_class[34m([0m[34m*[0m[34m*[0mlayer_desc[34m)[0m
- [34m locals:[0m
- layer [34;1m= [0m[34m<not found>[0m
- layer_class [34;1m= [0m[34m<local> [0m[34m<[0m[34mclass [0m[36m'TFNetworkRecLayer.RecLayer'[0m[34m>[0m
- layer_desc [34;1m= [0m[34m<local> [0m[34m{[0m[36m'max_seq_len'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'mul:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'network'[0m[34m:[0m [34m<[0mTFNetwork [36m'root'[0m train[34m=[0m[34mFalse [0msearch[34m>[0m[34m,[0m [36m'name'[0m[34m:[0m [36m'output'[0m[34m,[0m [36m'unit'[0m[34m:[0m [34m{[0m[36m'dec_06_att_out'[0m[34m:[0m [34m{[0m[36m'from'[0m[34m:[0m [34m[[0m[36m'dec_06_self_att_out'[0m[34m,[0m [36m'dec_06_att_drop'[0m[34m][0m[34m,[0m [36m'kind'[0m[34m:[0m [36m'add'[0m[34m,[0m [36m'class'[0m[34m:[0m [36m'combine'[0m[34m,[0m [36m'n_out'[0m[34m:[0m 512[34m}[0m[34m,[0m [36m'dec_05_att_weights_drop'[0m[34m:[0m [34m{[0m[36m'dropout_nois..., len = 8[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m179[0m, [34min [0m__init__
- [34m line: [0my [34m=[0m self[34m.[0m_get_output_subnet_unit[34m([0mself[34m.[0mcell[34m)[0m
- [34m locals:[0m
- y [34;1m= [0m[34m<not found>[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m
- self[34;1m.[0m_get_output_subnet_unit [34;1m= [0m[34m<local> [0m[34m<[0mbound method RecLayer[34m.[0m_get_output_subnet_unit of [34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m[34m>[0m
- self[34;1m.[0mcell [34;1m= [0m[34m<local> [0m[34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m703[0m, [34min [0m_get_output_subnet_unit
- [34m line: [0moutput[34m,[0m search_choices [34m=[0m cell[34m.[0mget_output[34m([0mrec_layer[34m=[0mself[34m)[0m
- [34m locals:[0m
- output [34;1m= [0m[34m<not found>[0m
- search_choices [34;1m= [0m[34m<not found>[0m
- cell [34;1m= [0m[34m<local> [0m[34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m
- cell[34;1m.[0mget_output [34;1m= [0m[34m<local> [0m[34m<[0mbound method _SubnetworkRecCell[34m.[0mget_output of [34m<[0mTFNetworkRecLayer[34m.[0m_SubnetworkRecCell object at 0x7fe37643e7b8[34m>[0m[34m>[0m
- rec_layer [34;1m= [0m[34m<not found>[0m
- self [34;1m= [0m[34m<local> [0m[34m<[0mRecLayer [36m'output'[0m out_type[34m=[0mData[34m([0mshape[34m=[0m[34m([0m[34mNone[0m[34m,[0m[34m)[0m[34m,[0m dtype[34m=[0m[36m'int32'[0m[34m,[0m sparse[34m=[0m[34mTrue[0m[34m,[0m dim[34m=[0m34908[34m,[0m batch_dim_axis[34m=[0m1[34m,[0m beam_size[34m=[0m12[34m)[0m[34m>[0m
- [34;1mFile [0m[36m"/u/makarov/returnn-hmm-fac/[0m[36;1mTFNetworkRecLayer.py[0m[36m"[0m, [34mline [0m[35m1459[0m, [34min [0mget_output
- [34m line: [0m[34massert [0mfixed_seq_len [34mis [0m[34mnot [0m[34mNone[0m
- [34m locals:[0m
- fixed_seq_len [34;1m= [0m[34m<local> [0m[34mNone[0m
- [31mAssertionError[0m
- Unhandled exception <class 'AssertionError'> in thread <_MainThread(MainThread, started 140634717132544)>, proc 22317.
- Thread current, main, <_MainThread(MainThread, started 140634717132544)>:
- (Excluded thread.)
- That were all threads.
Advertisement
Add Comment
Please, Sign In to add comment