Guest User

Untitled

a guest
Dec 1st, 2018
297
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 70.56 KB | None | 0 0
  1. RETURNN starting up, version 20180918.113416--git-ef519a3, date/time 2018-10-03-05-22-57 (UTC+0200), pid 339, cwd /work/smt2/makarov/NMT, Python /usr/bin/python3
  2. RETURNN command line options: ['hmm-factorization/en-de/transformer-hmm']
  3. Hostname: cluster-cn-216
  4. TensorFlow: 1.9.0 (v1.9.0-0-g25c197e023) (<site-package> in /u/makarov/.local/lib/python3.5/site-packages/tensorflow)
  5. Error while getting SGE num_proc: FileNotFoundError(2, "No such file or directory: 'qstat'")
  6. Setup TF inter and intra global thread pools, num_threads None, session opts {'device_count': {'GPU': 0}, 'log_device_placement': False}.
  7. CUDA_VISIBLE_DEVICES is set to '1'.
  8. Local devices available to TensorFlow:
  9. 1/2: name: "/device:CPU:0"
  10. device_type: "CPU"
  11. memory_limit: 268435456
  12. locality {
  13. }
  14. incarnation: 3957494146875968502
  15. 2/2: name: "/device:GPU:0"
  16. device_type: "GPU"
  17. memory_limit: 10915220685
  18. locality {
  19. bus_id: 3
  20. numa_node: 2
  21. links {
  22. }
  23. }
  24. incarnation: 12079591820331675797
  25. physical_device_desc: "device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:41:00.0, compute capability: 6.1"
  26. Using gpu device 1: GeForce GTX 1080 Ti
  27. <TranslationDataset 'dev' epoch=1>: waiting for data length info...
  28. <TranslationDataset 'train' epoch=1>: waiting for data length info...
  29. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (176536 loaded so far)...
  30. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (337293 loaded so far)...
  31. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (499005 loaded so far)...
  32. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (652725 loaded so far)...
  33. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (813324 loaded so far)...
  34. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (974394 loaded so far)...
  35. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1128281 loaded so far)...
  36. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1289733 loaded so far)...
  37. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1450179 loaded so far)...
  38. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1611445 loaded so far)...
  39. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1765489 loaded so far)...
  40. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (1919088 loaded so far)...
  41. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2072864 loaded so far)...
  42. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2233664 loaded so far)...
  43. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2387009 loaded so far)...
  44. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2548141 loaded so far)...
  45. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2709416 loaded so far)...
  46. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (2863560 loaded so far)...
  47. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3017322 loaded so far)...
  48. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3162975 loaded so far)...
  49. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3324726 loaded so far)...
  50. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3478511 loaded so far)...
  51. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3631523 loaded so far)...
  52. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3792573 loaded so far)...
  53. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (3946154 loaded so far)...
  54. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4107273 loaded so far)...
  55. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4260136 loaded so far)...
  56. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4413473 loaded so far)...
  57. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4574533 loaded so far)...
  58. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4727686 loaded so far)...
  59. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (4881106 loaded so far)...
  60. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5034970 loaded so far)...
  61. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5195904 loaded so far)...
  62. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5349291 loaded so far)...
  63. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5502423 loaded so far)...
  64. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5663958 loaded so far)...
  65. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5817471 loaded so far)...
  66. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (5970659 loaded so far)...
  67. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6123912 loaded so far)...
  68. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6285104 loaded so far)...
  69. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6438822 loaded so far)...
  70. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6599880 loaded so far)...
  71. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6753151 loaded so far)...
  72. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (6914701 loaded so far)...
  73. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7068295 loaded so far)...
  74. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7221632 loaded so far)...
  75. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7375275 loaded so far)...
  76. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7536427 loaded so far)...
  77. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7689729 loaded so far)...
  78. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7842801 loaded so far)...
  79. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (7996648 loaded so far)...
  80. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8157644 loaded so far)...
  81. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8310471 loaded so far)...
  82. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8464375 loaded so far)...
  83. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8625442 loaded so far)...
  84. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8779388 loaded so far)...
  85. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (8933284 loaded so far)...
  86. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9086438 loaded so far)...
  87. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9239330 loaded so far)...
  88. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9400120 loaded so far)...
  89. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9553923 loaded so far)...
  90. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9707107 loaded so far)...
  91. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (9860411 loaded so far)...
  92. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10022067 loaded so far)...
  93. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10175145 loaded so far)...
  94. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10328003 loaded so far)...
  95. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10481148 loaded so far)...
  96. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10634347 loaded so far)...
  97. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10788315 loaded so far)...
  98. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (10941864 loaded so far)...
  99. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11095517 loaded so far)...
  100. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11249202 loaded so far)...
  101. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11402595 loaded so far)...
  102. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11563281 loaded so far)...
  103. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11716833 loaded so far)...
  104. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (11870435 loaded so far)...
  105. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12023539 loaded so far)...
  106. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12185177 loaded so far)...
  107. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12338570 loaded so far)...
  108. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12492296 loaded so far)...
  109. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12653028 loaded so far)...
  110. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12806337 loaded so far)...
  111. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (12960467 loaded so far)...
  112. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13113820 loaded so far)...
  113. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13274577 loaded so far)...
  114. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13428222 loaded so far)...
  115. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13581922 loaded so far)...
  116. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13742852 loaded so far)...
  117. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (13889085 loaded so far)...
  118. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14042850 loaded so far)...
  119. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14203231 loaded so far)...
  120. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14357482 loaded so far)...
  121. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14510676 loaded so far)...
  122. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14671414 loaded so far)...
  123. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14824201 loaded so far)...
  124. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (14978019 loaded so far)...
  125. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15138657 loaded so far)...
  126. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15291788 loaded so far)...
  127. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15444741 loaded so far)...
  128. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15606231 loaded so far)...
  129. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15759000 loaded so far)...
  130. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (15904743 loaded so far)...
  131. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16059153 loaded so far)...
  132. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16212601 loaded so far)...
  133. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16365960 loaded so far)...
  134. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16511978 loaded so far)...
  135. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16665664 loaded so far)...
  136. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16826382 loaded so far)...
  137. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (16979543 loaded so far)...
  138. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17132848 loaded so far)...
  139. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17286992 loaded so far)...
  140. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17448667 loaded so far)...
  141. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17601847 loaded so far)...
  142. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17755599 loaded so far)...
  143. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (17908877 loaded so far)...
  144. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18070305 loaded so far)...
  145. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18223233 loaded so far)...
  146. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18376359 loaded so far)...
  147. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18530181 loaded so far)...
  148. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18690986 loaded so far)...
  149. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18844197 loaded so far)...
  150. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (18997199 loaded so far)...
  151. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19142611 loaded so far)...
  152. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19303843 loaded so far)...
  153. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19456774 loaded so far)...
  154. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19610061 loaded so far)...
  155. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19764176 loaded so far)...
  156. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (19925307 loaded so far)...
  157. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20078997 loaded so far)...
  158. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20232105 loaded so far)...
  159. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20385414 loaded so far)...
  160. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20538785 loaded so far)...
  161. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20692069 loaded so far)...
  162. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20845738 loaded so far)...
  163. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (20991753 loaded so far)...
  164. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (21152861 loaded so far)...
  165. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (21306787 loaded so far)...
  166. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (21460216 loaded so far)...
  167. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (21613731 loaded so far)...
  168. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 21826104 (21767737 loaded so far)...
  169. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (21836948 loaded so far)...
  170. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (21990810 loaded so far)...
  171. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22144763 loaded so far)...
  172. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22297950 loaded so far)...
  173. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22451182 loaded so far)...
  174. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22603913 loaded so far)...
  175. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22757257 loaded so far)...
  176. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (22910035 loaded so far)...
  177. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (23063757 loaded so far)...
  178. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23232217 (23217089 loaded so far)...
  179. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23526579 (23232399 loaded so far)...
  180. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 23526579 (23385998 loaded so far)...
  181. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (23539819 loaded so far)...
  182. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (23693276 loaded so far)...
  183. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (23847021 loaded so far)...
  184. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (23992433 loaded so far)...
  185. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24146396 loaded so far)...
  186. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24292528 loaded so far)...
  187. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24445776 loaded so far)...
  188. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24599544 loaded so far)...
  189. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24745486 loaded so far)...
  190. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (24891284 loaded so far)...
  191. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25044506 loaded so far)...
  192. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25205256 loaded so far)...
  193. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25358793 loaded so far)...
  194. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25512318 loaded so far)...
  195. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25665392 loaded so far)...
  196. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25819256 loaded so far)...
  197. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (25973004 loaded so far)...
  198. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26126656 loaded so far)...
  199. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26280647 loaded so far)...
  200. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26441910 loaded so far)...
  201. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26595164 loaded so far)...
  202. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26748940 loaded so far)...
  203. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (26902341 loaded so far)...
  204. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27055745 loaded so far)...
  205. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27209105 loaded so far)...
  206. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27369780 loaded so far)...
  207. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27523031 loaded so far)...
  208. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27676658 loaded so far)...
  209. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27830454 loaded so far)...
  210. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (27983927 loaded so far)...
  211. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28137189 loaded so far)...
  212. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28290428 loaded so far)...
  213. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28451923 loaded so far)...
  214. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28597938 loaded so far)...
  215. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28758610 loaded so far)...
  216. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (28973760 loaded so far)...
  217. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (29288171 loaded so far)...
  218. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (29618233 loaded so far)...
  219. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (29941434 loaded so far)...
  220. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (30256213 loaded so far)...
  221. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 30815203 (30579011 loaded so far)...
  222. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 31533234 (30832360 loaded so far)...
  223. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 31533234 (31147358 loaded so far)...
  224. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 31533234 (31462529 loaded so far)...
  225. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32692165 (31554159 loaded so far)...
  226. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32692165 (31891258 loaded so far)...
  227. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32692165 (32190816 loaded so far)...
  228. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32692165 (32513343 loaded so far)...
  229. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32760483 (32698103 loaded so far)...
  230. <TranslationDataset 'train' epoch=1>: waiting for 'data', line 32795599 (32789983 loaded so far)...
  231. Train data:
  232. input: 46300 x 1
  233. output: {'data': [46300, 1], 'classes': [34908, 1]}
  234. TranslationDataset, sequences: 327993, frames: unknown
  235. Dev data:
  236. TranslationDataset, sequences: 2169, frames: unknown
  237. Learning-rate-control: file hmm-factorization/logs/en-de/transformer-hmm/newbob.data does not exist yet
  238. Setup tf.Session with options {'device_count': {'GPU': 1}, 'log_device_placement': False} ...
  239. Rec layer sub net:
  240. Input layers moved out of loop: (#: 0)
  241. None
  242. Output layers moved out of loop: (#: 142)
  243. output_prob
  244. decoder_int
  245. decoder
  246. dec_06
  247. dec_06_ff_out
  248. dec_06_ff_drop
  249. dec_06_ff_conv2
  250. dec_06_ff_conv1
  251. dec_06_ff_laynorm
  252. dec_06_att_out
  253. dec_06_att_drop
  254. dec_06_att_lin
  255. dec_06_att_att
  256. dec_06_att0
  257. dec_06_att_weights_drop
  258. dec_06_att_weights
  259. dec_06_att_energy
  260. dec_06_att_query
  261. dec_06_att_query0
  262. dec_06_att_laynorm
  263. dec_06_self_att_out
  264. dec_06_self_att_drop
  265. dec_06_self_att_lin
  266. dec_06_self_att_att
  267. dec_06_self_att_laynorm
  268. dec_05
  269. dec_05_ff_out
  270. dec_05_ff_drop
  271. dec_05_ff_conv2
  272. dec_05_ff_conv1
  273. dec_05_ff_laynorm
  274. dec_05_att_out
  275. dec_05_att_drop
  276. dec_05_att_lin
  277. dec_05_att_att
  278. dec_05_att0
  279. dec_05_att_weights_drop
  280. dec_05_att_weights
  281. dec_05_att_energy
  282. dec_05_att_query
  283. dec_05_att_query0
  284. dec_05_att_laynorm
  285. dec_05_self_att_out
  286. dec_05_self_att_drop
  287. dec_05_self_att_lin
  288. dec_05_self_att_att
  289. dec_05_self_att_laynorm
  290. dec_04
  291. dec_04_ff_out
  292. dec_04_ff_drop
  293. dec_04_ff_conv2
  294. dec_04_ff_conv1
  295. dec_04_ff_laynorm
  296. dec_04_att_out
  297. dec_04_att_drop
  298. dec_04_att_lin
  299. dec_04_att_att
  300. dec_04_att0
  301. dec_04_att_weights_drop
  302. dec_04_att_weights
  303. dec_04_att_energy
  304. dec_04_att_query
  305. dec_04_att_query0
  306. dec_04_att_laynorm
  307. dec_04_self_att_out
  308. dec_04_self_att_drop
  309. dec_04_self_att_lin
  310. dec_04_self_att_att
  311. dec_04_self_att_laynorm
  312. dec_03
  313. dec_03_ff_out
  314. dec_03_ff_drop
  315. dec_03_ff_conv2
  316. dec_03_ff_conv1
  317. dec_03_ff_laynorm
  318. dec_03_att_out
  319. dec_03_att_drop
  320. dec_03_att_lin
  321. dec_03_att_att
  322. dec_03_att0
  323. dec_03_att_weights_drop
  324. dec_03_att_weights
  325. dec_03_att_energy
  326. dec_03_att_query
  327. dec_03_att_query0
  328. dec_03_att_laynorm
  329. dec_03_self_att_out
  330. dec_03_self_att_drop
  331. dec_03_self_att_lin
  332. dec_03_self_att_att
  333. dec_03_self_att_laynorm
  334. dec_02
  335. dec_02_ff_out
  336. dec_02_ff_drop
  337. dec_02_ff_conv2
  338. dec_02_ff_conv1
  339. dec_02_ff_laynorm
  340. dec_02_att_out
  341. dec_02_att_drop
  342. dec_02_att_lin
  343. dec_02_att_att
  344. dec_02_att0
  345. dec_02_att_weights_drop
  346. dec_02_att_weights
  347. dec_02_att_energy
  348. dec_02_att_query
  349. dec_02_att_query0
  350. dec_02_att_laynorm
  351. dec_02_self_att_out
  352. dec_02_self_att_drop
  353. dec_02_self_att_lin
  354. dec_02_self_att_att
  355. dec_02_self_att_laynorm
  356. dec_01
  357. dec_01_ff_out
  358. dec_01_ff_drop
  359. dec_01_ff_conv2
  360. dec_01_ff_conv1
  361. dec_01_ff_laynorm
  362. dec_01_att_out
  363. dec_01_att_drop
  364. dec_01_att_lin
  365. dec_01_att_att
  366. dec_01_att0
  367. dec_01_att_weights_drop
  368. dec_01_att_weights
  369. dec_01_att_energy
  370. dec_01_att_query
  371. dec_01_att_query0
  372. dec_01_att_laynorm
  373. dec_01_self_att_out
  374. dec_01_self_att_drop
  375. dec_01_self_att_lin
  376. dec_01_self_att_att
  377. dec_01_self_att_laynorm
  378. target_embed
  379. target_embed_with_pos
  380. target_embed_weighted
  381. encoder_int
  382. prev_outputs_int
  383. target_embed_raw
  384. output
  385. Layers in loop: (#: 0)
  386. None
  387. Unused layers: (#: 1)
  388. end
  389. Warning: using numerical unstable sparse Cross-Entropy loss calculation
  390. Network layer topology:
  391. extern data: data: Data(shape=(None,), dtype='int32', sparse=True, dim=46300), classes: Data(shape=(None,), dtype='int32', sparse=True, dim=34908, available_for_inference=False)
  392. used data keys: ['classes', 'data']
  393. layer source 'data' #: 46300
  394. layer split_dims 'dec_01_att_key' #: 64
  395. layer linear 'dec_01_att_key0' #: 512
  396. layer split_dims 'dec_01_att_value' #: 64
  397. layer linear 'dec_01_att_value0' #: 512
  398. layer split_dims 'dec_02_att_key' #: 64
  399. layer linear 'dec_02_att_key0' #: 512
  400. layer split_dims 'dec_02_att_value' #: 64
  401. layer linear 'dec_02_att_value0' #: 512
  402. layer split_dims 'dec_03_att_key' #: 64
  403. layer linear 'dec_03_att_key0' #: 512
  404. layer split_dims 'dec_03_att_value' #: 64
  405. layer linear 'dec_03_att_value0' #: 512
  406. layer split_dims 'dec_04_att_key' #: 64
  407. layer linear 'dec_04_att_key0' #: 512
  408. layer split_dims 'dec_04_att_value' #: 64
  409. layer linear 'dec_04_att_value0' #: 512
  410. layer split_dims 'dec_05_att_key' #: 64
  411. layer linear 'dec_05_att_key0' #: 512
  412. layer split_dims 'dec_05_att_value' #: 64
  413. layer linear 'dec_05_att_value0' #: 512
  414. layer split_dims 'dec_06_att_key' #: 64
  415. layer linear 'dec_06_att_key0' #: 512
  416. layer split_dims 'dec_06_att_value' #: 64
  417. layer linear 'dec_06_att_value0' #: 512
  418. layer decide 'decision' #: 34908
  419. layer copy 'enc_01' #: 512
  420. layer linear 'enc_01_ff_conv1' #: 2048
  421. layer linear 'enc_01_ff_conv2' #: 512
  422. layer dropout 'enc_01_ff_drop' #: 512
  423. layer layer_norm 'enc_01_ff_laynorm' #: 512
  424. layer combine 'enc_01_ff_out' #: 512
  425. layer self_attention 'enc_01_self_att_att' #: 512
  426. layer dropout 'enc_01_self_att_drop' #: 512
  427. layer layer_norm 'enc_01_self_att_laynorm' #: 512
  428. layer linear 'enc_01_self_att_lin' #: 512
  429. layer combine 'enc_01_self_att_out' #: 512
  430. layer copy 'enc_02' #: 512
  431. layer linear 'enc_02_ff_conv1' #: 2048
  432. layer linear 'enc_02_ff_conv2' #: 512
  433. layer dropout 'enc_02_ff_drop' #: 512
  434. layer layer_norm 'enc_02_ff_laynorm' #: 512
  435. layer combine 'enc_02_ff_out' #: 512
  436. layer self_attention 'enc_02_self_att_att' #: 512
  437. layer dropout 'enc_02_self_att_drop' #: 512
  438. layer layer_norm 'enc_02_self_att_laynorm' #: 512
  439. layer linear 'enc_02_self_att_lin' #: 512
  440. layer combine 'enc_02_self_att_out' #: 512
  441. layer copy 'enc_03' #: 512
  442. layer linear 'enc_03_ff_conv1' #: 2048
  443. layer linear 'enc_03_ff_conv2' #: 512
  444. layer dropout 'enc_03_ff_drop' #: 512
  445. layer layer_norm 'enc_03_ff_laynorm' #: 512
  446. layer combine 'enc_03_ff_out' #: 512
  447. layer self_attention 'enc_03_self_att_att' #: 512
  448. layer dropout 'enc_03_self_att_drop' #: 512
  449. layer layer_norm 'enc_03_self_att_laynorm' #: 512
  450. layer linear 'enc_03_self_att_lin' #: 512
  451. layer combine 'enc_03_self_att_out' #: 512
  452. layer copy 'enc_04' #: 512
  453. layer linear 'enc_04_ff_conv1' #: 2048
  454. layer linear 'enc_04_ff_conv2' #: 512
  455. layer dropout 'enc_04_ff_drop' #: 512
  456. layer layer_norm 'enc_04_ff_laynorm' #: 512
  457. layer combine 'enc_04_ff_out' #: 512
  458. layer self_attention 'enc_04_self_att_att' #: 512
  459. layer dropout 'enc_04_self_att_drop' #: 512
  460. layer layer_norm 'enc_04_self_att_laynorm' #: 512
  461. layer linear 'enc_04_self_att_lin' #: 512
  462. layer combine 'enc_04_self_att_out' #: 512
  463. layer copy 'enc_05' #: 512
  464. layer linear 'enc_05_ff_conv1' #: 2048
  465. layer linear 'enc_05_ff_conv2' #: 512
  466. layer dropout 'enc_05_ff_drop' #: 512
  467. layer layer_norm 'enc_05_ff_laynorm' #: 512
  468. layer combine 'enc_05_ff_out' #: 512
  469. layer self_attention 'enc_05_self_att_att' #: 512
  470. layer dropout 'enc_05_self_att_drop' #: 512
  471. layer layer_norm 'enc_05_self_att_laynorm' #: 512
  472. layer linear 'enc_05_self_att_lin' #: 512
  473. layer combine 'enc_05_self_att_out' #: 512
  474. layer copy 'enc_06' #: 512
  475. layer linear 'enc_06_ff_conv1' #: 2048
  476. layer linear 'enc_06_ff_conv2' #: 512
  477. layer dropout 'enc_06_ff_drop' #: 512
  478. layer layer_norm 'enc_06_ff_laynorm' #: 512
  479. layer combine 'enc_06_ff_out' #: 512
  480. layer self_attention 'enc_06_self_att_att' #: 512
  481. layer dropout 'enc_06_self_att_drop' #: 512
  482. layer layer_norm 'enc_06_self_att_laynorm' #: 512
  483. layer linear 'enc_06_self_att_lin' #: 512
  484. layer combine 'enc_06_self_att_out' #: 512
  485. layer layer_norm 'encoder' #: 512
  486. layer rec 'output' #: 34908
  487. layer dropout 'source_embed' #: 512
  488. layer linear 'source_embed_raw' #: 512
  489. layer eval 'source_embed_weighted' #: 512
  490. layer positional_encoding 'source_embed_with_pos' #: 512
  491. net params #: 122126176
  492.  
  493. net trainable params: [<tf.Variable 'dec_01_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_01_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_02_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_02_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_03_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_03_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_04_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_04_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_05_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_05_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_06_att_key0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'dec_06_att_value0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_01_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_01_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_01_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_01_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_01_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_01_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_01_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_01_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_01_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_01_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_02_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_02_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_02_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_02_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_02_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_02_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_02_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_02_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_02_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_02_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_03_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_03_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_03_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_03_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_03_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_03_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_03_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_03_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_03_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_03_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_04_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_04_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_04_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_04_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_04_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_04_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_04_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_04_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_04_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_04_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_05_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_05_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_05_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_05_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_05_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_05_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_05_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_05_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_05_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_05_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'enc_06_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'enc_06_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'enc_06_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'enc_06_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_06_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_06_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_06_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'enc_06_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_06_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'enc_06_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'encoder/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'encoder/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_01_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_02_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_03_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_04_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_05_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_att_query0/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_conv1/W:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_conv1/b:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_conv2/W:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_conv2/b:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_ff_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_self_att_att/QKV:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_self_att_laynorm/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_self_att_laynorm/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/dec_06_self_att_lin/W:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'output/rec/decoder/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/decoder/scale:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'output/rec/decoder_int/W:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'output/rec/encoder_int/W:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'output/rec/output_prob/dense/kernel:0' shape=(1000, 34908) dtype=float32_ref>, <tf.Variable 'output/rec/prev_outputs_int/W:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'output/rec/target_embed_raw/W:0' shape=(34908, 512) dtype=float32_ref>, <tf.Variable 'source_embed_raw/W:0' shape=(46300, 512) dtype=float32_ref>]
  494. start training at epoch 1 and step 0
  495. using batch size: 400, max seqs: 50
  496. learning rate control: NewbobMultiEpoch(numEpochs=20, updateInterval=1, relativeErrorThreshold=-0.005, learningRateDecayFactor=0.9, learningRateGrowthFactor=1.0), epoch data: , error key: None
  497. pretrain: None
  498. start epoch 1 with learning rate 0.0003 ...
  499. TF: log_dir: "net-model/network/en-de/train-2018-10-03-03-22-57
  500. Create optimizer <class 'tensorflow.python.training.adam.AdamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32_ref>, 'beta1': 0.9, 'beta2': 0.999}.
  501. Initialize optimizer with slots ['m', 'v'].
  502. These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/dec_01_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_01_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_02_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_02_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_03_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_03_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_04_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_04_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_05_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_05_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_06_att_key0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/dec_06_att_value0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_01_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_02_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_03_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_04_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_05_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/enc_06_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/encoder/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/encoder/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_01_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_02_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_03_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_04_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_05_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_att_query0/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_conv1/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_conv1/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_conv2/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_conv2/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_ff_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_self_att_att/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_self_att_laynorm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_self_att_laynorm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/dec_06_self_att_lin/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/decoder/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/decoder/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/decoder_int/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/encoder_int/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/output_prob/dense/Tensordot/transpose_1_grad/transpose_accum_grad/var_accum_grad:0' shape=(1000, 34908) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/prev_outputs_int/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1000) dtype=float32_ref>, <tf.Variable 'optimize/gradients/output/rec/target_embed_raw/linear/embedding_lookup_grad/Reshape_accum_grad/var_accum_grad:0' shape=(34908, 512) dtype=float32_ref>, <tf.Variable 'optimize/gradients/source_embed_raw/linear/embedding_lookup_grad/Reshape_accum_grad/var_accum_grad:0' shape=(46300, 512) dtype=float32_ref>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32_ref>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32_ref>].
  503. train epoch 1, step 0, cost:output/output_prob 10.579492196670572, error:decision 0.0, error:output/output_prob 0.9999999441206455, loss 1290.6981, max_size:classes 15, max_size:data 4, mem_usage:GPU:0 2.1GB, num_seqs 26, 8.767 sec/step, elapsed 0:02:14, exp. remaining 51:35:23, complete 0.07%
  504. train epoch 1, step 1, cost:output/output_prob 10.488510898613981, error:decision 0.0, error:output/output_prob 1.0000000251457095, loss 1908.9089, max_size:classes 16, max_size:data 4, mem_usage:GPU:0 3.0GB, num_seqs 25, 0.783 sec/step, elapsed 0:02:23, exp. remaining 51:31:50, complete 0.08%
  505. ...
Advertisement
Add Comment
Please, Sign In to add comment