Guest User

q-learning-dual-taxi-competitive-output

a guest
Jul 9th, 2021
54
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Total encoded states are 6144
  2. ==============================
  3. Running evaluation after 0 episodes
  4. Evaluation results after 200 trials
  5. Average time steps taken: 1500.0
  6. Average number of penalties incurred: 1500.0
  7. Had 0 wins in 200 episodes
  8. ==============================
  9. Current Episode: 78
  10. Reward distribution: Counter({-4: 16236, -12: 14641, -3: 8871, -20: 4909, -11: 2483, -30: 2128, -2: 436, -10: 227, 99: 43, 90: 26})
  11. Last 10 episode lengths (avg: 563.54)
  12. 15546 Q table 1 zeroes, 57.828776041666664 percent filled
  13. 15524 Q table 2 zeroes, 57.888454861111114 percent filled
  14. Current Episode: 207
  15. Reward distribution: Counter({-12: 15905, -4: 14499, -3: 9253, -20: 4763, -11: 2997, -30: 1632, -2: 575, -10: 253, 99: 86, 90: 37})
  16. Last 10 episode lengths (avg: 380.8)
  17. 9698 Q table 1 zeroes, 73.69249131944444 percent filled
  18. 9702 Q table 2 zeroes, 73.681640625 percent filled
  19. Current Episode: 332
  20. Reward distribution: Counter({-12: 16935, -4: 14206, -3: 8991, -20: 4169, -11: 3063, -30: 1496, -2: 764, -10: 255, 99: 96, 90: 25})
  21. Last 10 episode lengths (avg: 382.14)
  22. 8500 Q table 1 zeroes, 76.94227430555556 percent filled
  23. 8505 Q table 2 zeroes, 76.9287109375 percent filled
  24. Current Episode: 463
  25. Reward distribution: Counter({-12: 17773, -4: 14211, -3: 8342, -20: 4075, -11: 3095, -30: 1319, -2: 807, -10: 256, 99: 85, 90: 37})
  26. Last 10 episode lengths (avg: 346.18)
  27. 8243 Q table 1 zeroes, 77.63943142361111 percent filled
  28. 8244 Q table 2 zeroes, 77.63671875 percent filled
  29. Current Episode: 609
  30. Reward distribution: Counter({-12: 17565, -4: 14015, -3: 8414, -20: 4136, -11: 3263, -30: 1414, -2: 783, -10: 269, 99: 111, 90: 30})
  31. Last 10 episode lengths (avg: 297.88)
  32. 8125 Q table 1 zeroes, 77.95952690972221 percent filled
  33. 8126 Q table 2 zeroes, 77.95681423611111 percent filled
  34. Current Episode: 760
  35. Reward distribution: Counter({-12: 17868, -4: 13579, -3: 8422, -20: 4102, -11: 3412, -30: 1398, -2: 779, -10: 292, 99: 97, 90: 51})
  36. Last 10 episode lengths (avg: 294.4)
  37. 8092 Q table 1 zeroes, 78.04904513888889 percent filled
  38. 8092 Q table 2 zeroes, 78.04904513888889 percent filled
  39. Current Episode: 940
  40. Reward distribution: Counter({-12: 18162, -4: 13845, -3: 8019, -20: 4100, -11: 3299, -30: 1319, -2: 775, -10: 304, 99: 137, 90: 40})
  41. Last 10 episode lengths (avg: 301.72)
  42. 8074 Q table 1 zeroes, 78.09787326388889 percent filled
  43. 8074 Q table 2 zeroes, 78.09787326388889 percent filled
  44. ==============================
  45. Running evaluation after 1000 episodes
  46. Evaluation results after 200 trials
  47. Average time steps taken: 1365.62
  48. Average number of penalties incurred: 1313.53
  49. Had 18 wins in 200 episodes
  50. ==============================
  51. Current Episode: 1136
  52. Reward distribution: Counter({-12: 18547, -4: 13871, -3: 7794, -20: 3898, -11: 3327, -30: 1281, -2: 810, -10: 279, 99: 134, 90: 59})
  53. Last 10 episode lengths (avg: 263.08)
  54. 8072 Q table 1 zeroes, 78.10329861111111 percent filled
  55. 8072 Q table 2 zeroes, 78.10329861111111 percent filled
  56. Current Episode: 1356
  57. Reward distribution: Counter({-12: 18757, -4: 13830, -3: 7701, -20: 4038, -11: 3193, -30: 1219, -2: 771, -10: 277, 99: 142, 90: 72})
  58. Last 10 episode lengths (avg: 240.66)
  59. 8070 Q table 1 zeroes, 78.10872395833334 percent filled
  60. 8070 Q table 2 zeroes, 78.10872395833334 percent filled
  61. Current Episode: 1594
  62. Reward distribution: Counter({-12: 18412, -4: 13257, -3: 8170, -20: 3962, -11: 3436, -30: 1419, -2: 806, -10: 304, 99: 158, 90: 76})
  63. Last 10 episode lengths (avg: 212.8)
  64. 8067 Q table 1 zeroes, 78.11686197916666 percent filled
  65. 8067 Q table 2 zeroes, 78.11686197916666 percent filled
  66. Current Episode: 1850
  67. Reward distribution: Counter({-12: 18926, -4: 13619, -3: 7642, -20: 3968, -11: 3217, -30: 1345, -2: 773, -10: 258, 99: 177, 90: 75})
  68. Last 10 episode lengths (avg: 152.66)
  69. 8066 Q table 1 zeroes, 78.11957465277779 percent filled
  70. 8066 Q table 2 zeroes, 78.11957465277779 percent filled
  71. ==============================
  72. Running evaluation after 2000 episodes
  73. Evaluation results after 200 trials
  74. Average time steps taken: 1313.485
  75. Average number of penalties incurred: 1282.55
  76. Had 25 wins in 200 episodes
  77. ==============================
  78. Current Episode: 2118
  79. Reward distribution: Counter({-12: 18243, -4: 13003, -3: 8347, -20: 4021, -11: 3706, -30: 1353, -2: 785, -10: 277, 99: 157, 90: 108})
  80. Last 10 episode lengths (avg: 174.96)
  81. 8065 Q table 1 zeroes, 78.12228732638889 percent filled
  82. 8065 Q table 2 zeroes, 78.12228732638889 percent filled
  83. Current Episode: 2436
  84. Reward distribution: Counter({-12: 18241, -4: 13010, -3: 8532, -20: 3969, -11: 3525, -30: 1292, -2: 780, -10: 334, 99: 194, 90: 123})
  85. Last 10 episode lengths (avg: 140.18)
  86. 8065 Q table 1 zeroes, 78.12228732638889 percent filled
  87. 8065 Q table 2 zeroes, 78.12228732638889 percent filled
  88. Current Episode: 2765
  89. Reward distribution: Counter({-12: 18255, -4: 12707, -3: 8732, -20: 4121, -11: 3482, -30: 1325, -2: 755, -10: 300, 99: 188, 90: 135})
  90. Last 10 episode lengths (avg: 134.7)
  91. 8065 Q table 1 zeroes, 78.12228732638889 percent filled
  92. 8065 Q table 2 zeroes, 78.12228732638889 percent filled
  93. ==============================
  94. Running evaluation after 3000 episodes
  95. Evaluation results after 200 trials
  96. Average time steps taken: 1045.075
  97. Average number of penalties incurred: 1007.03
  98. Had 61 wins in 200 episodes
  99. ==============================
  100. Current Episode: 3109
  101. Reward distribution: Counter({-12: 18105, -4: 12934, -3: 8584, -20: 4067, -11: 3579, -30: 1324, -2: 753, -10: 317, 99: 188, 90: 149})
  102. Last 10 episode lengths (avg: 122.02)
  103. 8064 Q table 1 zeroes, 78.125 percent filled
  104. 8064 Q table 2 zeroes, 78.125 percent filled
  105. Current Episode: 3464
  106. Reward distribution: Counter({-12: 17368, -4: 11512, -3: 10147, -11: 4080, -20: 3978, -30: 1427, -2: 836, -10: 301, 99: 190, 90: 161})
  107. Last 10 episode lengths (avg: 138.88)
  108. 8064 Q table 1 zeroes, 78.125 percent filled
  109. 8064 Q table 2 zeroes, 78.125 percent filled
  110. Current Episode: 3850
  111. Reward distribution: Counter({-12: 17172, -4: 11752, -3: 9906, -20: 4163, -11: 4096, -30: 1481, -2: 767, -10: 290, 99: 198, 90: 175})
  112. Last 10 episode lengths (avg: 130.32)
  113. 8064 Q table 1 zeroes, 78.125 percent filled
  114. 8064 Q table 2 zeroes, 78.125 percent filled
  115. ==============================
  116. Running evaluation after 4000 episodes
  117. Evaluation results after 200 trials
  118. Average time steps taken: 1037.775
  119. Average number of penalties incurred: 1034.65
  120. Had 62 wins in 200 episodes
  121. ==============================
  122. Current Episode: 4277
  123. Reward distribution: Counter({-12: 16909, -4: 11606, -3: 10692, -11: 3995, -20: 3873, -30: 1389, -2: 812, -10: 308, 99: 210, 90: 206})
  124. Last 10 episode lengths (avg: 108.5)
  125. 8064 Q table 1 zeroes, 78.125 percent filled
  126. 8064 Q table 2 zeroes, 78.125 percent filled
  127. Current Episode: 4721
  128. Reward distribution: Counter({-12: 17092, -4: 10975, -3: 10850, -11: 4040, -20: 3989, -30: 1493, -2: 788, -10: 334, 99: 227, 90: 212})
  129. Last 10 episode lengths (avg: 118.46)
  130. 8064 Q table 1 zeroes, 78.125 percent filled
  131. 8064 Q table 2 zeroes, 78.125 percent filled
  132. ==============================
  133. Running evaluation after 5000 episodes
  134. Evaluation results after 200 trials
  135. Average time steps taken: 858.98
  136. Average number of penalties incurred: 856.645
  137. Had 86 wins in 200 episodes
  138. ==============================
  139. Current Episode: 5184
  140. Reward distribution: Counter({-12: 16646, -3: 10993, -4: 10954, -11: 4422, -20: 4025, -30: 1393, -2: 785, -10: 324, 90: 248, 99: 210})
  141. Last 10 episode lengths (avg: 108.6)
  142. 8064 Q table 1 zeroes, 78.125 percent filled
  143. 8064 Q table 2 zeroes, 78.125 percent filled
  144. Current Episode: 5603
  145. Reward distribution: Counter({-12: 16664, -3: 11182, -4: 10970, -11: 4254, -20: 3980, -30: 1371, -2: 879, -10: 287, 90: 234, 99: 179})
  146. Last 10 episode lengths (avg: 102.98)
  147. 8064 Q table 1 zeroes, 78.125 percent filled
  148. 8064 Q table 2 zeroes, 78.125 percent filled
  149. ==============================
  150. Running evaluation after 6000 episodes
  151. Evaluation results after 200 trials
  152. Average time steps taken: 687.67
  153. Average number of penalties incurred: 684.09
  154. Had 109 wins in 200 episodes
  155. ==============================
  156. Current Episode: 6095
  157. Reward distribution: Counter({-12: 16605, -3: 11217, -4: 10687, -11: 4378, -20: 3956, -30: 1492, -2: 865, -10: 319, 90: 277, 99: 204})
  158. Last 10 episode lengths (avg: 96.92)
  159. 8064 Q table 1 zeroes, 78.125 percent filled
  160. 8064 Q table 2 zeroes, 78.125 percent filled
  161. Current Episode: 6551
  162. Reward distribution: Counter({-12: 16164, -3: 11626, -4: 10414, -11: 4784, -20: 3999, -30: 1398, -2: 866, -10: 303, 90: 273, 99: 173})
  163. Last 10 episode lengths (avg: 124.7)
  164. 8064 Q table 1 zeroes, 78.125 percent filled
  165. 8064 Q table 2 zeroes, 78.125 percent filled
  166. ==============================
  167. Running evaluation after 7000 episodes
  168. Evaluation results after 200 trials
  169. Average time steps taken: 606.155
  170. Average number of penalties incurred: 605.04
  171. Had 120 wins in 200 episodes
  172. ==============================
  173. Current Episode: 7070
  174. Reward distribution: Counter({-12: 16430, -3: 11566, -4: 10704, -11: 4205, -20: 3954, -30: 1499, -2: 813, -10: 316, 90: 313, 99: 200})
  175. Last 10 episode lengths (avg: 115.9)
  176. 8064 Q table 1 zeroes, 78.125 percent filled
  177. 8064 Q table 2 zeroes, 78.125 percent filled
  178. Current Episode: 7582
  179. Reward distribution: Counter({-12: 16093, -3: 11523, -4: 10834, -11: 4299, -20: 4033, -30: 1520, -2: 840, -10: 347, 90: 326, 99: 185})
  180. Last 10 episode lengths (avg: 120.1)
  181. 8064 Q table 1 zeroes, 78.125 percent filled
  182. 8064 Q table 2 zeroes, 78.125 percent filled
  183. ==============================
  184. Running evaluation after 8000 episodes
  185. Evaluation results after 200 trials
  186. Average time steps taken: 605.74
  187. Average number of penalties incurred: 599.405
  188. Had 120 wins in 200 episodes
  189. ==============================
  190. Current Episode: 8096
  191. Reward distribution: Counter({-12: 15989, -3: 11714, -4: 10738, -11: 4313, -20: 4014, -30: 1484, -2: 859, -10: 380, 90: 329, 99: 180})
  192. Last 10 episode lengths (avg: 102.98)
  193. 8064 Q table 1 zeroes, 78.125 percent filled
  194. 8064 Q table 2 zeroes, 78.125 percent filled
  195. Current Episode: 8589
  196. Reward distribution: Counter({-12: 15374, -3: 12538, -4: 9959, -11: 4705, -20: 4104, -30: 1605, -2: 873, -10: 360, 90: 301, 99: 181})
  197. Last 10 episode lengths (avg: 86.54)
  198. 8064 Q table 1 zeroes, 78.125 percent filled
  199. 8064 Q table 2 zeroes, 78.125 percent filled
  200. ==============================
  201. Running evaluation after 9000 episodes
  202. Evaluation results after 200 trials
  203. Average time steps taken: 590.95
  204. Average number of penalties incurred: 589.885
  205. Had 122 wins in 200 episodes
  206. ==============================
  207. Current Episode: 9099
  208. Reward distribution: Counter({-12: 15838, -3: 11751, -4: 10855, -11: 4403, -20: 4038, -30: 1476, -2: 811, 90: 330, -10: 328, 99: 170})
  209. Last 10 episode lengths (avg: 86.54)
  210. 8064 Q table 1 zeroes, 78.125 percent filled
  211. 8064 Q table 2 zeroes, 78.125 percent filled
  212. Current Episode: 9650
  213. Reward distribution: Counter({-12: 15274, -3: 12905, -4: 9714, -11: 4702, -20: 4031, -30: 1618, -2: 825, -10: 384, 90: 377, 99: 170})
  214. Last 10 episode lengths (avg: 97.16)
  215. 8064 Q table 1 zeroes, 78.125 percent filled
  216. 8064 Q table 2 zeroes, 78.125 percent filled
  217. ==============================
  218. Running evaluation after 10000 episodes
  219. Evaluation results after 200 trials
  220. Average time steps taken: 561.445
  221. Average number of penalties incurred: 552.865
  222. Had 126 wins in 200 episodes
  223. ==============================
  224. Current Episode: 10237
  225. Reward distribution: Counter({-12: 15030, -3: 12976, -4: 10085, -11: 4492, -20: 3905, -30: 1686, -2: 828, -10: 421, 90: 406, 99: 171})
  226. Last 10 episode lengths (avg: 87.66)
  227. 8064 Q table 1 zeroes, 78.125 percent filled
  228. 8064 Q table 2 zeroes, 78.125 percent filled
  229. Current Episode: 10761
  230. Reward distribution: Counter({-12: 15322, -3: 12727, -4: 9937, -11: 4685, -20: 3964, -30: 1634, -2: 840, -10: 375, 90: 365, 99: 151})
  231. Last 10 episode lengths (avg: 89.9)
  232. 8064 Q table 1 zeroes, 78.125 percent filled
  233. 8064 Q table 2 zeroes, 78.125 percent filled
  234. ==============================
  235. Running evaluation after 11000 episodes
  236. Evaluation results after 200 trials
  237. Average time steps taken: 449.455
  238. Average number of penalties incurred: 448.205
  239. Had 141 wins in 200 episodes
  240. ==============================
  241. Current Episode: 11326
  242. Reward distribution: Counter({-12: 15321, -3: 12575, -4: 10546, -11: 4281, -20: 3848, -30: 1600, -2: 848, -10: 418, 90: 400, 99: 163})
  243. Last 10 episode lengths (avg: 82.1)
  244. 8064 Q table 1 zeroes, 78.125 percent filled
  245. 8064 Q table 2 zeroes, 78.125 percent filled
  246. Current Episode: 11926
  247. Reward distribution: Counter({-12: 15002, -3: 12758, -4: 10269, -11: 4509, -20: 3983, -30: 1619, -2: 834, -10: 430, 90: 416, 99: 180})
  248. Last 10 episode lengths (avg: 82.12)
  249. 8064 Q table 1 zeroes, 78.125 percent filled
  250. 8064 Q table 2 zeroes, 78.125 percent filled
  251. ==============================
  252. Running evaluation after 12000 episodes
  253. Evaluation results after 200 trials
  254. Average time steps taken: 479.44
  255. Average number of penalties incurred: 469.51
  256. Had 132 wins in 200 episodes
  257. ==============================
  258. Current Episode: 12504
  259. Reward distribution: Counter({-12: 15175, -3: 12377, -4: 10901, -11: 4140, -20: 3916, -30: 1716, -2: 800, -10: 410, 90: 392, 99: 173})
  260. Last 10 episode lengths (avg: 92.48)
  261. 8064 Q table 1 zeroes, 78.125 percent filled
  262. 8064 Q table 2 zeroes, 78.125 percent filled
  263. ==============================
  264. Running evaluation after 13000 episodes
  265. Evaluation results after 200 trials
  266. Average time steps taken: 449.515
  267. Average number of penalties incurred: 444.605
  268. Had 141 wins in 200 episodes
  269. ==============================
  270. Current Episode: 13115
  271. Reward distribution: Counter({-12: 14838, -3: 12990, -4: 10241, -11: 4541, -20: 3901, -30: 1599, -2: 845, -10: 442, 90: 432, 99: 171})
  272. Last 10 episode lengths (avg: 77.26)
  273. 8064 Q table 1 zeroes, 78.125 percent filled
  274. 8064 Q table 2 zeroes, 78.125 percent filled
  275. Current Episode: 13685
  276. Reward distribution: Counter({-12: 14920, -3: 12772, -4: 10446, -11: 4439, -20: 3845, -30: 1708, -2: 852, -10: 460, 90: 391, 99: 167})
  277. Last 10 episode lengths (avg: 105.0)
  278. 8064 Q table 1 zeroes, 78.125 percent filled
  279. 8064 Q table 2 zeroes, 78.125 percent filled
  280. ==============================
  281. Running evaluation after 14000 episodes
  282. Evaluation results after 200 trials
  283. Average time steps taken: 337.83
  284. Average number of penalties incurred: 336.535
  285. Had 156 wins in 200 episodes
  286. ==============================
  287. Current Episode: 14311
  288. Reward distribution: Counter({-12: 14896, -3: 12656, -4: 10624, -11: 4248, -20: 3911, -30: 1771, -2: 830, -10: 452, 90: 429, 99: 183})
  289. Last 10 episode lengths (avg: 103.02)
  290. 8064 Q table 1 zeroes, 78.125 percent filled
  291. 8064 Q table 2 zeroes, 78.125 percent filled
  292. Current Episode: 14921
  293. Reward distribution: Counter({-12: 14492, -3: 13014, -4: 10568, -11: 4363, -20: 3873, -30: 1818, -2: 829, -10: 444, 90: 441, 99: 158})
  294. Last 10 episode lengths (avg: 72.2)
  295. 8064 Q table 1 zeroes, 78.125 percent filled
  296. 8064 Q table 2 zeroes, 78.125 percent filled
  297. ==============================
  298. Running evaluation after 15000 episodes
  299. Evaluation results after 200 trials
  300. Average time steps taken: 375.08
  301. Average number of penalties incurred: 373.815
  302. Had 151 wins in 200 episodes
  303. ==============================
  304. Current Episode: 15518
  305. Reward distribution: Counter({-12: 14688, -3: 12882, -4: 10438, -11: 4418, -20: 3999, -30: 1766, -2: 764, -10: 459, 90: 427, 99: 159})
  306. Last 10 episode lengths (avg: 94.86)
  307. 8064 Q table 1 zeroes, 78.125 percent filled
  308. 8064 Q table 2 zeroes, 78.125 percent filled
  309. ==============================
  310. Running evaluation after 16000 episodes
  311. Evaluation results after 200 trials
  312. Average time steps taken: 397.555
  313. Average number of penalties incurred: 396.32
  314. Had 148 wins in 200 episodes
  315. ==============================
  316. Current Episode: 16157
  317. Reward distribution: Counter({-12: 14952, -3: 12617, -4: 10620, -11: 4302, -20: 3928, -30: 1703, -2: 824, 90: 467, -10: 423, 99: 164})
  318. Last 10 episode lengths (avg: 72.12)
  319. 8064 Q table 1 zeroes, 78.125 percent filled
  320. 8064 Q table 2 zeroes, 78.125 percent filled
  321. Current Episode: 16765
  322. Reward distribution: Counter({-12: 14656, -3: 12951, -4: 10247, -11: 4564, -20: 4041, -30: 1713, -2: 799, -10: 435, 90: 429, 99: 165})
  323. Last 10 episode lengths (avg: 92.56)
  324. 8064 Q table 1 zeroes, 78.125 percent filled
  325. 8064 Q table 2 zeroes, 78.125 percent filled
  326. ==============================
  327. Running evaluation after 17000 episodes
  328. Evaluation results after 200 trials
  329. Average time steps taken: 405.055
  330. Average number of penalties incurred: 403.8
  331. Had 147 wins in 200 episodes
  332. ==============================
  333. Current Episode: 17360
  334. Reward distribution: Counter({-12: 14657, -3: 12980, -4: 10351, -11: 4446, -20: 3969, -30: 1725, -2: 810, -10: 477, 90: 415, 99: 170})
  335. Last 10 episode lengths (avg: 80.68)
  336. 8064 Q table 1 zeroes, 78.125 percent filled
  337. 8064 Q table 2 zeroes, 78.125 percent filled
  338. Current Episode: 17957
  339. Reward distribution: Counter({-12: 14534, -3: 13057, -4: 10323, -11: 4444, -20: 4045, -30: 1723, -2: 821, -10: 465, 90: 424, 99: 164})
  340. Last 10 episode lengths (avg: 81.18)
  341. 8064 Q table 1 zeroes, 78.125 percent filled
  342. 8064 Q table 2 zeroes, 78.125 percent filled
  343. ==============================
  344. Running evaluation after 18000 episodes
  345. Evaluation results after 200 trials
  346. Average time steps taken: 419.765
  347. Average number of penalties incurred: 410.945
  348. Had 145 wins in 200 episodes
  349. ==============================
  350. Current Episode: 18567
  351. Reward distribution: Counter({-12: 14434, -3: 12967, -4: 10585, -11: 4482, -20: 3955, -30: 1658, -2: 826, -10: 493, 90: 442, 99: 158})
  352. Last 10 episode lengths (avg: 82.86)
  353. 8064 Q table 1 zeroes, 78.125 percent filled
  354. 8064 Q table 2 zeroes, 78.125 percent filled
  355. ==============================
  356. Running evaluation after 19000 episodes
  357. Evaluation results after 200 trials
  358. Average time steps taken: 382.335
  359. Average number of penalties incurred: 373.56
  360. Had 150 wins in 200 episodes
  361. ==============================
  362. Current Episode: 19198
  363. Reward distribution: Counter({-12: 14177, -3: 13460, -4: 10308, -11: 4483, -20: 3825, -30: 1808, -2: 841, -10: 476, 90: 465, 99: 157})
  364. Last 10 episode lengths (avg: 83.92)
  365. 8064 Q table 1 zeroes, 78.125 percent filled
  366. 8064 Q table 2 zeroes, 78.125 percent filled
  367. Current Episode: 19813
  368. Reward distribution: Counter({-12: 14277, -3: 13247, -4: 10433, -11: 4489, -20: 4004, -30: 1658, -2: 817, -10: 476, 90: 443, 99: 156})
  369. Last 10 episode lengths (avg: 79.62)
  370. 8064 Q table 1 zeroes, 78.125 percent filled
  371. 8064 Q table 2 zeroes, 78.125 percent filled
  372. ==============================
  373. Running evaluation after 20000 episodes
  374. Evaluation results after 200 trials
  375. Average time steps taken: 487.09
  376. Average number of penalties incurred: 480.86
  377. Had 136 wins in 200 episodes
  378. ==============================
  379. Current Episode: 20453
  380. Reward distribution: Counter({-12: 14231, -3: 13034, -4: 10585, -11: 4419, -20: 4004, -30: 1812, -2: 739, -10: 542, 90: 466, 99: 168})
  381. Last 10 episode lengths (avg: 100.88)
  382. 8064 Q table 1 zeroes, 78.125 percent filled
  383. 8064 Q table 2 zeroes, 78.125 percent filled
  384. ==============================
  385. Running evaluation after 21000 episodes
  386. Evaluation results after 200 trials
  387. Average time steps taken: 338.18
  388. Average number of penalties incurred: 335.115
  389. Had 156 wins in 200 episodes
  390. ==============================
  391. Current Episode: 21073
  392. Reward distribution: Counter({-12: 14364, -3: 12954, -4: 10742, -11: 4379, -20: 3957, -30: 1715, -2: 753, -10: 533, 90: 435, 99: 168})
  393. Last 10 episode lengths (avg: 79.54)
  394. 8064 Q table 1 zeroes, 78.125 percent filled
  395. 8064 Q table 2 zeroes, 78.125 percent filled
  396. Current Episode: 21670
  397. Reward distribution: Counter({-12: 14366, -3: 13117, -4: 10635, -11: 4511, -20: 3910, -30: 1632, -2: 744, -10: 496, 90: 417, 99: 172})
  398. Last 10 episode lengths (avg: 81.02)
  399. 8064 Q table 1 zeroes, 78.125 percent filled
  400. 8064 Q table 2 zeroes, 78.125 percent filled
  401. ==============================
  402. Running evaluation after 22000 episodes
  403. Evaluation results after 200 trials
  404. Average time steps taken: 338.115
  405. Average number of penalties incurred: 333.195
  406. Had 156 wins in 200 episodes
  407. ==============================
  408. Current Episode: 22284
  409. Reward distribution: Counter({-12: 14302, -3: 12926, -4: 10894, -11: 4343, -20: 3973, -30: 1674, -2: 759, -10: 530, 90: 435, 99: 164})
  410. Last 10 episode lengths (avg: 78.04)
  411. 8064 Q table 1 zeroes, 78.125 percent filled
  412. 8064 Q table 2 zeroes, 78.125 percent filled
  413. Current Episode: 22890
  414. Reward distribution: Counter({-12: 14029, -3: 13648, -4: 10201, -11: 4537, -20: 3935, -30: 1837, -2: 707, -10: 524, 90: 428, 99: 154})
  415. Last 10 episode lengths (avg: 94.58)
  416. 8064 Q table 1 zeroes, 78.125 percent filled
  417. 8064 Q table 2 zeroes, 78.125 percent filled
  418. ==============================
  419. Running evaluation after 23000 episodes
  420. Evaluation results after 200 trials
  421. Average time steps taken: 271.365
  422. Average number of penalties incurred: 270.245
  423. Had 165 wins in 200 episodes
  424. ==============================
  425. Current Episode: 23485
  426. Reward distribution: Counter({-12: 14376, -3: 12795, -4: 10820, -11: 4420, -20: 3964, -30: 1745, -2: 762, -10: 531, 90: 426, 99: 161})
  427. Last 10 episode lengths (avg: 79.12)
  428. 8064 Q table 1 zeroes, 78.125 percent filled
  429. 8064 Q table 2 zeroes, 78.125 percent filled
  430. ==============================
  431. Running evaluation after 24000 episodes
  432. Evaluation results after 200 trials
  433. Average time steps taken: 360.49
  434. Average number of penalties incurred: 359.375
  435. Had 153 wins in 200 episodes
  436. ==============================
  437. Current Episode: 24114
  438. Reward distribution: Counter({-12: 14171, -3: 13113, -4: 10677, -11: 4356, -20: 3904, -30: 1875, -2: 712, -10: 570, 90: 453, 99: 169})
  439. Last 10 episode lengths (avg: 78.52)
  440. 8064 Q table 1 zeroes, 78.125 percent filled
  441. 8064 Q table 2 zeroes, 78.125 percent filled
  442. Current Episode: 24752
  443. Reward distribution: Counter({-12: 14101, -3: 13567, -4: 10279, -11: 4424, -20: 3956, -30: 1767, -2: 689, -10: 586, 90: 458, 99: 173})
  444. Last 10 episode lengths (avg: 78.84)
  445. 8064 Q table 1 zeroes, 78.125 percent filled
  446. 8064 Q table 2 zeroes, 78.125 percent filled
  447. ==============================
  448. Running evaluation after 25000 episodes
  449. Evaluation results after 200 trials
  450. Average time steps taken: 323.16
  451. Average number of penalties incurred: 321.95
  452. Had 158 wins in 200 episodes
  453. ==============================
  454. Current Episode: 25392
  455. Reward distribution: Counter({-12: 14004, -3: 13430, -4: 10356, -11: 4443, -20: 3963, -30: 1876, -2: 718, -10: 578, 90: 476, 99: 156})
  456. Last 10 episode lengths (avg: 76.04)
  457. 8064 Q table 1 zeroes, 78.125 percent filled
  458. 8064 Q table 2 zeroes, 78.125 percent filled
  459. ==============================
  460. Running evaluation after 26000 episodes
  461. Evaluation results after 200 trials
  462. Average time steps taken: 367.72
  463. Average number of penalties incurred: 366.665
  464. Had 152 wins in 200 episodes
  465. ==============================
  466. Current Episode: 26005
  467. Reward distribution: Counter({-12: 14208, -3: 13188, -4: 10442, -11: 4536, -20: 4065, -30: 1699, -2: 697, -10: 561, 90: 431, 99: 173})
  468. Last 10 episode lengths (avg: 77.06)
  469. 8064 Q table 1 zeroes, 78.125 percent filled
  470. 8064 Q table 2 zeroes, 78.125 percent filled
  471. Current Episode: 26646
  472. Reward distribution: Counter({-12: 14136, -3: 13345, -4: 10527, -11: 4463, -20: 3893, -30: 1724, -2: 693, -10: 588, 90: 466, 99: 165})
  473. Last 10 episode lengths (avg: 73.36)
  474. 8064 Q table 1 zeroes, 78.125 percent filled
  475. 8064 Q table 2 zeroes, 78.125 percent filled
  476. ==============================
  477. Running evaluation after 27000 episodes
  478. Evaluation results after 200 trials
  479. Average time steps taken: 270.955
  480. Average number of penalties incurred: 269.895
  481. Had 165 wins in 200 episodes
  482. ==============================
  483. Current Episode: 27275
  484. Reward distribution: Counter({-12: 13800, -3: 13695, -4: 10263, -11: 4588, -20: 3951, -30: 1781, -2: 679, -10: 623, 90: 433, 99: 187})
  485. Last 10 episode lengths (avg: 75.2)
  486. 8064 Q table 1 zeroes, 78.125 percent filled
  487. 8064 Q table 2 zeroes, 78.125 percent filled
  488. Current Episode: 27897
  489. Reward distribution: Counter({-12: 14292, -3: 12911, -4: 11016, -11: 4282, -20: 3938, -30: 1686, -2: 669, -10: 594, 90: 443, 99: 169})
  490. Last 10 episode lengths (avg: 72.1)
  491. 8064 Q table 1 zeroes, 78.125 percent filled
  492. 8064 Q table 2 zeroes, 78.125 percent filled
  493. ==============================
  494. Running evaluation after 28000 episodes
  495. Evaluation results after 200 trials
  496. Average time steps taken: 315.875
  497. Average number of penalties incurred: 314.665
  498. Had 159 wins in 200 episodes
  499. ==============================
  500. Current Episode: 28524
  501. Reward distribution: Counter({-12: 14040, -3: 13440, -4: 10463, -11: 4498, -20: 3944, -30: 1725, -2: 663, -10: 609, 90: 451, 99: 167})
  502. Last 10 episode lengths (avg: 95.02)
  503. 8064 Q table 1 zeroes, 78.125 percent filled
  504. 8064 Q table 2 zeroes, 78.125 percent filled
  505. ==============================
  506. Running evaluation after 29000 episodes
  507. Evaluation results after 200 trials
  508. Average time steps taken: 300.76
  509. Average number of penalties incurred: 299.635
  510. Had 161 wins in 200 episodes
  511. ==============================
  512. Current Episode: 29175
  513. Reward distribution: Counter({-12: 14012, -3: 13396, -4: 10536, -11: 4403, -20: 4026, -30: 1663, -2: 689, -10: 631, 90: 455, 99: 189})
  514. Last 10 episode lengths (avg: 86.1)
  515. 8064 Q table 1 zeroes, 78.125 percent filled
  516. 8064 Q table 2 zeroes, 78.125 percent filled
  517. Current Episode: 29794
  518. Reward distribution: Counter({-12: 13781, -3: 13420, -4: 10579, -11: 4571, -20: 4022, -30: 1711, -2: 675, -10: 631, 90: 446, 99: 164})
  519. Last 10 episode lengths (avg: 91.46)
  520. 8064 Q table 1 zeroes, 78.125 percent filled
  521. 8064 Q table 2 zeroes, 78.125 percent filled
  522. ==============================
  523. Running evaluation after 30000 episodes
  524. Evaluation results after 200 trials
  525. Average time steps taken: 367.66
  526. Average number of penalties incurred: 364.195
  527. Had 137 wins in 200 episodes
  528. ==============================
  529. Current Episode: 30416
  530. Reward distribution: Counter({-12: 14304, -3: 12844, -4: 11222, -11: 4135, -20: 3933, -30: 1733, -2: 640, -10: 580, 90: 430, 99: 179})
  531. Last 10 episode lengths (avg: 92.78)
  532. 8064 Q table 1 zeroes, 78.125 percent filled
  533. 8064 Q table 2 zeroes, 78.125 percent filled
  534. ==============================
  535. Running evaluation after 31000 episodes
  536. Evaluation results after 200 trials
  537. Average time steps taken: 382.64
  538. Average number of penalties incurred: 381.68
  539. Had 150 wins in 200 episodes
  540. ==============================
  541. Current Episode: 31030
  542. Reward distribution: Counter({-12: 13675, -3: 13655, -4: 10510, -11: 4471, -20: 3908, -30: 1901, -2: 663, -10: 613, 90: 433, 99: 171})
  543. Last 10 episode lengths (avg: 81.28)
  544. 8064 Q table 1 zeroes, 78.125 percent filled
  545. 8064 Q table 2 zeroes, 78.125 percent filled
  546. Current Episode: 31659
  547. Reward distribution: Counter({-12: 13985, -3: 13261, -4: 10815, -11: 4377, -20: 4043, -30: 1650, -2: 638, -10: 611, 90: 457, 99: 163})
  548. Last 10 episode lengths (avg: 78.22)
  549. 8064 Q table 1 zeroes, 78.125 percent filled
  550. 8064 Q table 2 zeroes, 78.125 percent filled
  551. ==============================
  552. Running evaluation after 32000 episodes
  553. Evaluation results after 200 trials
  554. Average time steps taken: 345.51
  555. Average number of penalties incurred: 344.46
  556. Had 155 wins in 200 episodes
  557. ==============================
  558. Current Episode: 32287
  559. Reward distribution: Counter({-12: 13867, -3: 13586, -4: 10574, -11: 4493, -20: 3922, -30: 1676, -2: 634, -10: 624, 90: 454, 99: 170})
  560. Last 10 episode lengths (avg: 70.28)
  561. 8064 Q table 1 zeroes, 78.125 percent filled
  562. 8064 Q table 2 zeroes, 78.125 percent filled
  563. Current Episode: 32949
  564. Reward distribution: Counter({-12: 13656, -3: 13570, -4: 10695, -11: 4492, -20: 3934, -30: 1733, -2: 642, -10: 626, 90: 458, 99: 194})
  565. Last 10 episode lengths (avg: 89.22)
  566. 8064 Q table 1 zeroes, 78.125 percent filled
  567. 8064 Q table 2 zeroes, 78.125 percent filled
  568. ==============================
  569. Running evaluation after 33000 episodes
  570. Evaluation results after 200 trials
  571. Average time steps taken: 367.965
  572. Average number of penalties incurred: 366.935
  573. Had 152 wins in 200 episodes
  574. ==============================
  575. Current Episode: 33570
  576. Reward distribution: Counter({-12: 13962, -3: 13261, -4: 10797, -11: 4427, -20: 3933, -30: 1735, -10: 656, -2: 617, 90: 444, 99: 168})
  577. Last 10 episode lengths (avg: 86.86)
  578. 8064 Q table 1 zeroes, 78.125 percent filled
  579. 8064 Q table 2 zeroes, 78.125 percent filled
  580. ==============================
  581. Running evaluation after 34000 episodes
  582. Evaluation results after 200 trials
  583. Average time steps taken: 457.435
  584. Average number of penalties incurred: 445.2
  585. Had 140 wins in 200 episodes
  586. ==============================
  587. Current Episode: 34196
  588. Reward distribution: Counter({-12: 13867, -3: 13416, -4: 10810, -11: 4360, -20: 3922, -30: 1696, -2: 664, -10: 652, 90: 444, 99: 169})
  589. Last 10 episode lengths (avg: 74.5)
  590. 8064 Q table 1 zeroes, 78.125 percent filled
  591. 8064 Q table 2 zeroes, 78.125 percent filled
  592. Current Episode: 34792
  593. Reward distribution: Counter({-12: 13922, -3: 13388, -4: 10780, -11: 4411, -20: 3966, -30: 1764, -10: 619, -2: 565, 90: 419, 99: 166})
  594. Last 10 episode lengths (avg: 93.52)
  595. 8064 Q table 1 zeroes, 78.125 percent filled
  596. 8064 Q table 2 zeroes, 78.125 percent filled
  597. ==============================
  598. Running evaluation after 35000 episodes
  599. Evaluation results after 200 trials
  600. Average time steps taken: 360.21
  601. Average number of penalties incurred: 359.15
  602. Had 153 wins in 200 episodes
  603. ==============================
  604. Current Episode: 35414
  605. Reward distribution: Counter({-12: 13892, -3: 13065, -4: 10874, -11: 4337, -20: 4062, -30: 1887, -10: 659, -2: 612, 90: 438, 99: 174})
  606. Last 10 episode lengths (avg: 84.38)
  607. 8064 Q table 1 zeroes, 78.125 percent filled
  608. 8064 Q table 2 zeroes, 78.125 percent filled
  609. ==============================
  610. Running evaluation after 36000 episodes
  611. Evaluation results after 200 trials
  612. Average time steps taken: 367.675
  613. Average number of penalties incurred: 366.735
  614. Had 152 wins in 200 episodes
  615. ==============================
  616. Current Episode: 36051
  617. Reward distribution: Counter({-12: 13800, -3: 13210, -4: 11182, -11: 4126, -20: 3976, -30: 1780, -10: 675, -2: 619, 90: 446, 99: 186})
  618. Last 10 episode lengths (avg: 69.36)
  619. 8064 Q table 1 zeroes, 78.125 percent filled
  620. 8064 Q table 2 zeroes, 78.125 percent filled
  621. Current Episode: 36673
  622. Reward distribution: Counter({-12: 13613, -3: 13512, -4: 10724, -11: 4459, -20: 4029, -30: 1785, -10: 660, -2: 618, 90: 416, 99: 184})
  623. Last 10 episode lengths (avg: 75.94)
  624. 8064 Q table 1 zeroes, 78.125 percent filled
  625. 8064 Q table 2 zeroes, 78.125 percent filled
  626. ==============================
  627. Running evaluation after 37000 episodes
  628. Evaluation results after 200 trials
  629. Average time steps taken: 248.625
  630. Average number of penalties incurred: 247.665
  631. Had 156 wins in 200 episodes
  632. ==============================
  633. Current Episode: 37302
  634. Reward distribution: Counter({-12: 13773, -3: 13537, -4: 10591, -11: 4392, -20: 3972, -30: 1824, -10: 663, -2: 634, 90: 434, 99: 180})
  635. Last 10 episode lengths (avg: 81.02)
  636. 8064 Q table 1 zeroes, 78.125 percent filled
  637. 8064 Q table 2 zeroes, 78.125 percent filled
  638. Current Episode: 37893
  639. Reward distribution: Counter({-12: 13886, -3: 13001, -4: 11143, -11: 4360, -20: 3951, -30: 1860, -10: 628, -2: 592, 90: 399, 99: 180})
  640. Last 10 episode lengths (avg: 81.52)
  641. 8064 Q table 1 zeroes, 78.125 percent filled
  642. 8064 Q table 2 zeroes, 78.125 percent filled
  643. ==============================
  644. Running evaluation after 38000 episodes
  645. Evaluation results after 200 trials
  646. Average time steps taken: 613.67
  647. Average number of penalties incurred: 612.785
  648. Had 119 wins in 200 episodes
  649. ==============================
  650. Current Episode: 38497
  651. Reward distribution: Counter({-12: 13781, -3: 13546, -4: 10474, -11: 4563, -20: 3962, -30: 1828, -10: 626, -2: 624, 90: 440, 99: 156})
  652. Last 10 episode lengths (avg: 79.48)
  653. 8064 Q table 1 zeroes, 78.125 percent filled
  654. 8064 Q table 2 zeroes, 78.125 percent filled
  655. ==============================
  656. Running evaluation after 39000 episodes
  657. Evaluation results after 200 trials
  658. Average time steps taken: 211.575
  659. Average number of penalties incurred: 206.71
  660. Had 173 wins in 200 episodes
  661. ==============================
  662. Current Episode: 39116
  663. Reward distribution: Counter({-3: 14114, -12: 13655, -4: 9979, -11: 4525, -20: 3999, -30: 1856, -2: 635, -10: 632, 90: 436, 99: 169})
  664. Last 10 episode lengths (avg: 88.96)
  665. 8064 Q table 1 zeroes, 78.125 percent filled
  666. 8064 Q table 2 zeroes, 78.125 percent filled
  667. Current Episode: 39732
  668. Reward distribution: Counter({-12: 13901, -3: 13461, -4: 10706, -11: 4339, -20: 3940, -30: 1752, -10: 675, -2: 619, 90: 446, 99: 161})
  669. Last 10 episode lengths (avg: 78.6)
  670. 8064 Q table 1 zeroes, 78.125 percent filled
  671. 8064 Q table 2 zeroes, 78.125 percent filled
  672. ==============================
  673. Running evaluation after 40000 episodes
  674. Evaluation results after 200 trials
  675. Average time steps taken: 382.64
  676. Average number of penalties incurred: 381.65
  677. Had 150 wins in 200 episodes
  678. ==============================
  679. Current Episode: 40342
  680. Reward distribution: Counter({-3: 13789, -12: 13784, -4: 10429, -11: 4345, -20: 4011, -30: 1829, -10: 624, -2: 589, 90: 419, 99: 181})
  681. Last 10 episode lengths (avg: 72.16)
  682. 8064 Q table 1 zeroes, 78.125 percent filled
  683. 8064 Q table 2 zeroes, 78.125 percent filled
  684. Current Episode: 40975
  685. Reward distribution: Counter({-12: 13773, -3: 13570, -4: 10570, -11: 4430, -20: 3998, -30: 1818, -10: 652, -2: 585, 90: 443, 99: 161})
  686. Last 10 episode lengths (avg: 76.06)
  687. 8064 Q table 1 zeroes, 78.125 percent filled
  688. 8064 Q table 2 zeroes, 78.125 percent filled
  689. ==============================
  690. Running evaluation after 41000 episodes
  691. Evaluation results after 200 trials
  692. Average time steps taken: 509.11
  693. Average number of penalties incurred: 508.125
  694. Had 133 wins in 200 episodes
  695. ==============================
  696. Current Episode: 41585
  697. Reward distribution: Counter({-12: 13920, -3: 13184, -4: 10943, -11: 4272, -20: 4014, -30: 1807, -10: 671, -2: 589, 90: 412, 99: 188})
  698. Last 10 episode lengths (avg: 94.94)
  699. 8064 Q table 1 zeroes, 78.125 percent filled
  700. 8064 Q table 2 zeroes, 78.125 percent filled
  701. ==============================
  702. Running evaluation after 42000 episodes
  703. Evaluation results after 200 trials
  704. Average time steps taken: 472.25
  705. Average number of penalties incurred: 471.295
  706. Had 138 wins in 200 episodes
  707. ==============================
  708. Current Episode: 42180
  709. Reward distribution: Counter({-12: 13858, -3: 13532, -4: 10833, -11: 4407, -20: 3820, -30: 1703, -10: 657, -2: 607, 90: 419, 99: 164})
  710. Last 10 episode lengths (avg: 93.72)
  711. 8064 Q table 1 zeroes, 78.125 percent filled
  712. 8064 Q table 2 zeroes, 78.125 percent filled
  713. Current Episode: 42814
  714. Reward distribution: Counter({-3: 14016, -12: 13572, -4: 10150, -11: 4636, -20: 3967, -30: 1783, -10: 690, -2: 573, 90: 436, 99: 177})
  715. Last 10 episode lengths (avg: 90.86)
  716. 8064 Q table 1 zeroes, 78.125 percent filled
  717. 8064 Q table 2 zeroes, 78.125 percent filled
  718. ==============================
  719. Running evaluation after 43000 episodes
  720. Evaluation results after 200 trials
  721. Average time steps taken: 427.6
  722. Average number of penalties incurred: 426.63
  723. Had 144 wins in 200 episodes
  724. ==============================
  725. Current Episode: 43423
  726. Reward distribution: Counter({-12: 13790, -3: 13406, -4: 10682, -11: 4428, -20: 3869, -30: 1952, -10: 679, -2: 598, 90: 434, 99: 162})
  727. Last 10 episode lengths (avg: 90.76)
  728. 8064 Q table 1 zeroes, 78.125 percent filled
  729. 8064 Q table 2 zeroes, 78.125 percent filled
  730. ==============================
  731. Running evaluation after 44000 episodes
  732. Evaluation results after 200 trials
  733. Average time steps taken: 293.295
  734. Average number of penalties incurred: 292.25
  735. Had 162 wins in 200 episodes
  736. ==============================
  737. Current Episode: 44035
  738. Reward distribution: Counter({-12: 13791, -3: 13481, -4: 10691, -11: 4391, -20: 3910, -30: 1840, -10: 710, -2: 584, 90: 425, 99: 177})
  739. Last 10 episode lengths (avg: 77.52)
  740. 8064 Q table 1 zeroes, 78.125 percent filled
  741. 8064 Q table 2 zeroes, 78.125 percent filled
  742. Current Episode: 44681
  743. Reward distribution: Counter({-12: 13910, -3: 13580, -4: 10628, -11: 4224, -20: 3979, -30: 1776, -10: 683, -2: 585, 90: 450, 99: 185})
  744. Last 10 episode lengths (avg: 90.08)
  745. 8064 Q table 1 zeroes, 78.125 percent filled
  746. 8064 Q table 2 zeroes, 78.125 percent filled
  747. ==============================
  748. Running evaluation after 45000 episodes
  749. Evaluation results after 200 trials
  750. Average time steps taken: 397.55
  751. Average number of penalties incurred: 396.535
  752. Had 148 wins in 200 episodes
  753. ==============================
  754. Current Episode: 45302
  755. Reward distribution: Counter({-12: 13816, -3: 13614, -4: 10698, -11: 4325, -20: 3961, -30: 1740, -10: 680, -2: 556, 90: 434, 99: 176})
  756. Last 10 episode lengths (avg: 84.46)
  757. 8064 Q table 1 zeroes, 78.125 percent filled
  758. 8064 Q table 2 zeroes, 78.125 percent filled
  759. Current Episode: 45939
  760. Reward distribution: Counter({-12: 13962, -3: 13259, -4: 11091, -11: 4197, -20: 3846, -30: 1764, -10: 662, -2: 592, 90: 467, 99: 160})
  761. Last 10 episode lengths (avg: 73.12)
  762. 8064 Q table 1 zeroes, 78.125 percent filled
  763. 8064 Q table 2 zeroes, 78.125 percent filled
  764. ==============================
  765. Running evaluation after 46000 episodes
  766. Evaluation results after 200 trials
  767. Average time steps taken: 412.755
  768. Average number of penalties incurred: 411.76
  769. Had 146 wins in 200 episodes
  770. ==============================
  771. Current Episode: 46561
  772. Reward distribution: Counter({-12: 13906, -3: 13435, -4: 10668, -11: 4224, -20: 3989, -30: 1891, -10: 695, -2: 584, 90: 436, 99: 172})
  773. Last 10 episode lengths (avg: 74.22)
  774. 8064 Q table 1 zeroes, 78.125 percent filled
  775. 8064 Q table 2 zeroes, 78.125 percent filled
  776. ==============================
  777. Running evaluation after 47000 episodes
  778. Evaluation results after 200 trials
  779. Average time steps taken: 405.4
  780. Average number of penalties incurred: 404.51
  781. Had 147 wins in 200 episodes
  782. ==============================
  783. Current Episode: 47173
  784. Reward distribution: Counter({-12: 13821, -3: 13267, -4: 11020, -11: 4261, -20: 3943, -30: 1807, -10: 696, -2: 584, 90: 425, 99: 176})
  785. Last 10 episode lengths (avg: 77.6)
  786. 8064 Q table 1 zeroes, 78.125 percent filled
  787. 8064 Q table 2 zeroes, 78.125 percent filled
  788. Current Episode: 47792
  789. Reward distribution: Counter({-3: 13811, -12: 13742, -4: 10491, -11: 4400, -20: 3913, -30: 1789, -10: 679, -2: 565, 90: 450, 99: 160})
  790. Last 10 episode lengths (avg: 76.32)
  791. 8064 Q table 1 zeroes, 78.125 percent filled
  792. 8064 Q table 2 zeroes, 78.125 percent filled
  793. ==============================
  794. Running evaluation after 48000 episodes
  795. Evaluation results after 200 trials
  796. Average time steps taken: 323.39
  797. Average number of penalties incurred: 322.44
  798. Had 158 wins in 200 episodes
  799. ==============================
  800. Current Episode: 48445
  801. Reward distribution: Counter({-12: 13993, -3: 12754, -4: 11561, -20: 3995, -11: 3972, -30: 1907, -10: 649, -2: 524, 90: 476, 99: 169})
  802. Last 10 episode lengths (avg: 78.08)
  803. 8064 Q table 1 zeroes, 78.125 percent filled
  804. 8064 Q table 2 zeroes, 78.125 percent filled
  805. ==============================
  806. Running evaluation after 49000 episodes
  807. Evaluation results after 200 trials
  808. Average time steps taken: 434.975
  809. Average number of penalties incurred: 434.115
  810. Had 143 wins in 200 episodes
  811. ==============================
  812. Current Episode: 49063
  813. Reward distribution: Counter({-3: 13928, -12: 13605, -4: 10286, -11: 4542, -20: 3917, -30: 1874, -10: 665, -2: 575, 90: 433, 99: 175})
  814. Last 10 episode lengths (avg: 75.1)
  815. 8064 Q table 1 zeroes, 78.125 percent filled
  816. 8064 Q table 2 zeroes, 78.125 percent filled
  817. Current Episode: 49706
  818. Reward distribution: Counter({-12: 13870, -3: 13169, -4: 11054, -11: 4251, -20: 3985, -30: 1810, -10: 704, -2: 529, 90: 436, 99: 192})
  819. Last 10 episode lengths (avg: 76.12)
  820. 8064 Q table 1 zeroes, 78.125 percent filled
  821. 8064 Q table 2 zeroes, 78.125 percent filled
  822. Training finished.
  823.  
RAW Paste Data