Guest User

q-learning-dual-taxi-cooperative-output

a guest
Jul 9th, 2021
76
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. Total encoded states are 6144
  2. ==============================
  3. Running evaluation after 0 episodes
  4. Evaluation results after 200 trials
  5. Average time steps taken: 1500.0
  6. Average number of penalties incurred: 1500.0
  7. Had 0 wins in 200 episodes
  8. ==============================
  9. Current Episode: 80
  10. Reward distribution: Counter({-4: 17683, -12: 15636, -3: 7789, -20: 3988, -11: 2314, -30: 1972, -2: 366, -10: 182, 99: 41, 90: 29})
  11. Last 10 episode lengths (avg: 554.34)
  12. 180671 Q table zeroes, 18.31642433449074 percent filled
  13. Current Episode: 163
  14. Reward distribution: Counter({-4: 16589, -12: 16432, -3: 7660, -20: 4178, -11: 2566, -30: 1919, -2: 392, -10: 190, 99: 50, 90: 24})
  15. Last 10 episode lengths (avg: 569.62)
  16. 148651 Q table zeroes, 32.7930591724537 percent filled
  17. Current Episode: 262
  18. Reward distribution: Counter({-12: 16838, -4: 15857, -3: 7739, -20: 4183, -11: 2729, -30: 1921, -2: 458, -10: 184, 99: 54, 90: 37})
  19. Last 10 episode lengths (avg: 475.54)
  20. 122432 Q table zeroes, 44.64699074074074 percent filled
  21. Current Episode: 377
  22. Reward distribution: Counter({-12: 17179, -4: 15135, -3: 7679, -20: 4518, -11: 2808, -30: 1903, -2: 473, -10: 194, 99: 85, 90: 26})
  23. Last 10 episode lengths (avg: 406.82)
  24. 102995 Q table zeroes, 53.43469690393518 percent filled
  25. Current Episode: 483
  26. Reward distribution: Counter({-12: 16416, -4: 15575, -3: 7704, -20: 4789, -11: 2855, -30: 1817, -2: 553, -10: 194, 99: 59, 90: 38})
  27. Last 10 episode lengths (avg: 435.08)
  28. 89459 Q table zeroes, 59.55448857060185 percent filled
  29. Current Episode: 602
  30. Reward distribution: Counter({-12: 16332, -4: 16306, -3: 7521, -20: 4515, -11: 2662, -30: 1746, -2: 587, -10: 214, 99: 73, 90: 44})
  31. Last 10 episode lengths (avg: 463.04)
  32. 78543 Q table zeroes, 64.48974609375 percent filled
  33. Current Episode: 748
  34. Reward distribution: Counter({-4: 16338, -12: 15711, -3: 8105, -20: 4591, -11: 2583, -30: 1689, -2: 662, -10: 181, 99: 93, 90: 47})
  35. Last 10 episode lengths (avg: 355.62)
  36. 71371 Q table zeroes, 67.73229528356481 percent filled
  37. Current Episode: 890
  38. Reward distribution: Counter({-4: 16571, -12: 15347, -3: 8455, -20: 4425, -11: 2507, -30: 1746, -2: 622, -10: 187, 99: 95, 90: 45})
  39. Last 10 episode lengths (avg: 387.06)
  40. 66317 Q table zeroes, 70.0172706886574 percent filled
  41. ==============================
  42. Running evaluation after 1000 episodes
  43. Evaluation results after 200 trials
  44. Average time steps taken: 1492.54
  45. Average number of penalties incurred: 1478.15
  46. Had 1 wins in 200 episodes
  47. ==============================
  48. Current Episode: 1069
  49. Reward distribution: Counter({-4: 16374, -12: 15092, -3: 8985, -20: 4344, -11: 2633, -30: 1487, -2: 709, -10: 199, 99: 112, 90: 65})
  50. Last 10 episode lengths (avg: 328.6)
  51. 62942 Q table zeroes, 71.5431495949074 percent filled
  52. Current Episode: 1249
  53. Reward distribution: Counter({-4: 17483, -12: 14986, -3: 8359, -20: 4224, -11: 2330, -30: 1563, -2: 714, -10: 167, 99: 130, 90: 44})
  54. Last 10 episode lengths (avg: 272.6)
  55. 60615 Q table zeroes, 72.59521484375 percent filled
  56. Current Episode: 1468
  57. Reward distribution: Counter({-4: 17589, -12: 14841, -3: 8523, -20: 4095, -11: 2352, -30: 1482, -2: 726, -10: 177, 99: 155, 90: 60})
  58. Last 10 episode lengths (avg: 211.56)
  59. 58536 Q table zeroes, 73.53515625 percent filled
  60. Current Episode: 1713
  61. Reward distribution: Counter({-4: 16659, -12: 14389, -3: 9414, -20: 4157, -11: 2638, -30: 1573, -2: 748, -10: 178, 99: 161, 90: 83})
  62. Last 10 episode lengths (avg: 204.32)
  63. 57105 Q table zeroes, 74.18212890625 percent filled
  64. Current Episode: 1918
  65. Reward distribution: Counter({-4: 18104, -12: 14688, -3: 8226, -20: 4159, -11: 2265, -30: 1464, -2: 711, -10: 178, 99: 123, 90: 82})
  66. Last 10 episode lengths (avg: 238.16)
  67. 56146 Q table zeroes, 74.61570457175925 percent filled
  68. ==============================
  69. Running evaluation after 2000 episodes
  70. Evaluation results after 200 trials
  71. Average time steps taken: 1373.14
  72. Average number of penalties incurred: 1341.665
  73. Had 17 wins in 200 episodes
  74. ==============================
  75. Current Episode: 2167
  76. Reward distribution: Counter({-4: 17223, -12: 14469, -3: 8960, -20: 4081, -11: 2559, -30: 1560, -2: 737, 99: 177, -10: 166, 90: 68})
  77. Last 10 episode lengths (avg: 235.08)
  78. 55242 Q table zeroes, 75.0244140625 percent filled
  79. Current Episode: 2443
  80. Reward distribution: Counter({-4: 17391, -12: 14470, -3: 8867, -20: 4170, -11: 2470, -30: 1433, -2: 755, 99: 182, -10: 168, 90: 94})
  81. Last 10 episode lengths (avg: 176.46)
  82. 54499 Q table zeroes, 75.36033347800925 percent filled
  83. Current Episode: 2741
  84. Reward distribution: Counter({-4: 17578, -12: 14593, -3: 8682, -20: 4178, -11: 2277, -30: 1562, -2: 670, 99: 197, -10: 162, 90: 101})
  85. Last 10 episode lengths (avg: 163.88)
  86. 53778 Q table zeroes, 75.68630642361111 percent filled
  87. ==============================
  88. Running evaluation after 3000 episodes
  89. Evaluation results after 200 trials
  90. Average time steps taken: 1365.77
  91. Average number of penalties incurred: 1328.055
  92. Had 18 wins in 200 episodes
  93. ==============================
  94. Current Episode: 3044
  95. Reward distribution: Counter({-4: 18282, -12: 14977, -3: 7843, -20: 4080, -11: 2152, -30: 1510, -2: 730, 99: 199, -10: 124, 90: 103})
  96. Last 10 episode lengths (avg: 182.52)
  97. 53169 Q table zeroes, 75.96164279513889 percent filled
  98. Current Episode: 3350
  99. Reward distribution: Counter({-4: 18289, -12: 14529, -3: 8133, -20: 4072, -11: 2215, -30: 1541, -2: 755, 99: 213, -10: 162, 90: 91})
  100. Last 10 episode lengths (avg: 168.18)
  101. 52661 Q table zeroes, 76.19131582754629 percent filled
  102. Current Episode: 3723
  103. Reward distribution: Counter({-4: 17416, -12: 14471, -3: 8780, -20: 4135, -11: 2390, -30: 1535, -2: 736, 99: 246, -10: 168, 90: 123})
  104. Last 10 episode lengths (avg: 132.14)
  105. 52114 Q table zeroes, 76.43862123842592 percent filled
  106. ==============================
  107. Running evaluation after 4000 episodes
  108. Evaluation results after 200 trials
  109. Average time steps taken: 1149.505
  110. Average number of penalties incurred: 1140.2
  111. Had 47 wins in 200 episodes
  112. ==============================
  113. Current Episode: 4087
  114. Reward distribution: Counter({-4: 17405, -12: 14483, -3: 8816, -20: 4083, -11: 2494, -30: 1474, -2: 739, 99: 240, -10: 143, 90: 123})
  115. Last 10 episode lengths (avg: 126.34)
  116. 51549 Q table zeroes, 76.69406467013889 percent filled
  117. Current Episode: 4486
  118. Reward distribution: Counter({-4: 17837, -12: 14499, -3: 8311, -20: 4208, -11: 2298, -30: 1570, -2: 743, 99: 250, 90: 148, -10: 136})
  119. Last 10 episode lengths (avg: 136.5)
  120. 51214 Q table zeroes, 76.8455222800926 percent filled
  121. Current Episode: 4925
  122. Reward distribution: Counter({-4: 17579, -12: 14151, -3: 8730, -20: 4243, -11: 2405, -30: 1550, -2: 776, 99: 289, 90: 148, -10: 129})
  123. Last 10 episode lengths (avg: 117.68)
  124. 50904 Q table zeroes, 76.98567708333334 percent filled
  125. ==============================
  126. Running evaluation after 5000 episodes
  127. Evaluation results after 200 trials
  128. Average time steps taken: 1208.835
  129. Average number of penalties incurred: 1198.94
  130. Had 39 wins in 200 episodes
  131. ==============================
  132. Current Episode: 5363
  133. Reward distribution: Counter({-4: 17516, -12: 14165, -3: 8770, -20: 4137, -11: 2454, -30: 1649, -2: 733, 99: 294, -10: 142, 90: 140})
  134. Last 10 episode lengths (avg: 111.7)
  135. 50603 Q table zeroes, 77.1217628761574 percent filled
  136. Current Episode: 5869
  137. Reward distribution: Counter({-4: 16100, -12: 13953, -3: 10198, -20: 3972, -11: 2851, -30: 1456, -2: 810, 99: 321, 90: 179, -10: 160})
  138. Last 10 episode lengths (avg: 98.5)
  139. 50301 Q table zeroes, 77.25830078125 percent filled
  140. ==============================
  141. Running evaluation after 6000 episodes
  142. Evaluation results after 200 trials
  143. Average time steps taken: 1015.34
  144. Average number of penalties incurred: 1003.38
  145. Had 65 wins in 200 episodes
  146. ==============================
  147. Current Episode: 6329
  148. Reward distribution: Counter({-4: 17680, -12: 14322, -3: 8632, -20: 4081, -11: 2464, -30: 1548, -2: 691, 99: 302, 90: 154, -10: 126})
  149. Last 10 episode lengths (avg: 109.5)
  150. 50063 Q table zeroes, 77.3659035011574 percent filled
  151. Current Episode: 6840
  152. Reward distribution: Counter({-4: 16399, -12: 14005, -3: 9923, -20: 4117, -11: 2686, -30: 1440, -2: 793, 99: 370, 90: 139, -10: 128})
  153. Last 10 episode lengths (avg: 86.62)
  154. 49838 Q table zeroes, 77.46762876157408 percent filled
  155. ==============================
  156. Running evaluation after 7000 episodes
  157. Evaluation results after 200 trials
  158. Average time steps taken: 866.03
  159. Average number of penalties incurred: 861.42
  160. Had 85 wins in 200 episodes
  161. ==============================
  162. Current Episode: 7369
  163. Reward distribution: Counter({-4: 16415, -12: 13778, -3: 10046, -20: 4097, -11: 2736, -30: 1460, -2: 799, 99: 355, 90: 169, -10: 145})
  164. Last 10 episode lengths (avg: 93.02)
  165. 49651 Q table zeroes, 77.55217375578704 percent filled
  166. Current Episode: 7902
  167. Reward distribution: Counter({-4: 15640, -12: 13665, -3: 10681, -20: 4017, -11: 2987, -30: 1535, -2: 831, 99: 363, 90: 164, -10: 117})
  168. Last 10 episode lengths (avg: 99.74)
  169. 49486 Q table zeroes, 77.6267722800926 percent filled
  170. ==============================
  171. Running evaluation after 8000 episodes
  172. Evaluation results after 200 trials
  173. Average time steps taken: 828.83
  174. Average number of penalties incurred: 820.435
  175. Had 90 wins in 200 episodes
  176. ==============================
  177. Current Episode: 8419
  178. Reward distribution: Counter({-4: 15397, -12: 13596, -3: 10785, -20: 4138, -11: 3098, -30: 1514, -2: 822, 99: 372, 90: 143, -10: 135})
  179. Last 10 episode lengths (avg: 84.74)
  180. 49355 Q table zeroes, 77.68599898726852 percent filled
  181. Current Episode: 9000
  182. Reward distribution: Counter({-4: 15236, -12: 13366, -3: 11053, -20: 4053, -11: 3108, -30: 1600, -2: 871, 99: 400, 90: 173, -10: 140})
  183. Last 10 episode lengths (avg: 91.4)
  184. 49229 Q table zeroes, 77.74296513310185 percent filled
  185. ==============================
  186. Running evaluation after 9000 episodes
  187. Evaluation results after 200 trials
  188. Average time steps taken: 776.58
  189. Average number of penalties incurred: 770.62
  190. Had 97 wins in 200 episodes
  191. ==============================
  192. Current Episode: 9589
  193. Reward distribution: Counter({-4: 15364, -12: 13554, -3: 10918, -20: 4014, -11: 3039, -30: 1477, -2: 905, 99: 426, 90: 158, -10: 145})
  194. Last 10 episode lengths (avg: 83.36)
  195. 49133 Q table zeroes, 77.78636791087963 percent filled
  196. ==============================
  197. Running evaluation after 10000 episodes
  198. Evaluation results after 200 trials
  199. Average time steps taken: 746.94
  200. Average number of penalties incurred: 745.93
  201. Had 101 wins in 200 episodes
  202. ==============================
  203. Current Episode: 10226
  204. Reward distribution: Counter({-4: 14460, -12: 13340, -3: 11714, -20: 3985, -11: 3257, -30: 1480, -2: 989, 99: 437, 90: 195, -10: 143})
  205. Last 10 episode lengths (avg: 89.0)
  206. 49027 Q table zeroes, 77.8342918113426 percent filled
  207. Current Episode: 10847
  208. Reward distribution: Counter({-4: 14619, -12: 13159, -3: 11666, -20: 4039, -11: 3223, -30: 1596, -2: 925, 99: 440, 90: 177, -10: 156})
  209. Last 10 episode lengths (avg: 68.66)
  210. 48938 Q table zeroes, 77.87452980324075 percent filled
  211. ==============================
  212. Running evaluation after 11000 episodes
  213. Evaluation results after 200 trials
  214. Average time steps taken: 627.375
  215. Average number of penalties incurred: 626.205
  216. Had 117 wins in 200 episodes
  217. ==============================
  218. Current Episode: 11448
  219. Reward distribution: Counter({-4: 15149, -12: 13411, -3: 11217, -20: 4018, -11: 3082, -30: 1465, -2: 926, 99: 434, 90: 159, -10: 139})
  220. Last 10 episode lengths (avg: 76.48)
  221. 48851 Q table zeroes, 77.91386357060185 percent filled
  222. ==============================
  223. Running evaluation after 12000 episodes
  224. Evaluation results after 200 trials
  225. Average time steps taken: 858.73
  226. Average number of penalties incurred: 857.87
  227. Had 86 wins in 200 episodes
  228. ==============================
  229. Current Episode: 12071
  230. Reward distribution: Counter({-4: 14276, -12: 13141, -3: 11973, -20: 4095, -11: 3355, -30: 1427, -2: 965, 99: 430, 90: 186, -10: 152})
  231. Last 10 episode lengths (avg: 77.92)
  232. 48787 Q table zeroes, 77.94279875578704 percent filled
  233. Current Episode: 12699
  234. Reward distribution: Counter({-4: 13808, -12: 13027, -3: 12340, -20: 4107, -11: 3509, -30: 1452, -2: 973, 99: 452, 90: 173, -10: 159})
  235. Last 10 episode lengths (avg: 63.58)
  236. 48737 Q table zeroes, 77.96540436921296 percent filled
  237. ==============================
  238. Running evaluation after 13000 episodes
  239. Evaluation results after 200 trials
  240. Average time steps taken: 620.115
  241. Average number of penalties incurred: 618.935
  242. Had 118 wins in 200 episodes
  243. ==============================
  244. Current Episode: 13366
  245. Reward distribution: Counter({-4: 13608, -12: 13023, -3: 12474, -20: 4093, -11: 3435, -30: 1569, -2: 991, 99: 510, 90: 150, -10: 147})
  246. Last 10 episode lengths (avg: 66.48)
  247. 48686 Q table zeroes, 77.9884620949074 percent filled
  248. ==============================
  249. Running evaluation after 14000 episodes
  250. Evaluation results after 200 trials
  251. Average time steps taken: 597.69
  252. Average number of penalties incurred: 596.48
  253. Had 121 wins in 200 episodes
  254. ==============================
  255. Current Episode: 14047
  256. Reward distribution: Counter({-4: 13151, -3: 13098, -12: 12763, -20: 4055, -11: 3547, -30: 1575, -2: 971, 99: 505, -10: 169, 90: 166})
  257. Last 10 episode lengths (avg: 71.18)
  258. 48651 Q table zeroes, 78.00428602430556 percent filled
  259. Current Episode: 14736
  260. Reward distribution: Counter({-4: 13417, -12: 13022, -3: 12698, -20: 4052, -11: 3547, -30: 1424, -2: 985, 99: 517, -10: 169, 90: 169})
  261. Last 10 episode lengths (avg: 67.3)
  262. 48619 Q table zeroes, 78.01875361689815 percent filled
  263. ==============================
  264. Running evaluation after 15000 episodes
  265. Evaluation results after 200 trials
  266. Average time steps taken: 567.6
  267. Average number of penalties incurred: 566.35
  268. Had 125 wins in 200 episodes
  269. ==============================
  270. Current Episode: 15419
  271. Reward distribution: Counter({-4: 13203, -3: 12984, -12: 12752, -20: 4044, -11: 3698, -30: 1453, -2: 1041, 99: 510, 90: 164, -10: 151})
  272. Last 10 episode lengths (avg: 61.52)
  273. 48584 Q table zeroes, 78.03457754629629 percent filled
  274. ==============================
  275. Running evaluation after 16000 episodes
  276. Evaluation results after 200 trials
  277. Average time steps taken: 485.95
  278. Average number of penalties incurred: 484.59
  279. Had 136 wins in 200 episodes
  280. ==============================
  281. Current Episode: 16078
  282. Reward distribution: Counter({-4: 13505, -12: 12826, -3: 12633, -20: 4067, -11: 3629, -30: 1528, -2: 1007, 99: 470, 90: 181, -10: 154})
  283. Last 10 episode lengths (avg: 79.18)
  284. 48563 Q table zeroes, 78.04407190393519 percent filled
  285. Current Episode: 16783
  286. Reward distribution: Counter({-3: 13270, -12: 12895, -4: 12888, -20: 4069, -11: 3588, -30: 1415, -2: 990, 99: 542, -10: 185, 90: 158})
  287. Last 10 episode lengths (avg: 66.8)
  288. 48542 Q table zeroes, 78.05356626157408 percent filled
  289. ==============================
  290. Running evaluation after 17000 episodes
  291. Evaluation results after 200 trials
  292. Average time steps taken: 478.355
  293. Average number of penalties incurred: 476.985
  294. Had 137 wins in 200 episodes
  295. ==============================
  296. Current Episode: 17490
  297. Reward distribution: Counter({-3: 13667, -12: 12690, -4: 12354, -20: 4026, -11: 3799, -30: 1503, -2: 1101, 99: 549, -10: 160, 90: 151})
  298. Last 10 episode lengths (avg: 69.48)
  299. 48518 Q table zeroes, 78.06441695601852 percent filled
  300. ==============================
  301. Running evaluation after 18000 episodes
  302. Evaluation results after 200 trials
  303. Average time steps taken: 493.315
  304. Average number of penalties incurred: 491.965
  305. Had 135 wins in 200 episodes
  306. ==============================
  307. Current Episode: 18236
  308. Reward distribution: Counter({-3: 13805, -12: 12606, -4: 12256, -20: 4117, -11: 3816, -30: 1384, -2: 1097, 99: 556, -10: 182, 90: 181})
  309. Last 10 episode lengths (avg: 65.0)
  310. 48496 Q table zeroes, 78.07436342592592 percent filled
  311. Current Episode: 18900
  312. Reward distribution: Counter({-4: 13277, -3: 12964, -12: 12863, -20: 4108, -11: 3525, -30: 1425, -2: 1011, 99: 490, -10: 172, 90: 165})
  313. Last 10 episode lengths (avg: 63.88)
  314. 48480 Q table zeroes, 78.08159722222221 percent filled
  315. ==============================
  316. Running evaluation after 19000 episodes
  317. Evaluation results after 200 trials
  318. Average time steps taken: 448.36
  319. Average number of penalties incurred: 446.95
  320. Had 141 wins in 200 episodes
  321. ==============================
  322. Current Episode: 19643
  323. Reward distribution: Counter({-3: 14191, -12: 12568, -4: 11877, -11: 4000, -20: 3970, -30: 1359, -2: 1126, 99: 568, -10: 175, 90: 166})
  324. Last 10 episode lengths (avg: 70.74)
  325. 48465 Q table zeroes, 78.08837890625 percent filled
  326. ==============================
  327. Running evaluation after 20000 episodes
  328. Evaluation results after 200 trials
  329. Average time steps taken: 470.685
  330. Average number of penalties incurred: 469.305
  331. Had 138 wins in 200 episodes
  332. ==============================
  333. Current Episode: 20371
  334. Reward distribution: Counter({-3: 13207, -4: 13040, -12: 12755, -20: 3922, -11: 3598, -30: 1445, -2: 1138, 99: 563, -10: 170, 90: 162})
  335. Last 10 episode lengths (avg: 76.76)
  336. 48458 Q table zeroes, 78.09154369212963 percent filled
  337. ==============================
  338. Running evaluation after 21000 episodes
  339. Evaluation results after 200 trials
  340. Average time steps taken: 411.335
  341. Average number of penalties incurred: 409.875
  342. Had 146 wins in 200 episodes
  343. ==============================
  344. Current Episode: 21106
  345. Reward distribution: Counter({-3: 13433, -12: 12749, -4: 12635, -20: 3982, -11: 3869, -30: 1309, -2: 1124, 99: 569, -10: 170, 90: 160})
  346. Last 10 episode lengths (avg: 72.92)
  347. 48452 Q table zeroes, 78.09425636574075 percent filled
  348. Current Episode: 21846
  349. Reward distribution: Counter({-3: 13503, -4: 12886, -12: 12667, -20: 3921, -11: 3603, -30: 1419, -2: 1088, 99: 595, -10: 179, 90: 139})
  350. Last 10 episode lengths (avg: 80.02)
  351. 48447 Q table zeroes, 78.09651692708334 percent filled
  352. ==============================
  353. Running evaluation after 22000 episodes
  354. Evaluation results after 200 trials
  355. Average time steps taken: 411.4
  356. Average number of penalties incurred: 409.94
  357. Had 146 wins in 200 episodes
  358. ==============================
  359. Current Episode: 22594
  360. Reward distribution: Counter({-3: 13870, -12: 12646, -4: 12325, -20: 3991, -11: 3787, -30: 1338, -2: 1137, 99: 575, 90: 168, -10: 163})
  361. Last 10 episode lengths (avg: 58.24)
  362. 48439 Q table zeroes, 78.10013382523148 percent filled
  363. ==============================
  364. Running evaluation after 23000 episodes
  365. Evaluation results after 200 trials
  366. Average time steps taken: 441.175
  367. Average number of penalties incurred: 439.755
  368. Had 142 wins in 200 episodes
  369. ==============================
  370. Current Episode: 23339
  371. Reward distribution: Counter({-3: 14132, -12: 12470, -4: 12172, -20: 3919, -11: 3892, -30: 1382, -2: 1119, 99: 563, -10: 183, 90: 168})
  372. Last 10 episode lengths (avg: 63.04)
  373. 48433 Q table zeroes, 78.1028464988426 percent filled
  374. ==============================
  375. Running evaluation after 24000 episodes
  376. Evaluation results after 200 trials
  377. Average time steps taken: 411.38
  378. Average number of penalties incurred: 409.92
  379. Had 146 wins in 200 episodes
  380. ==============================
  381. Current Episode: 24083
  382. Reward distribution: Counter({-3: 13555, -4: 12611, -12: 12466, -20: 4159, -11: 3726, -30: 1460, -2: 1104, 99: 578, -10: 182, 90: 159})
  383. Last 10 episode lengths (avg: 60.44)
  384. 48427 Q table zeroes, 78.10555917245371 percent filled
  385. Current Episode: 24835
  386. Reward distribution: Counter({-3: 14073, -12: 12415, -4: 11909, -20: 4149, -11: 3989, -30: 1409, -2: 1129, 99: 584, -10: 181, 90: 162})
  387. Last 10 episode lengths (avg: 68.48)
  388. 48421 Q table zeroes, 78.10827184606481 percent filled
  389. ==============================
  390. Running evaluation after 25000 episodes
  391. Evaluation results after 200 trials
  392. Average time steps taken: 366.455
  393. Average number of penalties incurred: 364.935
  394. Had 152 wins in 200 episodes
  395. ==============================
  396. Current Episode: 25586
  397. Reward distribution: Counter({-3: 13506, -12: 12785, -4: 12533, -20: 3897, -11: 3767, -30: 1465, -2: 1149, 99: 567, 90: 175, -10: 156})
  398. Last 10 episode lengths (avg: 75.5)
  399. 48418 Q table zeroes, 78.10962818287037 percent filled
  400. ==============================
  401. Running evaluation after 26000 episodes
  402. Evaluation results after 200 trials
  403. Average time steps taken: 373.635
  404. Average number of penalties incurred: 372.125
  405. Had 151 wins in 200 episodes
  406. ==============================
  407. Current Episode: 26313
  408. Reward distribution: Counter({-3: 13929, -12: 12522, -4: 12217, -20: 3950, -11: 3872, -30: 1502, -2: 1103, 99: 569, -10: 185, 90: 151})
  409. Last 10 episode lengths (avg: 65.3)
  410. 48417 Q table zeroes, 78.11008029513889 percent filled
  411. ==============================
  412. Running evaluation after 27000 episodes
  413. Evaluation results after 200 trials
  414. Average time steps taken: 366.51
  415. Average number of penalties incurred: 364.99
  416. Had 152 wins in 200 episodes
  417. ==============================
  418. Current Episode: 27052
  419. Reward distribution: Counter({-3: 13877, -12: 12671, -4: 12139, -20: 4059, -11: 3843, -30: 1413, -2: 1087, 99: 569, -10: 177, 90: 165})
  420. Last 10 episode lengths (avg: 68.3)
  421. 48412 Q table zeroes, 78.11234085648148 percent filled
  422. Current Episode: 27833
  423. Reward distribution: Counter({-3: 14565, -12: 12322, -4: 11546, -11: 4032, -20: 3955, -30: 1410, -2: 1215, 99: 613, -10: 182, 90: 160})
  424. Last 10 episode lengths (avg: 59.08)
  425. 48409 Q table zeroes, 78.11369719328704 percent filled
  426. ==============================
  427. Running evaluation after 28000 episodes
  428. Evaluation results after 200 trials
  429. Average time steps taken: 351.79
  430. Average number of penalties incurred: 350.25
  431. Had 154 wins in 200 episodes
  432. ==============================
  433. Current Episode: 28583
  434. Reward distribution: Counter({-3: 13836, -12: 12512, -4: 12157, -20: 4103, -11: 3955, -30: 1422, -2: 1093, 99: 609, -10: 180, 90: 133})
  435. Last 10 episode lengths (avg: 66.58)
  436. 48408 Q table zeroes, 78.11414930555556 percent filled
  437. ==============================
  438. Running evaluation after 29000 episodes
  439. Evaluation results after 200 trials
  440. Average time steps taken: 322.02
  441. Average number of penalties incurred: 320.44
  442. Had 158 wins in 200 episodes
  443. ==============================
  444. Current Episode: 29332
  445. Reward distribution: Counter({-3: 14780, -12: 12249, -4: 11398, -11: 4263, -20: 3865, -30: 1358, -2: 1174, 99: 599, -10: 170, 90: 144})
  446. Last 10 episode lengths (avg: 63.98)
  447. 48406 Q table zeroes, 78.1150535300926 percent filled
  448. ==============================
  449. Running evaluation after 30000 episodes
  450. Evaluation results after 200 trials
  451. Average time steps taken: 307.015
  452. Average number of penalties incurred: 305.415
  453. Had 160 wins in 200 episodes
  454. ==============================
  455. Current Episode: 30077
  456. Reward distribution: Counter({-3: 14247, -12: 12471, -4: 11624, -20: 4062, -11: 4014, -30: 1469, -2: 1207, 99: 601, -10: 169, 90: 136})
  457. Last 10 episode lengths (avg: 60.44)
  458. 48402 Q table zeroes, 78.11686197916666 percent filled
  459. Current Episode: 30835
  460. Reward distribution: Counter({-3: 14250, -12: 12607, -4: 11887, -11: 4007, -20: 3916, -30: 1310, -2: 1103, 99: 619, -10: 175, 90: 126})
  461. Last 10 episode lengths (avg: 66.98)
  462. 48401 Q table zeroes, 78.11731409143519 percent filled
  463. ==============================
  464. Running evaluation after 31000 episodes
  465. Evaluation results after 200 trials
  466. Average time steps taken: 292.405
  467. Average number of penalties incurred: 290.785
  468. Had 162 wins in 200 episodes
  469. ==============================
  470. Current Episode: 31640
  471. Reward distribution: Counter({-3: 14440, -12: 12263, -4: 11833, -20: 4066, -11: 3920, -30: 1301, -2: 1207, 99: 652, -10: 172, 90: 146})
  472. Last 10 episode lengths (avg: 54.0)
  473. 48399 Q table zeroes, 78.11821831597221 percent filled
  474. ==============================
  475. Running evaluation after 32000 episodes
  476. Evaluation results after 200 trials
  477. Average time steps taken: 329.47
  478. Average number of penalties incurred: 327.9
  479. Had 157 wins in 200 episodes
  480. ==============================
  481. Current Episode: 32427
  482. Reward distribution: Counter({-3: 14674, -12: 12297, -4: 11413, -11: 4067, -20: 4042, -30: 1376, -2: 1178, 99: 630, -10: 180, 90: 143})
  483. Last 10 episode lengths (avg: 52.32)
  484. 48399 Q table zeroes, 78.11821831597221 percent filled
  485. ==============================
  486. Running evaluation after 33000 episodes
  487. Evaluation results after 200 trials
  488. Average time steps taken: 351.665
  489. Average number of penalties incurred: 350.125
  490. Had 154 wins in 200 episodes
  491. ==============================
  492. Current Episode: 33234
  493. Reward distribution: Counter({-3: 14318, -12: 12552, -4: 11698, -11: 4059, -20: 3930, -30: 1313, -2: 1132, 99: 644, -10: 195, 90: 159})
  494. Last 10 episode lengths (avg: 67.78)
  495. 48396 Q table zeroes, 78.11957465277779 percent filled
  496. ==============================
  497. Running evaluation after 34000 episodes
  498. Evaluation results after 200 trials
  499. Average time steps taken: 210.105
  500. Average number of penalties incurred: 208.375
  501. Had 173 wins in 200 episodes
  502. ==============================
  503. Current Episode: 34018
  504. Reward distribution: Counter({-3: 14591, -12: 12332, -4: 11463, -11: 4058, -20: 3990, -30: 1395, -2: 1211, 99: 628, -10: 183, 90: 149})
  505. Last 10 episode lengths (avg: 60.54)
  506. 48396 Q table zeroes, 78.11957465277779 percent filled
  507. Current Episode: 34840
  508. Reward distribution: Counter({-3: 14947, -12: 12210, -4: 11123, -11: 4181, -20: 3960, -30: 1366, -2: 1192, 99: 667, -10: 203, 90: 151})
  509. Last 10 episode lengths (avg: 60.08)
  510. 48395 Q table zeroes, 78.12002676504629 percent filled
  511. ==============================
  512. Running evaluation after 35000 episodes
  513. Evaluation results after 200 trials
  514. Average time steps taken: 240.13
  515. Average number of penalties incurred: 238.44
  516. Had 169 wins in 200 episodes
  517. ==============================
  518. Current Episode: 35627
  519. Reward distribution: Counter({-3: 14725, -12: 12209, -4: 11341, -11: 4288, -20: 3957, -30: 1349, -2: 1160, 99: 640, -10: 192, 90: 139})
  520. Last 10 episode lengths (avg: 58.16)
  521. 48395 Q table zeroes, 78.12002676504629 percent filled
  522. ==============================
  523. Running evaluation after 36000 episodes
  524. Evaluation results after 200 trials
  525. Average time steps taken: 239.845
  526. Average number of penalties incurred: 238.155
  527. Had 169 wins in 200 episodes
  528. ==============================
  529. Current Episode: 36421
  530. Reward distribution: Counter({-3: 14609, -12: 12507, -4: 11382, -11: 4123, -20: 3907, -30: 1302, -2: 1212, 99: 627, -10: 170, 90: 161})
  531. Last 10 episode lengths (avg: 58.46)
  532. 48393 Q table zeroes, 78.12093098958334 percent filled
  533. ==============================
  534. Running evaluation after 37000 episodes
  535. Evaluation results after 200 trials
  536. Average time steps taken: 239.885
  537. Average number of penalties incurred: 238.195
  538. Had 169 wins in 200 episodes
  539. ==============================
  540. Current Episode: 37229
  541. Reward distribution: Counter({-3: 14754, -12: 12226, -4: 11445, -11: 4012, -20: 3943, -30: 1418, -2: 1205, 99: 661, -10: 192, 90: 144})
  542. Last 10 episode lengths (avg: 58.62)
  543. 48393 Q table zeroes, 78.12093098958334 percent filled
  544. ==============================
  545. Running evaluation after 38000 episodes
  546. Evaluation results after 200 trials
  547. Average time steps taken: 195.505
  548. Average number of penalties incurred: 193.755
  549. Had 175 wins in 200 episodes
  550. ==============================
  551. Current Episode: 38017
  552. Reward distribution: Counter({-3: 14717, -12: 12127, -4: 11432, -11: 4169, -20: 3969, -30: 1441, -2: 1194, 99: 641, -10: 167, 90: 143})
  553. Last 10 episode lengths (avg: 64.68)
  554. 48393 Q table zeroes, 78.12093098958334 percent filled
  555. Current Episode: 38827
  556. Reward distribution: Counter({-3: 15219, -12: 12190, -4: 10883, -11: 4271, -20: 3906, -30: 1311, -2: 1226, 99: 641, -10: 191, 90: 162})
  557. Last 10 episode lengths (avg: 72.06)
  558. 48392 Q table zeroes, 78.12138310185185 percent filled
  559. ==============================
  560. Running evaluation after 39000 episodes
  561. Evaluation results after 200 trials
  562. Average time steps taken: 225.065
  563. Average number of penalties incurred: 223.355
  564. Had 171 wins in 200 episodes
  565. ==============================
  566. Current Episode: 39628
  567. Reward distribution: Counter({-3: 15479, -12: 11996, -4: 10673, -11: 4290, -20: 3996, -30: 1343, -2: 1223, 99: 656, -10: 203, 90: 141})
  568. Last 10 episode lengths (avg: 59.84)
  569. 48392 Q table zeroes, 78.12138310185185 percent filled
  570. ==============================
  571. Running evaluation after 40000 episodes
  572. Evaluation results after 200 trials
  573. Average time steps taken: 224.98
  574. Average number of penalties incurred: 223.27
  575. Had 171 wins in 200 episodes
  576. ==============================
  577. Current Episode: 40422
  578. Reward distribution: Counter({-3: 14517, -12: 12223, -4: 11759, -11: 4040, -20: 3956, -30: 1368, -2: 1181, 99: 636, -10: 174, 90: 146})
  579. Last 10 episode lengths (avg: 64.44)
  580. 48391 Q table zeroes, 78.12183521412037 percent filled
  581. ==============================
  582. Running evaluation after 41000 episodes
  583. Evaluation results after 200 trials
  584. Average time steps taken: 262.16
  585. Average number of penalties incurred: 260.5
  586. Had 166 wins in 200 episodes
  587. ==============================
  588. Current Episode: 41240
  589. Reward distribution: Counter({-3: 15103, -12: 12163, -4: 10940, -11: 4302, -20: 3912, -30: 1371, -2: 1184, 99: 686, -10: 212, 90: 127})
  590. Last 10 episode lengths (avg: 62.48)
  591. 48390 Q table zeroes, 78.12228732638889 percent filled
  592. ==============================
  593. Running evaluation after 42000 episodes
  594. Evaluation results after 200 trials
  595. Average time steps taken: 314.625
  596. Average number of penalties incurred: 313.035
  597. Had 159 wins in 200 episodes
  598. ==============================
  599. Current Episode: 42037
  600. Reward distribution: Counter({-3: 15293, -12: 12126, -4: 10755, -11: 4289, -20: 3990, -30: 1357, -2: 1207, 99: 656, -10: 191, 90: 136})
  601. Last 10 episode lengths (avg: 62.56)
  602. 48390 Q table zeroes, 78.12228732638889 percent filled
  603. Current Episode: 42867
  604. Reward distribution: Counter({-3: 15136, -12: 12113, -4: 11190, -11: 4144, -20: 3864, -30: 1332, -2: 1202, 99: 675, -10: 201, 90: 143})
  605. Last 10 episode lengths (avg: 67.04)
  606. 48390 Q table zeroes, 78.12228732638889 percent filled
  607. ==============================
  608. Running evaluation after 43000 episodes
  609. Evaluation results after 200 trials
  610. Average time steps taken: 314.43
  611. Average number of penalties incurred: 312.84
  612. Had 159 wins in 200 episodes
  613. ==============================
  614. Current Episode: 43688
  615. Reward distribution: Counter({-3: 15117, -12: 12399, -4: 10866, -11: 4140, -20: 3825, -30: 1405, -2: 1243, 99: 658, -10: 192, 90: 155})
  616. Last 10 episode lengths (avg: 73.54)
  617. 48390 Q table zeroes, 78.12228732638889 percent filled
  618. ==============================
  619. Running evaluation after 44000 episodes
  620. Evaluation results after 200 trials
  621. Average time steps taken: 202.66
  622. Average number of penalties incurred: 200.92
  623. Had 174 wins in 200 episodes
  624. ==============================
  625. Current Episode: 44473
  626. Reward distribution: Counter({-3: 14792, -12: 12254, -4: 11251, -11: 4149, -20: 4122, -30: 1308, -2: 1169, 99: 653, -10: 180, 90: 122})
  627. Last 10 episode lengths (avg: 58.72)
  628. 48390 Q table zeroes, 78.12228732638889 percent filled
  629. ==============================
  630. Running evaluation after 45000 episodes
  631. Evaluation results after 200 trials
  632. Average time steps taken: 202.71
  633. Average number of penalties incurred: 200.97
  634. Had 174 wins in 200 episodes
  635. ==============================
  636. Current Episode: 45266
  637. Reward distribution: Counter({-3: 15201, -12: 12174, -4: 10946, -11: 4142, -20: 4000, -30: 1335, -2: 1237, 99: 672, -10: 179, 90: 114})
  638. Last 10 episode lengths (avg: 62.46)
  639. 48390 Q table zeroes, 78.12228732638889 percent filled
  640. ==============================
  641. Running evaluation after 46000 episodes
  642. Evaluation results after 200 trials
  643. Average time steps taken: 225.11
  644. Average number of penalties incurred: 223.4
  645. Had 171 wins in 200 episodes
  646. ==============================
  647. Current Episode: 46079
  648. Reward distribution: Counter({-3: 14756, -12: 12307, -4: 11201, -11: 4154, -20: 3974, -30: 1422, -2: 1195, 99: 670, -10: 189, 90: 132})
  649. Last 10 episode lengths (avg: 67.38)
  650. 48389 Q table zeroes, 78.1227394386574 percent filled
  651. Current Episode: 46900
  652. Reward distribution: Counter({-3: 14980, -12: 12049, -4: 11369, -11: 4163, -20: 3902, -30: 1292, -2: 1253, 99: 662, -10: 179, 90: 151})
  653. Last 10 episode lengths (avg: 54.1)
  654. 48389 Q table zeroes, 78.1227394386574 percent filled
  655. ==============================
  656. Running evaluation after 47000 episodes
  657. Evaluation results after 200 trials
  658. Average time steps taken: 240.11
  659. Average number of penalties incurred: 238.42
  660. Had 169 wins in 200 episodes
  661. ==============================
  662. Current Episode: 47745
  663. Reward distribution: Counter({-3: 15355, -12: 12024, -4: 10685, -11: 4316, -20: 4010, -30: 1331, -2: 1250, 99: 711, -10: 187, 90: 131})
  664. Last 10 episode lengths (avg: 58.74)
  665. 48388 Q table zeroes, 78.12319155092592 percent filled
  666. ==============================
  667. Running evaluation after 48000 episodes
  668. Evaluation results after 200 trials
  669. Average time steps taken: 173.045
  670. Average number of penalties incurred: 171.265
  671. Had 178 wins in 200 episodes
  672. ==============================
  673. Current Episode: 48573
  674. Reward distribution: Counter({-3: 14990, -12: 12336, -4: 10959, -11: 4184, -20: 3961, -30: 1363, -2: 1212, 99: 691, -10: 174, 90: 130})
  675. Last 10 episode lengths (avg: 55.16)
  676. 48388 Q table zeroes, 78.12319155092592 percent filled
  677. ==============================
  678. Running evaluation after 49000 episodes
  679. Evaluation results after 200 trials
  680. Average time steps taken: 210.315
  681. Average number of penalties incurred: 208.585
  682. Had 173 wins in 200 episodes
  683. ==============================
  684. Current Episode: 49404
  685. Reward distribution: Counter({-3: 15343, -12: 12055, -4: 11018, -11: 4143, -20: 3906, -30: 1266, -2: 1260, 99: 699, -10: 185, 90: 125})
  686. Last 10 episode lengths (avg: 63.0)
  687. 48388 Q table zeroes, 78.12319155092592 percent filled
  688. Training finished.
RAW Paste Data