Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Total encoded states are 6144
- ==============================
- Running evaluation after 0 episodes
- Evaluation results after 200 trials
- Average time steps taken: 1500.0
- Average number of penalties incurred: 1500.0
- Had 0 wins in 200 episodes
- ==============================
- Current Episode: 78
- Reward distribution: Counter({-4: 16236, -12: 14641, -3: 8871, -20: 4909, -11: 2483, -30: 2128, -2: 436, -10: 227, 99: 43, 90: 26})
- Last 10 episode lengths (avg: 563.54)
- 15546 Q table 1 zeroes, 57.828776041666664 percent filled
- 15524 Q table 2 zeroes, 57.888454861111114 percent filled
- Current Episode: 207
- Reward distribution: Counter({-12: 15905, -4: 14499, -3: 9253, -20: 4763, -11: 2997, -30: 1632, -2: 575, -10: 253, 99: 86, 90: 37})
- Last 10 episode lengths (avg: 380.8)
- 9698 Q table 1 zeroes, 73.69249131944444 percent filled
- 9702 Q table 2 zeroes, 73.681640625 percent filled
- Current Episode: 332
- Reward distribution: Counter({-12: 16935, -4: 14206, -3: 8991, -20: 4169, -11: 3063, -30: 1496, -2: 764, -10: 255, 99: 96, 90: 25})
- Last 10 episode lengths (avg: 382.14)
- 8500 Q table 1 zeroes, 76.94227430555556 percent filled
- 8505 Q table 2 zeroes, 76.9287109375 percent filled
- Current Episode: 463
- Reward distribution: Counter({-12: 17773, -4: 14211, -3: 8342, -20: 4075, -11: 3095, -30: 1319, -2: 807, -10: 256, 99: 85, 90: 37})
- Last 10 episode lengths (avg: 346.18)
- 8243 Q table 1 zeroes, 77.63943142361111 percent filled
- 8244 Q table 2 zeroes, 77.63671875 percent filled
- Current Episode: 609
- Reward distribution: Counter({-12: 17565, -4: 14015, -3: 8414, -20: 4136, -11: 3263, -30: 1414, -2: 783, -10: 269, 99: 111, 90: 30})
- Last 10 episode lengths (avg: 297.88)
- 8125 Q table 1 zeroes, 77.95952690972221 percent filled
- 8126 Q table 2 zeroes, 77.95681423611111 percent filled
- Current Episode: 760
- Reward distribution: Counter({-12: 17868, -4: 13579, -3: 8422, -20: 4102, -11: 3412, -30: 1398, -2: 779, -10: 292, 99: 97, 90: 51})
- Last 10 episode lengths (avg: 294.4)
- 8092 Q table 1 zeroes, 78.04904513888889 percent filled
- 8092 Q table 2 zeroes, 78.04904513888889 percent filled
- Current Episode: 940
- Reward distribution: Counter({-12: 18162, -4: 13845, -3: 8019, -20: 4100, -11: 3299, -30: 1319, -2: 775, -10: 304, 99: 137, 90: 40})
- Last 10 episode lengths (avg: 301.72)
- 8074 Q table 1 zeroes, 78.09787326388889 percent filled
- 8074 Q table 2 zeroes, 78.09787326388889 percent filled
- ==============================
- Running evaluation after 1000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 1365.62
- Average number of penalties incurred: 1313.53
- Had 18 wins in 200 episodes
- ==============================
- Current Episode: 1136
- Reward distribution: Counter({-12: 18547, -4: 13871, -3: 7794, -20: 3898, -11: 3327, -30: 1281, -2: 810, -10: 279, 99: 134, 90: 59})
- Last 10 episode lengths (avg: 263.08)
- 8072 Q table 1 zeroes, 78.10329861111111 percent filled
- 8072 Q table 2 zeroes, 78.10329861111111 percent filled
- Current Episode: 1356
- Reward distribution: Counter({-12: 18757, -4: 13830, -3: 7701, -20: 4038, -11: 3193, -30: 1219, -2: 771, -10: 277, 99: 142, 90: 72})
- Last 10 episode lengths (avg: 240.66)
- 8070 Q table 1 zeroes, 78.10872395833334 percent filled
- 8070 Q table 2 zeroes, 78.10872395833334 percent filled
- Current Episode: 1594
- Reward distribution: Counter({-12: 18412, -4: 13257, -3: 8170, -20: 3962, -11: 3436, -30: 1419, -2: 806, -10: 304, 99: 158, 90: 76})
- Last 10 episode lengths (avg: 212.8)
- 8067 Q table 1 zeroes, 78.11686197916666 percent filled
- 8067 Q table 2 zeroes, 78.11686197916666 percent filled
- Current Episode: 1850
- Reward distribution: Counter({-12: 18926, -4: 13619, -3: 7642, -20: 3968, -11: 3217, -30: 1345, -2: 773, -10: 258, 99: 177, 90: 75})
- Last 10 episode lengths (avg: 152.66)
- 8066 Q table 1 zeroes, 78.11957465277779 percent filled
- 8066 Q table 2 zeroes, 78.11957465277779 percent filled
- ==============================
- Running evaluation after 2000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 1313.485
- Average number of penalties incurred: 1282.55
- Had 25 wins in 200 episodes
- ==============================
- Current Episode: 2118
- Reward distribution: Counter({-12: 18243, -4: 13003, -3: 8347, -20: 4021, -11: 3706, -30: 1353, -2: 785, -10: 277, 99: 157, 90: 108})
- Last 10 episode lengths (avg: 174.96)
- 8065 Q table 1 zeroes, 78.12228732638889 percent filled
- 8065 Q table 2 zeroes, 78.12228732638889 percent filled
- Current Episode: 2436
- Reward distribution: Counter({-12: 18241, -4: 13010, -3: 8532, -20: 3969, -11: 3525, -30: 1292, -2: 780, -10: 334, 99: 194, 90: 123})
- Last 10 episode lengths (avg: 140.18)
- 8065 Q table 1 zeroes, 78.12228732638889 percent filled
- 8065 Q table 2 zeroes, 78.12228732638889 percent filled
- Current Episode: 2765
- Reward distribution: Counter({-12: 18255, -4: 12707, -3: 8732, -20: 4121, -11: 3482, -30: 1325, -2: 755, -10: 300, 99: 188, 90: 135})
- Last 10 episode lengths (avg: 134.7)
- 8065 Q table 1 zeroes, 78.12228732638889 percent filled
- 8065 Q table 2 zeroes, 78.12228732638889 percent filled
- ==============================
- Running evaluation after 3000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 1045.075
- Average number of penalties incurred: 1007.03
- Had 61 wins in 200 episodes
- ==============================
- Current Episode: 3109
- Reward distribution: Counter({-12: 18105, -4: 12934, -3: 8584, -20: 4067, -11: 3579, -30: 1324, -2: 753, -10: 317, 99: 188, 90: 149})
- Last 10 episode lengths (avg: 122.02)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 3464
- Reward distribution: Counter({-12: 17368, -4: 11512, -3: 10147, -11: 4080, -20: 3978, -30: 1427, -2: 836, -10: 301, 99: 190, 90: 161})
- Last 10 episode lengths (avg: 138.88)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 3850
- Reward distribution: Counter({-12: 17172, -4: 11752, -3: 9906, -20: 4163, -11: 4096, -30: 1481, -2: 767, -10: 290, 99: 198, 90: 175})
- Last 10 episode lengths (avg: 130.32)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 4000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 1037.775
- Average number of penalties incurred: 1034.65
- Had 62 wins in 200 episodes
- ==============================
- Current Episode: 4277
- Reward distribution: Counter({-12: 16909, -4: 11606, -3: 10692, -11: 3995, -20: 3873, -30: 1389, -2: 812, -10: 308, 99: 210, 90: 206})
- Last 10 episode lengths (avg: 108.5)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 4721
- Reward distribution: Counter({-12: 17092, -4: 10975, -3: 10850, -11: 4040, -20: 3989, -30: 1493, -2: 788, -10: 334, 99: 227, 90: 212})
- Last 10 episode lengths (avg: 118.46)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 5000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 858.98
- Average number of penalties incurred: 856.645
- Had 86 wins in 200 episodes
- ==============================
- Current Episode: 5184
- Reward distribution: Counter({-12: 16646, -3: 10993, -4: 10954, -11: 4422, -20: 4025, -30: 1393, -2: 785, -10: 324, 90: 248, 99: 210})
- Last 10 episode lengths (avg: 108.6)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 5603
- Reward distribution: Counter({-12: 16664, -3: 11182, -4: 10970, -11: 4254, -20: 3980, -30: 1371, -2: 879, -10: 287, 90: 234, 99: 179})
- Last 10 episode lengths (avg: 102.98)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 6000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 687.67
- Average number of penalties incurred: 684.09
- Had 109 wins in 200 episodes
- ==============================
- Current Episode: 6095
- Reward distribution: Counter({-12: 16605, -3: 11217, -4: 10687, -11: 4378, -20: 3956, -30: 1492, -2: 865, -10: 319, 90: 277, 99: 204})
- Last 10 episode lengths (avg: 96.92)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 6551
- Reward distribution: Counter({-12: 16164, -3: 11626, -4: 10414, -11: 4784, -20: 3999, -30: 1398, -2: 866, -10: 303, 90: 273, 99: 173})
- Last 10 episode lengths (avg: 124.7)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 7000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 606.155
- Average number of penalties incurred: 605.04
- Had 120 wins in 200 episodes
- ==============================
- Current Episode: 7070
- Reward distribution: Counter({-12: 16430, -3: 11566, -4: 10704, -11: 4205, -20: 3954, -30: 1499, -2: 813, -10: 316, 90: 313, 99: 200})
- Last 10 episode lengths (avg: 115.9)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 7582
- Reward distribution: Counter({-12: 16093, -3: 11523, -4: 10834, -11: 4299, -20: 4033, -30: 1520, -2: 840, -10: 347, 90: 326, 99: 185})
- Last 10 episode lengths (avg: 120.1)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 8000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 605.74
- Average number of penalties incurred: 599.405
- Had 120 wins in 200 episodes
- ==============================
- Current Episode: 8096
- Reward distribution: Counter({-12: 15989, -3: 11714, -4: 10738, -11: 4313, -20: 4014, -30: 1484, -2: 859, -10: 380, 90: 329, 99: 180})
- Last 10 episode lengths (avg: 102.98)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 8589
- Reward distribution: Counter({-12: 15374, -3: 12538, -4: 9959, -11: 4705, -20: 4104, -30: 1605, -2: 873, -10: 360, 90: 301, 99: 181})
- Last 10 episode lengths (avg: 86.54)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 9000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 590.95
- Average number of penalties incurred: 589.885
- Had 122 wins in 200 episodes
- ==============================
- Current Episode: 9099
- Reward distribution: Counter({-12: 15838, -3: 11751, -4: 10855, -11: 4403, -20: 4038, -30: 1476, -2: 811, 90: 330, -10: 328, 99: 170})
- Last 10 episode lengths (avg: 86.54)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 9650
- Reward distribution: Counter({-12: 15274, -3: 12905, -4: 9714, -11: 4702, -20: 4031, -30: 1618, -2: 825, -10: 384, 90: 377, 99: 170})
- Last 10 episode lengths (avg: 97.16)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 10000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 561.445
- Average number of penalties incurred: 552.865
- Had 126 wins in 200 episodes
- ==============================
- Current Episode: 10237
- Reward distribution: Counter({-12: 15030, -3: 12976, -4: 10085, -11: 4492, -20: 3905, -30: 1686, -2: 828, -10: 421, 90: 406, 99: 171})
- Last 10 episode lengths (avg: 87.66)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 10761
- Reward distribution: Counter({-12: 15322, -3: 12727, -4: 9937, -11: 4685, -20: 3964, -30: 1634, -2: 840, -10: 375, 90: 365, 99: 151})
- Last 10 episode lengths (avg: 89.9)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 11000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 449.455
- Average number of penalties incurred: 448.205
- Had 141 wins in 200 episodes
- ==============================
- Current Episode: 11326
- Reward distribution: Counter({-12: 15321, -3: 12575, -4: 10546, -11: 4281, -20: 3848, -30: 1600, -2: 848, -10: 418, 90: 400, 99: 163})
- Last 10 episode lengths (avg: 82.1)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 11926
- Reward distribution: Counter({-12: 15002, -3: 12758, -4: 10269, -11: 4509, -20: 3983, -30: 1619, -2: 834, -10: 430, 90: 416, 99: 180})
- Last 10 episode lengths (avg: 82.12)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 12000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 479.44
- Average number of penalties incurred: 469.51
- Had 132 wins in 200 episodes
- ==============================
- Current Episode: 12504
- Reward distribution: Counter({-12: 15175, -3: 12377, -4: 10901, -11: 4140, -20: 3916, -30: 1716, -2: 800, -10: 410, 90: 392, 99: 173})
- Last 10 episode lengths (avg: 92.48)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 13000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 449.515
- Average number of penalties incurred: 444.605
- Had 141 wins in 200 episodes
- ==============================
- Current Episode: 13115
- Reward distribution: Counter({-12: 14838, -3: 12990, -4: 10241, -11: 4541, -20: 3901, -30: 1599, -2: 845, -10: 442, 90: 432, 99: 171})
- Last 10 episode lengths (avg: 77.26)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 13685
- Reward distribution: Counter({-12: 14920, -3: 12772, -4: 10446, -11: 4439, -20: 3845, -30: 1708, -2: 852, -10: 460, 90: 391, 99: 167})
- Last 10 episode lengths (avg: 105.0)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 14000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 337.83
- Average number of penalties incurred: 336.535
- Had 156 wins in 200 episodes
- ==============================
- Current Episode: 14311
- Reward distribution: Counter({-12: 14896, -3: 12656, -4: 10624, -11: 4248, -20: 3911, -30: 1771, -2: 830, -10: 452, 90: 429, 99: 183})
- Last 10 episode lengths (avg: 103.02)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 14921
- Reward distribution: Counter({-12: 14492, -3: 13014, -4: 10568, -11: 4363, -20: 3873, -30: 1818, -2: 829, -10: 444, 90: 441, 99: 158})
- Last 10 episode lengths (avg: 72.2)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 15000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 375.08
- Average number of penalties incurred: 373.815
- Had 151 wins in 200 episodes
- ==============================
- Current Episode: 15518
- Reward distribution: Counter({-12: 14688, -3: 12882, -4: 10438, -11: 4418, -20: 3999, -30: 1766, -2: 764, -10: 459, 90: 427, 99: 159})
- Last 10 episode lengths (avg: 94.86)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 16000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 397.555
- Average number of penalties incurred: 396.32
- Had 148 wins in 200 episodes
- ==============================
- Current Episode: 16157
- Reward distribution: Counter({-12: 14952, -3: 12617, -4: 10620, -11: 4302, -20: 3928, -30: 1703, -2: 824, 90: 467, -10: 423, 99: 164})
- Last 10 episode lengths (avg: 72.12)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 16765
- Reward distribution: Counter({-12: 14656, -3: 12951, -4: 10247, -11: 4564, -20: 4041, -30: 1713, -2: 799, -10: 435, 90: 429, 99: 165})
- Last 10 episode lengths (avg: 92.56)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 17000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 405.055
- Average number of penalties incurred: 403.8
- Had 147 wins in 200 episodes
- ==============================
- Current Episode: 17360
- Reward distribution: Counter({-12: 14657, -3: 12980, -4: 10351, -11: 4446, -20: 3969, -30: 1725, -2: 810, -10: 477, 90: 415, 99: 170})
- Last 10 episode lengths (avg: 80.68)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 17957
- Reward distribution: Counter({-12: 14534, -3: 13057, -4: 10323, -11: 4444, -20: 4045, -30: 1723, -2: 821, -10: 465, 90: 424, 99: 164})
- Last 10 episode lengths (avg: 81.18)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 18000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 419.765
- Average number of penalties incurred: 410.945
- Had 145 wins in 200 episodes
- ==============================
- Current Episode: 18567
- Reward distribution: Counter({-12: 14434, -3: 12967, -4: 10585, -11: 4482, -20: 3955, -30: 1658, -2: 826, -10: 493, 90: 442, 99: 158})
- Last 10 episode lengths (avg: 82.86)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 19000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 382.335
- Average number of penalties incurred: 373.56
- Had 150 wins in 200 episodes
- ==============================
- Current Episode: 19198
- Reward distribution: Counter({-12: 14177, -3: 13460, -4: 10308, -11: 4483, -20: 3825, -30: 1808, -2: 841, -10: 476, 90: 465, 99: 157})
- Last 10 episode lengths (avg: 83.92)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 19813
- Reward distribution: Counter({-12: 14277, -3: 13247, -4: 10433, -11: 4489, -20: 4004, -30: 1658, -2: 817, -10: 476, 90: 443, 99: 156})
- Last 10 episode lengths (avg: 79.62)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 20000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 487.09
- Average number of penalties incurred: 480.86
- Had 136 wins in 200 episodes
- ==============================
- Current Episode: 20453
- Reward distribution: Counter({-12: 14231, -3: 13034, -4: 10585, -11: 4419, -20: 4004, -30: 1812, -2: 739, -10: 542, 90: 466, 99: 168})
- Last 10 episode lengths (avg: 100.88)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 21000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 338.18
- Average number of penalties incurred: 335.115
- Had 156 wins in 200 episodes
- ==============================
- Current Episode: 21073
- Reward distribution: Counter({-12: 14364, -3: 12954, -4: 10742, -11: 4379, -20: 3957, -30: 1715, -2: 753, -10: 533, 90: 435, 99: 168})
- Last 10 episode lengths (avg: 79.54)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 21670
- Reward distribution: Counter({-12: 14366, -3: 13117, -4: 10635, -11: 4511, -20: 3910, -30: 1632, -2: 744, -10: 496, 90: 417, 99: 172})
- Last 10 episode lengths (avg: 81.02)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 22000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 338.115
- Average number of penalties incurred: 333.195
- Had 156 wins in 200 episodes
- ==============================
- Current Episode: 22284
- Reward distribution: Counter({-12: 14302, -3: 12926, -4: 10894, -11: 4343, -20: 3973, -30: 1674, -2: 759, -10: 530, 90: 435, 99: 164})
- Last 10 episode lengths (avg: 78.04)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 22890
- Reward distribution: Counter({-12: 14029, -3: 13648, -4: 10201, -11: 4537, -20: 3935, -30: 1837, -2: 707, -10: 524, 90: 428, 99: 154})
- Last 10 episode lengths (avg: 94.58)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 23000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 271.365
- Average number of penalties incurred: 270.245
- Had 165 wins in 200 episodes
- ==============================
- Current Episode: 23485
- Reward distribution: Counter({-12: 14376, -3: 12795, -4: 10820, -11: 4420, -20: 3964, -30: 1745, -2: 762, -10: 531, 90: 426, 99: 161})
- Last 10 episode lengths (avg: 79.12)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 24000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 360.49
- Average number of penalties incurred: 359.375
- Had 153 wins in 200 episodes
- ==============================
- Current Episode: 24114
- Reward distribution: Counter({-12: 14171, -3: 13113, -4: 10677, -11: 4356, -20: 3904, -30: 1875, -2: 712, -10: 570, 90: 453, 99: 169})
- Last 10 episode lengths (avg: 78.52)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 24752
- Reward distribution: Counter({-12: 14101, -3: 13567, -4: 10279, -11: 4424, -20: 3956, -30: 1767, -2: 689, -10: 586, 90: 458, 99: 173})
- Last 10 episode lengths (avg: 78.84)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 25000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 323.16
- Average number of penalties incurred: 321.95
- Had 158 wins in 200 episodes
- ==============================
- Current Episode: 25392
- Reward distribution: Counter({-12: 14004, -3: 13430, -4: 10356, -11: 4443, -20: 3963, -30: 1876, -2: 718, -10: 578, 90: 476, 99: 156})
- Last 10 episode lengths (avg: 76.04)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 26000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 367.72
- Average number of penalties incurred: 366.665
- Had 152 wins in 200 episodes
- ==============================
- Current Episode: 26005
- Reward distribution: Counter({-12: 14208, -3: 13188, -4: 10442, -11: 4536, -20: 4065, -30: 1699, -2: 697, -10: 561, 90: 431, 99: 173})
- Last 10 episode lengths (avg: 77.06)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 26646
- Reward distribution: Counter({-12: 14136, -3: 13345, -4: 10527, -11: 4463, -20: 3893, -30: 1724, -2: 693, -10: 588, 90: 466, 99: 165})
- Last 10 episode lengths (avg: 73.36)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 27000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 270.955
- Average number of penalties incurred: 269.895
- Had 165 wins in 200 episodes
- ==============================
- Current Episode: 27275
- Reward distribution: Counter({-12: 13800, -3: 13695, -4: 10263, -11: 4588, -20: 3951, -30: 1781, -2: 679, -10: 623, 90: 433, 99: 187})
- Last 10 episode lengths (avg: 75.2)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 27897
- Reward distribution: Counter({-12: 14292, -3: 12911, -4: 11016, -11: 4282, -20: 3938, -30: 1686, -2: 669, -10: 594, 90: 443, 99: 169})
- Last 10 episode lengths (avg: 72.1)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 28000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 315.875
- Average number of penalties incurred: 314.665
- Had 159 wins in 200 episodes
- ==============================
- Current Episode: 28524
- Reward distribution: Counter({-12: 14040, -3: 13440, -4: 10463, -11: 4498, -20: 3944, -30: 1725, -2: 663, -10: 609, 90: 451, 99: 167})
- Last 10 episode lengths (avg: 95.02)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 29000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 300.76
- Average number of penalties incurred: 299.635
- Had 161 wins in 200 episodes
- ==============================
- Current Episode: 29175
- Reward distribution: Counter({-12: 14012, -3: 13396, -4: 10536, -11: 4403, -20: 4026, -30: 1663, -2: 689, -10: 631, 90: 455, 99: 189})
- Last 10 episode lengths (avg: 86.1)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 29794
- Reward distribution: Counter({-12: 13781, -3: 13420, -4: 10579, -11: 4571, -20: 4022, -30: 1711, -2: 675, -10: 631, 90: 446, 99: 164})
- Last 10 episode lengths (avg: 91.46)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 30000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 367.66
- Average number of penalties incurred: 364.195
- Had 137 wins in 200 episodes
- ==============================
- Current Episode: 30416
- Reward distribution: Counter({-12: 14304, -3: 12844, -4: 11222, -11: 4135, -20: 3933, -30: 1733, -2: 640, -10: 580, 90: 430, 99: 179})
- Last 10 episode lengths (avg: 92.78)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 31000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 382.64
- Average number of penalties incurred: 381.68
- Had 150 wins in 200 episodes
- ==============================
- Current Episode: 31030
- Reward distribution: Counter({-12: 13675, -3: 13655, -4: 10510, -11: 4471, -20: 3908, -30: 1901, -2: 663, -10: 613, 90: 433, 99: 171})
- Last 10 episode lengths (avg: 81.28)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 31659
- Reward distribution: Counter({-12: 13985, -3: 13261, -4: 10815, -11: 4377, -20: 4043, -30: 1650, -2: 638, -10: 611, 90: 457, 99: 163})
- Last 10 episode lengths (avg: 78.22)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 32000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 345.51
- Average number of penalties incurred: 344.46
- Had 155 wins in 200 episodes
- ==============================
- Current Episode: 32287
- Reward distribution: Counter({-12: 13867, -3: 13586, -4: 10574, -11: 4493, -20: 3922, -30: 1676, -2: 634, -10: 624, 90: 454, 99: 170})
- Last 10 episode lengths (avg: 70.28)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 32949
- Reward distribution: Counter({-12: 13656, -3: 13570, -4: 10695, -11: 4492, -20: 3934, -30: 1733, -2: 642, -10: 626, 90: 458, 99: 194})
- Last 10 episode lengths (avg: 89.22)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 33000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 367.965
- Average number of penalties incurred: 366.935
- Had 152 wins in 200 episodes
- ==============================
- Current Episode: 33570
- Reward distribution: Counter({-12: 13962, -3: 13261, -4: 10797, -11: 4427, -20: 3933, -30: 1735, -10: 656, -2: 617, 90: 444, 99: 168})
- Last 10 episode lengths (avg: 86.86)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 34000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 457.435
- Average number of penalties incurred: 445.2
- Had 140 wins in 200 episodes
- ==============================
- Current Episode: 34196
- Reward distribution: Counter({-12: 13867, -3: 13416, -4: 10810, -11: 4360, -20: 3922, -30: 1696, -2: 664, -10: 652, 90: 444, 99: 169})
- Last 10 episode lengths (avg: 74.5)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 34792
- Reward distribution: Counter({-12: 13922, -3: 13388, -4: 10780, -11: 4411, -20: 3966, -30: 1764, -10: 619, -2: 565, 90: 419, 99: 166})
- Last 10 episode lengths (avg: 93.52)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 35000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 360.21
- Average number of penalties incurred: 359.15
- Had 153 wins in 200 episodes
- ==============================
- Current Episode: 35414
- Reward distribution: Counter({-12: 13892, -3: 13065, -4: 10874, -11: 4337, -20: 4062, -30: 1887, -10: 659, -2: 612, 90: 438, 99: 174})
- Last 10 episode lengths (avg: 84.38)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 36000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 367.675
- Average number of penalties incurred: 366.735
- Had 152 wins in 200 episodes
- ==============================
- Current Episode: 36051
- Reward distribution: Counter({-12: 13800, -3: 13210, -4: 11182, -11: 4126, -20: 3976, -30: 1780, -10: 675, -2: 619, 90: 446, 99: 186})
- Last 10 episode lengths (avg: 69.36)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 36673
- Reward distribution: Counter({-12: 13613, -3: 13512, -4: 10724, -11: 4459, -20: 4029, -30: 1785, -10: 660, -2: 618, 90: 416, 99: 184})
- Last 10 episode lengths (avg: 75.94)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 37000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 248.625
- Average number of penalties incurred: 247.665
- Had 156 wins in 200 episodes
- ==============================
- Current Episode: 37302
- Reward distribution: Counter({-12: 13773, -3: 13537, -4: 10591, -11: 4392, -20: 3972, -30: 1824, -10: 663, -2: 634, 90: 434, 99: 180})
- Last 10 episode lengths (avg: 81.02)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 37893
- Reward distribution: Counter({-12: 13886, -3: 13001, -4: 11143, -11: 4360, -20: 3951, -30: 1860, -10: 628, -2: 592, 90: 399, 99: 180})
- Last 10 episode lengths (avg: 81.52)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 38000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 613.67
- Average number of penalties incurred: 612.785
- Had 119 wins in 200 episodes
- ==============================
- Current Episode: 38497
- Reward distribution: Counter({-12: 13781, -3: 13546, -4: 10474, -11: 4563, -20: 3962, -30: 1828, -10: 626, -2: 624, 90: 440, 99: 156})
- Last 10 episode lengths (avg: 79.48)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 39000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 211.575
- Average number of penalties incurred: 206.71
- Had 173 wins in 200 episodes
- ==============================
- Current Episode: 39116
- Reward distribution: Counter({-3: 14114, -12: 13655, -4: 9979, -11: 4525, -20: 3999, -30: 1856, -2: 635, -10: 632, 90: 436, 99: 169})
- Last 10 episode lengths (avg: 88.96)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 39732
- Reward distribution: Counter({-12: 13901, -3: 13461, -4: 10706, -11: 4339, -20: 3940, -30: 1752, -10: 675, -2: 619, 90: 446, 99: 161})
- Last 10 episode lengths (avg: 78.6)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 40000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 382.64
- Average number of penalties incurred: 381.65
- Had 150 wins in 200 episodes
- ==============================
- Current Episode: 40342
- Reward distribution: Counter({-3: 13789, -12: 13784, -4: 10429, -11: 4345, -20: 4011, -30: 1829, -10: 624, -2: 589, 90: 419, 99: 181})
- Last 10 episode lengths (avg: 72.16)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 40975
- Reward distribution: Counter({-12: 13773, -3: 13570, -4: 10570, -11: 4430, -20: 3998, -30: 1818, -10: 652, -2: 585, 90: 443, 99: 161})
- Last 10 episode lengths (avg: 76.06)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 41000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 509.11
- Average number of penalties incurred: 508.125
- Had 133 wins in 200 episodes
- ==============================
- Current Episode: 41585
- Reward distribution: Counter({-12: 13920, -3: 13184, -4: 10943, -11: 4272, -20: 4014, -30: 1807, -10: 671, -2: 589, 90: 412, 99: 188})
- Last 10 episode lengths (avg: 94.94)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 42000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 472.25
- Average number of penalties incurred: 471.295
- Had 138 wins in 200 episodes
- ==============================
- Current Episode: 42180
- Reward distribution: Counter({-12: 13858, -3: 13532, -4: 10833, -11: 4407, -20: 3820, -30: 1703, -10: 657, -2: 607, 90: 419, 99: 164})
- Last 10 episode lengths (avg: 93.72)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 42814
- Reward distribution: Counter({-3: 14016, -12: 13572, -4: 10150, -11: 4636, -20: 3967, -30: 1783, -10: 690, -2: 573, 90: 436, 99: 177})
- Last 10 episode lengths (avg: 90.86)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 43000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 427.6
- Average number of penalties incurred: 426.63
- Had 144 wins in 200 episodes
- ==============================
- Current Episode: 43423
- Reward distribution: Counter({-12: 13790, -3: 13406, -4: 10682, -11: 4428, -20: 3869, -30: 1952, -10: 679, -2: 598, 90: 434, 99: 162})
- Last 10 episode lengths (avg: 90.76)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 44000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 293.295
- Average number of penalties incurred: 292.25
- Had 162 wins in 200 episodes
- ==============================
- Current Episode: 44035
- Reward distribution: Counter({-12: 13791, -3: 13481, -4: 10691, -11: 4391, -20: 3910, -30: 1840, -10: 710, -2: 584, 90: 425, 99: 177})
- Last 10 episode lengths (avg: 77.52)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 44681
- Reward distribution: Counter({-12: 13910, -3: 13580, -4: 10628, -11: 4224, -20: 3979, -30: 1776, -10: 683, -2: 585, 90: 450, 99: 185})
- Last 10 episode lengths (avg: 90.08)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 45000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 397.55
- Average number of penalties incurred: 396.535
- Had 148 wins in 200 episodes
- ==============================
- Current Episode: 45302
- Reward distribution: Counter({-12: 13816, -3: 13614, -4: 10698, -11: 4325, -20: 3961, -30: 1740, -10: 680, -2: 556, 90: 434, 99: 176})
- Last 10 episode lengths (avg: 84.46)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 45939
- Reward distribution: Counter({-12: 13962, -3: 13259, -4: 11091, -11: 4197, -20: 3846, -30: 1764, -10: 662, -2: 592, 90: 467, 99: 160})
- Last 10 episode lengths (avg: 73.12)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 46000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 412.755
- Average number of penalties incurred: 411.76
- Had 146 wins in 200 episodes
- ==============================
- Current Episode: 46561
- Reward distribution: Counter({-12: 13906, -3: 13435, -4: 10668, -11: 4224, -20: 3989, -30: 1891, -10: 695, -2: 584, 90: 436, 99: 172})
- Last 10 episode lengths (avg: 74.22)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 47000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 405.4
- Average number of penalties incurred: 404.51
- Had 147 wins in 200 episodes
- ==============================
- Current Episode: 47173
- Reward distribution: Counter({-12: 13821, -3: 13267, -4: 11020, -11: 4261, -20: 3943, -30: 1807, -10: 696, -2: 584, 90: 425, 99: 176})
- Last 10 episode lengths (avg: 77.6)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 47792
- Reward distribution: Counter({-3: 13811, -12: 13742, -4: 10491, -11: 4400, -20: 3913, -30: 1789, -10: 679, -2: 565, 90: 450, 99: 160})
- Last 10 episode lengths (avg: 76.32)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 48000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 323.39
- Average number of penalties incurred: 322.44
- Had 158 wins in 200 episodes
- ==============================
- Current Episode: 48445
- Reward distribution: Counter({-12: 13993, -3: 12754, -4: 11561, -20: 3995, -11: 3972, -30: 1907, -10: 649, -2: 524, 90: 476, 99: 169})
- Last 10 episode lengths (avg: 78.08)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- ==============================
- Running evaluation after 49000 episodes
- Evaluation results after 200 trials
- Average time steps taken: 434.975
- Average number of penalties incurred: 434.115
- Had 143 wins in 200 episodes
- ==============================
- Current Episode: 49063
- Reward distribution: Counter({-3: 13928, -12: 13605, -4: 10286, -11: 4542, -20: 3917, -30: 1874, -10: 665, -2: 575, 90: 433, 99: 175})
- Last 10 episode lengths (avg: 75.1)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Current Episode: 49706
- Reward distribution: Counter({-12: 13870, -3: 13169, -4: 11054, -11: 4251, -20: 3985, -30: 1810, -10: 704, -2: 529, 90: 436, 99: 192})
- Last 10 episode lengths (avg: 76.12)
- 8064 Q table 1 zeroes, 78.125 percent filled
- 8064 Q table 2 zeroes, 78.125 percent filled
- Training finished.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement