Advertisement
IhavenonameSDA

TSO miss penalty thoughts

Nov 8th, 2024 (edited)
114
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 17.52 KB | None | 0 0
  1. TSO miss penalties:
  2.  
  3. The first time I saw the metrics was roughly in the state they were used for the playtest. Most things, like base scores relative to each other[1], made sense at a glance, but the thing I couldn't understand was the rationale behind what dictated miss penalties for a game. The 0.8s made sense - the few games where getting to low miss is substantially harder than clearing in the first place - but 0.9 vs. 1.0 seemed completely arbitrary. In my view the EoSD/PCB balance change in response to seeing that matchup has only made this problem worse.
  4. So what is the miss penalty trying to achieve? It's a way of trying to balance how impactful a miss is in a given game - a 5 miss vs. a 7 miss in TD isn't that different, in MoF it's something small, and in DDC or LoLK it's a big gain. In a way, it correlates with game difficulty, but even more than game difficulty I think it should correlate with *how stable a game is*, or places where it wouldn't be unexpected to die. If we look at PCB, there are two main "gates" cited by those who've gone for or gotten LNNN: Hell God Sword and Spam. There are other threats at lower levels, but these two are the main dangers and places where something could be expected to go wrong. Looking to MoF, you have Momiji and PWG as the main ones, but the rest of the game has more places where something could go awry. Good Spell Design, Hydro Camo, Illusory Waterfall, Storm Day, Kanako non 1, even Source of Rains can be funny. And DDC, you have... all of Kagerou, half of your stage 4 boss, most of Seija with bad shots, first 3 Shinmy attacks... there's no stability there.
  5. So what does all that mean in practice? Miss penalties varying by 0.1 per miss results in a rounding error at low miss counts like we've mostly seen in the tournament, but since the goal is to have a larger open tournament with a wider skill range, these metrics also need to encompass mid-miss runs. If things break down once people are getting into continue territory that's fine and is a category choice issue on the players' part (sorry draconic) but for comparing **LNB clears**[2] this variation can add up to an entire miss. That's pretty significant!
  6.  
  7. Anyway, my philosophy when coming up with these numbers is as follows:
  8. 1. Ignore game base scores when considering what the miss penalty should be. If a game has a "guaranteed" miss, then adjusting the LNN score by the change in the miss penalty is a good patch, keeping the score for 1M the same while improving the comparison at higher miss counts. Something like this was already done for PCB: miss penalty + 0.2, base score - 0.2 for most shots. In other words, game base scores should be balanced around **very low miss, not LNN** - if this rewards an LNN relative to a 1M in another game more, that's probably just a feature.
  9. 2. Due to the above, ignore sections that are extremely likely misses when considering game stability for the miss penalty. For the most part, the biggest dangers are what define the LNN difficulty, and as a result are what define the base score. That being said, if a game has many major threats (like LoLK) then I'm just ignoring about 3 of them: the largest difference between base scores (WBaWC at 3.6 good Youmu vs. DDC/LoLK 0.0, and 0.8 miss penalty) is a 4.5 miss advantage[3] so this feels reasonably fair to me. Assuming those 3 misses happen, how does the rest of the game compare? For LoLK, ignoring Lunatic September, Doremy non 1, and PDH still leaves a grueling gauntlet well deserving of its miss penalty.
  10. 3. The rest is kind of vibes: trying to use raw data like expected cap rates of patterns across the run is all of prohibitively difficult to acquire that data, unclear how to use all of that data, and still somewhat subjective as different players have different skillsets and different levels of experience with a pattern, so the collected data would still reflect the biases of the sample.
  11. 4. Don't reference the existing metrics as much as possible when coming up with the proposals. I want to start a conversation on the differences, not try to declare that a specific game should have a specific miss penalty. Unfortunately I remember a lot of these offhand...
  12. 5. When possible, keep the per-game miss penalties the same, but for some games/shots (hello, DDC), it's almost like playing two completely different games. If it's only a 0.1 difference between shots this can be safely ignored in favor of simplicity even if I call it out, but if it's larger I think there's a real need for a difference there.
  13. 6. 1.0 miss penalty is the base that a game needs to have an argument to move away from, and at the end of this, each distinct miss penalty should feel like its own "tier".
  14. Alright, let's get to the games.
  15.  
  16. EoSD: 0.9
  17. Books, Megalith, KD, VI, Meister, SG. There are a lot of high variance threats in EoSD such that the first few misses are honestly kind of expected, and you can't just put all the big threats into the base score. There's more I didn't mention, but EoSD does not rise to the level of 0.8 penalty on the back of its consistent early game and reliable stage portions, outside of books.
  18. PCB: 1.1
  19. Hell God Sword, Spam. The latter of these is basically already in the base score, so this then becomes a mostly stable game. Alice can be a bit funny sometimes, and Prismriver/Youmu/Yuyuko finals can as well, but honestly every miss in PCB past 2 or 3 is avoidable and your fault. 1.1 miss penalty feels right to punish that.
  20. IN: 1.0
  21. IN is another very consistent game like PCB, with a few minor threats in the early game (stuff like Blink). However, unlike PCB there isn't a single defining pattern that goes into the base score, there's a whole final boss full of dangerous things regardless of chosen route. Stages 1-5 deserve the 1.1 miss penalty, but Stage 6 brings it back to normal. This is even with Last Spells counting - they really don't have much impact on the run outside of Blink, Stage 6, and bad shot Remote Rabbit. But bad shots have enough other problems, and the difference in difficulty is already in the base score. There's an argument for giving the bad solos an 0.9 miss penalty here, but IN is complicated enough without that. Maybe it's worth it, though.
  22. MoF: 1.0
  23. Good Spell Design, Momiji, PWG. Those are the big ones for a one credit format. Even though MoF is the shortest game in the series, and has many consistent parts, there are still enough dangers packed into that run that it doesn't rise above 1.0. The exception is MarisaB, who I think deserves a 1.5 or 1.6 miss penalty. An LNN with her is still difficult as you have to engage with Momiji and PWG, but uniquely, a miss not only can't cause power trainwrecking (like in other games) but instead actively helps. In that way, while an LNN is not much easier (less than a miss difference compared to ReimuA/B), something like a 2 miss run is significantly easier. A MarisaB 2 miss should not beat a balanced shot 3 miss, and raising the penalty for her specifically would accomplish this. (3.2 + 2*1.5) = 6.2, (2.4 + 3*1.0) = 5.4. For the 2/3 miss comparison, 1.5 is probably more correct, but my gut says 1.6 and drop the LNN score to 3.1.
  24. SA: 1.0
  25. SA feels like the most "average" game in this list. You could maybe make an argument for Marisa being 0.1 lower than Reimu due to not the hitbox difference (Satori nons, not having the stage 5 midorin safespot) but I probably wouldn't make that argument.
  26. UFO: 0.8 (no summon) / 0.9 (yes summon)
  27. Greatest Treasure, and honestly UFO is a game where it feels most things are at least a little threatening, such that GT rises well above the rest. The reason that I'm giving two values here is that this is really kind of borderline, and UFO has an additional factor that I think pushes it to either side of the line: UFO summons. I think there should just be a separate metric for with UFO summons to not gatekeep UFO behind TWC level performance, and they help enough with a few key stage sections that it warrants the miss penalty difference. Given that Maribel's site tracks LNN and LNNN together, I don't think you even need to change the base scores to account for summons (maybe +0.1 or 0.2 there as well?) and this just... works. So yeah. 0.8 for no summons, 0.9 for yes summons.
  28. TD: 1.1
  29. Defiance. For defining patterns, Defiance is on the easy side, even if it is fairly variable, especially by TD standards. If PCB gets 1.1, TD definitely gets 1.1. You could make an argument for Youmu 1.2 due to trivializing some stage sections, parts of Seiga, and the high overall DPS, but I wouldn't make it here.
  30. DDC: 0.8 (everyone else) / 1.0 or 0.9 (ReiA/SakA)
  31. One of the metrics I 100% remember, and while I do think that the good shots aren't strong enough to offset the miss penalty that much under normal circumstances... at their full potential they might even be 1.1 candidates, between triple gohei and shift tapping. So since ReimuA and SakuyaA range between 0.9 (normal circumstances through DDC, a hard game they still have to engage with) and 1.1 (every trick in the book and 80% of the game doesn't exist), 1.0 is the fair compromise to make everybody happy.
  32. (update about a week later - upon further reflection, I think the "every trick in the book" score is already reasonably well accounted for in the low miss/base score side, so the miss penalty for ReimuA and SakuyaA should be tuned more towards the average experience. While things like RNA basically don't exist, these shots do still have to cope with most of the game, and so 0.9 is probably more correct when considering non-TWC level runs.)
  33. LoLK: 0.8
  34. This would be 0.7 if lowering the miss penalties that far wouldn't break cross-game comparisons really badly so LoLK.
  35. HSiFS: 0.9 (Sp/Su) / 1.0?? (Au/Wi)
  36. The game HSiFS most feels like is EoSD. EoSD was moved to 0.9 which makes sense, HSiFS should follow. You could make an argument for just having Spring and Summer at 0.9 and having Fall and Winter at 1.0 I think, though the late game is probably tough enough with both to just account for the difference in the base score. I'm not actually sure what I think for Fall/Winter, though, now that I'm actually trying to reason about it. Maybe it should be split? This subsection is probably incoherent but it's also the only one where I'm genuinely not sure of what my opinion should be. Honestly I probably just need to compare hypothetical 4-8 miss runs with similar scores across games with each value and see what makes more sense.
  37. WBaWC: 1.1
  38. The most consistent Touhou game. You could even argue Youmu should be 1.2 due to ignoring half the stages, Mayumi, and Idol Creature. I might have argued that but honestly I don't really want to argue that after waffling about HSiFS, so I'll just mention I had the thought and leave it at that.
  39. UM: 0.8
  40. Stage 4, Stage 4, Sannyo, Stage 4. The game this is most like is bad shot DDC: hard Stage 3 boss (but a bit easier), hard Stage 4 boss (but a bit more consistent) with a few extremely problematic patterns elsewhere. UM also borrows from UFO in that power loss trainwrecking is a very real danger, as UM Sannyo into Stage 4 into Misumaru can have a single mistake snowball into more. This is perhaps mitigated a bit by the post-Sannyo shop (can buy power) or continues being allowed (continue puts you at full power) but UM almost certainly warrants 0.8 miss penalty if UFO does.
  41.  
  42.  
  43. Tiers (ordered within a tier):
  44. 0.8: DDC (bad shot), LoLK, UFO (no summon), UM
  45. 0.9: UFO (yes summon), EoSD, HSiFS (Sp/Su)
  46. 1.0: HSiFS (Au/Wi), DDC (good shot), SA, MoF, IN
  47. 1.1: PCB, TD, WBaWC
  48.  
  49. Now I'm going to check what I've proposed changing, and how strongly do I feel about each?
  50. EoSD: 0.9 -> 0.9 (no change)
  51. PCB: 1.1 -> 1.1 (no change)
  52. IN: 0.9 -> 1.0 (+0.1) very very strongly, the reason I started doing this
  53. MoF: 1.0 -> 1.0 (no change)
  54. MoF:MB 1.0 -> 1.6 (+0.6 MarisaB) I mentioned this on the Discord already so yeah this is necessary
  55. SA: 0.9 -> 1.0 (+0.1) very strongly for Reimu, a bit less so for Marisa [a]
  56. UFO: 0.8 -> 0.8 (no change)
  57. UFO:V N/A -> 0.9 (new metric, also add 0.2 to base scores) this should exist
  58. TD: 0.9 -> 1.1 (+0.2) (copies PCB) very strongly
  59. DDC:B 0.8 -> 0.8 (no change for bad shot)
  60. DDC:G 1.0 -> 1.0 (no change for good shot) NOT STRONGLY (0.9 probably better for the average player)
  61. LoLK: 0.8 -> 0.8 (no change)
  62. HSiFS: 1.0 -> 0.9 (-0.1 for Sp/Su) I feel okay about this I guess
  63. HSiFS: 1.0 -> 1.0 (no change for Au/Wi)
  64. WBaWC: 1.0 -> 1.1 (+0.1) very very strongly [b]
  65. UM: 0.9 -> 0.8 (-0.1) not very confident [c]
  66.  
  67. [a] could probably be convinced otherwise but I think under the methods I tried to use this is right
  68. [b] I think this was only 1.0 originally because 1.0 was the max miss penalty
  69. [c] I think this should probably match UFO without summons, but it's a bit below that, and UFO without summons was borderline, so this isn't confident
  70.  
  71. And what base score tweaks to go with it: [4]
  72. PCB: -0.1 or -0.2 to be in line with the other 1.1 reductions
  73. IN: none other than what's in footnote [1] (and if done for Reimu/Sakuya/Alice/Yuyuko, they can get a miss penalty of 0.9)
  74. MoF:MB as mentioned -0.1
  75. SA: probably none but maybe -0.1
  76. TD: -0.2
  77. HSiFS: none
  78. WBaWC: between none and -0.2 or -0.3, depending on shot, to bring the upper end a bit closer. If MarisaW/YoumuW/YoumuO are really that different, give them 1.2 miss penalty instead, but that seems overkill.
  79. UM: -0.1 for everyone but Marisa I think?
  80.  
  81.  
  82.  
  83.  
  84.  
  85. [1] I think a few cross-game base score comparisons are questionable, with the one that's come up being the worse IN solos. I'm not experienced enough with the solos personally to feel super confident in any proposed change, but the comparisons to similar base scores feel off to me. Solo Reimu, Alice and Sakuya all beat any HSiFS LNNN, including AyaSummer and CirnoSummer. These shots all have to nearly time out significant portions of the game, but the biggest dangers in HSiFS are more threatening than the biggest dangers in IN, even ignoring HPSI, which within HSiFS itself is rated at roughly a miss.
  86. A single game to game comparison just shows one of those is miscalibrated, so here's a full list of 0.6 base scores: EoSD MarisaA, IN Reimu (FinalA), UFO MarisaB, LoLK Reimu, UM Marisa. Lower scores are the rest of IN Reimu/Alice/Sakuya, DDC ReimuB and the really bad shots, and the rest of LoLK.
  87. I believe that the relative comparisons within IN are mostly sound (though some of the 2 miss advantage comparisons are weird?), but the lowest end were overrated overall because of how much worse the bad shots are than the teams purely within IN. Going off the cross-game comparisons, Yuyuko should probably be bumped up slightly (+0.1), Reimu should be bumped up closer to Yuyuko (+0.3), Alice bumped up a bit to where Reimu currently is (+0.3, she does feel closer to the level of the 0.6 base scores), and Sakuya up a bit as well (+0.3?). The better solos (Youmu, Yukari, Remilia, Marisa) are well rated due to being close to the teams. Also, Youmu Final A being 3.0 doesn't make sense cross-game: should probably be 2.9.
  88.  
  89. [2] Obviously some games give more lives to clear with than others - UFO you get 5 without UFOs, while DDC you can get 14+ fairly easily, and LoLK 20+. When I say clears here, I mean clears with a miss count that would be a clear in most games - so up to about 8-10 miss.
  90.  
  91. [3] This is probably too much and WBaWC base scores could be lowered a bit, to go with the proposed (spoilers if you're reading footnotes early) miss penalty increase.
  92.  
  93. [4] This is also an update from after the initial posting with some less thought out more vibe-check thoughts on some comparisons, trying to assess everything in one place. There's some overlap with the above and little given rationale here.
  94. PCB MarisaA / ReimuB comparison seems wrong: MarisaA probably overvalued (2.6 -> 2.8ish?)
  95. IN bad solos probably overvalued (as noted in pastebin), Youmu undervalued (3.0 -> 2.8)
  96. LoLK Sanae probably overvalued (0.2 -> 0.3)
  97. TD and WBaWC all undervalued after miss penalty change (PCB probably slightly undervalued too - decrease of 0.3 to 0.4 across the board)
  98. SA/MoF/IN/DDC good/HSiFS good LNN scores also probably undervalued after that tweak (-0.1/0.2)
  99. UM Sakuya should probably be 1.5 (so UM Sakuya LNNN beats DDC crapshot 2M), scale Sanae/Reimu down slightly to go with it
  100.  
  101. An additional footnote on testing methodology: it wouldn't be hard to throw together a program that presents two runs that are close together in penalty score, and asks which of these runs should win. This would also allow checking multiple sets of metric simultaneously to see which ones are most agreed with, by comparing the given answer against the calculation for multiple sets of assigned base scores and penalties. Perhaps after Speedromizer.
  102.  
  103. I've put all of the above tweaks (and an overall slight dampening effect so that the worst LNN ties the best 4 miss - something I'm not entirely certain of the validity of, but also something quite hard to try to account for) + some small balancing between games into a spreadsheet, linked below. This is still a first effort and I'd like to whip up a testing program to see if the updated numbers more or less accurately reflect feelings for close runs. Another thought I've had is that we could treat a 0.1 difference as a tie for match result purposes, basically saying that it's too close to call, but then still use the actual numbers for tiebreaks.
  104. https://docs.google.com/spreadsheets/d/1ZwRAh0UUNT2qJDNZE3qHr2YiFieI27AmWbBggSgKl0w/edit?gid=1815514190#gid=1815514190
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement