Guest User


a guest
Sep 29th, 2018
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 21.21 KB | None | 0 0
  1. [08:27:03] <hyc> I thought this randomJS PoW idea was pretty cool when I came up with it, but rrol makes it sound 100x cooler
  2. [08:27:32] <-- sgp_1 ( has left #monero-research-lab
  3. [08:28:32] --> sgp_ ( has joined #monero-research-lab
  4. [08:30:01] <gmaxwell> hyc: that kind of approach has been totally busted whenever I've seen it suggested before (going back to 2012 at least). To mine it you can just throw out any nonces that choose code which isn't super trivial to execute.
  5. [08:30:40] <gmaxwell> generally one of the needed properties for a good POW is that every instance is equivlently hard to execute under the most optimized execution model.
  6. [08:30:51] <-- muff1nman ( has quit (Quit: No Ping reply in 180 seconds.)
  7. [08:32:14] <hyc> yes but - how complex is the code that determines "super trivial" ?
  8. [08:33:06] <hyc> the code generator is being fine tuned to keep all the generated code within X +/- 10ms execution time
  9. [08:33:14] <endogenic> hyc: what changes could be made to randomjs if it turns out some aspects of it have been implemented in hardware?
  10. [08:33:20] <andytoshi> i'd expect it to do template-matching and to take single-digit numbers of nanoseconds
  11. [08:33:24] --> muff1nman ( has joined #monero-research-lab
  12. [08:34:26] <hyc> andytoshi: I doubt that very much. particularly since the generated code includes exec("") invocations with more randomly generated code.
  13. [08:34:46] <hyc> it will require actually executing JS, there is no getting around that.
  14. [08:34:53] <gmaxwell> famous last words
  15. [08:36:08] <hyc> the number of permutations is astronomical. template matching for all of that?
  16. [08:36:23] <hyc> no matter what optimizations, the required circuitry will be a lot bigger than a cryptonight ASIC
  17. [08:36:42] <hyc> larger chip area, higher power, lower efficiency ratio.
  18. [08:39:34] <hyc> how are you going to optimize the instruction pointer? you'll need a CPU-style branch predictor
  19. [08:40:01] <endogenic> not sure that reasoning holds when we have big industry players with an incentive and the facilities to produce a JS chip
  20. [08:40:12] <hyc> endogenic: that's still a win.
  21. [08:40:14] <endogenic> but i'm no expert of this..
  22. [08:40:16] <andytoshi> hyc: this really really sounds like something people could find shortcuts for
  23. [08:40:24] <endogenic> it's not necessarily a win if we don't have as much flexibility to change it later
  24. [08:40:26] <hyc> if a JS chip becomes commoditized, everyone wins
  25. [08:40:26] <andytoshi> and cheaply-identifiable weak subsets
  26. [08:41:18] <hyc> andytoshi: I don't see how the code analysis could be significantly cheaper
  27. [08:42:16] <hyc> look at the state of the art in code analysis tools today, like Coverity or Veracode. totally riddled with false positives and bogus results.
  28. [08:42:37] <hyc> and that's for a language like C with just a small set of keywords and semantics
  29. [08:43:09] --> doubletwist25 (~doubletwi@ has joined #monero-research-lab
  30. [08:44:51] <andytoshi> right, so critically you don't care about false negatives here (within reason), nor do you care about the depth of analysis you'd need to do useful static analysis for logic bugs
  31. [08:45:04] <andytoshi> also C is mostly UB
  32. [08:45:22] <andytoshi> so it's not immediately obvious to me that it'd be easier to reason about than js
  33. [08:47:04] <hyc> essentially you're talking about writing a decompiler to turn the generated syntax tree into macro-ops
  34. [08:47:36] <hyc> and somehow setting up a lookup table that can direct flow to various template handlers
  35. [08:48:06] <-- valentinbuza_ (~valentinb@unaffiliated/valentinbuza) has quit (Ping timeout: 252 seconds)
  36. [08:48:12] <hyc> it would still be an interpreter for a tokenized language, you're just saying some of the tokens can be macro scale
  37. [08:48:16] <andytoshi> yes. and if you can do that in .1% the time it takes to actually execute the code, but this only works .1% of the time, then you're set
  38. [08:48:28] <andytoshi> and the other 99.9% of the time you just reject the nonce
  39. [08:49:00] <hyc> that's self-defeating, since the winning nonce is 99.9% more likely to be in the code sequences you rejected.
  40. [08:49:36] <gmaxwell> andytoshi: thats exactly how I broke the first proshares pow. several orders of magitude speedup.
  41. [08:50:00] <andytoshi> hyc: there is no "the winning nonce", just some proportion of nonces that lead to a valid pow
  42. [08:50:01] <gmaxwell> hyc: "the winning nonce" sounds like a common fundimental misunderstanding of POW.
  43. [08:50:42] <-- doubletwist25 (~doubletwi@ has quit (Ping timeout: 272 seconds)
  44. [08:50:43] <hyc> eh? the final result must have bytes XXXX lower than difficulty.
  45. [08:50:45] <gmaxwell> Mining is a 'find the needled in a haystack problem' it's a 'find a shorter than usual straw in a pile of straw' problem.
  46. [08:51:07] <gmaxwell> hyc: there are an effectively infinite number of such solutions.
  47. [08:51:21] <andytoshi> there will be 999 times as many winning nonces that i skip than there are nonces that i don't skip, sure (assuming a uniform distribution)
  48. [08:51:36] <andytoshi> but that's fine because i spend so much less time on skipped nonces than non-skipped ones
  49. [08:52:27] <gmaxwell> I'd suggest reading the equihash paper on the description of what it takes to be a good POW. (not that equihash is particularly good, but the authors had a good understanding of what was required-- they just had incorrect beliefs about how hardware worked)
  50. [08:52:53] <hyc> for a given block hash there is only a small number of valid nonces
  51. [08:53:24] <hyc> hm, I've read the equihash paper, yes.
  52. [08:53:38] <gmaxwell> hyc: the mining process doesn't just change the small nonce field, if it did, it would be unacceptably likely to be stuck and never be able to find a solution.
  53. [08:53:56] <gmaxwell> If you don't find a solution in your nonce field search you just move on to a larger space.
  54. [08:54:08] <hyc> for a given block hash, it is true that there may be no solution.
  55. [08:54:25] <andytoshi> you mean, "for a given blockheader sans nonce"
  56. [08:54:30] <hyc> yes
  57. [08:54:39] <andytoshi> but you can change other parts of the block header
  58. [08:54:51] <gmaxwell> right, so it's wrong to think in trms of "only a small number of valid nonces".
  59. [08:54:51] <andytoshi> and you have to in e.g. bitcoin where the actual "nonce" field is only 32 bits
  60. [08:55:09] <hyc> you can change the timestamp and you can change the mix of txns used to generate the header
  61. [08:55:42] <moneromooo> I think you can just handwave and not care what changes, and just call it the "nonce". Close enough and doesn't change the theory.
  62. [08:55:53] <gmaxwell> moneromooo: exactly.
  63. [08:55:55] <hyc> but you don't know that you need to change anything unless you've already exhausted the nonce space
  64. [08:56:03] <gmaxwell> hyc: sure, and?
  65. [08:56:45] <gmaxwell> hyc: moneromooo's point is that because you can change other stuff, the nonce space is effectively unlimited. Yes, you compute it in two parts, but from an analysis perspective that doesn't really matter much.
  66. [08:57:24] <moneromooo> Oh, the generated program depends on the actual nonce, right ? So you could keep the same nonce, then iterate through other stuff (like non-nonce tx_extra garbage).
  67. [08:57:28] <gmaxwell> other than updating one part is somewhat more expensive, but since you'll do a huge number of checks on the fast to update part, the more costly part is amortized.
  68. [08:57:36] <moneromooo> So you get to do the program analysis once.
  69. [08:58:08] <hyc> moneromooo: the generated program depends on the entire header including nonce
  70. [08:58:18] <hyc> you don't get to reuse any analysis
  71. [08:58:18] <moneromooo> OK, ignore me ^_^
  72. [08:59:02] <hyc> the header with nonce is hashed, hash result feeds PRNG.
  73. [08:59:06] <hyc> seeds
  74. [09:07:12] <hyc> Equihash also claims memory hardness, which again is a fast moving target. memory tech advances far more quickly than CPU speeds
  75. [09:07:26] <hyc> It's a doomed approach from the get-go
  76. [09:08:58] <gmaxwell> I agree it's a bad idea, but the reason you give is kinda bonkers, "memory tech advances far more quickly than CPU speeds" is just not historically true... memory has advanced much much much much much slower than cpu speeds.
  77. [09:09:21] <hyc> capacity doubles faster than CPU speeds double.
  78. [09:09:35] <gmaxwell> these functions are not 'capacity hard', they're bandwidth hard.
  79. [09:09:39] <hyc> actually CPU speeds have been flat for the past few years, and are regressing now due to Meltdown
  80. [09:10:07] <hyc> they are both - if you shrink the memory space used, the work accelerates by more than 2x
  81. [09:10:27] <hyc> if you shrink the space *required*...
  82. [09:11:02] <gmaxwell> also, (xkcd386) memory per dollar has advanced very slowly as well...
  83. [09:11:35] <gmaxwell> (in recent years at least)
  84. [09:15:17] <gmaxwell> < I made you a graph.
  85. [09:15:25] <gmaxwell> thats the last decade.
  86. [09:16:58] --> valentinbuza_ (~valentinb@unaffiliated/valentinbuza) has joined #monero-research-lab
  87. [09:17:14] --> sfhi ( has joined #monero-research-lab
  88. [09:18:32] <hyc> hmmm. I think that's a result of physically destroyed fabs, not the technology trend
  89. [09:19:30] <gmaxwell> progress on compute/$ have been MUCH better. ... as in looks kind of similar if you give the compute/$ a log scale instead of linear like in that chart.
  90. [09:20:19] <andytoshi> fwiw memory _speed_ has still improved over the time period where $/mbyte has been stagnant
  91. [09:20:58] <gmaxwell> andytoshi: yes, but the bandwidth improvement is still a joke compared to computing improvements.
  92. [09:21:25] <hyc> computing cost
  93. [09:21:38] <hyc> storage cost
  94. [09:21:41] <andytoshi> i believe that, i'm more trying to justify to myself all the money i've spent on RAM, than i am trying to contribute to the conversation ;)
  95. [09:21:48] <hyc> lol
  96. [09:21:58] <gmaxwell> the big thing equihash authors misunderstood is that memory bandwidth limits aren't as much a fundimental property of memory as they were a fundimental property of external to chip IO.. put the memory on chip and the bandwidth goes up 100x with no other tech improvements.
  97. [09:22:07] <hyc> storage cost declining 38%/yr, vs compute at 33%/yr.
  98. [09:22:35] <hyc> I guess nobody told those guys about HBM
  99. [09:22:36] <gmaxwell> storage != ram though, yes bulk storage cost overtime has historically improved better than compute.
  100. [09:23:02] <-- muff1nman ( has quit (Quit: No Ping reply in 180 seconds.)
  101. [09:23:25] --> el00ruobuob_[m] ( has joined #monero-research-lab
  102. [09:23:34] --> thelinuxguy7 ( has joined #monero-research-lab
  103. [09:23:37] <gmaxwell> hyc: eldyentyrell posted on the zcash github, basically pointing out package attached ram, tsvs etc... they ignored him.
  104. [09:24:14] <hyc> such a surprising response ... ...
  105. [09:24:22] <gmaxwell> but thats what you get when you have self appointed narrow field experts design something like that, without input from the very people who would be most responsible for optimizing it (hardware engineers).
  106. [09:24:46] <gmaxwell> the comment has since been deleted, or I'd link to it.
  107. [09:25:21] --> muff1nman ( has joined #monero-research-lab
  108. [09:25:27] <hyc> btw andytoshi, might be time to splurge on some more DRAM again soon
  109. [09:25:28] --> midipoet (uid316937@gateway/web/ has joined #monero-research-lab
  110. [09:25:43] <gmaxwell> in any case, equihash guys appeared to understand what was needed, but misunderstood what hardware provided. ... still better than most altcoin pow designs, that failed in both domains.
  111. [09:26:04] --> jwheare10 (~jwheare@ has joined #monero-research-lab
  112. [09:26:13] <andytoshi> hyc: my real struggle is that the 14in thinkpads have had motherboards capped at 32Gb for like 6 years
  113. [09:26:33] <hyc> ah I noticed that. was looking at picking up an A485 myself
  114. [09:26:40] <gmaxwell> andytoshi: the xeon based luggable computers will take 64. :P
  115. [09:26:46] <-- thelinuxguy7 ( has quit (Remote host closed the connection)
  116. [09:27:09] <knaccc> Lenovo’s ThinkPad X1 Extreme: Hex-core, GTX 1050 Ti, 64GB RAM under 4 pounds
  117. [09:27:09] <hyc> gmaxwell: so you believe memory-hardness is still a viable approach? taking on-chip memory into account?
  118. [09:28:11] <gmaxwell> hyc: I think no one really knows. In general you can see memory hardness as a speific instance of "upfront cost hardness", and I think there are good arguments as to why "upfront cost hardness" is bad, both for POW-consensus and for password security (though for different reasons).
  119. [09:29:09] <gmaxwell> Basically, with upfront cost hardness, you pay a lot to buy in to attack, but the amoritized cost after that is low. So this favors first movers and parties that can run the stuff the longest before throwing it out.
  120. [09:29:31] <gmaxwell> Proof of Purchase, instead of Proof of Work. :P
  121. [09:29:36] <-- jwheare10 (~jwheare@ has quit (Remote host closed the connection)
  122. [09:29:37] <hyc> that would seem to be true even if we were in a CPU-only world
  123. [09:29:46] <hyc> first movers just buy up the most CPUs
  124. [09:31:51] <gmaxwell> I think trying to be cpu only, mostly, is a lost cause... for one, it's a massive subsidy for patent rights for fancy cpus rather than a fundimental cost, meaning some cpu pirate would perhaps have the best advantage. Also no matter how 'cpu only' your function is, someone could make a mining specific cpu that strips the useless parts (like pcie busses) and uses a fair bit less power and
  125. [09:31:51] <gmaxwell> silicon area. maybe at best you could make the optmized hardware only 2x power efficient and 10x area efficient, but in mining (at least) that kind of small gap guarentees the generic hardware will be driven out of business eventually.
  126. [09:32:17] <gmaxwell> (cpu pirates, or cpu companies themselves, of course)
  127. [09:35:29] <knaccc> don't economies of scale factor into things?
  128. [09:35:31] <hyc> I think that's overly pessimistic. on a headless server, dedicated to mining, the ALUs will be busy 100% of the time, and external buses will be mostly idle
  129. [09:35:43] <gmaxwell> I'm much more hopeful of the utility of fancy work functions for password security than for mining. I think for mining making specialized stuff 'merely' 2x/10x more efficient is basically useless in the long run, but for password security, it's fine
  130. [09:35:53] <hyc> a special purpose mining chip will still need a communications channel to get data in and out.
  131. [09:36:09] <gmaxwell> hyc: I dunno about other things, but idle the pci-e controllers in epyc draw about 4 watts. (just as an example)
  132. [09:36:09] <hyc> this comms channel would have about the same utilization as the buses in the server.
  133. [09:36:55] <gmaxwell> a mining chip needs a tin can and string to get data in and out. it's just not compariable to a gbytes per second bus. :)
  134. [09:37:22] <hyc> sure, but that gbytes/sec bus only blips on for a microsecond
  135. [09:37:26] <knaccc> what's the worst that can happen if we try hyc's idea for a few Monero release cycles just to see what happens? it seems like it has so much potential
  136. [09:37:47] <hyc> the worst thing that can happen is a few months later an ASIC shows up
  137. [09:38:08] <knaccc> yeah and so we fork back to something else, no real damage
  138. [09:38:48] <knaccc> or make it hybrid, so it's every 4th block or something that requires hyc's idea, and then ramp up depending on the response
  139. [09:38:50] <hyc> the damage is if we have no fallback algorithm for the fork after that
  140. [09:39:03] <moneromooo> The worst is that someone finds a way to speed up massively by shortcut, and 99%s the network and keeps mum.
  141. [09:39:35] <hyc> to get 99% of the network they'd have a huge visible impact on network hashrate
  142. [09:39:57] <gmaxwell> knaccc: it's kind of an embarassingly unsound design.
  143. [09:40:23] <gmaxwell> so one disadvantages is that people like me start thinking monero is more like ethereum in terms of yolo design.
  144. [09:40:28] <moneromooo> Yes, we'd certainly see. My point is that an ASIC manufacturer would have an incentive to keep the network "unaffected".
  145. [09:40:59] <knaccc> haha yolo design, nice term
  146. [09:41:15] <gmaxwell> (and FWIW, ethereum had proposed a similar POW, with a RNG selecting a random circuit, and they abandoned it after it was shown to be broken by the same kind of optimizations I described)
  147. [09:41:42] <hyc> are you talking about something before ProgPoW?
  148. [09:42:03] <gmaxwell> yes, long before they launched.
  149. [09:42:24] <hyc> can you point me to the discussion?
  150. [09:42:25] <gmaxwell> the original eth hash constructed a random arithmetic circuit over 256 bit integers.
  151. [09:43:26] --> Trieste24 ( has joined #monero-research-lab
  152. [09:43:34] <-- Trieste24 ( has quit (Remote host closed the connection)
  153. [09:43:51] <gmaxwell> they seem to have vanished a lot of the docs, you can see it described on this page:
  154. [09:44:01] <hyc> thanks
  155. [09:44:19] <gmaxwell> (hm that shows 64 bit, but I am pretty darn confident that it was 256 bit at one point, they changed it a lot)
  156. [09:45:08] <gmaxwell> yea, that must be a later version. the first version had instructions like div and mod too.
  157. [09:45:45] <endogenic> "the worst thing that can happen is a few months later an ASIC shows up" < you also lose however many months towards developing the old mining algo's asic resistance while miners are going to be hedging for a future switch back
  158. [09:46:32] <hyc> the current algo is not asic-resistant at all. it is simply being changed, to break existing ASICs
  159. [09:47:41] <hyc> this "Random Circuit" is pretty trivial
  160. [09:48:52] <gmaxwell> It's general for all computation, so I don't think you can say it is any less 'trivial' than anything else. Additional complexity may just be obfscuation.
  161. [09:49:23] <knaccc> Eventually, I predict all of the world's governments will issue every citizen with a unique asymmetric keypair, publish a list of all issued public keys so that the number issued is known, and then we don't need proof of work any more
  162. [09:49:32] <gmaxwell> obfuscation*
  163. [09:49:38] <hyc> difference between theory and practice. in theory algos X and Y may be comparable, while in practice one is far more easily implementable than another
  164. [09:49:41] <gmaxwell> knaccc: hi, mike hearn.
  165. [09:50:01] <knaccc> haha oh is that what he is hoping for too
  166. [09:50:13] <gmaxwell> knaccc: "proof of passport" was a thing he wanted.
  167. [09:50:30] <hyc> is he an book-of-revelations/mark-of-the-beast bible thumper?
  168. [09:50:46] <knaccc> ha interesting, thanks i'll google it
  169. [09:51:26] <hyc> that would kind of defeat the purpose of a permissionless anonymous cryptocurrency
  170. [09:51:40] <gmaxwell> hyc: If you look at X and say it's trivial, but then we look at Y and can that Y is provably reducably to X, shouldn't that at least worry you that Y is also similarly trivial?
  171. [09:52:02] <knaccc> hyc: the keypair would only be used for "mining", not as wallet public keys
  172. [09:52:05] <hyc> yes, that's a valid point
  173. [09:52:51] <hyc> I would say that since the inputs are ultimately the same, the range of permutations may also be the same.
  174. [09:53:05] <endogenic> the monero project would have had the opportunity to learn something more about asic resistance through having the problem of having to manually tweak the algorithm..
  175. [09:53:49] <hyc> I mean, we know that PRNGs can't create entropy. so it may all just turn out to be pointless window dressing
  176. [09:54:28] <hyc> but the flip side of the argument is that the existing algos haven't squeezed out as many permutations as they could have
  177. [09:56:29] <hyc> if we can say these 2^256 permutations can only be reduced to 2^32 unique templates, we've still created a problem that's much harder for an ASIC implementer
  178. [09:58:18] <endogenic> how much harder is it for us?
  179. [09:59:33] <hyc> with a CPU? it would be no particular disadvantage
  180. [10:06:55] <endogenic> what changes can be made to it though?
  181. [10:07:57] <gmaxwell> hyc: I don't agree at all.
  182. [10:08:47] <gmaxwell> the asic implemter would look at those 2^32 unique templates, find a large subclass that a stupidly cheap to implement, implement only those, and then just make their chip reconize and abort on the others.
  183. [10:11:24] <hyc> we could simulate this right now. modify a randomJS impl and make it abort on 90% of its inputs
  184. [10:11:50] <gmaxwell> hyc: and be 1000x faster on the remaining 10%...
  185. [10:12:10] <endogenic> ^
  186. [10:12:23] <hyc> sure, we can replace it with a 10ns sleep or whatnot
  187. [10:14:04] <hyc> hm, no, we still need it to produce a valid output for the cases it actually executes.
  188. [10:14:14] <hyc> we'd have to slow down the regular impl isntead.
  189. [10:16:11] <hyc> but that'd work fine. we can also vary the delay factor, and see where the breakeven point is
Add Comment
Please, Sign In to add comment