Untitled

[08:27:03] <hyc> I thought this randomJS PoW idea was pretty cool when I came up with it, but rrol makes it sound 100x cooler
[08:27:32] <-- sgp_1 ([email protected]) has left #monero-research-lab
[08:28:32] --> sgp_ ([email protected]) has joined #monero-research-lab
[08:30:01] <gmaxwell> hyc: that kind of approach has been totally busted whenever I've seen it suggested before (going back to 2012 at least).  To mine it you can just throw out any nonces that choose code which isn't super trivial to execute.
[08:30:40] <gmaxwell> generally one of the needed properties for a good POW is that every instance is equivlently hard to execute under the most optimized execution model.
[08:30:51] <-- muff1nman ([email protected]) has quit (Quit: No Ping reply in 180 seconds.)
[08:32:14] <hyc> yes but - how complex is the code that determines "super trivial" ?
[08:33:06] <hyc> the code generator is being fine tuned to keep all the generated code within X +/- 10ms execution time
[08:33:14] <endogenic> hyc: what changes could be made to randomjs if it turns out some aspects of it have been implemented in hardware?
[08:33:20] <andytoshi> i'd expect it to do template-matching and to take single-digit numbers of nanoseconds
[08:33:24] --> muff1nman ([email protected]) has joined #monero-research-lab
[08:34:26] <hyc> andytoshi: I doubt that very much. particularly since the generated code includes exec("") invocations with more randomly generated code.
[08:34:46] <hyc> it will require actually executing JS, there is no getting around that.
[08:34:53] <gmaxwell> famous last words
[08:36:08] <hyc> the number of permutations is astronomical. template matching for all of that?
[08:36:23] <hyc> no matter what optimizations, the required circuitry will be a lot bigger than a cryptonight ASIC
[08:36:42] <hyc> larger chip area, higher power, lower efficiency ratio.
[08:39:34] <hyc> how are you going to optimize the instruction pointer? you'll need a CPU-style branch predictor
[08:40:01] <endogenic> not sure that reasoning holds when we have big industry players with an incentive and the facilities to produce a JS chip
[08:40:12] <hyc> endogenic: that's still a win.
[08:40:14] <endogenic> but i'm no expert of this..
[08:40:16] <andytoshi> hyc: this really really sounds like something people could find shortcuts for
[08:40:24] <endogenic> it's not necessarily a win if we don't have as much flexibility to change it later
[08:40:26] <hyc> if a JS chip becomes commoditized, everyone wins
[08:40:26] <andytoshi> and cheaply-identifiable weak subsets
[08:41:18] <hyc> andytoshi: I don't see how the code analysis could be significantly cheaper
[08:42:16] <hyc> look at the state of the art in code analysis tools today, like Coverity or Veracode. totally riddled with false positives and bogus results.
[08:42:37] <hyc> and that's for a language like C with just a small set of keywords and semantics
[08:43:09] --> doubletwist25 ([email protected]) has joined #monero-research-lab
[08:44:51] <andytoshi> right, so critically you don't care about false negatives here (within reason), nor do you care about the depth of analysis you'd need to do useful static analysis for logic bugs
[08:45:04] <andytoshi> also C is mostly UB
[08:45:22] <andytoshi> so it's not immediately obvious to me that it'd be easier to reason about than js
[08:47:04] <hyc> essentially you're talking about writing a decompiler to turn the generated syntax tree into macro-ops
[08:47:36] <hyc> and somehow setting up a lookup table that can direct flow to various template handlers
[08:48:06] <-- valentinbuza_ (~valentinb@unaffiliated/valentinbuza) has quit (Ping timeout: 252 seconds)
[08:48:12] <hyc> it would still be an interpreter for a tokenized language, you're just saying some of the tokens can be macro scale
[08:48:16] <andytoshi> yes. and if you can do that in .1% the time it takes to actually execute the code, but this only works .1% of the time, then you're set
[08:48:28] <andytoshi> and the other 99.9% of the time you just reject the nonce
[08:49:00] <hyc> that's self-defeating, since the winning nonce is 99.9% more likely to be in the code sequences you rejected.
[08:49:36] <gmaxwell> andytoshi: thats exactly how I broke the first proshares pow. several orders of magitude speedup.
[08:50:00] <andytoshi> hyc: there is no "the winning nonce", just some proportion of nonces that lead to a valid pow
[08:50:01] <gmaxwell> hyc: "the winning nonce" sounds like a common fundimental misunderstanding of POW.
[08:50:42] <-- doubletwist25 ([email protected]) has quit (Ping timeout: 272 seconds)
[08:50:43] <hyc> eh? the final result must have bytes XXXX lower than difficulty.
[08:50:45] <gmaxwell> Mining is a 'find the needled in a haystack problem' it's a 'find a shorter than usual straw in a pile of straw' problem.
[08:51:07] <gmaxwell> hyc: there are an effectively infinite number of such solutions.
[08:51:21] <andytoshi> there will be 999 times as many winning nonces that i skip than there are nonces that i don't skip, sure (assuming a uniform distribution)
[08:51:36] <andytoshi> but that's fine because i spend so much less time on skipped nonces than non-skipped ones
[08:52:27] <gmaxwell> I'd suggest reading the equihash paper on the description of what it takes to be a good POW.  (not that equihash is particularly good, but the authors had a good understanding of what was required-- they just had incorrect beliefs about how hardware worked)
[08:52:53] <hyc> for a given block hash there is only a small number of valid nonces
[08:53:24] <hyc> hm, I've read the equihash paper, yes.
[08:53:38] <gmaxwell> hyc: the mining process doesn't just change the small nonce field, if it did, it would be unacceptably likely to be stuck and never be able to find a solution.
[08:53:56] <gmaxwell> If you don't find a solution in your nonce field search you just move on to a larger space.
[08:54:08] <hyc> for a given block hash, it is true that there may be no solution.
[08:54:25] <andytoshi> you mean, "for a given blockheader sans nonce"
[08:54:30] <hyc> yes
[08:54:39] <andytoshi> but you can change other parts of the block header
[08:54:51] <gmaxwell> right, so it's wrong to think in trms of "only a small number of valid nonces".
[08:54:51] <andytoshi> and you have to in e.g. bitcoin where the actual "nonce" field is only 32 bits
[08:55:09] <hyc> you can change the timestamp and you can change the mix of txns used to generate the header
[08:55:42] <moneromooo> I think you can just handwave and not care what changes, and just call it the "nonce". Close enough and doesn't change the theory.
[08:55:53] <gmaxwell> moneromooo: exactly.
[08:55:55] <hyc> but you don't know that you need to change anything unless you've already exhausted the nonce space
[08:56:03] <gmaxwell> hyc: sure, and?
[08:56:45] <gmaxwell> hyc: moneromooo's point is that because you can change other stuff, the nonce space is effectively unlimited. Yes, you compute it in two parts, but from an analysis perspective that doesn't really matter much.
[08:57:24] <moneromooo> Oh, the generated program depends on the actual nonce, right ? So you could keep the same nonce, then iterate through other stuff (like non-nonce tx_extra garbage).
[08:57:28] <gmaxwell> other than updating one part is somewhat more expensive, but since you'll do a huge number of checks on the fast to update part, the more costly part is amortized.
[08:57:36] <moneromooo> So you get to do the program analysis once.
[08:58:08] <hyc> moneromooo: the generated program depends on the entire header including nonce
[08:58:18] <hyc> you don't get to reuse any analysis
[08:58:18] <moneromooo> OK, ignore me ^_^
[08:59:02] <hyc> the header with nonce is hashed, hash result feeds PRNG.
[08:59:06] <hyc> seeds
[09:07:12] <hyc> Equihash also claims memory hardness, which again is a fast moving target. memory tech advances far more quickly than CPU speeds
[09:07:26] <hyc> It's a doomed approach from the get-go
[09:08:58] <gmaxwell> I agree it's a bad idea, but the reason you give is kinda bonkers, "memory tech advances far more quickly than CPU speeds" is just not historically true... memory has advanced much much much much much slower than cpu speeds.
[09:09:21] <hyc> capacity doubles faster than CPU speeds double.
[09:09:35] <gmaxwell> these functions are not 'capacity hard', they're bandwidth hard.
[09:09:39] <hyc> actually CPU speeds have been flat for the past few years, and are regressing now due to Meltdown
[09:10:07] <hyc> they are both - if you shrink the memory space used, the work accelerates by more than 2x
[09:10:27] <hyc> if you shrink the space *required*...
[09:11:02] <gmaxwell> also, (xkcd386) memory per dollar has advanced very slowly as well...
[09:11:35] <gmaxwell> (in recent years at least)
[09:15:17] <gmaxwell> https://people.xiph.org/~greg/temp/memoryprices.png  < I made you a graph.
[09:15:25] <gmaxwell> thats the last decade.
[09:16:58] --> valentinbuza_ (~valentinb@unaffiliated/valentinbuza) has joined #monero-research-lab
[09:17:14] --> sfhi ([email protected]) has joined #monero-research-lab
[09:18:32] <hyc> hmmm. I think that's a result of physically destroyed fabs, not the technology trend
[09:19:30] <gmaxwell> progress on compute/$ have been MUCH better. ... as in looks kind of similar if you give the compute/$ a log scale instead of linear like in that chart.
[09:20:19] <andytoshi> fwiw memory _speed_ has still improved over the time period where $/mbyte has been stagnant
[09:20:58] <gmaxwell> andytoshi: yes, but the bandwidth improvement is still a joke compared to computing improvements.
[09:21:25] <hyc> computing cost http://image.slidesharecdn.com/jsu-feb6-2016-160206014218/95/big-data-hpc-convergence-and-a-bunch-of-other-things-18-638.jpg?cb=1454723352
[09:21:38] <hyc> storage cost http://image.slidesharecdn.com/jsu-feb6-2016-160206014218/95/big-data-hpc-convergence-and-a-bunch-of-other-things-19-638.jpg?cb=1454723352
[09:21:41] <andytoshi> i believe that, i'm more trying to justify to myself all the money i've spent on RAM, than i am trying to contribute to the conversation ;)
[09:21:48] <hyc> lol
[09:21:58] <gmaxwell> the big thing equihash authors misunderstood is that memory bandwidth limits aren't as much a fundimental property of memory as they were a fundimental property of external to chip IO.. put the memory on chip and the bandwidth goes up 100x with no other tech improvements.
[09:22:07] <hyc> storage cost declining 38%/yr, vs compute at 33%/yr.
[09:22:35] <hyc> I guess nobody told those guys about HBM
[09:22:36] <gmaxwell> storage != ram though, yes bulk storage cost overtime has historically improved better than compute.
[09:23:02] <-- muff1nman ([email protected]) has quit (Quit: No Ping reply in 180 seconds.)
[09:23:25] --> el00ruobuob_[m] ([email protected]) has joined #monero-research-lab
[09:23:34] --> thelinuxguy7 ([email protected]) has joined #monero-research-lab
[09:23:37] <gmaxwell> hyc: eldyentyrell posted on the zcash github, basically pointing out package attached ram, tsvs etc... they ignored him.
[09:24:14] <hyc> such a surprising response ... ...
[09:24:22] <gmaxwell> but thats what you get when you have self appointed narrow field experts design something like that, without input from the very people who would be most responsible for optimizing it (hardware engineers).
[09:24:46] <gmaxwell> the comment has since been deleted, or I'd link to it.
[09:25:21] --> muff1nman ([email protected]) has joined #monero-research-lab
[09:25:27] <hyc> btw andytoshi, might be time to splurge on some more DRAM again soon https://press.trendforce.com/press/20180926-3163.html
[09:25:28] --> midipoet (uid316937@gateway/web/irccloud.com/x-phbbtmjglhzxrinb) has joined #monero-research-lab
[09:25:43] <gmaxwell> in any case, equihash guys appeared to understand what was needed, but misunderstood what hardware provided. ... still better than most altcoin pow designs, that failed in both domains.
[09:26:04] --> jwheare10 ([email protected]) has joined #monero-research-lab
[09:26:13] <andytoshi> hyc: my real struggle is that the 14in thinkpads have had motherboards capped at 32Gb for like 6 years
[09:26:33] <hyc> ah I noticed that. was looking at picking up an A485 myself
[09:26:40] <gmaxwell> andytoshi: the xeon based luggable computers will take 64. :P
[09:26:46] <-- thelinuxguy7 ([email protected]) has quit (Remote host closed the connection)
[09:27:09] <knaccc> Lenovo’s ThinkPad X1 Extreme: Hex-core, GTX 1050 Ti, 64GB RAM under 4 pounds
[09:27:09] <hyc> gmaxwell: so you believe memory-hardness is still a viable approach? taking on-chip memory into account?
[09:28:11] <gmaxwell> hyc: I think no one really knows. In general you can see memory hardness as a speific instance of "upfront cost hardness", and I think there are good arguments as to why "upfront cost hardness" is bad, both for POW-consensus and for password security (though for different reasons).
[09:29:09] <gmaxwell> Basically, with upfront cost hardness, you pay a lot to buy in to attack, but the amoritized cost after that is low.  So this favors first movers and parties that can run the stuff the longest before throwing it out.
[09:29:31] <gmaxwell> Proof of Purchase, instead of Proof of Work. :P
[09:29:36] <-- jwheare10 ([email protected]) has quit (Remote host closed the connection)
[09:29:37] <hyc> that would seem to be true even if we were in a CPU-only world
[09:29:46] <hyc> first movers just buy up the most CPUs
[09:31:51] <gmaxwell> I think trying to be cpu only, mostly, is a lost cause... for one, it's a massive subsidy for patent rights for fancy cpus rather than a fundimental cost, meaning some cpu pirate would perhaps have the best advantage.   Also no matter how 'cpu only' your function is, someone could make a mining specific cpu that strips the useless parts (like pcie busses) and uses a fair bit less power and
[09:31:51] <gmaxwell> silicon area.  maybe at best you could make the optmized hardware only 2x power efficient and 10x area efficient,  but in mining (at least) that kind of small gap guarentees the generic hardware will be driven out of business eventually.
[09:32:17] <gmaxwell> (cpu pirates, or cpu companies themselves, of course)
[09:35:29] <knaccc> don't economies of scale factor into things?
[09:35:31] <hyc> I think that's overly pessimistic. on a headless server, dedicated to mining, the ALUs will be busy 100% of the time, and external buses will be mostly idle
[09:35:43] <gmaxwell> I'm much more hopeful of the utility of fancy work functions for password security than for mining. I think for mining making specialized stuff 'merely' 2x/10x more efficient is basically useless in the long run,  but for password security, it's fine
[09:35:53] <hyc> a special purpose mining chip will still need a communications channel to get data in and out.
[09:36:09] <gmaxwell> hyc: I dunno about other things, but idle the pci-e controllers in epyc draw about 4 watts. (just as an example)
[09:36:09] <hyc> this comms channel would have about the same utilization as the buses in the server.
[09:36:55] <gmaxwell> a mining chip needs a tin can and string to get data in and out. it's just not compariable to a gbytes per second bus. :)
[09:37:22] <hyc> sure, but that gbytes/sec bus only blips on for a microsecond
[09:37:26] <knaccc> what's the worst that can happen if we try hyc's idea for a few Monero release cycles just to see what happens? it seems like it has so much potential
[09:37:47] <hyc> the worst thing that can happen is a few months later an ASIC shows up
[09:38:08] <knaccc> yeah and so we fork back to something else, no real damage
[09:38:48] <knaccc> or make it hybrid, so it's every 4th block or something that requires hyc's idea, and then ramp up depending on the response
[09:38:50] <hyc> the damage is if we have no fallback algorithm for the fork after that
[09:39:03] <moneromooo> The worst is that someone finds a way to speed up massively by shortcut, and 99%s the network and keeps mum.
[09:39:35] <hyc> to get 99% of the network they'd have a huge visible impact on network hashrate
[09:39:57] <gmaxwell> knaccc: it's kind of an embarassingly unsound design.
[09:40:23] <gmaxwell> so one disadvantages is that people like me start thinking monero is more like ethereum in terms of yolo design.
[09:40:28] <moneromooo> Yes, we'd certainly see. My point is that an ASIC manufacturer would have an incentive to keep the network "unaffected".
[09:40:59] <knaccc> haha yolo design, nice term
[09:41:15] <gmaxwell> (and FWIW, ethereum had proposed a similar POW,  with a RNG selecting a random circuit, and they abandoned it after it was shown to be broken by the same kind of optimizations I described)
[09:41:42] <hyc> are you talking about something before ProgPoW?
[09:42:03] <gmaxwell> yes, long before they launched.
[09:42:24] <hyc> can you point me to the discussion?
[09:42:25] <gmaxwell> the original eth hash constructed a random arithmetic circuit over 256 bit integers.
[09:43:26] --> Trieste24 ([email protected]) has joined #monero-research-lab
[09:43:34] <-- Trieste24 ([email protected]) has quit (Remote host closed the connection)
[09:43:51] <gmaxwell> they seem to have vanished a lot of the docs, you can see it described on this page: http://nbviewer.jupyter.org/gist/anonymous/cb53d06b837be97ebe32
[09:44:01] <hyc> thanks
[09:44:19] <gmaxwell> (hm that shows 64 bit, but I am pretty darn confident that it was 256 bit at one point, they changed it a lot)
[09:45:08] <gmaxwell> yea, that must be a later version. the first version had instructions like div and mod too.
[09:45:45] <endogenic> "the worst thing that can happen is a few months later an ASIC shows up" < you also lose however many months towards developing the old mining algo's asic resistance while miners are going to be hedging for a future switch back
[09:46:32] <hyc> the current algo is not asic-resistant at all. it is simply being changed, to break existing ASICs
[09:47:41] <hyc> this "Random Circuit" is pretty trivial
[09:48:52] <gmaxwell> It's general for all computation, so I don't think you can say it is any less 'trivial' than anything else. Additional complexity may just be obfscuation.
[09:49:23] <knaccc> Eventually, I predict all of the world's governments will issue every citizen with a unique asymmetric keypair, publish a list of all issued public keys so that the number issued is known, and then we don't need proof of work any more
[09:49:32] <gmaxwell> obfuscation*
[09:49:38] <hyc> difference between theory and practice. in theory algos X and Y may be comparable, while in practice one is far more easily implementable than another
[09:49:41] <gmaxwell> knaccc: hi, mike hearn.
[09:50:01] <knaccc> haha oh is that what he is hoping for too
[09:50:13] <gmaxwell> knaccc: "proof of passport" was a thing he wanted.
[09:50:30] <hyc> is he an book-of-revelations/mark-of-the-beast bible thumper?
[09:50:46] <knaccc> ha interesting, thanks i'll google it
[09:51:26] <hyc> that would kind of defeat the purpose of a permissionless anonymous cryptocurrency
[09:51:40] <gmaxwell> hyc: If you look at X and say it's trivial, but then we look at Y and can that Y is provably reducably to X, shouldn't that at least worry you that Y is also similarly trivial?
[09:52:02] <knaccc> hyc: the keypair would only be used for "mining", not as wallet public keys
[09:52:05] <hyc> yes, that's a valid point
[09:52:51] <hyc> I would say that since the inputs are ultimately the same, the range of permutations may also be the same.
[09:53:05] <endogenic> the monero project would have had the opportunity to learn something more about asic resistance through having the problem of having to manually tweak the algorithm..
[09:53:49] <hyc> I mean, we know that PRNGs can't create entropy. so it may all just turn out to be pointless window dressing
[09:54:28] <hyc> but the flip side of the argument is that the existing algos haven't squeezed out as many permutations as they could have
[09:56:29] <hyc> if we can say these 2^256 permutations can only be reduced to 2^32 unique templates, we've still created a problem that's much harder for an ASIC implementer
[09:58:18] <endogenic> how much harder is it for us?
[09:59:33] <hyc> with a CPU? it would be no particular disadvantage
[10:06:55] <endogenic> what changes can be made to it though?
[10:07:57] <gmaxwell> hyc: I don't agree at all.
[10:08:47] <gmaxwell> the asic implemter would look at those 2^32 unique templates, find a large subclass that a stupidly cheap to implement, implement only those, and then just make their chip reconize and abort on the others.
[10:11:24] <hyc> we could simulate this right now. modify a randomJS impl and make it abort on 90% of its inputs
[10:11:50] <gmaxwell> hyc: and be 1000x faster on the remaining 10%...
[10:12:10] <endogenic> ^
[10:12:23] <hyc> sure, we can replace it with a 10ns sleep or whatnot
[10:14:04] <hyc> hm, no, we still need it to produce a valid output for the cases it actually executes.
[10:14:14] <hyc> we'd have to slow down the regular impl isntead.
[10:16:11] <hyc> but that'd work fine. we can also vary the delay factor, and see where the breakeven point is