Effort and block number distributions of mining pools and miners

## Effort distribution

Mining blocks/share can be approximated by a Poisson point process. The uniformity property of the hash function tells us that for a fixed difficulty `diff` the probability of finding a block in a single attempt `p = 1/diff`. Therefore the probability `P(n)` of finding a block after `n` failed attempt is given by the geometric distribution
```
P(n) = (1 - p)(1 - p)...(1 - p) p = (1-p)^n p.
       \________n times_______/
```
Let `p = 1/diff = (n/diff)/n`. By taking the limit `n -> inf` of a large number of attempts while keeping the ratio `n/diff` finite we find
```
P(n)dn = (1/diff) exp(-n/diff) dn
```
where the infinitesimal `1/n -> dn`. The ratio `n/diff` is nothing but the effort (technically `(n+1)/diff` but we don't care for large `n`), i.e. the number of attempted hashes before the next block/share is found in units of the difficulty. Let `f = n/diff`. With this change of variable we get the effort distribution
```
P(f)df = exp(-f)df,
```
i.e. the standard exponential. [Figure 1](https://imgur.com/a/BCfwP41) compares the empirical distribution of the efforts of the last 100 blocks on p2pool and p2pool mini (top row), and the last 49 shares of two randomly selected miners on p2pool and p2pool mini respectively (bottom row). All 4 empirical distributions pass the Kolmogorov-Smirnov test, i.e. fail to reject the null hypothesis that the sample of efforts could have come out of a standard exponential.

It follows that the CDF
```
P(f <= F) = integral exp(-f) over f from 0 to F
          = 1 - exp(-F).
```
and thus the quantiles
```
Q(x) = -log(1 - x).
```
Alternatively, we could focus on the change of variable `n = ht` involving the product of the hash rate `h` and `t` the inter-block time. With this change of variables we find the distribution of inter-block times
```
P(t)dt = (h/diff) exp(-(h/diff)t) dt,
```
namely the exponential distribution with rate `h/diff`.

Compared to the inter-block times distribution, the effort distribution has two favourable properties. It does not depend on the difficulty of the network and is therefore static over time, and it also does not depend on the local hash rate and is thus universal across pools and miners.

Some effort quantiles of interest are
```
Q(0.05) = 5%
Q(0.25) = 29%
Q(0.50) = 69%
Q(0.63) = 100%
Q(0.75) = 139%
Q(0.86) = 200%
Q(0.95) = 300%
```
These tell us that the odds below:above of an effort of 100% are roughly 2:1, and the odds below:above 200% are roughly 6:1. In other words mining blocks with efforts of 200% or more is not that atypical.

### Distribution of the number of shares/blocks for a given cumulative effort

Given once again the probability `p = 1/diff` of hitting a share, the probability of a single typical run `i` of `n` trials out of which `k` shares were found could look something like this

```
Q_i(k | n) = (1 - p)(1 - p) p (1 - p) p (1 - p)(1 - p) p p (1 - p)...(1 - p)(1 - p) = p^k (1-p)^(n-k)
           \____k successes/shares with prob p, n-k failures with prob 1 - p____/
```

To get the total probability of finding `k` shares within `n` attempts, we must account for all possible permutations `i` of failures and successes, i.e. all possible arrangement of `n-k` failure probability `(1 - p)`'s and `k` success probabilities `p`'s. There are n_choose_k of them, namely as many as there are ways of picking `k` positions among `n` possible positions. The probability of hitting `k` shares within `n` attempts is therefore given by the binomial distribution with success probability `p`:
```
                                   n!
P_exact(k shares | n hashes) = ---------- p^k (1 - p)^(n - k).
                               k!(n - k)!
```
Since the cumulative effort `f = np`, we can write `p^k = f^k/n^k`. For `n` much larger than `k` we have `n!/k!/(n-k)!/n^k -1`, and together with large difficulty, that is `p << 1`, we therefore find `(1-p)^(n-k) -exp(-f)`. Putting those limits together we find to a good approximation
```
                         f^k
P(k shares | effort f) = ---- exp(-f)
                          k!
```
which is nothing but the Poisson distribution. The mode of the Poisson distribution is `floor(f)`, the mean is `f`, and the standard deviation `sqrt(f)`. Using the CDF of the Poisson distribution, we find the probability of the d-th deviate
```
                                         Gamma(floor(f + d*sqrt(f) + 1), f)     Gamma(floor(f - d*sqrt(f)), f)
P(f - d*sqrt(f) <= k <= f + d*sqrt(f)) = ----------------------------------  -  ----------------------------------
                                           Gamma(floor(f + d*sqrt(f) + 1))        Gamma(floor(f - d*sqrt(f)))
```
where `Gamma(x, s)` is the upper incomplete gamma function. To give a practical example, this means that for a cumulative effort `f = 1500%` one will likely have found roughly 15 ± 4 blocks/shares with 63% probability (1st deviate), and 15 ± 8 blocks/shares with 95% probability (2nd deviate). As the cumulative effort `f` becomes large, those deviate probabilities will converge to those of the normal deviates, namely 68.3% at 1 standard deviation and 95.4% at 2 standard deviations. For d-th deviates with `d >= sqrt(2 + 1/f + f)` the lower bound in the above expression falls outside the support of the Poisson distribution, i.e. k < 0, and should instead be interpreted as 0. Computationally we simply replace every instances of `f - d*sqrt(f)` with `max(0, f - d*sqrt(f))` to keep the above expression valid for all `d 0`.

We can of course rewrite the all expressions above in terms of the local hash rate `h` and elapsed mining time `T` by substituting `n = hT` such that `f = hT/diff` and thus
```
                     (hT/diff)^k
P(k shares | h, T) = ----------- exp(-hT/diff).
                          k!


## Reference tables of #shares probabilities as a function of the cumulative effort. The last column represents the 1st deviate probabilities, i.e. the probability of falling within 1 standard deviation of the mean number of shares as denoted by square brackets.

 #shares       0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
  effort                                                                                                   P([bulk])
     25%    [78%]  19%    2%                                                                                   78%
     50%    [61%   30%]   8%    1%                                                                             91%
     75%    [47%   35%]  13%    3%    1%                                                                       83%
    100%    [37%   37%   18%]   6%    2%                                                                       92%
    150%    [22%   33%   25%]  13%    5%    1%                                                                 81%
    200%    [14%   27%   27%   18%]   9%    4%    1%                                                           86%
    300%      5%  [15%   22%   22%   17%]  10%    5%    2%    1%                                               77%
    400%      2%    7%  [15%   20%   20%   16%   10%]   6%    3%    1%    1%                                   80%
    500%      1%    3%   [8%   14%   18%   18%   15%   10%]   7%    4%    2%    1%                             83%
    600%            1%    4%   [9%   13%   16%   16%   14%   10%]   7%    4%    2%    1%    1%                 79%
    700%            1%    2%    5%   [9%   13%   15%   15%   13%   10%]   7%    5%    3%    1%    1%           75%
    800%                  1%    3%    6%   [9%   12%   14%   14%   12%   10%]   7%    5%    3%    2%    1%     72%
    900%                        1%    3%    6%   [9%   12%   13%   13%   12%   10%    7%]   5%    3%    2%     76%
   1000%                        1%    2%    4%   [6%    9%   11%   13%   13%   11%    9%    7%]   5%    3%     80%
 #shares       0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15