Omnium Gatherum

§1 Selecting a Particular Item Once

Praemissa propositio . Consider k threads independently selecting one item each from a collection of n items. The probability that at least one thread selects a particular item is the complement of the probability that every thread selects an item other than that:

1 - {(\frac{n - 1}{n})}^{k}

Utile theorema . When $k ≪ n$ , this is approximately $k / n$ .

1 - {(1 - 1 / n)}^{k} \sim k / n

More precisely, the relative difference between these can be made arbitrarily small by increasing n. This can be seen here.

Sometimes this approximation is more useful in a reärranged form:

{(1 - 1 / n)}^{k} \sim 1 - k / n .

Ratio demonstrandi . When k = 1, this is exact:

1 - (1 - 1 / n) = 1 - 1 + 1 / n = 1 / n

From here on, we will assume that k is greater than one.

Considering k = 2 and k = 3 gives the general idea of an argument. We first reärrange

1 - k / n \sim {(1 - 1 / n)}^{k}

and then expand:

{(1 - 1 / n)}^{2} = 1 - 2 / n + 1 / n^{2}

{(1 - 1 / n)}^{3} = (1 - 2 / n + 1 / n^{2}) (1 - 1 / n) = 1 - 3 / n + 3 / n^{2} - 1 / n^{3} .

Here we can see the origin of the $- k / n$ term,

(1 - (k - 1) / n + · · ·) (1 - 1 / n)

= 1 \times 1 + 1 \times (- 1 / n) + (- (k - 1) / n) \times 1 + · · ·

and indeed, simply applying the binomial theorem will suffice:

{(1 - 1 / n)}^{k} = (\binom{k}{0}) 1^{k} {(- 1 / n)}^{0} + (\binom{k}{1}) 1^{k - 1} {(- 1 / n)}^{1}

+ (\binom{k}{2}) 1^{k - 2} {(- 1 / n)}^{2} + (\binom{k}{3}) 1^{k - 3} {(- 1 / n)}^{3} + · · ·

= 1 - \frac{k}{n} + \frac{k (k - 1)}{2} \frac{1}{n^{2}} - \frac{k (k - 1) (k - 2)}{6} \frac{1}{n^{3}} + . . . .

The absolute error term is

E_{abs} = \frac{k (k - 1)}{2} \frac{1}{n^{2}} - \frac{k (k - 1) (k - 2)}{6} \frac{1}{n^{3}} + · · ·

and the relative error term $E_{rel} = E_{abs} / (k / n)$ is

E_{rel} = \frac{k - 1}{2 n} - \frac{(k - 1) (k - 2)}{6 n^{2}} + . . . .

We need to show that, for any ε less than one, the relative error is less than ε when n is $k / ε$ .

ε > \frac{k - 1}{2 k} ε - \frac{(k - 1) (k - 2)}{6 k^{2}} ε^{2} + · · ·

1 > \frac{k - 1}{2 k} - \frac{k - 1}{2 k} \frac{k - 2}{3 k} ε + · · ·

Each term on the right-hand side is larger than the succeeding term, and since the signs alternate, each term is also larger than the succeeding series.

Lemma. Suppose that

· · · > x_{2} > x_{1} > x_{0} > 0 .

Then $x_{1} - x_{0} > 0$ and $x_{2} > x_{1} - x_{0}$ (because $x_{2} > x_{1}$ and $x_{1} > x_{1} - x_{0}$ ) .

x_{2} > x_{1} - x_{0} > 0

x_{3} > x_{2} - (x_{1} - x_{0}) > 0

x_{4} > x_{3} - (x_{2} - (x_{1} - x_{0})) > 0

Inductively, we have that

x_{n} > x_{(n - 1)} - x_{(n - 2)} + x_{(n - 3)} - · · · > 0 .

And so we conclude that

1 > \frac{k - 1}{2 k} - \frac{k - 1}{2 k} \frac{k - 2}{3 k} ε + . . . ∎

Addendum . A better approximation is

{(1 - 1 / n)}^{k} \sim 1 - \frac{k}{n + (k - 1) / 2} .

§2 Selecting a Particular Item Twice

Praemissa propositio . Consider k threads independently selecting one item each from a collection of n items. The probability that two or more threads select a particular item is the complement of the probability that ⌜every thread selects an item other than that or exactly one thread selects the item⌝ :

1 - ({(\frac{n - 1}{n})}^{k} + \frac{k}{n} {(\frac{n - 1}{n})}^{k - 1})

= 1 - {(\frac{n - 1}{n})}^{k} (1 + \frac{k}{n} \frac{n}{n - 1})

= 1 - (1 + \frac{k}{n - 1}) {(\frac{n - 1}{n})}^{k}

Utile theorema . When $k ≪ n$ , this is approximately

\frac{1}{2} \frac{k (k - 1)}{n (n - 1)} .

Admonitio . This is different than the probability of a collision occurring, that is, the probability of two or more threads selecting the same item (where the selected item may be any item rather than a particular item).

§3 Selecting Any Item Twice

Praemissa propositio . Consider k threads independently selecting one item each from a collection of n items. The probability of a collision occurring, that is, the probability that two or more threads select the same item (which may be any item), is the complement of the probability that the threads make unique selections:

1 - \frac{n (n - 1) (n - 2) · · · (n - k + 1)}{n^{k}}

= 1 - \frac{n!}{(n - k)!} \frac{1}{n^{k}}

= 1 - (\binom{n}{k}) \frac{k!}{n^{k}} .

(The expression above is valid when k ≤ n; when k > n, the probability is 1.)

Utile theorema . When $k ≪ n$ , this is approximately

\frac{k (k - 1)}{2 n} .

Ratio demonstrandi . For this approximation, we will be slightly less rigorous than we were for §1 in order to demonstrate a different technique.

We can rewrite the expression slightly as

1 - \frac{n}{n} \frac{(n - 1)}{n} \frac{(n - 2)}{n} · · · \frac{(n - k + 1)}{n}

= 1 - (\frac{n}{n} - \frac{0}{n}) (\frac{n}{n} - \frac{1}{n}) (\frac{n}{n} - \frac{2}{n}) · · ·

= 1 - \prod_{x = 0}^{k - 1} (1 - \frac{x}{n})

and then take the logarithm:

= 1 - exp \sum_{x = 0}^{k - 1} log (1 - x / n) .

For $z \approx 0$ , we note that $log (1 - z) \approx - z$

· · · \sim 1 - exp \sum_{x = 0}^{k - 1} (- x / n)

= 1 - exp (- \frac{k (k - 1)}{2 n})

and $exp (- z) \approx 1 - z$ :

· · · \sim 1 - (1 - \frac{k (k - 1)}{2 n}) = \frac{k (k - 1)}{2 n} ∎

Observantia . This is the birthday problem:

1 - \prod_{x = 0}^{x = 22} (1 - \frac{x}{365}) \approx 51% .

Although 365 is not sufficiently larger than 23 for the approximation to be used advisably,^† the estimate is sane:

\frac{23 \times 22}{2 \times 365} \approx 69% .

†Generally speaking, n ought to a few times larger than k² for the approximation to hold well.

Admonitio . This is different than the expected collision rate — the expected number of items that are selected by more than one thread or the expected number of threads that select the same item as another thread.

§4 Avoiding the Selection of Any Item Twice

Praemissa propositio . Suppose there are k threads selecting openings (and playing out games from them) and that these threads cannot communicate. How do we avoid collisions (book exits being selected by more than one thread) while also having randomized and uniform coverage? Broadly speaking, we simply make the space of possible book exits to sample from large enough that the chance of a collision is low.

Suppose we want to generate g book exits (play out g games) in total, or g / k games per thread. Let s be a multiplier, so that the number of possible book exits to pick from is g × s. Then the fraction of the space g × s that each thread will explore is

p = \frac{g / k}{g \times s} = 1 / (k \times s) .

For any particular thread, and any particular book exit, this is the probability that the thread will pick that book exit. For convenience, let’s also define $q = 1 - p$ , which is the probability that a particular thread does not pick a particular book exit. The probability that a particular book exit is picked by any thread is

a = 1 - q^{k}

and the probability that a particular book exit is picked by exactly one thread is

u = k \times p \times q^{k - 1} .

Then the probability that a particular book exit is picked by exactly one thread given that it was picked is u / a. Substituting the definitions of u, p, q, and a, we obtain

u / a = \frac{1}{s} \times \frac{{(1 - 1 / (k \times s))}^{k - 1}}{1 - {(1 - 1 / (k \times s))}^{k}} .

We might attempt to approximate this by replacing the numerator and denominator of the right-hand fraction with their approximations. Then we have

u / a \sim \frac{1}{s} \times \frac{1 - (k - 1) / (k \times s)}{k / (k \times s)} = 1 - \frac{k - 1}{k} \times \frac{1}{s}

when $k \times s ≫ k$ , or more simply, when $s ≫ 1$ .

When $k ≫ 1$ , this is approximately $1 - 1 / s$ .

Alternatively, we might instead write

u / a = \frac{1}{s \times (1 - 1 / (k s))} \times \frac{{(1 - 1 / (k s))}^{k}}{1 - {(1 - 1 / (k s))}^{k}}

and then

u / a \sim \frac{1}{s \times (1 - 1 / (k s))} \times \frac{1 - 1 / s}{1 / s} = \frac{1 - 1 / s}{1 - 1 / (k s)}

= \frac{s - 1}{s - 1 / k}

and once again, when $k ≫ 1$ , this is approximately $(s - 1) / s = 1 - 1 / s$ . However, these are (in some sense) incorrect approximations; they actually ought to look like $1 - 1 / (2 s)$ .

We might try repeating the same procedure above using

{(1 - 1 / n)}^{k} \sim 1 - \frac{k}{n} + \frac{k (k - 1)}{2 n^{2}}

and derive

u / a \sim \frac{s - 1 + 1 / (2 s)}{s - ½}

when $k ≫ 1$ , which is an improvement, but still worse than $1 - 1 / (2 s)$ .

We can instead take a different tack.

Utile theorema . The probability that a particular item (of k × s items) is selected by exactly one thread (of k threads) given that it was selected by any thread is

\frac{1}{s} \times \frac{{(1 - 1 / (k \times s))}^{k - 1}}{1 - {(1 - 1 / (k \times s))}^{k}} .

When $s ≫ 1$ , this is approximately

1 - \frac{1}{2 s} \times \frac{k - 1}{k - 1 / s},

or simply $1 - 1 / (2 s)$ when $k ≫ 1$ .

Ratio demonstrandi . Let us rewrite the expression as

(⋆) \frac{1}{s} \times \frac{1}{1 - \frac{1}{s k}} \times \frac{{(1 - \frac{1}{s k})}^{k}}{1 - {(1 - \frac{1}{s k})}^{k}}

and consider the rightmost term

\frac{{(1 - 1 / (s k))}^{k}}{1 - {(1 - 1 / (s k))}^{k}}

in isolation.

We first rewrite this term further as

(†) \frac{1}{1 - {(1 - 1 / (s k))}^{k}} - 1 .

Æquatio.

For any z,

\frac{z}{1 - z} = \frac{1}{1 - z} - 1

by the following sequence of rewrites:

\frac{z}{1 - z} = \frac{1 - (1 - z)}{1 - z} = \frac{1}{1 - z} - \frac{1 - z}{1 - z} = \frac{1}{1 - z} - 1 .

The subterm ${(1 - 1 / (s k))}^{k}$ can be rewritten as

\frac{{(s k - 1)}^{k}}{{(s k)}^{k}}

and then we replace ${(s k - 1)}^{k}$ with its binomial expansion:

\frac{{(s k - 1)}^{k}}{{(s k)}^{k}} = \frac{1}{s^{k} k^{k}} (s^{k} k^{k} - s^{k - 1} k^{k} + \frac{1}{2} \frac{k - 1}{k} s^{k - 2} k^{k} - · · ·)

= 1 - \frac{1}{s} + \frac{1}{2} \frac{k - 1}{k} \frac{1}{s^{2}} - · · ·

(The elaboration of the binomial expansion has been omitted for brevity.) We create an approximation of the subterm by truncating the series. Substituting this in † gives us

\frac{1}{1 - (1 - \frac{1}{s} + \frac{1}{2} \frac{k - 1}{k} \frac{1}{s^{2}})} - 1

= \frac{1}{\frac{1}{s} - \frac{1}{2} \frac{k - 1}{k} \frac{1}{s^{2}}} - 1

= \frac{2 k s^{2}}{2 k s - k + 1} - 1

= \frac{2 k s^{2} - 2 k s + k - 1}{2 k s - k + 1}

which is exactly

s - \frac{1}{s} - \frac{1}{2 k} .

(The steps of polynomial division have been omitted for brevity.)

We now substitute this in ⋆ and proceed to simplify:

\sim \frac{1}{s} \times \frac{1}{1 - 1 / (s k)} \times (s - \frac{1}{2} - \frac{1}{2 k})

= \frac{k}{s k - 1} (s - \frac{1}{2} - \frac{1}{2 k})

= \frac{s k - k / 2 - 1 / 2}{s k - 1}

= \frac{s k - 1 - k / 2 + 1 / 2}{s k - 1}

= 1 - \frac{1}{2} \frac{k - 1}{s k - 1} = 1 - \frac{1}{2} \frac{1}{s} \frac{s k - s}{s k - 1}

= 1 - \frac{1}{2 s} \frac{k - 1}{k - 1 / s} ∎

Observantia . By inspection, we can see that the probability of uniqueness is 0.95 around s = 10; in other words, for each book exit, the probability that the book exit is duplicated is less than 5% when s = 10, and this is true independent of the number of threads, k, and the total number of games, g, we want to play. (The chance of duplication is even less when k is small.)