Omnium Gatherum

Praemissa propositio . Consider k threads independently selecting one item each from a collection of n items. The probability that at least one thread selects a particular item is the complement of the probability that every thread selects an item other than that:

1 ( n 1 n ) k

Utile theorema . When k n , this is approximately k / n .

1 ( 1 1 / n ) k k / n

More precisely, the relative difference between these can be made arbitrarily small by decreasing k or increasing n. This can be seen here.

Sometimes this approximation is more useful in a reärranged form:

( 1 1 / n ) k 1 k / n .

Ratio demonstrandi . When k = 1, this is exact:

1 ( 1 1 / n ) = 1 1 + 1 / n = 1 / n

From here on, we will assume that k is greater than one.

Considering k = 2 and k = 3 gives the general idea of an argument. We first reärrange

1 k / n ( 1 1 / n ) k

and then expand:

( 1 1 / n ) 2 = 1 2 / n + 1 / n 2 ( 1 1 / n ) 3 = ( 1 2 / n + 1 / n 2 ) ( 1 1 / n ) = 1 3 / n + 3 / n 2 1 / n 3 .

Here we can see the origin of the k / n term,

( 1 ( k 1 ) / n + · · · ) ( 1 1 / n ) = 1 × 1 + 1 × ( 1 / n ) + ( ( k 1 ) / n ) × 1 + · · ·

and indeed, simply applying the binomial theorem will suffice:

( 1 1 / n ) k = ( k 0 ) 1 k ( 1 / n ) 0 + ( k 1 ) 1 k 1 ( 1 / n ) 1 + ( k 2 ) 1 k 2 ( 1 / n ) 2 + ( k 3 ) 1 k 3 ( 1 / n ) 3 + · · · = 1 k n + k ( k 1 ) 2 1 n 2 k ( k 1 ) ( k 2 ) 6 1 n 3 + . . . .

The absolute error term is

E abs = k ( k 1 ) 2 1 n 2 k ( k 1 ) ( k 2 ) 6 1 n 3 + · · ·

and the relative error term E rel = E abs / ( k / n ) is

E rel = k 1 2 n ( k 1 ) ( k 2 ) 6 n 2 + . . . .

We need to show that, for any ε less than one, the relative error is less than ε when n is k / ε .

ε > k 1 2 k ε ( k 1 ) ( k 2 ) 6 k 2 ε 2 + · · · 1 > k 1 2 k k 1 2 k k 2 3 k ε + · · ·

Each term on the right-hand is larger than the succeeding term, and since the signs alternate, each term is also larger than the succeeding series.

Lemma. Suppose that · · · > x 2 > x 1 > x 0 > 0 . Then x 1 x 0 > 0 and x 2 > x 1 x 0 , because x 2 > x 1 and x 1 > x 1 x 0 . x 2 > x 1 x 0 > 0 x 3 > x 2 ( x 1 x 0 ) > 0 x 4 > x 3 ( x 2 ( x 1 x 0 ) ) > 0 Inductively, we have that x n > x ( n 1 ) x ( n 2 ) + x ( n 3 ) · · · > 0 .

And so we conclude that

1 > k 1 2 k k 1 2 k k 2 3 k ε + . . .

Praemissa propositio . Consider k threads independently selecting one item each from a collection of n items. The probability that two or more threads select a particular item is the complement of the probability that every thread selects an item other than that or exactly one thread selects the item :

1 ( ( n 1 n ) k + k n ( n 1 n ) k 1 ) = 1 ( n 1 n ) k ( 1 + k n n n 1 ) = 1 ( 1 + k n 1 ) ( n 1 n ) k .

Utile theorema . When k n , this is approximately

1 2 k ( k 1 ) n ( n 1 ) .

Admonitio . This is different than the probability of a collision occurring, that is, the prob­a­bility of two or more threads selecting the same item (where the selected item may be any item rather than a particular item).


Praemissa propositio . Suppose there are k threads selecting openings (and playing out games from them) and that these threads cannot communicate. How do we avoid collisions (book exits being selected by more than one thread) while also having randomized and uniform coverage? Broadly speaking, we simply make the space of possible book exits to sample from large enough that the chance of a collision is low.

Suppose we want to generate g book exits (play out g games) in total, or g / k games per thread. Let s be a multiplier, so that that the number of possible book exits to pick from is g × s. Then the fraction of the space g × s that each thread will explore is

p = g / k g × s = 1 / ( k × s ) .

For any particular thread, and any particular book exit, this is the probability that the thread will pick that book exit. For convenience, let’s also define q = 1 p , which is the probability that a particular thread does not pick a particular book exit. The probability that a particular book exit is picked by any thread is

a = 1 q k

and the probability that a particular book exit is picked by exactly one thread is

u = k × p × q k 1 .

Then the probability that a particular book exit is picked by exactly one thread given that it was picked is u / a.

Quite happily, u / a rapidly converges to a limit as k increases, which follows from our approximation. We first substitute the definitions of u, p, q, and a to obtain

u / a = k k × s × ( 1 1 / ( k × s ) ) k 1 1 ( 1 1 / ( k × s ) ) k .

Then we have

u / a 1 s × 1 ( k 1 ) / ( k × s ) k / ( k × s ) = 1 k 1 k × 1 s

when k × s k , or more simply, when s 1 .

When k 1 , this is approximately 1 1 / s .

Alternatively, we might instead write

u / a = 1 s × ( 1 1 / ( k s ) ) × ( 1 1 / ( k s ) ) k 1 ( 1 1 / ( k s ) ) k

and then

u / a 1 s × ( 1 1 / ( k s ) ) × 1 1 / s 1 / s = 1 1 / s 1 1 / ( k s ) = s 1 s 1 / k

and once again, when k 1 , this is approximately ( s 1 ) / s = 1 1 / s . However, these are fairly poor approximations; an approximation like 1 1 / ( 2 s ) works much better.

We might try repeating the same procedure above using

( 1 1 / n ) k 1 k n + k ( k 1 ) 2 n 2

and derive

u / a s 1 + 1 / ( 2 s ) s ½

when k 1 , which is an improvement, but still worse than 1 1 / ( 2 s ) .

We can instead take a different tack.

Utile theorema . The probability that a particular item (of k × s items) is selected by exactly one thread (of k threads) given that it was selected by any thread is

1 s × ( 1 1 / ( k × s ) ) k 1 1 ( 1 1 / ( k × s ) ) k .

When s 1 , this is approximately

1 1 2 s × k 1 k 1 / s ,

or simply 1 1 / ( 2 s ) when k 1 .

Ratio demonstrandi . Let us rewrite the expression as

( ) 1 s × 1 1 1 s k × ( 1 1 s k ) k 1 ( 1 1 s k ) k

and consider the rightmost term

( 1 1 / ( s k ) ) k 1 ( 1 1 / ( s k ) ) k

in isolation.

We first rewrite this term further as

( ) 1 1 ( 1 1 / ( s k ) ) k 1 . Æquatio. For any z, z 1 z = 1 1 z 1 by the following sequence of rewrites: z 1 z = 1 ( 1 z ) 1 z = 1 1 z 1 z 1 z = 1 1 z 1 .

The subterm ( 1 1 / ( s k ) ) k can be rewritten as

( s k 1 ) k ( s k ) k

and then we replace ( s k 1 ) k with its binomial expansion:

( s k 1 ) k ( s k ) k = 1 s k k k ( s k k k s k 1 k k + 1 2 k 1 k s k 2 k k · · · ) = 1 1 s + 1 2 k 1 k 1 s 2 · · ·

(The elaboration of the binomial expansion has been omitted for brevity.) We create an approximation of the subterm by truncating the series. Substituting this in † gives us

1 1 ( 1 1 s + 1 2 k 1 k 1 s 2 ) 1 = 1 1 s 1 2 k 1 k 1 s 2 1 = 2 k s 2 2 k s k + 1 1 = 2 k s 2 2 k s + k 1 2 k s k + 1

which is exactly

s 1 s 1 2 k .

(The steps of polynomial division have been omitted for brevity.)

We now substitute this in and proceed to simplify:

1 s × 1 1 1 / ( s k ) × ( s 1 2 1 2 k ) = k s k 1 ( s 1 2 1 2 k ) = s k k / 2 1 / 2 s k 1 = s k 1 k / 2 + 1 / 2 s k 1 = 1 1 2 k 1 s k 1 = 1 1 2 1 s s k s s k 1 = 1 1 2 s k 1 k 1 / s

Annotatio . By inspection, we can see that the probability of uniqueness is 0.95 around s = 10; in other words, for each book exit, the probability that the book exit is duplicated is less than 5% when s = 10, and this is true independent of the number of threads, k, and the total number of games, g, we want to play. (The chance of duplication is even less when k is small.)