STA 247 - Week 2 lecture summary

Counting

If we assume that outcomes are equally likely, we can find the probability of an event by just counting the number of outcomes in it, and dividing by the total number of outcomes in the sample space. So counting is a fundamental task for probability theory.

The Multiplication Principle:

If an experiment has two parts, and there are n₁ ways for the first part of the experiment to happen, and n₂ ways for the second part to happen, then there are n₁n₂ ways for the whole experiment to happen. This generalizes in the obvious way to more than two parts.

Examples:

We flip a coin and roll a six-sided die. The coin can land in two ways (heads or tails). The die can land in six ways (showing 1, 2, 3, 4, 5, or 6). The total number of possible outcomes in the sample space is therefore 2x6=12. If we assume that the outcomes are equally likely, the probability for any single outcome, such as the coin landing heads and the die showing 4, is therefore 1/12.
We flip a coin six times and roll a six-sided die. Each flip of the coin can land in two ways (heads or tails). The die can land in six ways (showing 1, 2, 3, 4, 5, or 6). The total number of possible outcomes in the sample space is therefore 2x2x2x2x2x2x6 = 2⁶x6 = 384. Let A be the event that exactly one flip lands heads and the die shows 1. The number of outcoes in A is 6, since there are six possibilities for which flip is a head and only one way for the die to show 1. Hence, if we assume that the outcomes are equally likely, P(A)=6/384.

Drawing balls from urns

Drawing balls from an urn is a standard example in probability, which represents many actually applications, or parts of applications.

Suppose that we have an urn containing n distinguishable balls. We draw a ball from the urn k times. We can do this in two ways: we might replace the ball drawn each time before drawing the next ball, or we might not replace the ball (in which case k cannot be bigger than n). We may also consider the order of balls drawn to matter, or we may consider the draws to be unordered.

We'll count the number of possible outcomes for each of these possible urn drawing scenarios.

Drawing with replacement, ordered result:

Since we replace the balls drawn, each draw can pick any of the n balls. Since we drawn k times, the multiplication principle says that the number of possible outcomes is n multiplied by itsef k times, which is n^k.
Example: We draw with replacement two times from an urn containing three balls - red, green, and blue. There are 3²=9 possible outcomes: RR, RG, RB, GR, GG, GB, BR, BG, BB.

Drawing without replacement, ordered result:

When we don't replace the balls, the number of possible ball choices goes down by one after each draw. The multiplication principle then says that the total number of possible outcomes with k draws from an urn with n balls is n(n-1)(n-2)...(n-k+1) = n!/(n-k)!.
Example: We draw without replacement two times from an urn containing three balls - red, green, and blue. There are 3x2=6 possible ordered outcomes: RG, RB, GR, GB, BR, BG.
If k=n, we are drawing some permutation of the balls. From the formula above, the number of possible permutations is n! (remembering that 0!=1).

Drawing without replacement, unordered result:

If we draw k balls without replacement, and don't look at the order of the balls drawn, the number of possible results of the experiment decreases, compared to the result above when we do look at the order. There are k! ways of ordering k balls (the number of permutations of k items), so the result with ordering over-counts by this factor. So, dividing the number of outcomes with ordering by this factor, we get that the number of ways of drawing k balls without replacement ignoring order from an urn with n balls is n!/((n-k)!k!). This is called ``n choose k'', and is also written as n over k in big parentheses, or as C(n,k). In R, it can be computed as choose(n,k). Note that C(n,k) = C(n,n-k).
Example: We draw without replacement two times from an urn containing three balls - red, green, and blue. There are 3x2/1x2=3 possible unordered outcomes: {R,G}, {R,B}, {G,B}.

Drawing with replacement, unordered result:

We can use the result above to find how many possible outcomes there are when drawing k balls with replacement, when we don't look at the order the balls are drawn in. Since we don't look at the order, all that matters is how many times each of the n balls in the urn are drawn. We can represent a count as a sequence of Os, with the number of Os being equal to the count. We can represent the counts of how many times each of the n balls were drawn by putting together the sequences of Os representing the counts for each ball, separating them with Xs. (We choose some order for the n balls; it doesn't matter which order.)
For example, if n=3, with the balls labelled red, green, and blue, and k=6, one possible outcome is 2 red, 1 green, and 3 blue. Ordering the balls as red, green, blue, these counts can be represented by the sequence OOXOXOOO.
Every set of counts will correspond to a sequence of k+n-1 Xs and Os, in which the number of Xs is exactly n-1, and every such sequence will correspond to a set of counts. The correspondence is one-to-one, so we can count the number of outcomes of the experiment by counting how many sequences there are of length k+n-1 with n-1 of the positions being occupied by Xs.
The number of ways of putting n-1 Xs down in a sequence of length k+n-1 is the same as the number of ways of choosing n-1 balls without replacement from an urn with k+n-1, ignoring order. We figured that out above - it is k+n-1 choose n-1. This is the same as the number of ways of choosing places for the k Os out of the k+n-1 positions, which is k+n-1 choose k.
Example: We draw with replacement two times from an urn containing three balls - red, green, and blue. There are C(2+3-1,2)=6 possible outcomes if we ignore order: {R,R}, {R,G}, {R,B}, {G,G}, {G,B}, {B,B}.
Example: A computer has 6 processors. It is regularly used to run jobs of 4 kinds. It always runs 6 jobs at a time, so that all the processors will be used, but there won't be any processor contention between jobs. The performance of the computer may depend on what kinds of jobs it is running (eg, it may go slowly if two jobs that both access the disk a lot are running). We're therefore interested in how many possible job mixes there are, since we may need to evaluate performance for each job mix.
We can treat this as a problem where we draw k=6 balls (jobs) with replacement from an urn with n=4 balls (kinds of jobs), and we care only about the numbers of jobs of each kind (there's no order to jobs). The answer is therefore C(6+4-1,4-1)=C(9,3)=84 possible job mixes.

The birthday problem / hashing

A famous probability problem is to find how likely it is that, at a party with n people, at least two people have the same birthday. I'll talk about the equivalent problem of finding the probability that hashing n keys to 32-bit hash codes will produce two or more keys with the same hash code.

The hash function takes a string of characters and outputs a 32-bit hash code. For a good hash function, we can model this as a hash code being selected randomly for each key, with all possible combinations of hash codes for keys being equally likely.

Let A be the event that two or more of the n keys have the same hash code. A^c is the event that there is no such collision. We'll find P(A^c), and then get P(A) as 1-P(A^c).

Since we're assuming equally-likely outcomes, P(A^c) = #(A^c)/#(S). The number of outcomes in the sample space is #(S)=(2³²)ⁿ, the same as the number of ways of drawing n balls with replacement from an urn with 2³² balls, paying attention to the order. Using the multiplication principle, the number of outcomes with no collision is

A^c = 2³²(2³²-1)(2³²-2)...(2³²-n+1)

which is the same as the number of ways of drawing n balls from an urn with 2³² balls without replacement, paying attention to the order. We use these numbers to compute P(A)=1-#(A^c)/#(S)

By numerical calculation, I found that P(A) reaches 1% when n is 9292. So even with over 4 billion hash codes (2³²), you can't expect to avoid collisions even with a fairly small number of keys.

Conditional probability

We may find it easiest to first set up a probability model (eg, with equally likely outcomes) that assumes we know nothing but the basic setup, and then look at conditional probabilities that account for what else we know.

Definition:

If event A has non-zero probability, the conditional probability of event B given event A, written P(B|A), is defined to be P(A intersect B)/P(A).

The intended interpretation is that P(B|A) is how likely B is to occur (or has occurred) if we know that A has occurred (and don't know anything else relevant).

Example:

We flip a coin three times. What is the probability that the first flip is a head (event B), given that two of the three flips are heads (event A)?
The sample space is as follows, with the subsets for the events marked:
                HHH HHT HTH HTT THH THT TTH TTT
       A             x   x       x   
       B         x   x   x   x
 A intersect B       x   x
Assuming equally likely outcomes, we see that P(A intersect B)=2/8 and P(A)=3/8, so P(B|A)=(2/8)/(3/8)=2/3.

Conditional probabilities for events all conditional on the same event, B, obey the axioms of probability. That is,

For any event A, P(A|B) >= 0.
If S is the sample space, P(S|B) = 1.
For any disjoint events A₁, A₂, A₃, ...
P(A₁ U A₂ U A₂ U ... | B) = P(A₁|B) + P(A₂|B) + P(A₃|B) + ...

Because of this, any theorems we've proved about probabilities also apply to conditional probabilities (when all are conditional on the same event). For example, P(A^c|B)=1-P(A|B).

The multiplication rule:

If P(A)>0, P(A intersect B)=P(A)P(B|A) and if P(B)>0, P(A intersect B)=P(B)P(A|B)
More generally, P(A₁ intersect A₂ intersect A₃ intersect ...) = P(A₁) P(A₂ | A₁) P(A₃ | A₁ instersect A₂) ...
These are immediate consequences of the definition of conditional probability (just substitute the definitions above and cancel factors).

The law of total probability

If B₁, B₂, B₃, ... are disjoint, and S = B₁ U B₂ U B₃ U ..., then the probability of any event A can be written as

P(A) = P(A intersect B₁) + P(A intersect B₂) + P(A intersect B₃) + ...

Applying the multiplication rule, one can also write this as

P(A) = P(A | B₁) P(B₁) + P(A | B₂) P(B₂) + P(A | B₃) P(B₃) + ...

This can be proved using the fact that the events (A intersect B_i) are disjoint, which means we can use one of the axioms of probability to show that the probability of the union of these events (which is A) is the sum of their individual probabilities.

Example: Suppose we roll a die and then flip a coin as many times as the number showing on the die. What is the probability of getting exactly one head? We can answer this using the law of total probability, with B₁, B₂, ..., B₆ being the events that the die shows 1, 2, ..., 6.

Independence of events

Definition:

Events A and B are said to be independent if P(A intersect B) = P(A) P(B)
If P(A)>0, this is equivalent to P(B)=P(B|A), and if P(B)>0, it is equivalent to P(A)=P(A|B).

NOTE: ``independent'' is not the same as ``disjoint'' or ``mutually exclusive'' - in fact events that are disjoint can't possibly be independent (unless one has probability zero).

Theorem: If A and B are independent, then A and B^c are also independent (and hence also A^c and B, and A^c and B^c).