If we assume that outcomes are equally likely, we can find the probability of an event by just counting the number of outcomes in it, and dividing by the total number of outcomes in the sample space. So counting is a fundamental task for probability theory.

**The Multiplication Principle**:

If an experiment has two parts, and there arenways for the first part of the experiment to happen, and_{1}nways for the second part to happen, then there are_{2}nways for the whole experiment to happen. This generalizes in the obvious way to more than two parts._{1}n_{2}

**Examples:**

We flip a coin and roll a six-sided die. The coin can land in two ways (heads or tails). The die can land in six ways (showing 1, 2, 3, 4, 5, or 6). The total number of possible outcomes in the sample space is therefore 2x6=12. If we assume that the outcomes are equally likely, the probability for any single outcome, such as the coin landing heads and the die showing 4, is therefore 1/12.We flip a coin six times and roll a six-sided die. Each flip of the coin can land in two ways (heads or tails). The die can land in six ways (showing 1, 2, 3, 4, 5, or 6). The total number of possible outcomes in the sample space is therefore 2x2x2x2x2x2x6 = 2

^{6}x6 = 384. Let A be the event that exactly one flip lands heads and the die shows 1. The number of outcoes in A is 6, since there are six possibilities for which flip is a head and only one way for the die to show 1. Hence, if we assume that the outcomes are equally likely, P(A)=6/384.

Drawing balls from an urn is a standard example in probability, which represents many actually applications, or parts of applications.

Suppose that we have an urn containing *n* distinguishable
balls. We draw a ball from the urn *k* times. We can do this
in two ways: we might replace the ball drawn each time before
drawing the next ball, or we might not replace the ball (in which
case *k* cannot be bigger than *n*). We may also consider
the order of balls drawn to matter, or we may consider the draws to
be unordered.

We'll count the number of possible outcomes for each of these possible urn drawing scenarios.

**Drawing with replacement, ordered result:**

Since we replace the balls drawn, each draw can pick any of the

nballs. Since we drawnktimes, the multiplication principle says that the number of possible outcomes isnmultiplied by itsefktimes, which isn.^{k}

Example:We draw with replacement two times from an urn containing three balls - red, green, and blue. There are 3^{2}=9 possible outcomes: RR, RG, RB, GR, GG, GB, BR, BG, BB.

**Drawing without replacement, ordered result:**

When we don't replace the balls, the number of possible ball choices goes down by one after each draw. The multiplication principle then says that the total number of possible outcomes withkdraws from an urn withnballs isn(n-1)(n-2)...(n-k+1) =n!/(n-k)!.

Example:We draw without replacement two times from an urn containing three balls - red, green, and blue. There are 3x2=6 possible ordered outcomes: RG, RB, GR, GB, BR, BG.If

k=n, we are drawing some permutation of the balls. From the formula above, the number of possible permutations is n! (remembering that 0!=1).

**Drawing without replacement, unordered result:**

If we drawkballs without replacement, anddon'tlook at the order of the balls drawn, the number of possible results of the experiment decreases, compared to the result above when we do look at the order. There arek! ways of orderingkballs (the number of permutations ofkitems), so the result with ordering over-counts by this factor. So, dividing the number of outcomes with ordering by this factor, we get that the number of ways of drawingkballs without replacement ignoring order from an urn withnballs isn!/((n-k)!k!). This is called ``nchoosek'', and is also written asnoverkin big parentheses, or as C(n,k). In R, it can be computed as choose(n,k). Note that C(n,k) = C(n,n-k).

Example:We draw without replacement two times from an urn containing three balls - red, green, and blue. There are 3x2/1x2=3 possible unordered outcomes: {R,G}, {R,B}, {G,B}.

**Drawing with replacement, unordered result:**

We can use the result above to find how many possible outcomes there are when drawing

kballs with replacement, when we don't look at the order the balls are drawn in. Since we don't look at the order, all that matters is how many times each of thenballs in the urn are drawn. We can represent a count as a sequence of Os, with the number of Os being equal to the count. We can represent the counts of how many times each of thenballs were drawn by putting together the sequences of Os representing the counts for each ball, separating them with Xs. (We choose some order for thenballs; it doesn't matter which order.)For example, if

n=3, with the balls labelled red, green, and blue, andk=6, one possible outcome is 2 red, 1 green, and 3 blue. Ordering the balls as red, green, blue, these counts can be represented by the sequence OOXOXOOO.Every set of counts will correspond to a sequence of

k+n-1 Xs and Os, in which the number of Xs is exactlyn-1, and every such sequence will correspond to a set of counts. The correspondence is one-to-one, so we can count the number of outcomes of the experiment by counting how many sequences there are of lengthk+n-1 withn-1 of the positions being occupied by Xs.The number of ways of putting

n-1 Xs down in a sequence of lengthk+n-1 is the same as the number of ways of choosingn-1 balls without replacement from an urn withk+n-1, ignoring order. We figured that out above - it isk+n-1 choosen-1. This is the same as the number of ways of choosing places for thekOs out of thek+n-1 positions, which isk+n-1 choosek.

Example:We draw with replacement two times from an urn containing three balls - red, green, and blue. There are C(2+3-1,2)=6 possible outcomes if we ignore order: {R,R}, {R,G}, {R,B}, {G,G}, {G,B}, {B,B}.

Example:A computer has 6 processors. It is regularly used to run jobs of 4 kinds. It always runs 6 jobs at a time, so that all the processors will be used, but there won't be any processor contention between jobs. The performance of the computer may depend on what kinds of jobs it is running (eg, it may go slowly if two jobs that both access the disk a lot are running). We're therefore interested in how many possible job mixes there are, since we may need to evaluate performance for each job mix.We can treat this as a problem where we draw

k=6 balls (jobs) with replacement from an urn withn=4 balls (kinds of jobs), and we care only about the numbers of jobs of each kind (there's no order to jobs). The answer is therefore C(6+4-1,4-1)=C(9,3)=84 possible job mixes.

A famous probability problem is to find how likely it is that, at a
party with *n* people, at least two people have the same
birthday. I'll talk about the equivalent problem of finding the
probability that hashing *n* keys to 32-bit hash codes will
produce two or more keys with the same hash code.

The hash function takes a string of characters and outputs a 32-bit hash code. For a good hash function, we can model this as a hash code being selected randomly for each key, with all possible combinations of hash codes for keys being equally likely.

Let *A* be the event that two or more of the *n* keys
have the same hash code. *A ^{c}* is the event that
there is no such collision. We'll find P(

Since we're assuming equally-likely outcomes, P(*A ^{c}*)
= #(

which is the same as the number of ways of drawingA= 2^{c}^{32}(2^{32}-1)(2^{32}-2)...(2^{32}-n+1)

By numerical calculation, I found that P(*A*) reaches 1% when
*n* is 9292. So even with over 4 billion hash codes (2^{32}),
you can't expect to avoid collisions even with a fairly small number
of keys.

We may find it easiest to first set up a probability model (eg, with
equally likely outcomes) that assumes we know nothing but the basic
setup, and then look at *conditional probabilities* that account
for what else we know.

**Definition:**

If eventAhas non-zero probability, the conditional probability of eventBgiven eventA, written P(B|A), is defined to be P(AintersectB)/P(A).

The intended interpretation is that P(*B*|*A*) is how
likely *B* is to occur (or has occurred) if we know that *A*
has occurred (and don't know anything else relevant).

**Example:**

We flip a coin three times. What is the probability that the first flip is a head (event B), given that two of the three flips are heads (event A)?

The sample space is as follows, with the subsets for the events marked:

HHH HHT HTH HTT THH THT TTH TTT A x x x B x x x x A intersect B x xAssuming equally likely outcomes, we see that P(AintersectB)=2/8 and P(A)=3/8, so P(B|A)=(2/8)/(3/8)=2/3.

Conditional probabilities for events all conditional on the same
event, *B*, obey the axioms of probability. That is,

- For any event
*A*, P(*A*|*B*) >= 0. - If
*S*is the sample space, P(*S*|*B*) = 1. - For any disjoint events
*A*,_{1}*A*,_{2}*A*, ..._{3}P(

*A*U_{1}*A*U_{2}*A*U ... |_{2}*B*) = P(*A*|_{1}*B*) + P(*A*|_{2}*B*) + P(*A*|_{3}*B*) + ...

**The multiplication rule:**

If P(A)>0, P(AintersectB)=P(A)P(B|A) and if P(B)>0, P(AintersectB)=P(B)P(A|B)More generally, P(

Aintersect_{1}Aintersect_{2}Aintersect ...) = P(_{3}A) P(_{1}A|_{2}A) P(_{1}A|_{3}Ainstersect_{1}A) ..._{2}These are immediate consequences of the definition of conditional probability (just substitute the definitions above and cancel factors).

If *B _{1}*,

P(Applying the multiplication rule, one can also write this asA) = P(AintersectB) + P(_{1}AintersectB) + P(_{2}AintersectB) + ..._{3}

P(This can be proved using the fact that the events (A) = P(A|B) P(_{1}B) + P(_{1}A|B) P(_{2}B) + P(_{2}A|B) P(_{3}B) + ..._{3}

**Example:** Suppose we roll a die and then flip a coin
as many times as the number showing on the die. What is the
probability of getting exactly one head? We can answer this
using the law of total probability, with *B _{1}*,

**Definition:**

Events

AandBare said to beindependentif P(AintersectB) = P(A) P(B)If P(

A)>0, this is equivalent to P(B)=P(B|A), and if P(B)>0, it is equivalent to P(A)=P(A|B).

NOTE: ``independent'' is **not** the same as ``disjoint'' or ``mutually
exclusive'' - in fact events that are disjoint can't possibly be
independent (unless one has probability zero).

**Theorem:** If *A* and *B* are independent, then
*A* and *B ^{c}* are also independent (and hence also