Note that there was no lecture in week 9.

This is a family of distributions with one parameter, *b*. The
the range for an exponential distribution is the positive real
numbers. If *X* has the exp(*b*) distribution, then the
probability density function (PDF) for *X* is

f_{X}(x) =bexp(-bx), forx> 0

We can find the cumulative distribution function (CDF) for *X*
as follows:

We can use the CDF to find the probability thatF_{X}(x) = INTEGRAL(0 tox)bexp(-bt)dt

= [ -exp(-bt) ]_{0}^{x}

= (-exp(-bx)) - (-1)

= 1 - exp(-bx)

P(Whether the end-points are included in the interval doesn't matter, since the probability thatXin (a,b]) = P(Xin [a,b)) = P(Xin [a,b]) = P(Xin (a,b)) = F_{X}(b) - F_{X}(a)

One common use of the family of exponential distributons is when modelling how long you have to wait for some event - eg, the time until the next failure of a computer system. An exponential distribution is an appropriate model if the event has an equal probability of occuring in each tiny time interval, and whether it occurs in one such interval is independent of whether it occurs in another such interval.

We can derive this by considering the geometric distribution for
how many small intervals of time will pass until the event occurs.
Suppose that the length of such a small interval is *d*, and
assume that the probability of the event occurring in one such small
interval is *bd* for some constant *b*. (This obviously
can't be true when *d* is large, since it might give a
probability greater than one, but it can be approximately true for
small *d*.) The distribution of the number of intervals,
*N*, before the event occurs will then be geometric(*bd*).
Denoting the time that the event occurs (first) as the random variable
*X*, we see that the probability that the event occurs at or
before time *x* will equal the probability that *x*/*d*
intervals pass before the event occurs. So

P(In the limit asX<= x) = P(N<=x/d) = 1 - P(N>x/d) = 1 - (1 -bd)^{x/d}

E(X) = INTEGRAL(range of X)Example: Ifxf_{X}(x)dx

E(X) = INTEGRAL(0 to infinity)Some people parameterize exponential distributions in terms of this mean, so you have to be careful to check what someone means by an exp(2) distribution (it might be exp(1/2) according to the convention used here).xbexp(-bx)dx

= [ -(x+1/b) exp(-bx) ]_{0}^{infinity}

= 1/b

One can also show that the variance of an exponential(*b*)
distribution, E((X-1/*b*)^{2}), is 1/*b*^{2}.

Suppose that *Y* = *aX*, for some positive constant *a*.
What is the CDF for *Y*, the function F_{Y}(*y*),
in terms of the CDF for *X*, the function
F_{X}(*x*)?

F_{Y}(y) = P(Y<= y) = P(aX<= y) = P(X<= y/a) = F_{X}(y/a)

Similarly, what is the PDF for *Y*, the function
f_{Y}(*y*), in terms of the PDF for *X*, the
function f_{X}(*x*)? We can differentiate the
CDF to find that

f_{Y}(y) = F'_{Y}(y) = F'_{X}(y/a) /a= f_{X}(y/a) /a

We can use these results to see how we can get all the
exp(*b*) distributions by rescaling a random variable with the
exp(1) distribution. If *X* ~ exp(1), then *Y* =
*X*/*b* will have density function

fwhich is the PDF for the exp(_{Y}(y) = f_{X}(y/(1/b)) / (1/b) =bexp(-by)

More generally, suppose that *Y* = *g*(*X*) for some
monotonically increasing and differentiable function *g*, with
inverse *g*^{-1}. Then

Fand from this_{Y}(y) = P(Y<= y) = P(g(X) <= y) = P(X<=g^{-1}(y)) = F_{X}(g^{-1}(y))

fIf_{Y}(y) = F'_{Y}(y) = F'_{X}(g^{-1}(y)) / g'(g^{-1}(y)) = f_{X}(g^{-1}(y)) / g'(g^{-1}(y))

The family of "normal" or "Gaussian" distributions has parameters
*m* (which will turn out to be the mean) and *s* (which will
turn out to be the standard deviation). A distribution in this family
is usually denoted as N(*m*,*s*^{2}) - note that the
second parameter is usually the square of *s* (ie, the variance).

If *X* ~ N(*m*,*s*^{2}), its density
function (over the whole range of real numbers) is

fwhere_{X}(x) = [1/(Cs)] exp(-(1/2)((x-m)/s)^{2}))

One can get all these density functions as the densities for linear
transformations of a random variable *Z* with the "standard
normal" distribution, N(0,1), for which the PDF is

fThen_{Z}(x) = [1/(C)] exp(-z^{2}/2)

A normal distribution is appropriate for a quantity that is the sum of many small, mostly independent influences. It may not be appropriate for a quantity that is influenced by a single large factor - eg, heights of people might not be normally distributed because we know that whether someone is a man or woman has a big effect on height. But looking only at men (or only at women) perhaps height is approximately normal, since there are perhaps no other large influences. (Of course, height cannot be exactly normal, because the normal distribution has a range from -infinity to +infinity, and height can't be negative, but heights of men might nevertheless be close to normally distributed.)

SupposeIf the distribution ofX_{1},X_{2}, ... are independent, identically-distributed random variables, all with the distribution ofX, and E(X) exists, and Var(X) is finite. DefineS_{n}=X_{1}+X_{2}, ...,X_{n}andT_{n}= (S_{n}-nE(X))/sqrt(nVar(X)). ThenT_{n}approaches the N(0,1) distribution asngoes to infinity - that is, the cumulative distribution function forT_{n}approaches the cumulative distribution function for the N(0,1) distribution.

One application of the Central Limit Theorem is to approximating the
binomial distribution. If *X* ~ binomial(*n*,*p*),
then we can view *X* as a sum of *n* independent
Bernoulli(*p*) random variables, each with expectation *p*
and variance *p*(1-*p*). The CLT then says that
(*X*/*n*-*p*))/sqrt(*p*(1-*p*)/*n*) approaches
the N(0,1) distribution as *n* goes to infinity. Informally, we
might say that *X* approaches the
N(*np*,*np*(1-*p*)) distribution.

Some distributions are neither discrete, nor continuous, but rather a mixture of the two. Such distributions can still be described by their cumulative distribution funcition.

**Example:** A bus arrives regularly at a particular stop every
30 minutes, and waits at the stop for 5 minutes before leaving.
Suppose you arrive at the stop at a time that is uniformly distributed
over the day. What is the distribution of the time, *T*, you
have to wait until you can get on the bus?

The range of *T* is 0 to 25 minutes. The cumulative
distribution function for *T* is F_{T}(*t*) = 0 for
*t* < 0, F_{T}(0) = 1/6,
F_{T}(*t*) = (1/6)+(*t*/30) for 0 < *t*
< 25, and F_{T}(*t*) = 1 for *t* >= 25.

Another way that discrete and continuous random variables are combined is
when
we have a joint distribution for a discrete variable and a continuous variable.
Suppose *X* is discrete (say a range of { 1, 2, 3 }) and *Y* is
continuous (say a range of (0,1)). We need to specify joint probabilities
like

P(We can use the multiplication rule to write this asX=x,Yin (a,b))

P(The first factor above can be specified as usual with a probability mass function forX=x) P(Yin (a,b) |X=x)

One important use for having a discrete random variable together
with a continuous random variable is to specify a *mixture distribution*.

As an example, consider modelling the heights (in cm) of adults.
We know that height, *H*, depends a lot on sex, so we might introduce a
random variable *M* that is 0 for females and 1 for males. We
might assume that P(*M*=0) = P(*M*=1) = 1/2. We might also
model the heights of males as normal with mean 175 and standard
deviation 17, and the heights of females as normal with mean 155
and standard deviation 15. (Normal distributions can't be exactly right,
since they give a small positive probability to negative heights,
but they may be close enough to be useful.)

This mixture model might be better than assuming that the
distribution of *H* for all adults is normal. Even if we don't know
which people are male and which are female, we might model
height in this way. *M* would then be a "latent variable",
which is not observed, but which helps model the distribution of
what is observed.

We can also write the probability density function for a
mixture model without mentioning any latent variable. If *Y*
is modelled by a mixture of distributions with density functions
*f*_{0,Y} and *f*_{1,Y},
with probabilities *p*_{0} and *p*_{1}
= 1 - *p*_{0}, then the density function for *Y*
can be written as

There is a corresponding formula for the cumulative distribution function:f_{Y}(y) =p_{0}f_{0,Y}(y) +p_{1}f_{1,Y}(y)

F_{Y}(y) =p_{0}F_{0,Y}(y) +p_{1}F_{1,Y}(y)

Since E(*Y*|*X*) is a random variable, we can ask what
*it's* expectation is. It turns out to be as follows:

**Theorem:** E(*Y*) = E(E(*Y*|*X*)).

**Proof:**

E(As an example, consider the model for heights of adult males and females above. From our normal models for height given sex, we get that E(Y) = SUM(overy)yP(Y=y)

= SUM(overy)ySUM(overx) P(Y=y,X=x)

= SUM(overy)ySUM(overx) P(Y=y|X=x) P(X=x)

= SUM(overx) P(X=x) SUM(overy)yP(Y=y|X=x)

= SUM(overx) P(X=x) E(Y|X=x)

= E(E(Y|X))

E(H) = E(E(H|M)) = P(M=0)E(H|M=0) + P(M=1)E(H|M=1) = (1/2) 155 + (1/2) 175 = 165