STA 247 - Week 8 lecture summary

Uses of directed graphical models

A "mixture model" can be viewed as a directed graphical model in which one node is a discrete random variable that identifies which "kind" of item we have, and all the other variable have just this variable as their parent. So the other variables are independent given what kind of item we have. For example, if we model a patient as having one kind of disease, we might think that symptoms (eg, fever, vomiting, ...) are independent given the disease they have.

But a patient might have more than one disease! So we could generalize this to having one binary variable for each disease, saying whether the patient has it or not, and saying that symptoms are independent given the full list of what diseases a patient has. Each symptom node may have several disease nodes as parents. We need a model of how a symptom depends on which diseases a patient has (eg, they have the symptom if any of the diseases they have causes it).

A Markov model is a simple directed graphical model in which the nodes are ordered, with each node pointing to the immediately following node. In such a model, a variable is conditionally independent of variables before its parent given the value of its parent. If we view the order as time, the state at one time depends directly only on the state at the immediately previous time.

In a Hidden Markov Model (HMM), this Markov model is not directly observed. Instead we observe only variables that are linked to the states of the Markov model. Such models have been very successful in applications in speach recognition, genomics, and other fields.

The geometric distributions

So far, we have looked only at random variables with a finite range. Here, we'll look at a family of distributions in which the range is infinite, but still countable.

The geometric family of distributions can be visualized as the distribution for the number of tails before the first head, if you flip a coin repeatedly, with the coin having probability p of landing heads.

Specifically, if X has the geometric(p) distribution, its range will be { 0, 1, 2, ... }, and its probability mass function will be

P(X=x) = (1-p)x p
This is just the probability of x tails followed by one head.

Note: Sometimes the geometric(p) distribution is defined to be the total number of flips until the first head, including the final head, so its range will be { 1, 2, 3, ... }. You have to be careful to check which definition someone is using.

You can easily confirm that the sum of P(X=x) for all values of x in { 0, 1, 2, ... } is one, as it should be, if you remember that the sum of ai for i = 0, 1, 2, ... is 1/(1-a) when |a| is less than one.

One can also show that the expectation of a geometric(p) random variable is (1-p)/p and its variance is (1-p)/p2.

Continuous distributions

Now, we will consider random variables that take on a continuous range of values, such as all real numbers, or the real numbers between 0 and 1.

We can't specify such a distribution by giving a table for its probability mass function, not even an infinite table, since it's not possible to arrange all real numbers in some order. We'll look instead at two others ways of specifying such distributions - via a probability density function or via a cumulative distribution function.

Probability density functions

The probability density function (PDF) for X will be be written as fX(x). (This notation is sometimes used for the probability mass function too, as in Kerns' online book.) Once we have a probability density function, the probability that X lies in some interval (a,b) is defined to be

P(X in (a,b)) = INTEGRAL(a to b) fX(x) dx
When we define a distribution using a probability density function, the probability of any single value is defined to be zero, so the probability that X is in the open interval (a,b) is the same as the probability that it is in the closed interval [a,b].

We define the probability of events such as X in (a,b) OR X in (c,d) so that the axioms of probability are true - in this case, P(X in (a,b) OR X in (c,d)) will be P(X in (a,b)) + P(X in (c,d)) if a < b < c < d, so that (a,b) and (c,d) are disjoint.

For the axioms of probability to hold, we also require that fX(x) is never negative, and that the integral of fX(x) over the range of X be one.

Cumulative distribution functons

A different way of defining a continuous distribution is to specify its cumulative distribution function (CDF). (One can specify discrete distributions this way too, but for discrete distributions specifying the probability mass function is usually a more intuitive way.) We write the CDF for X as FX(x), and define it to be

FX(x) = P(X <= x)

We can use the CDF to define the probability that X is in the interval (a,b] as

P(X in (a,b]) = FX(b) - FX(a)
If the probability of any single value is zero, this will also be the probability that X is in (a,b) or in [a,b].

If we defined the distribution of X using a probability density function, we could derive the CDF as

FX(x) = INTEGRAL(-infinity to x) fX(t) dt
Conversely, if the CDF is differentiable, we will have
fX(x) = F'X(x)

For the probabilities obtained using the CDF to satisfy the axioms of probability theory, FX(x) must approach zero as x goes to -infinity, FX(x) must go to one as x goes to +infinity, and FX(x) must be less than or equal to FX(x+d) for any d > 0.

Uniform distributions on an interval

A simple family of continuous distributions is the family of distributions that are uniform over some interval of the real numbers. We say that X has the U(a,b) distribution (where a < b) if its probability density function is
fX(x) = 1/(b-a) if a < x < b; 0 otherwise
The corresponding cumulative distribution function is
FX(x) = 0 if x < a; 1 if x > b; (x-a)/(b-a) otherwise