A "mixture model" can be viewed as a directed graphical model in which
one node is a discrete random variable that identifies which "kind"
of item we have, and all the other variable have just this variable as
their parent. So the other variables are independent given what kind
of item we have. For example, if we model a patient as having one
kind of disease, we might think that symptoms (eg, fever, vomiting, ...)
are independent *given* the disease they have.

But a patient might have more than one disease! So we could
generalize this to having one binary variable for each disease, saying
whether the patient has it or not, and saying that symptoms are independent
given the full list of what diseases a patient has. Each symptom
node may have several disease nodes as parents. We need a model
of how a symptom depends on which diseases a patient has (eg, they
have the symptom if *any* of the diseases they have causes it).

A Markov model is a simple directed graphical model in which the nodes are ordered, with each node pointing to the immediately following node. In such a model, a variable is conditionally independent of variables before its parent given the value of its parent. If we view the order as time, the state at one time depends directly only on the state at the immediately previous time.

In a Hidden Markov Model (HMM), this Markov model is not directly observed. Instead we observe only variables that are linked to the states of the Markov model. Such models have been very successful in applications in speach recognition, genomics, and other fields.

So far, we have looked only at random variables with a finite range. Here, we'll look at a family of distributions in which the range is infinite, but still countable.

The geometric family of distributions can be visualized as the
distribution for the number of tails before the first head, if you
flip a coin repeatedly, with the coin having probability *p*
of landing heads.

Specifically, if *X* has the geometric(*p*) distribution,
its range will be { 0, 1, 2, ... }, and its probability mass function
will be

P(This is just the probability ofX=x) = (1-p)^{x}p

Note: Sometimes the geometric(*p*) distribution is defined to
be the total number of flips until the first head, including the final
head, so its range will be { 1, 2, 3, ... }. You have to be careful
to check which definition someone is using.

You can easily confirm that the sum of P(*X*=*x*) for all
values of *x* in { 0, 1, 2, ... } is one, as it should be,
if you remember that the sum of *a*^{i} for
*i* = 0, 1, 2, ... is 1/(1-*a*) when |*a*| is less than one.

One can also show that the expectation of a geometric(*p*)
random variable is (1-*p*)/*p* and its variance is
(1-*p*)/*p*^{2}.

Now, we will consider random variables that take on a
*continuous* range of values, such as all real numbers, or the
real numbers between 0 and 1.

We can't specify such a distribution by giving a table for its
probability mass function, not even an infinite table, since it's not
possible to arrange all real numbers in some order. We'll look instead
at two others ways of specifying such distributions - via a *probability
density function* or via a *cumulative distribution function*.

The probability density function (PDF) for *X* will be be written
as *f*_{X}(*x*). (This notation is sometimes used
for the probability mass function too, as in Kerns' online book.) Once
we have a probability density function, the probability that *X*
lies in some interval (*a*,*b*) is defined to be

P(When we define a distribution using a probability density function, the probability of any single value is defined to be zero, so the probability thatXin (a,b)) = INTEGRAL(atob)f_{X}(x)dx

We define the probability of events such as *X* in
(*a*,*b*) OR *X* in (*c*,*d*) so that the
axioms of probability are true - in this case, P(*X* in
(*a*,*b*) OR *X* in (*c*,*d*)) will be
P(*X* in (*a*,*b*)) + P(*X* in
(*c*,*d*)) if *a* < *b* < *c* < *d*, so
that (*a*,*b*) and (*c*,*d*) are disjoint.

For the axioms of probability to hold, we also require that
*f*_{X}(*x*) is never negative, and that the
integral of *f*_{X}(*x*) over the range of
*X* be one.

A different way of defining a continuous distribution is to specify
its cumulative distribution function (CDF). (One can specify discrete
distributions this way too, but for discrete distributions specifying
the probability mass function is usually a more intuitive way.) We
write the CDF for *X* as *F*_{X}(*x*), and
define it to be

F_{X}(x) = P(X<= x)

We can use the CDF to define the probability that *X* is in
the interval (*a*,*b*] as

P(If the probability of any single value is zero, this will also be the probability thatXin (a,b]) =F_{X}(b) -F_{X}(a)

If we defined the distribution of *X* using a probability
density function, we could derive the CDF as

Conversely, if the CDF is differentiable, we will haveF_{X}(x) = INTEGRAL(-infinity to x)f_{X}(t)dt

f_{X}(x) =F'_{X}(x)

For the probabilities obtained using the CDF to satisfy the axioms
of probability theory, *F*_{X}(*x*) must
approach zero as *x* goes to -infinity,
*F*_{X}(*x*) must go to one as *x* goes
to +infinity, and *F*_{X}(*x*) must be less
than or equal to *F*_{X}(*x*+*d*) for
any *d* > 0.

The corresponding cumulative distribution function isf_{X}(x) = 1/(b-a) ifa<x<b; 0 otherwise

F_{X}(x) = 0 ifx<a; 1 ifx>b; (x-a)/(b-a) otherwise