Consider last week's mixture model for the height of an adult,
*H*, in which we specified *H*|*M*=0 as
N(155,15^{2}) and *H*|*M*=1 as
N(175,17^{2}), where *M*=1 indicates male and *M*=0
indicates female, with P(*M*=1)=P(*M*=0)=1/2. Suppose we
measure that some adult's height is 180cm. How likely is it that they
are male?

We can answer this by finding the conditional probability that *M*=1
given the measured height, using Bayes' Rule. But we have a problem.
P(*M*=1|*H*=180) is a conditional probability in which the
event that we condition on, *H*=180, has zero probability. We
had previously said that such a conditional probability is undefined,
since its definition involves a division by zero.

However, our measurement doesn't actually tell us that *H*=180.
It has some finite precision, and so tells us something like that
*H* is in the interval (179.9,180.1). So what we actually need
to find is

P(which is well defined. And we could actually compute it, using integrals over the normal probability density function.M=1 |Hin (179.9,180.1)) = P(M=1) P(Hin (179.9,180.1)) |M=1) / P(Hin (179.9,180.1))

However, when the precision of a measurement is high compared to the standard deviation of the quantity measured, we will find that these integrals are over intervals where the probability density is almost constant. So the integral is approximately equal to the probability density at the centre of the interval times the width of the integral. For example,

P(whereHin (179.9,180.1) |M=1) is approximately 0.2f_{1,H}(180)

When we substitute this into Bayes' Rule, we find that the width
of the interval (0.2 in this example) cancels out. We then have
something that looks just like Bayes' Rule, but with probability
densities for *H* instead of probabilities.

We can often get away with this trick, treating probability densities almost like probabilities. Note, however, that there certainly are differences - for example, probabilities can't be greater than one, but probability densities can be greater than one.

In this model,X_{0}--->X_{1}--->X_{2}--->X_{3}---> ...

**Example applications:** Markov models arise in many application
areas. Some examples:

*X*_{t}could be the length of a queue (eg, of processes waiting to run on a processor) at time*t*.*X*_{t}could be the language (0=English, 1=French) of the*t*th word in some document, or document collection.*X*_{t}could be 1 or 0 according to whether or not the*t*th bit in a transmission was corrupted by noise.*X*_{t}could be the type of the*t*th web site visited by a user (eg, 1=news, 2=shopping, 3=photo, etc.).

Let's suppose that the random variables making up a Markov chain
have some finite range, such as { 1, 2, ..., K }. To specify the
joint distribution of all the *X*_{t}, we will
need to specify

TheIf the transition probabilities are the same for allinitial probabilitiesfor the state - in other words, P(X_{0}=x) for allxin the range ofX_{0}.The

transition probabilitiesfor moving from a state at timetto a state at timet+1 - in other words, P(X_{t+1}=x'|X_{t+1}=x) for allxandx'.

For a homogeneous Markov chain, we will write

PNote that specifying a Markov chain with with_{0}(x) for P(X_{0}=x).P

^{(1)}(x-->x') for P(X_{t+1}=x'|X_{t+1}=x)

Suppose we want to find P_{n}(*x*) =
P(*X*_{n} = *x*).

We know P_{0}(x). We can find P_{1}(*x*) as follows:

P_{1}(x_{1}) = P(X_{1}=x_{1})

= SUM(overx_{0}) P(X_{1}=x_{1},X_{0}=x_{0})

= SUM(overx_{0}) P(X_{1}=x_{1}|X_{0}=x_{0}) P(X_{0}=x_{0})

= SUM(overx_{0}) P^{(1)}(x_{0}-->x_{1}) P_{0}(x_{0})

Similarly, we could find P_{4}(x) as

PBut the summation here is over_{4}(x_{4}) = P(X_{4}=x_{4})

= SUM(overx_{0},x_{1},x_{2},x_{3}) P_{0}(x_{0}) P^{(1)}(x_{0}-->x_{1}) P^{(1)}(x_{1}-->x_{2}) P^{(1)}(x_{2}-->x_{3}) P^{(1)}(x_{3}-->x_{4})

Fortunately, we can instead proceed sequentially, computing
P_{1}, P_{2}, P_{3}, ... in turn (of course,
we know P_{0} before we start). At each stage, we build a
table of values for P_{n}, which we can use when
computing the next table. To compute P_{n} when
we already have a table of values for P_{n-1}, we
just need to write it as follows:

PNote how the Markov property is crucial in simplifying P(_{n}(x_{n}) = SUM(overx_{0}, ...,x_{n-1}) P(X_{n}=x_{n}|X_{0}=x_{0}, ...,X_{n-1}=x_{n-1}) P(X_{0}=x_{0}, ...,X_{n-1}=x_{n-1})

= SUM(overx_{0}, ...,x_{n-1}) P(X_{n}=x_{n}|X_{n-1}=x_{n-1}) P(X_{0}=x_{0}, ...,X_{n-1}=x_{n-1})

= SUM(overx_{n-1}) P(X_{n}=x_{n}|X_{n-1}=x_{n-1}) SUM(overx_{0}, ...,x_{n-2}) P(X_{0}=x_{0}, ...,X_{n-1}=x_{n-1})

= SUM(overx_{n-1}) P(X_{n}=x_{n}|X_{n-1}=x_{n-1}) P(X_{n-1}=x_{n-1})

= SUM(overx_{n-1}) P^{(1)}(x_{n-1}-->x_{n})) P_{n-1}(x_{n-1})