We can organize the computation of P_{n} from the
last lecture in the form of matrix operations.

To start, we put the marginal distribution for the state at time
*t* into a row vector, π_{n}:

πHere, the possible values for_{n}= [ P_{n}(1), P_{n}(2), ..., P_{n}(K) ]

We can put the transition probabilities for a homogeneous Markov
chain into a *K*x*K* matrix, as follows:

P ^{(1)}(1 --> 1)P ^{(1)}(1 --> 2)... P ^{(1)}(1 -->K)P ^{(1)}(2 --> 1)P ^{(1)}(2 --> 2)... P ^{(1)}(2 -->K). T ^{(1)}=. . P ^{(1)}(K--> 1)P ^{(1)}(K--> 2)... P ^{(1)}(K-->K)

We can then find π_{n} from π_{n-1}
by a vector-matrix multiplicaton, as follows:

πThis is just a rewriting of the formula for P_{n}= π_{n-1}T^{(1)}

Substituting for π_{n-1}, etc. above, we get that

π_{n}= π_{0}[ T^{(1)}]^{n}

We define T^{(n)} = [ T^{(1)} ]^{n}.
It might seem that computing T^{(n)} takes *n*-1
matrix multiplications, but actually many fewer are needed, since we
can get
T^{(2)} as T^{(1)}T^{(1)}, then
T^{(4)} as T^{(2)}T^{(2)}, then
T^{(8)} as T^{(4)}T^{(4)}, etc.
(Of course, this is faster than the obvious way only if you want
just T^{(n)} for some large *n*, but don't need
T^{(2)}, ..., T^{(n-1)}.)

and with initial state distribution given by π

0.99 0.01 T ^{(1)}=0.03 0.97

and the 128-step transition matrix:

0.9623 0.0377 T ^{(4)}=0.1130 0.8870

and the 256-step transition matrix:

0.7513 0.2487 T ^{(128)}=0.7460 0.2540

The state distribution after 256 transitions is π

0.7500072 0.2499928 T ^{(256)}=0.7499783 0.2500217

We see that the *n*-step transition matrix, and the state distribution
at time *n*, are reaching limits. The limit of the state distribution
is the "equlibrium" or "steady-state" distribution. The limit of the
*n*-step transition matrix has all rows equal to this equilibrium
distribution.

We can find this equilibrium distribution explicitly by solving the
equation πT^{(1)}=π for the equilibrium distribution
π, subject to the constraint that the elements of π must be
non-negative and sum to one. In this case we find that the equilibrium
distribution is π=[ 3/4 1/4 ].

Not all Markov chains have a unique equilibrium distribution, like the one in this example, but many do. The equilibrium distribution is important when using Markov chains to model things like the length of a queue, where it represents how likely the queue is to be of various lengths after the system has "settled down", with the initial state distribution no longer mattering. If the system settles down quickly, this may be what is of most interest.

Recall that the Central Limit Theorem implies that as *n*
increases, with *p* fixed, the binomial(*n*,*p*)
distribution approaches the N(*np*,*np*(1-*p*)) distribution.
A different limit is also of interest, as *n* increases with
*np* being fixed, say at λ. This limiting distribution
is called the Poisson distribution with parameter λ.

The range of a Poisson distribution is the non-negative integers. Unlike a binomial distibution, there is no upper limit.

**Example:**Historically, the first application of the Poisson
distribution is said to have been as a model of the distribution of the
number of Prussian soldiers kicked to death by horses in a year. The
number of soldiers, *n*, is large, but each has only a small
probability, *p*, of being kicked to death by a horse. So the
expected number who are kicked to death is moderate. Furthermore, it
may be reasonable to assume that that whether one soldier is kicked to
death is independent of whether another is kicked to death. If so,
the distribution of the number kicked to death in a year will be
binomial(*n*,*p*). But since *n* is large, we can
approximate this by the Poisson(λ) distribution, with
λ=*np*. After we do this, we don't necessarily need to
know *n* and *p*, just their product, λ.

We'll now derive the probability mass function for a random
variable *X* that has the Poisson(λ) distribution:

P(AsX=x) = lim(n->inf) (n!/x!(n-x)!) (λ/n)^{x}(1-λ/n)^{n-x}

= lim(n->inf) (n(n-1)...(n-x+1)) /n^{x}) (&lambda^{x}/x!) [(1-1/(n/λ))^{n/λ}]^{λ}(1-λ/n)^{-x}

P(X=x) = (&lambda^{x}/x!) exp(-&lambda)

One can show that if *X* has the Poisson(λ) distribution,
then E(*X*)=λ and Var(*X*)=λ. Also, as λ
goes to infinity, the Poisson(&lambda) distribution approaches the
N(λ,λ) distribution.

One use of Poisson distributions is in modelling how many times a device fails over a period of time. (We assume that the device recovers quickly after failure (eg, after rebooting), so it can fail again and again.) If we assume the device has a small probability of failing in any small time interval, independently from one interval to another, then the number of failures in an interval will have a Poisson distribution with parameter (the mean) that is proportional to the duration of the interval.

In detail, if for some small δ the probability of failure in
the time interval (*a*,*a*+δ) is ε, then the
distribution of the number of failures in a larger interval of
duration *D* (assumed to be a multiple of δ) will be
binomial(*D*/δ,&epsilon), which when *D*/δ is
large will be approximately Poisson(*D*ε/δ).
We can define ρ=ε/δ to be the "failure rate" (eg,
3.2 failures per day), in which case the mean of the Poisson distribution
for the number of failures in an interval of duration *D* is *D*ρ.

Notice that if *a*<*b*<*c*, the number of
failures in the interval (*a*,*c*) will have the
Poisson((*c*-*a*)ρ) distribution, but it will also
be the sum of the number of failures in the interval (*a*,*b*)
and the number of failures in the interval (*b*,*c*), which
are independent, with Poisson((*b*-*a*)ρ) and
Poisson((*c*-*b*)ρ) distributions. This is an instance
of a general theorem:

**Theorem:** If *X* and *Y* are independent random
variables, with *X*~Poisson(λ_{X}) and
*Y*~Poisson(λ_{Y}), then
*Z*=*X*+*Y* will have the
Poisson(λ_{X}+λ_{Y}) distribution.