STA 247 - Week 12 lecture summary

Markov chain computations using matrices

We can organize the computation of Pn from the last lecture in the form of matrix operations.

To start, we put the marginal distribution for the state at time t into a row vector, πn:

πn = [ Pn(1), Pn(2), ..., Pn(K) ]
Here, the possible values for Xn are assumed to be 1, 2, ..., K. The initial state distribution can now be written as the vector π0.

We can put the transition probabilities for a homogeneous Markov chain into a KxK matrix, as follows:

P(1)(1 --> 1) P(1)(1 --> 2) ... P(1)(1 --> K)
P(1)(2 --> 1) P(1)(2 --> 2) ... P(1)(2 --> K)
.
T(1)   =     .
.
P(1)(K --> 1) P(1)(K --> 2) ... P(1)(K --> K)

We can then find πn from πn-1 by a vector-matrix multiplicaton, as follows:

πn = πn-1 T(1)
This is just a rewriting of the formula for Pn from the previous lecture.

Substituting for πn-1, etc. above, we get that

πn = π0 [ T(1) ]n

We define T(n) = [ T(1) ]n. It might seem that computing T(n) takes n-1 matrix multiplications, but actually many fewer are needed, since we can get T(2) as T(1)T(1), then T(4) as T(2)T(2), then T(8) as T(4)T(4), etc. (Of course, this is faster than the obvious way only if you want just T(n) for some large n, but don't need T(2), ..., T(n-1).)

Equilibrium distribution for a Markov chain

Suppose that we have a Markov chain with two states, with transition probability matrix
0.99 0.01
T(1)   =    
0.03 0.97
and with initial state distribution given by π0=[ 0.8 0.2 ]. Here is the four-step transition matrix:
0.9623 0.0377
T(4)   =    
0.1130 0.8870
and the 128-step transition matrix:
0.7513 0.2487
T(128)   =    
0.7460 0.2540
and the 256-step transition matrix:
0.7500072 0.2499928
T(256)   =    
0.7499783 0.2500217
The state distribution after 256 transitions is π0T(256) = [ 0.7500014 0.2499986 ].

We see that the n-step transition matrix, and the state distribution at time n, are reaching limits. The limit of the state distribution is the "equlibrium" or "steady-state" distribution. The limit of the n-step transition matrix has all rows equal to this equilibrium distribution.

We can find this equilibrium distribution explicitly by solving the equation πT(1)=π for the equilibrium distribution π, subject to the constraint that the elements of π must be non-negative and sum to one. In this case we find that the equilibrium distribution is π=[ 3/4 1/4 ].

Not all Markov chains have a unique equilibrium distribution, like the one in this example, but many do. The equilibrium distribution is important when using Markov chains to model things like the length of a queue, where it represents how likely the queue is to be of various lengths after the system has "settled down", with the initial state distribution no longer mattering. If the system settles down quickly, this may be what is of most interest.

Poisson distributions

Recall that the Central Limit Theorem implies that as n increases, with p fixed, the binomial(n,p) distribution approaches the N(np,np(1-p)) distribution. A different limit is also of interest, as n increases with np being fixed, say at λ. This limiting distribution is called the Poisson distribution with parameter λ.

The range of a Poisson distribution is the non-negative integers. Unlike a binomial distibution, there is no upper limit.

Example:Historically, the first application of the Poisson distribution is said to have been as a model of the distribution of the number of Prussian soldiers kicked to death by horses in a year. The number of soldiers, n, is large, but each has only a small probability, p, of being kicked to death by a horse. So the expected number who are kicked to death is moderate. Furthermore, it may be reasonable to assume that that whether one soldier is kicked to death is independent of whether another is kicked to death. If so, the distribution of the number kicked to death in a year will be binomial(n,p). But since n is large, we can approximate this by the Poisson(λ) distribution, with λ=np. After we do this, we don't necessarily need to know n and p, just their product, λ.

We'll now derive the probability mass function for a random variable X that has the Poisson(λ) distribution:

P(X=x) = lim(n->inf) (n!/x!(n-x)!) (λ/n)x (1-λ/n)n-x
     = lim(n->inf) (n(n-1)...(n-x+1)) /nx) (&lambdax/x!) [(1-1/(n/λ)) n]λ (1-λ/n)-x
As n goes to infinity, the limit of n(n-1)...(n-x+1))/nx is one, the limit of (1-λ/n)-x is one, and the limit of (1-1/(n/λ)) n is 1/e. Accordingly, the probability mass function for the Poisson(λ) distribution is
P(X=x) = (&lambdax/x!) exp(-&lambda)

One can show that if X has the Poisson(λ) distribution, then E(X)=λ and Var(X)=λ. Also, as λ goes to infinity, the Poisson(&lambda) distribution approaches the N(λ,λ) distribution.

Distribution of the number of failures over a time period

One use of Poisson distributions is in modelling how many times a device fails over a period of time. (We assume that the device recovers quickly after failure (eg, after rebooting), so it can fail again and again.) If we assume the device has a small probability of failing in any small time interval, independently from one interval to another, then the number of failures in an interval will have a Poisson distribution with parameter (the mean) that is proportional to the duration of the interval.

In detail, if for some small δ the probability of failure in the time interval (a,a+δ) is ε, then the distribution of the number of failures in a larger interval of duration D (assumed to be a multiple of δ) will be binomial(D/δ,&epsilon), which when D/δ is large will be approximately Poisson(Dε/δ). We can define ρ=ε/δ to be the "failure rate" (eg, 3.2 failures per day), in which case the mean of the Poisson distribution for the number of failures in an interval of duration D is Dρ.

Notice that if a<b<c, the number of failures in the interval (a,c) will have the Poisson((c-a)ρ) distribution, but it will also be the sum of the number of failures in the interval (a,b) and the number of failures in the interval (b,c), which are independent, with Poisson((b-a)ρ) and Poisson((c-b)ρ) distributions. This is an instance of a general theorem:

Theorem: If X and Y are independent random variables, with X~Poisson(λX) and Y~Poisson(λY), then Z=X+Y will have the Poisson(λXY) distribution.