STA 247 - Week 6 lecture summary

Simulation in R

We can estimate probabilities and expectations by simulation, using pseudo-random numbers. Below is an R function for estimating the probabilities for a binomial(n,p) distribution based on N values simulated (independently) from this distribution:

    sim.binom <- function (N, n, p)
    {
      pr <- rep(0,n+1)
      xsq <- 0
      from2to4 <- 0
    
      for (i in 1:N)
      { x <- rbinom(1,n,p)
        pr[x+1] <- pr[x+1] + 1
        xsq <- xsq + x^2
        if (x>=2 && x<=4)
          from2to4 <- from2to4 + 1
      }
    
      list (pr = pr/N, xsq = xsq/N, from2to4 = from2to4/N)
    }
Note that indexes for vectors start at 1 in R, so to use a vector to hold the counts of how often we have seen the values from 0 to n, we need to add one to the value before using it as an index.

Here are four runs of this functions, for two random seeds, and with first 1000 and then 100000 simulated values:

> set.seed(1); sim.binom(1000,10,0.3)
$pr
 [1] 0.022 0.127 0.236 0.265 0.197 0.101 0.037 0.013 0.001 0.001 0.000

$xsq
[1] 11.247

$from2to4
[1] 0.698

> set.seed(2); sim.binom(1000,10,0.3)
$pr
 [1] 0.039 0.119 0.231 0.258 0.190 0.108 0.047 0.007 0.001 0.000 0.000

$xsq
[1] 11.204

$from2to4
[1] 0.679

> set.seed(1); sim.binom(100000,10,0.3)
$pr
 [1] 0.02901 0.12272 0.23219 0.26567 0.19948 0.10327 0.03671 0.00954 0.00131
[10] 0.00010 0.00000

$xsq
[1] 11.0969

$from2to4
[1] 0.69734

> set.seed(2); sim.binom(100000,10,0.3)
$pr
 [1] 0.02793 0.12218 0.23177 0.26475 0.20162 0.10369 0.03713 0.00931 0.00139
[10] 0.00023 0.00000

$xsq
[1] 11.15064

$from2to4
[1] 0.69814
Notice how the results differ a bit with the two different settings of the seed, but that the difference is less when the number of simulated value is larger.

For this example, we can compare the simulaton estimates with the exact values, found below:

> round(dbinom(0:10,10,0.3),5)
 [1] 0.02825 0.12106 0.23347 0.26683 0.20012 0.10292 0.03676 0.00900 0.00145
[10] 0.00014 0.00001
> sum(dbinom(0:10,10,0.3)*(0:10)^2)
[1] 11.1
> sum(dbinom(2:4,10,0.3))
[1] 0.7004233
> pbinom(4,10,0.3) - pbinom(1,10,0.3)
[1] 0.7004233
But we can use simulation for much more complex problems, where computing the exact values would be infeasible.

Variance

Recall that we defined the expectation of a random variable X as

E(X) = SUM(over x) x P(X=x)
The expectation is the "average" value for the random variable.

We may also be interested in how much the value varies around this average. One way to measure this variation is by the variance, defined as

Var(X) = E[ (X - E(X))2 ]
We also define the standard deviation of X, written as SD(X), to be the square root of the variance.

Note that for random variables with an infinite number of possible values, it is possible for the expectation to be undefined, in which case the variance will also be undefined, and even if the expectation is defined and finite, the variance might be infinite.

Example: We roll a six-sided die, and let X be the number it shows:

E(X) = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5

Var(X) = (-2.5)2(1/6) + (-1.5)2(1/6) + (-0.5)2(1/6) + (0.5)2(1/6) + (1.5)2(1/6) + (2.5)2(1/6) = 2.91666...

SD(X) = 1.70785

Markov's inequality

Theorem: If X is a non-negative random variable, (that is, P(X >= 0) = 1), then for any a>0,

P(X >= a) <= E(X) / a
or equivalently,
E(X) >= a P(X >= a)

Proof:

E(X) = SUM(x in range of X) x P(X=x)
    >= SUM(x in range of X with x >= a) x P(X=x)
    >= SUM(x in range of X with x >= a) a P(X=x)
      = a SUM(x in range of X with x >= a) P(X=x)
      = a P(X >= a)

Chebyshev's inequality

Theorem: If s2 = Var(X) and m = E(X) then for any k>0,

P(|X-m| >= ks) <= 1 / k2
or equivalently, for any a>0,
P((X-m)2 >= a) <= s2 / a
(Just let a=k2s2 in the first form above.)

Proof: The second form above follows directly from Markov's inequality, since (X-m)2 is non-negative, and its expectation is s2.

Example: For the roll of one die used as an example above, Chebyshev's inequality says that

P(|X-3.5| > 2) = P(|X-3.5| > (2/1.70785)x1.70785) <= 1 / (2/1.70785)2 = 0.7291879
Actually, P(|X-3.5| > 2) = 1/3, so we see that the inequality is obeyed, but that it isn't very tight in this case.

Some properties of expectation

Theorem: For any random variable X and any constant c,

E(cX) = cE(X)

Proof:

E(cX) = SUM(x in range of X) cx P(X=x) = c SUM(x in range of X) x P(X=x) = c E(X)

Theorem: For any random variables X and Y,

E(X+Y) = E(X) + E(Y)

Proof:

E(X+Y) = SUM(x in range of X) SUM(y in range of Y) (x+y) P(X=x,Y=y)
   = SUM(x in range of X) SUM(y in range of Y) [ x P(X=x,Y=y) + y P(X=x,Y=y) ]
   = SUM(x in range of X) x SUM(y in range of Y) P(X=x,Y=y) + SUM(y in range of Y) y SUM(x in range of X) P(X=x,Y=y)
   = SUM(x in range of X) x P(X=x) + SUM(y in range of Y) y P(Y=y)
   = E(X) + E(Y)

Theorem: For any random variables X and Y that are independent,

E(XY) = E(X) E(Y)

Proof:

E(XY) = SUM(x in range of X) SUM(y in range of Y) xy P(X=x,Y=y)
   = SUM(x in range of X) SUM(y in range of Y) xy P(X=x)P(Y=y)
   = [ SUM(x in range of X) x P(X=x) ] [ SUM(y in range of Y) y P(Y=y) ]
   = E(X) E(Y)

Some properties of variance

Theorem: For any random variable X and any constant c,

Var(cX) = c2 Var(X)
and hence
SD(cX) = |c| SD(X)

Proof:

Var(cX) = E [ (cX - E(cX))2 ]
   = E [ (cX - cE(X))2 ]
   = E [ c2 (X - E(X))2 ]
   = c2 E [ (X - E(X))2 ]
   = c2 Var(X)

Theorem: For any random variables X and Y that are independent,

Var(X+Y) = Var(X) + Var(Y)

Proof:

Var(X+Y) = E [ (X+Y) - E(X+Y))2 ]
   = E [ (X + Y - E(X) - E(Y))2 ]
   = E [ (X - E(X))2 + (Y - E(Y))2 + 2 (X - E(X)) (Y - E(Y)) ]
   = E [ (X - E(X))2 ] + E [ (Y - E(Y))2 ] + E [ 2 (X - E(X)) (Y - E(Y)) ]
   = Var(X) + Var(Y) + 2 E [ (X - E(X)) (Y - E(Y)) ]
But we can see that E [ (X - E(X)) (Y - E(Y)) ] is zero: Since X and Y are independent, (X - E(X)) and (Y - E(Y)) are also independent, which means that
E [ (X - E(X)) (Y - E(Y)) ] = E[X - E(X)] E[Y - E(Y)]
and this is zero because E[X - E(X)] = E(X) - E[E(X)] = E(X) - E(X)) = 0.