We can estimate probabilities and expectations by *simulation*,
using pseudo-random numbers. Below is an R function for estimating
the probabilities for a binomial(*n*,*p*) distribution based
on *N* values simulated (independently) from this distribution:

sim.binom <- function (N, n, p) { pr <- rep(0,n+1) xsq <- 0 from2to4 <- 0 for (i in 1:N) { x <- rbinom(1,n,p) pr[x+1] <- pr[x+1] + 1 xsq <- xsq + x^2 if (x>=2 && x<=4) from2to4 <- from2to4 + 1 } list (pr = pr/N, xsq = xsq/N, from2to4 = from2to4/N) }Note that indexes for vectors start at 1 in R, so to use a vector to hold the counts of how often we have seen the values from 0 to

Here are four runs of this functions, for two random seeds, and with first 1000 and then 100000 simulated values:

> set.seed(1); sim.binom(1000,10,0.3) $pr [1] 0.022 0.127 0.236 0.265 0.197 0.101 0.037 0.013 0.001 0.001 0.000 $xsq [1] 11.247 $from2to4 [1] 0.698 > set.seed(2); sim.binom(1000,10,0.3) $pr [1] 0.039 0.119 0.231 0.258 0.190 0.108 0.047 0.007 0.001 0.000 0.000 $xsq [1] 11.204 $from2to4 [1] 0.679 > set.seed(1); sim.binom(100000,10,0.3) $pr [1] 0.02901 0.12272 0.23219 0.26567 0.19948 0.10327 0.03671 0.00954 0.00131 [10] 0.00010 0.00000 $xsq [1] 11.0969 $from2to4 [1] 0.69734 > set.seed(2); sim.binom(100000,10,0.3) $pr [1] 0.02793 0.12218 0.23177 0.26475 0.20162 0.10369 0.03713 0.00931 0.00139 [10] 0.00023 0.00000 $xsq [1] 11.15064 $from2to4 [1] 0.69814Notice how the results differ a bit with the two different settings of the seed, but that the difference is less when the number of simulated value is larger.

For this example, we can compare the simulaton estimates with the exact values, found below:

> round(dbinom(0:10,10,0.3),5) [1] 0.02825 0.12106 0.23347 0.26683 0.20012 0.10292 0.03676 0.00900 0.00145 [10] 0.00014 0.00001 > sum(dbinom(0:10,10,0.3)*(0:10)^2) [1] 11.1 > sum(dbinom(2:4,10,0.3)) [1] 0.7004233 > pbinom(4,10,0.3) - pbinom(1,10,0.3) [1] 0.7004233But we can use simulation for much more complex problems, where computing the exact values would be infeasible.

Recall that we defined the *expectation* of a random variable *X*
as

E(The expectation is the "average" value for the random variable.X) = SUM(overx)xP(X=x)

We may
also be interested in how much the value varies around this average.
One way to measure this variation is by the *variance*, defined as

Var(We also define theX) = E[ (X- E(X))^{2}]

Note that for random variables with an infinite number of possible values, it is possible for the expectation to be undefined, in which case the variance will also be undefined, and even if the expectation is defined and finite, the variance might be infinite.

**Example:** We roll a six-sided die, and let *X* be the
number it shows:

E(

X) = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5Var(

X) = (-2.5)^{2}(1/6) + (-1.5)^{2}(1/6) + (-0.5)^{2}(1/6) + (0.5)^{2}(1/6) + (1.5)^{2}(1/6) + (2.5)^{2}(1/6) = 2.91666...SD(

X) = 1.70785

**Theorem:** If *X* is a non-negative random variable,
(that is, P(*X* >= 0) = 1), then for any *a*>0,

P(or equivalently,X>=a) <= E(X) /a

E(X) >=aP(X>=a)

**Proof:**

E(X) = SUM(xin range ofX)xP(X=x)

>= SUM(xin range ofXwithx>=a)xP(X=x)

>= SUM(xin range ofXwithx>=a)aP(X=x)

=aSUM(xin range ofXwithx>=a) P(X=x)

=aP(X>=a)

**Theorem:** If s^{2} = Var(*X*) and m = E(*X*)
then for any *k*>0,

P(|or equivalently, for anyX-m| >=ks) <= 1 /k^{2}

P(((Just letX-m)^{2}>= a) <=s^{2}/a

**Proof:** The second form above follows directly from Markov's
inequality, since (*X*-*m*)^{2} is non-negative, and
its expectation is *s*^{2}.

**Example:** For the roll of one die used as an example above,
Chebyshev's inequality says that

P(|Actually, P(|X-3.5| > 2) = P(|X-3.5| > (2/1.70785)x1.70785) <= 1 / (2/1.70785)^{2}= 0.7291879

**Theorem:** For any random variable *X*
and any constant *c*,

E(cX) =cE(X)

**Proof:**

E(cX) = SUM(xin range ofX)cxP(X=x) =cSUM(xin range ofX)xP(X=x) =cE(X)

**Theorem:** For any random variables *X* and *Y*,

E(X+Y) = E(X) + E(Y)

**Proof:**

E(X+Y) = SUM(xin range ofX) SUM(yin range ofY) (x+y) P(X=x,Y=y)

= SUM(xin range ofX) SUM(yin range ofY) [xP(X=x,Y=y) +yP(X=x,Y=y) ]

= SUM(xin range ofX)xSUM(yin range ofY) P(X=x,Y=y) + SUM(yin range ofY)ySUM(xin range ofX) P(X=x,Y=y)

= SUM(xin range ofX)xP(X=x) + SUM(yin range ofY)yP(Y=y)

= E(X) + E(Y)

**Theorem:** For any random variables *X* and *Y* that
are independent,

E(XY) = E(X) E(Y)

**Proof:**

E(XY) = SUM(xin range ofX) SUM(yin range ofY)xyP(X=x,Y=y)

= SUM(xin range ofX) SUM(yin range ofY)xyP(X=x)P(Y=y)

= [ SUM(xin range ofX)xP(X=x) ] [ SUM(yin range ofY)yP(Y=y) ]

= E(X) E(Y)

**Theorem:** For any random variable *X*
and any constant *c*,

Var(and hencecX) =c^{2}Var(X)

SD(cX) = |c| SD(X)

**Proof:**

Var(cX) = E [ (cX- E(cX))^{2}]

= E [ (cX-cE(X))^{2}]

= E [c^{2}(X- E(X))^{2}]

=c^{2}E [ (X- E(X))^{2}]

=c^{2}Var(X)

**Theorem:** For any random variables *X* and *Y* that
are independent,

Var(X+Y) = Var(X) + Var(Y)

**Proof:**

Var(But we can see that E [ (X+Y) = E [ (X+Y) - E(X+Y))^{2}]

= E [ (X+Y- E(X) - E(Y))^{2}]

= E [ (X- E(X))^{2}+ (Y- E(Y))^{2}+ 2 (X- E(X)) (Y- E(Y)) ]

= E [ (X- E(X))^{2}] + E [ (Y- E(Y))^{2}] + E [ 2 (X- E(X)) (Y- E(Y)) ]

= Var(X) + Var(Y) + 2 E [ (X- E(X)) (Y- E(Y)) ]

E [ (and this is zero because E[X- E(X)) (Y- E(Y)) ] = E[X- E(X)] E[Y- E(Y)]