STA 247 - Week 5 lecture summary

Families of distributions

Recall that two random variables, say X and Y, may have the same distribution, in which case P(X=a) = P(Y=a) for any a. Note, though, that this isn't the same as X and Y being the same random variable, nor does it imply that P(X=Y) = 1.

If we know that X and Y have the same distribution, anything we learn about the distribution of X applies to Y as well.

Of even more general use are families of distributions, in which a particular distribution can be specified by giving the values of one or more parameters. There are many standard families of distributions for which properties such as the probability mass function and the expected value have been worked out (as formulas involving the parameter values). If we know that a random variable has a distribution in one of these families, we immediately know a lot about its distribution.

The Bernoulli family

The simplest family of distributions is the set of distributions for a random variable with two possible values, 0 and 1. It has one parameter, p, which is the probability of the value 1. A distribution in this family is denoted by Bernoulli(p). We write

X ~ Bernoulli(p)

to say that the random variable X has such a distribution.

It is easy to show that if X~Bernoulli(p), then E(X)=p.

Example: If we flip a fair coin, and define X to have the value 1 for a head and 0 for a tail, then X~Bernoulli(1/2).

The binomial family

The binomial family of distributions, which has two parameters, n and p, can be seen as giving the distribution for the number of heads in n independent flips of a coin that has probability p of landing heads. Of course, it can be used for other problems not involving coins too, as long as they result in a probability mass function the same as the binomial distribution for some n and p.

We can derive the probability mass function for a binomial(n,p) random variable by counting how many ways there are to get each possible value, and multiplying by the probability of each of these ways (which all have the same probability). The result is that if X~binomial(n,p), then for x an integer from 0 to n,

P(X=x) = C(n,x) p^x (1-p)^n-x

Here, C(n,x) means "n choose x".

As we will see later, E(X)=np.

Example: If we flip a fair coin twice, and let X be the number of heads we get, then P(X=0)=1/4, P(X=1)=1/2, and P(X=2)=1/4.

Random number generation

One very useful ability of computers is to simulate random variables - that is, to pick a number as the value for a variable randomly, with some specified distribution.

Some computers can generate "real" random numbers (for instance, by exploiting the random noise in an electonic signal that results from the thermal motion of molecules). This is useful for applications like generation of cryptographic keys, where true unpredictability is essential.

However, for most purposes, it's better to use pseudo-random numbers, which appear to be random for almost all purposes, but which aren't really random at all - they are completely determined by the setting of an initial seed. One advantage of pseudo-random numbers is that results can be reproduced - you can, for example, notice a bug, try to fix it, and then be sure that it really was fixed, at least for the sequence of random numbers that showed the bug before.

Some R functions

R has functions for computing the probability mass function of a binomial random variable, for computing the cumulative distribution function for a binomial random variable, and for generating random values from a binomial distribution.

dbinom (v, n, p)
Returns the probability that a binomial(n,p) random variable will have the value v. If v is a vector, it returns a vector of such probabilities.

pbinom (v, n, p)
Returns the probability that a binomial(n,p) random variable will have a value less than or equal to v. If v is a vector, it returns a vector of such probabilities.

rbinom (k, n, p)
Returns a vector of k numbers randomly generated from the binomial(n,p) distribution.

Another useful function for generating random numbers is

sample (v, k, prob=..., replace=...)

This generates k random numbers chosen from those in the vector v, with probabilities given by the vector passed as the prob argument (default is all probabilities equal). If k is greater than 1, the replace argument (TRUE or FALSE, default FALSE) specifies whether the numbers should be drawn with or without replacement.

The random number seed is set with the function

set.seed (seed)

Here are some examples using these functions:

    > dbinom(0:2,2,0.5)
    [1] 0.25 0.50 0.25
    > dbinom(0:2,2,0.9)
    [1] 0.01 0.18 0.81
    > pbinom(0:2,2,0.9)
    [1] 0.01 0.19 1.00
    
    > set.seed(1)
    > rbinom(10,2,0.5)
     [1] 1 1 1 2 0 2 2 1 1 0
    > rbinom(10,2,0.5)
     [1] 0 0 1 1 2 1 1 2 1 2
    > rbinom(10,2,0.5)
     [1] 2 0 1 0 1 1 0 1 2 1
    > sample(5:8,3,prob=c(0.1,0.8,0.05,0.05),replace=TRUE)
    [1] 6 6 6
    > sample(5:8,3,prob=c(0.1,0.8,0.05,0.05),replace=TRUE)
    [1] 6 5 6
    > sample(5:8,3,prob=c(0.1,0.8,0.05,0.05),replace=TRUE)
    [1] 6 6 6
    > sample(5:8,3,prob=c(0.1,0.8,0.05,0.05),replace=TRUE)
    [1] 6 5 6
    
    > set.seed(2)
    > rbinom(10,2,0.5)
     [1] 0 1 1 0 2 2 0 2 1 1
    
    > set.seed(1)
    > rbinom(10,2,0.5)
     [1] 1 1 1 2 0 2 2 1 1 0