STA 247 - Week 1 lecture summary

What is probability?

Probability originated several hundred years ago, to analyse gambling. It involves a shift of thinking, from "fate" or "the gods" to "unpredicable" or "random".

Rolls of dice or flips of coins are "fair" by design. Hence it's reasonable to think that the possible results are "equally likely".

We can extend this way of thinking to situations where the results are not equally likely - for example, rain or no rain tomorrow, based on today's temperature and barometric pressure.

For these situations, we can assign probabilities based on the frequencey with which each possibility occurred in the past, in a similar situation.

We can apply probability even more widely, to unique situations, which have never occurred in the past. These probabilities may be assigned based on the opinion of an expert (human or machine). They represent the "degree of belief" in the various outcomes.

Probability isn't confined to prediction of future events. We can also use probability to "predict" past events that we don't already know about, or anything that is unknown or uncertain.

Some applications of probability in computer science

Probabilistic algorithms:

A probabilistic algorithm can branch one way or another with a specified probability.
Testing whether a large number is prime: There's a way to quickly get an answer with very high probability of being correct using a probabilistic algorithm. It's been recently shown that one can do the same with a non-probabilistic algorithm. Open question: is it always possible to get rid of the probabilistic aspect at little cost?
Monte Carlo evaluation of integrals: This is a very widely used numerical method in physics, statistics, finance, etc.

Probabilistic analysis of algorithms:

Rather than look at the worst-case performance of an algorithm, we might look at its average run time.
Quicksort: This has n² worst-case run time, but n log n average run time, if every possible input is equally likely. But for other probability distributions over inputs, average run time may be n².

Information theory:

Data compression and error correction are based on probabilistic models.
We can compress data files only if we have some idea of which files are more likely than other files. Then we can assign short encodings to likely files, at the cost of assigning long encodings to less likely files, winning on average.
To correct errors optimally we need to have a model of which error patterns are more likely (eg, do they often come in bursts?). Also, some of the most effective error-correcting codes are constructed randomly.

Computer graphics:

Manually specifying every object in a scene to be rendered would often be very tedious. Instead, many objects are generated randomly, with probability distributions designed to produce objects that look natural.
High-quality rendering of scenes often involves integration over possible light rays, which can be done by probabilistically by Monte Carlo methods.

Statistical analysis:

Probability is the foundation for traditional statistical analysis.
Many areas of computer science need to analyse experiments or surveys:
Computer system performance assessment
Empirical evaluation of algorithm performance
Experiments testing usability of human-computer interfaces
Surveys of user requirements

Machine learning / computer vision / computational linguistics / bioinformatics:

Many applications require both sophisticated computational methods and non-traditional methods for statistical infereence, in which probability plays a central role. For example:
Learning to classify customers as good or bad credit risks
Recognizing objects (eg, faces) in images
Translating web pages from one language to another
Finding genes associated with particular diseases

Basic mathematical foundations of probability

Definitions:

Sample space: The set of all possible outcomes of an "experiment" or other situation. Exactly one of these outcomes actually occurs.
Event: A subset of the sample space
We will usually denote the sample space by S and other events by other upper case letters.

Examples:

We roll one 6-sided die. The sample space is S = { 1, 2, 3, 4, 5, 6 }. The event "roll > 4" is the subset { 5, 6 }.
We roll two 6-sided dice (one red die and one green die). The sample space has 6x6 = 36 elements, each a pair of results, (red, green): S = { (1,1), (1,2), ..., (2,1), (2,2), ... (6,5), (6,6) }. The event "sum of rolls > 7" is the subset { (2,6), (3,5), (3,6), (4,4), (4,5), (4,6), (5,3), (5,4), (5,5), (5,6), (6,2), (6,3), (6,4), (6,5), (6,6) }.

Axioms:

For any event A, P(A) >= 0.
If S is the sample space, P(S) = 1.
For any disjoint events A₁, A₂, A₃, ...
P(A₁ U A₂ U A₂ U ...) = P(A₁) + P(A₂) + P(A₃) + ...

Theorem: P(A^c) = 1 - P(A)

Theorem: P(A U B) = P(A) + P(B) - P(A intersection B)

Probability with equally-likely outcomes:

It's sometimes reasonable to assume that all the outcomes in the sample space are equally likely. We can then let the probability of an event, E, be the fraction of outcomes in S that are in E:
P(E) = #(E) / #(S)

Example:

We roll two dice, and use the sample space mentioned above. If the dice are "fair", it's reasonable to assume equally likely outcomes. Referring to the event "sum of rolls > 7" above, we get
P(sum of rolls > 7) = 15/36