STA 410/2102 - Statistical Computation (Jan-Apr 2004)

Marks for all assigments and tests are now available here. Please let me know of any errors. You can collect the assigments and tests from my office. You can phone me at 416-978-4970 to see if I'm in.

This course will look at how statistical computations are done, and how to write programs for statistical problems that aren't handled by standard packages. Students will program in the R language (a free and improved variant of S), which will be introduced at the start of the course. Topics will include the use of simulation to investigate the properties of statistical methods, matrix computations used to implement linear models, optimization methods used for maximum likelihood estimation, and numerical integration methods for Bayesian inference, including a brief introduction to Markov chain Monte Carlo methods.

Instructor: Radford Neal, Office: SS6016A, Phone: (416) 978-4970, Email: radford@stat.utoronto.ca

Office hours: Thursdays from 2:30 to 3:30.

Lectures:

Tuesdays, Thursdays, and Fridays, 1:10pm to 2:00pm, from January 5 to April 8, except for Reading Week (February 16-20), in SS 1070.

Course Text:

R. A. Thisted, Elements of Statistical Computing. Here are some errata for the text (in PDF format).

Computing:

Assignments will be done in R. Graduate students will use the Statistics/Biostatistics computer system. Undergraduates will use CQUEST. You can request an account on CQUEST if you're an undergraduate student in this course. There is a CQUEST page for this course containing information on running R. You can also use R on your home computer by downloading it for free from http://lib.stat.cmu.edu/R/CRAN

Marking scheme:

Four assignments, worth 15%, 20%, 20%, and 15%.
Three one-hour in-class tests, each worth 10%.
The tests are scheduled for February 5, March 16, and April 8.

Please read the important course policies

Assignments:

Assignment 1: Postscript, PDF. Solution: program, discussion, plots in Postscript and PDF.

Assignment 2: Postscript, PDF. You should test your functions for problems 2 and 3 on the following three datasets:

a2data1 a2data2 a2data3
You can also get these datasets on the UTSTAT and CQUEST computer systems as /u/radford/a2data1, /u/radford/a2data2, and /u/radford/a2data3. For problem 2, you should find the MLE for mu for these three datasets with sigma fixed at 0.5 (ie, with rho fixed at log(0.5)).

Solution: program, tests, output of tests, discussion.

Assignment 3: Postscript, PDF. Solution: program, tests, output of tests, discussion.

Assignment 4: Postscript, PDF. Solution: program, results and discussion.

Tests:

Test 1 will be held on February 5, from 1:10 to 2:00, in SS 1070 (the usual classroom). Here are some practice questions: Postscript, PDF. The last practice question is solved by one of the example programs below

Test 2 will be held on March 16, from 1:10 to 2:00, in SS 1070 (the usual classroom). Here are some practice questions: Postscript, PDF.

Test 3 will be held on April 8, from 1:10 to 2:00, in SS 1070 (the usual classroom). Here are some practice questions: Postscript, PDF.

R Demonstrations:

Here is January 20th's demonstration of simple R data types.

Here is January 22nd's demonstration of more R data types.

Here is January 23rd's demonstration of R programming.

Example R Programs:

Creating a data frame with data for several groups: program.

Counting words in a file: program.

Testing how well t tests work: program, plots: Postscript, PDF.

Permutation test for correlation: program, output.

Testing how trimming affects the t test: program.

Computing a Cholesky decomposition: program.

Maximum likelihood for the Poisson model with no zeros: program, output.

Some old maximum likelihood assigments:

A) Handout, Program for part 1, Program for part 2.
B) Handout, Program, Tests.
C) Handout, Program.

Symbolic computation and minimization in R: examples

EM algorithm for censored Poission data: program, output of two tests.

An old EM assignment: Handout, Solution.

Bayesian inference for a simple model using the midpoint rule: program, output.

Evaluating a double integral with the midpoint rule: program, output.

An old Bayesian inference assignment: handout, program, output.

Gibbs sampling example: program, plots of output for data of 3.9, 3.6, 3.7: Postscript, PDF

Metropolis algorithm example: program. Run on data of 3.9, 3.6, 3.7.
Results with proposal standard deviations of 0.1: scatterplot (Postscript, PDF), trace (Postscript, PDF),
Results with proposal standard deviations of 1: scatterplot (Postscript, PDF), trace (Postscript, PDF),