STA 414/2104: Statistical Methods for Machine Learning and Data Mining (Jan-Apr 2014)

All course material has now been marked, and will be available for pickup in my office 3-4pm on May 7 and 3-4pm on May 9.

Note that marks for test 1 and test 3 were taken as out of 80, even though the maximum possible was 100.

Links to the papers presented by students in STA 2104 are here.


Radford Neal, Office: SS6026A, Phone: (416) 978-4970, Email:

Office Hours: Tuesdays 3:10-4:00pm, in SS6026A.


Tuesdays 12:10-2:00pm and Thursdays 12:10-1:00pm, in SS 2127.
The first lecture is January 7. The last lecture is April 3.
There are no lectures February 17 to 21 (Reading Week).

Graduate students in STA 2104 will make presentations the week of April 7 to 11.


For undergraduates in STA 414:
52% Three assignments.
48% Three 50-minutes tests, each worth 16%, held in lecture time on February 6, March 13, and April 3.
For graduate students in STA 2104:
48% Three assignments.
42% Three 50-minutes tests, each worth 14%, held in lecture time on February 6, March 13, and April 3.
10% A 12-minute individual presentation on a conference paper that you have read.

The assignments are to be done by each student individually. Any discussion of the assignments with other students should be about general issues only, and should not involve giving or receiving written, typed, or emailed notes.

Graduate students may discuss the conference paper that they will present with anyone, in order to help understand it, but they must prepare their presentation themselves. (They may if they wish solicit feedback from others after a practice run of their presentation.)


The book Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy, is strongly recommended (but not required).

I will be posting lecture slides, and links to online references.


Assignments will be done in R. Statistics Graduate students will use the Statistics research computing system. Undergraduates and graduate students from other departments will use CQUEST. You can request an account on CQUEST if you're an undergraduate student in this course (you need to fill out a form if you're a grad student).

You can also use R on your home computer by downloading it for free from From that site, here is the Introduction to R.

You might also be interested in trying out my faster implementation of R, called pqR, available from (However, it currently is distributed only in source form, and is easily installed only on Linux/Unix systems.)

Some useful on-line references

Information Theory, Inference, and Learning Algorithms, by David MacKay.

David MacKay's thesis.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), by Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

Gaussian Processes for Machine Learning, by Carl Edward Rasmussen and Christopher K. I. Williams.

Proceedings of the International Conference on Machine Learning (ICML)

Proceedings of the annual conference on Neural Information Processing Systems (NIPS)

Lecture slides:

Note that slides may be updated as mistakes are corrected, or the amount of material covered in the week becomes apparent.

Week 1 (Introduction: Murphy's Ch. 1)
Week 2 (Linear basis functions, penalties, cross-validation)
Week 3 (Introduction to Bayesian methods)
Week 4 (Conjugate priors: Murphy's Ch. 3, Bayesian linear basis function models: Murphy's Ch. 7)
Week 5 (Gaussian process models: Murphy's Ch. 15)
Week 6 (Monte Carlo methods: Murphy's Ch. 23)
Week 7 (Clustering, mixture models: Murphy's Ch. 11)
Week 8 (Classification, generative and discriminative models: Murphy's Section 3.5, Chapter 8)
Week 10 (Neural networks: Murphy's Section 16.5, Dimensionality reduction (PCA and factor analysis): Murphy's Chapter 12)
Week 11 (Support Vector Machines, Kernel PCA: Murphy's Chapter 14)

Practice problem sets:

Practice problem set #1, and the answers.

Practice problem set #2, and the answers.

Practice problem set #3, and the answers.


Assignment 1: handout.
Training set 1: train1x, train1y.
Training set 2: train2x, train2y.
Test set: testx, testy,
hints on using R.
Solution: GP functions, R script, output of script, discussion. Note that some of the numbers you obtained could be slightly different but still be right.

Assignment 2: handout.
Training set: images, labels.
Test set: images, tstlab.
An R function for displaying a digit: show.r.
Solution: R functions, R script, output, plot, discussion, M step derivation.

Assignment 3: handout.
Training set: covariates, labels.
Test set: covariates, labels.
Solution: R script, MLP functions, output, plot, discussion.

Example R programs:

Week 2 example (linear basis function models): script, functions.
Week 3 example (simple Monte Carlo for Bayesian inference): script, functions.
Week 4 lecture example (Bayesian linear basis function models): script, functions.
Week 6 lecture example (Gaussian process regression): script, functions.
Week 8 lecture example (Gaussian mixture model with EM): script, functions.
Week 10 lecture example (neural network regression model): script, functions, plots.
Week 10 lecture example (PCA): functions.

Web pages for past related courses:

STA 414/2104 (Spring 2013)
STA 414/2104 (Spring 2012)
STA 414/2104 (Spring 2011)
STA 414/2104 (Spring 2007)
STA 414/2104 (Spring 2006)
CSC 411 (Fall 2006)
STA 410/2102 (Spring 2004) - has many examples of R programs