Instructor:
Radford Neal
Phone: (416) 978-4970
Office: SS6016A (but this will change in a few weeks)
Email: radford@stat.utoronto.caOffice hours: Thursdays, 4:40pm to 5:30pm, in SS6016A.
Lectures:
Mondays 6:10pm to 9:00pm, from September 14 to November 30, except for October 12 (Thanksgiving), plus Wednesday November 11 from 6:10pm to 9:00pm, which makes up for the lecture missed on Thanksgiving.Lectures are in Sidney Smith Hall, 100 St. George Street, room 2110.
Textbook:
R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, 6th edition.You can get the datasets used as examples in the text, plus some proofs omitted from the book, from this web page. Click on "Take a closer look".
Computing:
Some assignment questions will require use of the R statistics package. You can use this package on the CQUEST computer system, or install it for free on your own computer (MS Windows, Macintosh, or Linux).You'll be able to get a CQUEST account once classes start at www.cquest.utoronto.ca.
The R package and documentation are at www.r-project.org. Here are some direct links to things available there:
- An introduction to R
- Current version of R for Windows. Click on the link for R 2.9.2 to download the setup program.
- R for Mac OS X.
- R for Linux.
Evaluation:
30% Three assignments, worth 6%, 12%, 12%.
25% Mid-term test, scheduled for Oct. 26, BA 1170, 6:10-8:30.
45% Final exam, scheduled by the Faculty during the exam period.The first assignment has pen-and-paper exercises, and is due Oct. 8.
The second and third assignments will involve substantial data analysis using R. The third will be handed in in two parts. The first part will be your solution. I will then release a model solution. A week later, you will hand in a critique of your solution, identifying what you think you did right and wrong. The grade for the assignment will be based on both parts.
Assignments:
NOTE: The assignments are worth 6%, 12%, and 12% of the course grade, as said above. Ignore any contrary information on the assignment handouts.Assignment 1: handout, solutions.
Assignment 2: handout, data set 1, data set 1 description, data set 2, data set 2 description, hints on using R.
Note: There's a typo in the assignment. Where it says "height to the power p divided by weight to the power q", it should read "weight to the power p divided by height to the power q". Also, you may find it useful to use the "apply" function with second argument of 1 in order to find means or medians of a set of variables. And you may find it useful to select a subset of observations in a data frame with something like d [d$class==2, ].
Model solutions: data set 1, data set 2.Assignment 3: handout, data set 1 (as modified), data set 1 description, data set 2, data set 2 description, hints on using R.
Test:
Held October 26, in Bahen room 1170, from 6:10 to 8:30.Here are the questions from last year's midterm test. Note that the last question is on material that won't be covered on this year's mid-term test.
The test will cover all material from lectures so far (and related material from the book). It will be closed book, no books or notes. Calculators will not be needed. I will provide any really complicated formulas needed, but you should remember the simple ones.
Here is the test paper and the answers.
Lecture topics:
We will likely cover most of Chapters 1 to 5, part of Chapters 6 and 7, most of Chapters 8 and 9, and part of Chapter 11.You should now have read Chapters 1, 2, 3, 4, 5, 8 and 9 of the text, and you should be reading Chapter 6.
Here are the topics and sections covered each week (this list may not be complete).
Sep. 14: Topics and applications of multivariate analysis, Data organization, Sample statistics, Scatterplots, Demonstration of R and of plots for data analysis. R scripts used are here and here. Text: 1.1-1.4
Sep. 21: Review of sample statistics; Meaning of a random sample; Means, covariances, correlations for random vectors; Estimation of mean, covariance, etc. from sample statistics; Effects of linear transformations; Start of discussion of normal distribution. Text: 2.5-2.6, 3.3, 3.6, 4.1-4.2.
Sep. 28: Multivariate normal distributions, MVN density function, positive definite matrices, properties of multivariate normal; Eigenvalues and eigenvectors, especially of covariance matrices; Distribution of sample mean and covariance. Text 2.3, 4.1-4.2, 4.4.
Oct. 5: Central Limit Theorem; Maximum likelihood estimation; Statistical distance; Assessing normality and finding outliers, QQ plots; Transformations to make data closer to being normally-distributed. Text: 1.5, 4.3, 4.5-4.8.
Oct. 12: THANKSGIVING. No lecture.
Oct. 19: Review of testing hypotheses about mean of univariate normal distribution with t statistic; Introduction to testing hypotheses about mean of multivariate normal distribution with T2 statistic. Text: 5.1-5.2.
Oct. 26: MIDTERM TEST. No lecture.
Nov. 2: Answers to the midterm test questions; T2 test as a likelihood ratio test; confidence regions from T2 test; simultaneous confidence intervals from T2 confidence region and from Bonferroni correction. Text: 5.3-5.4.
Nov. 9: Principal component analysis; Introduction to factor analysis. Text: 8.1-8.4, 9.1-9.2.
Nov. 11: More on factor analysis; Demo of PCA and factor analysis in R. Text: 9.3-9.5.
Web page for previous version of this course: