STA 437/1005, Fall 2008, Discussion for Assignment 3 As described in the assignment sheet, I did PCA in six ways, finding 4 PCs in each case. The 18 observations for the alpha experiment and the 24 observations for the cdc15 experiment were then projected onto these PCs and plotted in time order. These plots are the on the first six pages of the PDF file of plots on the web page. Looking these plots, one can see that using the covariance matrix for data from both experiments produces clear cyclic behaviour in PC 2 and PC 4. PC 3 has less clear cyclic behaviour, and PC 1 seems to have nothing to do with the cell cycle. The top two plots on the seventh page of the PDF file show PC 2 and PC 4 plotted against each other for the observations in the alpha and cdc15 experiments, confirming that they show cyclic behaviour. PCA on covariance matrices from just the alpha experiment and just the cdc15 experiment also seems to produce PCs that show cyclic behaviour, but it seems a bit less clear than when using both experiments. Using the correlation matrix rather than the covariance matrix results in PCs that seem to have much less cyclic behaviour. Since these experiments were designed to show the cell cycle, we might expect that the genes that are more variable are likely to be the ones that are related to the cycle, so using the correlation matrix (ie, rescaling variables to have the same variance) may discard relevant information. I therefore chose PC 2 and PC 4 using the covariance matrix of data from both experiments as the basis for choosing genes that show cyclic behaviour. A large magnitude for the coefficient of a gene in the eigenvector for either of these two PCs means that the gene makes a contribution to the value of at least one of these PCs, and hence may be cyclic. I added the squares of the coefficients in the two eigenvectors to obtain an overall measure of how relevant a gene is to the cell cycle (other ways of combining these coefficients would also be possible). The top 800 genes found this way include 82.6% of the previously identified cyclic genes. This high percentage confirms that the measure used to identify cyclic genes has some validity. However, a histogram of the measure of cyclicity that I used shows that its distribution over genes is unimodal, and that the cutoff for the 800 top genes does not correspond to any clear division into cyclic and non-cyclic genes. It may be that most genes are cyclic to some degree, so that a sharp cyclic/non-cyclic classification may not be appropriate.