USEFUL R COMMANDS FOR ASSIGNMENT 3 You may wish to consult the hints for the previous assignment as well. SCALING DATA An easy way to scale the data is with the "scale" function: Xs = scale(X) This produces a matrix, Xs, of scaled observations from a matrix or data frame, X. In the scaled matrix, variables are centred to have sample mean zero, and scaled to have sample standard deviation one. PRINCIPAL COMPONENT ANALYSIS PCA can be done using the buit-in prcomp function. It takes a matrix or data frame as its argument, and returns a list in which the element "rotation" has all the principal components (as columns of a matrix), and "sdev" has the standard deviations in those directions (ie, the square roots of the eigenvalues). For example, pc = prcomp(X) pc1 = pc$rotation[,1] pc2 = pc$rotation[,2] lambda1 = pc$sdev[1]^2 lambda2 = pc$sdev[2]^2 Note that prcomp centres the data by default (ie, looks at the data afte subtracting the sample means, as is standard). By default, prcomp does not scale the data to have variance one, but it can be made to do so with the scale=T option, but you might be better off scaling the data yourself so you can look at it that way if you want to. FACTOR ANALYSIS Maximum likelihood factor analysis can be done using the built-in factanal function. It takes a data frame or matrix as its first argument, and the number of common factors to use in the model. It returns a list with the matrix L as the element "loadings" and the vector of specific variances as the element "uniquenesses". For example, fa = factanal(X,2) L = fa$loadings psi = fa$uniquenesses The loadings matrix is displayed in a special way. If you'd rather look at it as an ordinary matrix, use unclass(fa$loadings). Note that factanal always scales the variables to have sample variance one before fitting the model. (This can be annoying, since scaling makes no real difference, but we sometimes want to look at the loadings and uniquenesses in terms of the original scales.) LINEAR REGRESSION Ordinary least-squares linear regression can be done in R using the lm function. Example: m = lm (y ~ X + v) This fits a regression model for values in the vector y in terms of variables in X and v. If the response vector y has length 100, then we might use a matrix X with 100 rows and 4 columns, each column being a covariate, and a vector v of length 100 as another covariate. Of course, we could also just say lm(y~X) or lm(y~v) to use just the covariates in X or the covariate in v. The summary function can be use to look at the estimated coefficients, the Adjusted R-Squared value, and other things of interest: summary(m)