STA 414/2104, Sprint 2014, Hints on using R for Assignment 1 Reading the data. You can read the x values from the files "train1x", "train2x", and "testx" using 'read.table', with the head=FALSE argument, to tell it that there is no header line. This will give you a "data frame" containing the covariates. You should convert this to a matrix using the 'as.matrix' function, since otherwise accesses to elements of it will be rather slow. You can read the y values with either 'read.table', or 'scan'. Fitting a linear model. You should fit the simple linear model by least squares using R's 'lm' function. It takes as its argument a "formula", such as y ~ x1+x2, which means that the responses in the vector y are modeled by a linear model with covariates x1 and x2, which may also be vectors, or which may be matrices, in which case each column of the matrix is a covariate. If you converted the covariates from the data files to a matrix, you can therefore say that all covariates are to be used in the model by justing specifying the matrix of training inputs. An intercept in included in the model by default as well (as desired for this assignment). The 'lm' function returns a model object, which you should assign to a variable. You can then look at the results (including regression coefficients and estimated standard deviation for the residuals) using the 'summary' function. You can get the regression coefficients (including the intercept as the first one) with the 'coef' function. You'll need the coefficients to make predictions for the test cases. The 'mean' function will be useful in computing the average squared error on the test cases. To create matrices that have squares and cubes of the original covariates, it may be convenient to use the 'cbind' function, which pastes together two or more matrices side-by-side. Defining covariance functions. To use the Gaussian process functions from the week 6 demo, you will need to define suitable covariance functions, each of which will take as arguments a single covariate vector, a matrix of covariates for a set of cases (with each row being the covariate vector for one case), and a vector of hyperparameter values. You should ignore the first hyperparameter, which is the noise standard deviation, and is not used in defining the noise-free covariance function. One R trick that may be useful is that you're allowed to add or subtract a vector and a matrix, if the length of the vector is equal to the number of rows in the matrix. The result of such an operation is found by adding or subtracting the vector to or from each column of the matrix. Combining this with the transpose operation ('t' in R) may be useful. You may also find the rowSums or colSums operations to be useful - they take a matrix as their argument, and return a vector found by summing all elements in each row, or each column. Cross validation. You'll need to extend the week 6 example functions with one to find hyperparameters based on 10-fold cross validation. You can minimize cross validation error with the 'nlm' function that, as was used in the week 6 example functions to maximize the likelihood (without the minus sign used there to switch from minimization to maximization). You'll need to pass 'nlm' a function for computing the cross validation error. The 'seq' function may be useful in dividing the training data into 10 parts. You can find examples of cross validation in the week 2 example program, as well as in example programs and assignment solutions from previous years. Importance sampling. You'll also need to extend the week 6 example functions with one to make predictions using Bayesian averaging, done with importance sampling. You can generate a vector of k independent normal random variates with means given by the vector mu and standard deviations given by the vector sd with the expression rnorm(k,mu,sd). You'll need to do this to generate values from the prior distribution. It is good practice to set the random number seed to some constant before starting random number generation, with a call such as set.seed(1). This way, you get the same results if you run the program again, which makes finding bugs a lot easier When doing prediction by importance sampling from the prior, the weight given to predictions with a sampled parameter value is the probability density of the training data given that parameter value. Such a probability density for an entire training set is often very small or very large - so small or large that underflow or overflow may occur, when the number can't be represented on the computer (using the usual floating-point scheme). The way around this is to work in terms of the log of the probability density, without ever computing the probability density itself (there are tricks to do that). However, for this assignment, you can use the actual probability densities, since they don't underflow or overflow with the data sets you are using. You may find it useful to return several quantities from the importance sampling function. This can be done by putting them together with the 'list' function.