STA 414, Assignment #1, discussion. As can be seen from the output, the best value for lambda according to cross validation is approximately 0.09, which produces a C.V. average squared error of 0.009278968. Values of 0.08 and 0.10 give only very slightly higher C.V. average squared errors. Much smaller values for lambda (eg, 0.0001) and much larger values for lambda (eg, 5) produce substantially higher average squared errors with C.V. Actual performance (by squared error) on the test cases is not best with lambda set to 0.09. A value of around 0.5 for lambda gives the best results. The difference (0.0124 versus 0.0148) is not huge, but it's not trivial. Looked at the other way, the C.V. assessment of performance for lambda equal to 0.5 is not much worse than for lambda equal to 0.09, which was chosen as the best (0.00928 versus 0.0108). It's not surprising that the best lambda by C.V. isn't the best for the test cases -- it would be too much to expect C.V. to work absolutely perfectly. The difference in performance on test cases for the lambda chosen by C.V. and the best lambda on test cases is perhaps a bit bigger than we might hope for, however. Cross validation is sometimes a bit of a noisy procedure. The C.V. average squared errors for all reasonably good lambdas are less than for test cases. This is probably due to the training cases happening to be easier to predict, just by chance. The contour plots for predictions for test cases with lambda set to 0.0001, 0.09 (best by C.V.), and 5 show that with a very small lambda the contours are rather rough, and are fitting noise in the training data, rather than showing the real relationship. With lambda set to 5, the contours are very smooth, smoother than they should be. (Both these statements are supported by the poor performance of these lambdas on test data.) The contours with the best lambda chosen by C.V. are intermediate, showing some detail, not lots of detail that seems like noise.