STA 414/2104, Spring 2006, Assignment #1, Discussion. R. M. Neal The predictions of knn with different k are smallest with k=3, which is also the value selected by knnsel. So cross validation worked well in this example. Predictions are nearly as good for k=9, but are substantially worse for k=1 and k=27. Predictions are much worse for k=81, which is not surprising, since this is a substantial fraction of the training set, so these 81 cases will include many that are far from the test case. The results for knncombo are better than for knnsel, which shows that combining results for several values of k worked better in this example. The knncombo method is related to the idea of using a weighted average of k nearest neighbors, rather than averaging all k training responses with equal weights, since the final result of linearly combining the averages found with different k values is some linear combination of the responses for the neighbors up to the maximum value of k used. For example, if we used knncombo with only 1 and 3 as possible values for k, and be obtained an intercept of b0=0.1 and regression coefficients of b1=0.2 and b2=0.6 for k=1 and k=3, the result is the same as using a prediction of 0.1 + (0.2+0.6/3) * y1 + (0.6/3) * y2 + (0.6/3) * y3 where y1, y2, and y3 are responses in the nearest, next nearest, and third nearest neighbors of the test point. This is similar to a weighted average of y1, y2, and y3. One difference, however, is that knncombo uses a regression model with an intercept, which wouldn't appear in a weighted average. A second difference is that a weighted average would usually use non-negative weights that sum to one, but knncombo can produce negative regression coefficients, which needn't sum to one. Finally, note that nknncombo will produce the same weights for the range from one possible value of k to the next in kvec. It's possible that modifying knncombo so that the intercept is zero and/or so that the betas can't be negative might improve it.