STA 414/2104 - DISCUSSION FOR ASSIGNMENT #2. For dataset A, with 250 training cases, linear logistic regression was best, with the linear discriminant method being only slightly worse, both having about 7.6% error rate. The quadratic discriminant was a bit worse, with 8.8% error rate, and quadratic logistic regression was worst, with an error rate, of 9.8%. The difference between the 7.6% error rate of linear logistic regression and the 9.8% error rate of quadratic logistic regression must result from the small size of the training set, since the quadratic model includes the linear model as a special case. With a larger training set, the quadratic logistic regression model would start to equal, or perhaps better, the linear logistic regression model. However, with a small dataset, the larger number of parameters to be estimated with the quadratic methods introduces more variability into the results, decreasing performance. The near-equal error rates of the linear discriminant and linear regression error rates is matched by near equality in their linear coefficients, as seen in the plot. This would be expected if the distributions of input variables within each class are close to being multivariate Gaussian. The univariate histograms for the six inputs show no clear departures from Gaussian distributions. The scatterplot of variables 1 and 2 also shows Gaussian-like distributions, and these two variables alone are enough to provide quite a bit of separation of the classes. For dataset B, with 2000 training cases, quadratic logistic regression performs best, with an error rate of 16.7%. The quadratic discriminant method does less well, with an error rate of 19.1%, and the two linear methods worst, with error rates of about 24%. Examining histograms of the six input variables, we can see that variable 1 has a bimodal distribution within class 1, and hence is clearly not Gaussian. The scatterplot of variables 1 and 3 shows that the two classes are somewhat separated by these variables, but in a way that cannot be captured by a linear discriminant. It is therefore not surprising that the linear methods do not do well. Although the quadratic discriminant does better, the fact that the Gaussian assumption it uses is false probably explains why it doesn't do as well as quadratic logistic regression. Note that the results obtained may change slightly according to whether the sample covariance matrix is computed with division by n-1 (as I did), or with division by n. Results may also change depending on how ties are broken when chosing w0 to minimize the error rate on training cases. (Such a tie occurs for the quadratic discriminant method on dataset A.)