STA 414, 2011 ASSIGNMENT 3 - DISCUSSION After modifying the EM algorithm to handle missing values, I ran it three times each for models with 3 and 5 mixtures components, using random seeds of 1, 2, and 3 to initialize the responsibilities, and continuing for 100 EM iterations. The accompanying plots show the log likelihood after each iteration of these runs. As it should, the log likelihood never decreased. In these runs, the log likelihood seems to have reached a value very close to its maximum in no more than 50 iterations, so running for more than 100 iterations would probably not have changed the results. The runs for the 3-component model all reached a final log likelihood value of -332.3623. Of the three 5-component model runs, one reached a final log likelihood value of -316.5747, the other two the higher value of -315.1075. One of the 3-component runs and one of the two higher-likelihood 5-component runs were used to fill in the covariates to allow a linear model to use all the cases. A 1-component model was also fit, and used to fill in the covariates. This is equivalent to filling in the values with the sample mean for that variable, and is easier given that the code for fitting mixture models was already written. Finally, a linear model was fit to just the complete cases (the default for lm when there are missing values). The squared error on the test cases for these four linear models was as follows: Only complete cases: 0.865 Filled in by sample means: 0.485 Filled in with 3-component model: 0.473 Filled in with 5-component model: 0.391 Using only the complete cases (6 out of 100) clearly performed very badly for this dataset. It appears that using the 5-component model is best, but this could be due in part or in whole to chance. One thing that makes one suspicious of this is that the Adjusted R-squared for the linear model with missing values filled in using the model with 5 components is actually slightly smaller than that obtained using the 3-component model. A data set with only 100 training cases is small enough that randomness in how the filling-in procedure works on the training cases could have an effect on how good a method appears to be. However, it does seems likely that using a mixture model gives better results than simply filling in with the sample mean value.