STA 410/2102, Fall 2014, Discussion This discussion refers to the output and plots available from the course webpage. Performance of optimization methods. The "nlm" and "optim" functions were tried on the datasets generated with seed 1, using two initialization schemes. Both n=200 and n=600 were tried. The first initialization assumes 50% twins and no trend, initializing "a" to fit the general level of births given these assumptions. The second initialization does a regression fit, and assumes 10% twins (small enough that no correction was done to the regression fit to account for twins). For n=200, both optimization methods and both initializations gave estimates with log likelihood of -421.0734, to four decimal places. The estimates also agreed closely, though "w" varied from about -15 to -20 (which all give a probability of nearly zero). The standard errors for "a" and "b" agreed closely for all methods, and the estimates and standard errors seem to be good, in relation to the true values used to generate the data. The standard error for "w" was large, and was "NaN" for "nlm" with the first initialization. When "w" is estimated as -15 or less, so "p" is close to zero, it makes sense that the exact value of "w" is highly uncertain, though the huge standard errors are not quite right, in that they allow for large positive values of "w" that are actually incompatible with the data ("p" can't be very close to 1 since there are some odd observations). For n=600, the two methods and two initializations all gave estimates with similar, but not identical, log likelihoods, with values varying from -1487.3183 to -1485.5141. Though this range is fairly small, the run that gave the smallest log likelihood (optim with the first initialization) did give estimates that were the furthest from the true values (and was the only run to give an estimate for "w" that was very far from the true value of -2.9). The best results in terms of both log likelihood and actual accuracy were from "nlm" with the first initialization and "optim" with the second initialization. The time used by "nlm" was always less than the time used by "optim", sometimes only slightly, but sometimes by up to a factor of six. There seems to be no clear winner here, both with regard to choice of optimization method and with regard to initialization, but if computation time is an issue, one might choose "nlm" since it isn't clearly worse for results, and is definitely faster. (Note that both "nlm" and "optim" might perform differently with different options, or if provided with the gradient or the Hessian matrix rather than just the function values.) Checking whether standard errors are reliable. Plots were made of standard errors versus actual errors for 20 datasets with n=600, using both "nlm" and "optim", always with the first initialization method. Four runs (three with "optim", one with "nlm") gave "NaN" as the standard error for "w"; these points were omitted from the plots. For "w", a few very large standard errors occurred, so plots with these omitted were also produced (allowing the other points to be seen in more detail). The standard errors produced using "nlm" and "optim" were largely similar, though not identical. For "a", the standard errors varied over only a small range, so whether this variation is correlated with actual error cannot be determined (at least from only 20 runs). The standard errors for "a" seem to be a good guide to actual accuracy, with one run out of 20 having an actual error of more than twice the standard error, which is just what one would expect (since +- twice the standard error gives an approximately 95% confidence interval). For "b", the distribution of standard errors seems to be bimodal, with one mode around 0.00065 and the other around 0.00095. There is a slight indication that smaller standard errors go with smaller actual errors, but 20 runs isn't enough to be sure. Overall, three or four of the actual errors were greater than twice the standard error, which is somewhat more than expected, but not enough so to be really worrying. For "w", standard errors for a few datasets were very large, and these datasets did have large actual errors as well, though not as large as the standard error. For the datasets with smaller standard errors, most actual errors were less than the standard error, indicating that the standard error may somewhat overestimate the actual error (although one out of 20 datasets gave an actual error of more than twice the standard error, a number that is exactly as expected). In conclusion, at least with n=600, and for the particular true parameter values used, the method for obtaining standard errors from the observed information seems to be working reasonably well. Robustness to a nonlinear trend. To test if the model produces an estimate of the probability of twins that is robust to a small departure from the assumption of a linear trend, I generated data in which the actual trend is a + b*t + sin(t/50). Two datasets, with n=600 and n=2500, were tried. The data is plotted, and appears visually to have close to a linear trend, so this is the sort of data that this model might well be used for. Estimates were obtained with both "nlm" and "optim", using both initialization schemes. Using the optimization run that gives the highest likelihood estimate, the estimate and standard error for "w" was -1.95 +- 0.39 with n=600 and -2.33 +- 0.24 with n=900. For both sizes of dataset, the true value of -2.9 is outside the 95% confidence interval one would construct from these estimates and standard errors. Note that an estimate for "w" of -2.33 corresponds to a probability of twins of 8.9%, which compared to the true probability of 5% seems like a fairly large error, that would be of practical importance. This lack of robustness can be understood from the fact that the mean and variance of a Poisson distribution are the same. If the actual variance in the number of children born is larger than the mean, the model can explain that by supposing that twins are common (if all births were of twins, the variance would be twice the mean). To this model, departures from linearity look like increased variance, so it overestimates the probability of twins to "explain" this higher variance. To use a method like this to reliably estimate the probability of twins, one would need to carefully model non-linear trends, as well as other sources of extra variance, such as seasonal or day-of-the-week effects.