STA 410/2104, Fall 2014, Discussion for Assignment #3.

For both data sets, as initial values for the Metropolis algorithm, I
used intercept and slope coefficients, and estimated residual standard
deviations, from a least-squares regression, fit using "lm".  I set
the initial value of d to the middle of its range, 0.5, corresponding
to an initial number of degrees of freedom for the t distribution of 4.

I chose the proposal standard deviations for each dataset to give an
acceptance rate of a bit less than 1/2, which was achieved with a
standard deviation of 0.1 for the first dataset (acceptance rate
0.38), and 0.05 for the second (acceptance rate 0.35).

The Markov chain appears to have converged quite rapidly for the first
dataset, so I discarded only 100 iterations as burn in.  The second
dataset clearly needed more time to converge, so I discarded 1000
iterations as burn in.  The trace plots of parameters and of the log
posterior density show the time to the end of the burn in period as a
vertical line.  

For both datasets, I simulated 15000 iterations after the burn in period.

The scatterplot of the posterior distribution show that there are some
correlations between parameters, which are stronger for the second
dataset.  The stronger correlations between parameters may explain why
a longer burn in period was required for the second dataset.  The
movement of the chain around the posterior distribution also seems to
have been slower for the second datasets.

The plots of regression lines from the posterior distribution and
found using "lm" show that the least-squares fit from "lm" is far in
the tail of the posterior distribution (more so for the second dataset
than the first).  This is explained by the extreme points (eg, at
about x=0.85, y=-2.2 for the second dataset) that strongly influence
the least-squares regression line, but which have less influence when
the residuals have a heavy-tailed t distribution.  

Because of this difference in regression lines, the prediction at
x=0.9 is substantially different when using the results of the
Metropolis run than when using "lm".  The variation in this prediction
over different Metropolis runs was fairly small - a range of around
0.03 for the first dataset, and 0.003 for the second dataset -
indicating that the estimate for the model prediction from the
Metorpolis method with this number of iterations is pretty good.