STA 437/1005, Assignment #2, Question #1. This discussion refers to plots and R output on the course web page. Plots of each of the seven types of investment return against year show that gbonds, gbills, and to a lesser extent cbonds can't really be seen as a random sample from some single hypthetical distribution of returns. The mean of gbills seems to have been higher from 1980 to 1985 than before and after, and other periods also seem different. Both the mean and the variance of gbonds seems to be less before 1980 than after. The mean and variance of cbonds also seems bigger after 1980, though the difference is less than for gbonds. However, the four variables giving returns for different types of stocks seem like they may be regarded as a random sample. No strong trends or high correlations of nearby years are apparent, beyond what can be plausibly explained by random variation. Of course, it is still possible that performance in future years will be from some different distribution, if it is determined by factors that weren't the same as in previous years. This possibility can only be judged on the basis of economic knowledge, not on the basis of statistical examination of the past data. Histograms of the seven return variables show distributions that are somewhat right skewed for gbonds and the four stock returns, and a distribution for gbills that has heavy tails (or looked at another way, a high central peak). The histograms of the log returns show right skew for industrial, utility, and finance stocks. Given the fairly small amount of data, these signs on non-normality might affect the validity of any confidence intervals produced, but the departures from normality are not extreme, so it makes sense to go ahead and find confidence intervals based on normal distributions anyway. (Of course, the problems with gbills, gbonds, and cbonds mentioned above will also affect the reliability of confidence intervals for those variables.) Since there are some moderate, but not extreme, problems with normality for both returns and log returns, it makes sense for the decision of whether to look at returns or log returns to be based on other considerations. We can consider the two possible uses of the results are mentioned in the assignment. If estimates (and confidence intervals) for different investments are to be used to estimate the return over one year from a mixed investment (eg, one third industrial stocks, one third finance stock, and one third government bonds), it makes sense to look at the returns themselves, since the return from a mixed investment such as that will be just the corresponding average of the returns from each investment. The return from the mixed investment could not easily be found from just the mean of the log return for each individual investment. However, if the estimates (and confidence intervals) are to be used to judge what return can be expected over many years, the mean of the log return is more useful, since the return over many years is the product of the returns for each year - ie, the log return is the sum of the log return for each year. The Law of Large Numbers says that the average log return over many years should be close to the mean log return, providing some guidance as to what to expect from a long term investment. On the other hand, the mean of the returns themselves is not a good guide. For instance, if the investment is wiped out in some years (return of zero), the long run return will be zero, but the average return might look quite good. Both the 90% and 95% confidence intervals for the returns based on the T-squared statistic have a lower bound less than one, and the corresponding intervals for the log returns have a lower bound less than zero. Based on these intervals, we couldn't be sure that any of these investments makes money on average. This may be too pessimistic, however, since these intervals are designed to be simultaneously cover the true values for not just the individual variables, but all possible linear combinations. If we look at the Bonferroni intervals, we see that all investments except gbonds and cbonds have lower bounds for the returns (and for log returns) that are greater than one (or zero). (These seem to be the most appropriate intervals to look at. The univariate t intervals without Bonferroni correction could be misleading, since we are likely to look for the best investment among these seven, for which we are relying on simultaneous coverage.) Note, however, that the uncertainly in the return (or log return) is quite high, even based on the Bonferroni intervals. The variability in return from year to year is also quite high, as seen from the standard deviations.