SSC2006
Last year I was most gratified to receive the Gold Medal of the Statistical Society of Canada.
I thank the great number of friends who must have worked very hard for this.
That was very kind.
This page contains links to material contained in the
acceptance address "Statistical Inference for Public Policy" presented
at the SSC annual meeting May 30, 2006.
Slides
A pdf file of the slides presented in the talk are at SSC 2006 slides (~700K).
Symbolic Computation in R
Symbolic Computation for Statistical Inference has been implemented in R.
Expressions are computed. These can be transformed into R functions
for numerical evaluation. The R procedures are found at Symbolic Computation in R (4K)
Examples of Symbolic R
Example 1. The expected value of the bootstrap variance of the
square of a mean.
source(file="http://www.fisher.utstat.utoronto.ca/david/Sym2006/SCSI.1.0.2.txt")
set.seed(42); n<- 20; y <- rnorm(n); # y Generate Gauss; 42 is the
answer to everything.
AA <- S(A(X) * A(X)); V <- S( EA(AA * AA) - EA(AA) * EA(AA) )
# Compute the bootstrap estimate of the variance, make it a function and
evaluate it.
BV <- S( BE(V) ); BVF <- Sfn( BV, X)(y)*n*n # -> 9.26
E <- GaussMoment; Sfn( V, X )(y)*n*n # Gauss -> 2
# Compute the expected value of the bootstrap, make it a function and evaluate
it.
EBV <- S( EA( BV ) ); Sfn( EBV, X )(y) * n * n # -> 5.78
# Compute the function for the unbiased estimate, make it a function and
evaluate it.
UV <- S( AE( V ) ); Sfn( UV, X )(y) *n * n #-> 0.91
# Compute the expected value of unbiased estimate, make it a function and
evaluate it.
EUV <- S( EA( UV ) ); Sfn( EUV, X )(y) * n * n # -> 2
Example 2. The expected value of the deviance.
In this example, the mle is expanded and used to compute the expansion of
the deviance. The expectation of the deviance are computed.
theta0 <- S( APPROX(t0, 4) ); thetahat <- S( InverseA(
l(theta0, 1) ) )
hadev <- S( A(l(thetahat)) - A(l(theta0)) ); S( EZ(hadev +
hadev) )
Symbolic Computation in browser
Symbolic computation for statistical inference may be easily done using
a standard browser. Details are found at PureStats (16K).
Examples
The code for the examples may be pasted in the input window above the buttons
of the calculator.
Example 1. The expected value of the bootstrap variance of the
square of a mean.
In ths example the variance V of AA, the square of an average is computed.
The bootstrap variance is the same expression with expectations considered
as averages. The expected value of the expression involving averages
is computed.
AA = A[X] A[X]; V = EfromKA[ AA, AA ]; EfromA[ V ]
Example 2. The expected value of the deviance.
The expression for the mle is used to compute the expression for the drop
in log-liklihood or deviance. The expected value of the deviance is
computed.
Theta = AE[4, theta]; hdev = Taylor[l,0, Root[l, 1, Theta] ] - Taylor[l,
0, Theta];
dev = hdev + hdev; EfromA[dev]
Example 3. The third moment of the signed root.
The expression for the signed root of the drop
in log-liklihood or deviance is used to compute it's third moment.
From the Bartlett identities this is found to be 0 to order n^-5/2 .
Set[ E[ l2 ] = -1];Set[ E[l1 l1] = 1]; Set[ E[l] = 0]
Theta = AE[4, theta]; t = Root[l, 1, Theta];dt = t - Theta;
rf2 = -1 Taylor[l,2,Theta]; rf3 = -0.66667 Taylor[l,3,Theta] dt;
rf4 = - 0.25 Taylor[l,4,Theta] dt dt; rf = rf2 + rf3 + rf4;
sr1 = (rf)^0.5; sr = sr1 dt;
EfromKA[sr, sr]
EfromKA[sr,sr,sr]
Cox-Tukey Confidence Intervals
The R code and an example of the Cox-Tukey intervals are given. At
this point there are two files: one for glm and one for coxph. The
code is designed to be general. We hope to have only one file in the
future.
Example
Example 1. Cox proportional hazards for Stanford Heart Transplant data.
In this example to Cox-Tukey interval is computed for the age effect on the
Stanford Heart Transplant data. The complete data are selected the
significance of the age effect is computed and the ACox-Tukey interval is
computed. Note the calculation will take a few minutes -- which is
small compared to the 52 lifetimes which were lost in the study.
source(file="http://www.fisher.utstat.utoronto.ca/david/SCS2006folder/coxtukeyph.txt")
library(survival)
good <- !is.na(stanford2$t5); time <- stanford2$time[good]
age <- stanford2$age[good]; t5 <- stanford2$t5[good]
status <- stanford2$status[good]; n <- length(t5); w <- rep(1,n)
f = as.formula("Surv(time,status)~t5")
2* sig(0,coxph,f,age,w) #two sided p-value
cti(coxph,f,age,w,alpha=0.025)
Example 2. Binomial glm for Stanford Heart Transplant data.
In ths example the same data is used for the confidence interval of the age
effect in a logistic model for survival. Note the calculation
will take a few minutes -- which is small compared to the 52 lifetimes which
were lost in the study.
source(file="http://www.fisher.utstat.utoronto.ca/david/SCS2006folder/coxtukeyglm.txt")
fg = as.formula("status~t5")
cti(glm,fg,age,w,alpha=0.025,family=binomial)
References
Bailar, J. C. (1992). Medical Uses of Statistics. 2nd Edition.NEJM Books. Boston See contributions by J. Bailar and L. Moses.
Cochrane, W. G. (1940). "The Analysis of Variance when Experimental Errors Follow the Poisson or Binomial Laws." Ann. Mth. Statist. 11, 335-347.
Cochrane, W. G. (1980). Fisher and the Analysis of Variance. in R. A. Fisher: An Appreciation. Springer.
Cox, D. R. (1990). Role of Models in Statistical Analysis. Statistical Science. 5. 169-174.
Note, in particular, section 3.1
Feynmann R. P. (1999). The pleasure of finding things out : the best short works of Richard P. Feynman. Cambridge, Mass. : Perseus Books
Laplace, P. S. (1823). De l'action de la lune sur l'atmosphere. Annales de Chimie et de Physique. 24: 280-294
Stigler, S. M. (1986). The History of Satistics. Harvard University Press. Cambridge. For the use of total sum of squares see pages 153, 155.