* Data from Myers and Montgomery, Table 2.8, p. 48;
data tab28;
input x1 x2 y;
group=100*x1+x2; * for a unique group number for each pair, for lack of fit later;
if x1= 9 then x1= sqrt(2);
if x1=-9 then x1=-sqrt(2);
if x2= 9 then x2= sqrt(2);
if x2=-9 then x2=-sqrt(2);
x1x2=x1*x2;
x1x1=x1*x1;
x2x2=x2*x2;
cards;
-1 -1 43
1 -1 78
-1 1 69
1 1 73
-9 0 48
9 0 76
0 -9 65
0 9 74
0 0 76
0 0 79
0 0 83
0 0 81
;
run;
/*
Because this is a special kind of model (a full second order model),
we can get the test for higher order terms and lack of fit simply
by using proc rsreg: The "/lackfit" option gives this test.
*/
title 'rsreg 1';
proc rsreg data=tab28;
model y=x1 x2/lackfit;
run;
/*
More generally, (i.e. for models other than full second order), we
would use two runs of proc glm to get the lack of fit test. First,
we need to define the full model by setting up a class variable that
will take unique values for each distinct pair of x1 x2 values. The
variable group defined in the data step is equivalent to a one-way ANOVA.
We then fit the
full model specifying the variable group as a classification variable.
(This is a one-way model with group specifying the groups).
SSE for this model is 26.75 with 3 df. This is the pure
error SS and df.
*/
title 'glm 1';
proc glm data=tab28;
class group;
model y=group;
run;
/*
The reduced model (reduced compared to the one-way ANOVA above)
is the regression model fit below (no class statement).
Note that the model statement illustrates the flexibility of proc glm
in specifying the regressors. SSE for this model is 35.35 with 6 df.
This is the residual SS. The LOF SS is the difference of SSE
between the reduced and the full model:
35.35 - 26.75 = 8.6 with 6 - 3 = 3 df.
The F-test is then the LOF SSE and df vs the full SSE and df
F = (8.6/3) / (26.75/3)
where there are 3 df and 3 df in the numerator and denominator
*/
title 'glm 2';
proc glm data=tab28;
model y=x1 x2 x1*x1 x1*x2 x2*x2;
run;
/*
The following illustrates a general way to run a general linear test
using proc reg. Here is where we finally use the variables x1x2, x1x1,
and x2x2.
The alpha option on the model statement defines the level of confidence
and prediction intervals requested in the output statement. The output
statement writes a number of inferential and diagnostic quantities to
the SAS data file
*/
title 'reg';
proc reg data=tab28;
model y=x1 x2 x1x1 x1x2 x2x2/alpha=.99; * alpha for intervals (CI and PI);
output out=tab28out cookd=d h=h lcl=lpred ucl=upred lclm=lest uclm=uest
press=press p=fit r=resid student=standres rstudent=studres stdr=se_resid
stdi=se_pred stdp=se_mean;
quad: test x1x1=0, x1x2=0, x2x2=0;
run;
quit;