382xm3

===================HW7======================================================================================
============================================================================================================
1.	For the dental study example, suppose we assume that every girl has the same intercept and slope (no random effects) and every boy has the same intercept and slope (no random effects). The intercept and slope for girls may be different from those of boys. Compare the intercepts and slopes of girls and boys group. Consider a few different variance-covariance structure of the error term within each subject: (i) Independence with the same variance (this may be accomplished by not including a repeated statement at all) (ii) Homogeneous compound symmetry (iii) Homogeneous AR(1). Choose the one you think fit the model best.
a.	Write down the statistical model you analyze.
b.	Are slopes statistically different from each other?
++++++++++++++++++>Code+++++++++++++++++++++++++++++++++
/*******************************************************************

  Analysis of the dental study data by fitting a random coefficient
  model in time using PROC MIXED.

  -  the repeated measurement factor is age (time)

  -  there is one "treatment" factor, gender

  The model for each child is assumed to be a straight line.
  The intercepts and slopes may have different means depending on
  gender, with the same covariance matrix D for each gender.

  We use the RANDOM and REPEATED statements to fit models that
  make several different assumptions about the forms of the matrices
  Ri and D.

*******************************************************************/

options ls=80 ps=59 nodate; run;

/******************************************************************

  Read in the data set (See Example 1 of Chapter 4)

*******************************************************************/

data dent1; infile 'C:\Users\Min\Box Sync\teaching\Spring 2015\Stat 382\datasets\dental.dat';
  input obsno child age distance gender;
run;

*  Independent case;

proc mixed method=ml data=dent1;
  class gender child;
  model distance = gender gender*age / noint solution;
*  estimate 'diff in mean slope' gender 0 0 gender*age 1 -1;
*  contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq;
run;

* CS structure;
proc mixed  method=ml data=dent1;
  class  child gender;
  model distance = gender gender*age / noint solution;
  repeated / type=cs subject=child;
*  estimate 'diff in mean slope' gender 0 0 gender*age 1 -1;
*  contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq;
run;

* AR(1);
proc mixed  method=ml data=dent1;
  class  child gender;
  model distance = gender gender*age / noint solution;
  repeated / type=ar(1) subject=child;
*  estimate 'diff in mean slope' gender 0 0 gender*age 1 -1;
*  contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq;
run;

* UN;
proc mixed  method=ml data=dent1;
  class  child gender;
  model distance = gender gender*age / noint solution;
  repeated / type=un subject=child;
*  estimate 'diff in mean slope' gender 0 0 gender*age 1 -1;
*  contrast 'overall gender diff' gender 1 -1, gender*age 1 -1 /chisq;
run;


==========================================================================>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++         2
==========================================================================>

2.	For the weight-lifting example, suppose we assume that a subject’s intercept depends on the subject’s weight, age, and previous weight experience only (no random effects), and slope depends weight, age, previous weight experience and the treatment effect (diet or no-diet). Consider a few different variance-covariance structure of the error term within each subject: (i) Independence with the same variance (this may be accomplished by not including a repeated statement at all) (ii) Homogeneous compound symmetry (iii) Homogeneous AR(1). Choose the one you think fit the model best.
a.	Write down the statistical model you analyze
b.	Is the diet treatment effective?

++++++++++++++>SAS CODE+++++++++++++++++++++++++++++++
/*******************************************************************

  Illustration of fitting a linear mixed effects model derived
  from a random coefficient model, where the mean slope in each
  group depends on a continuous covariate.

  The model for each man is assumed to be a straight line.
  The intercepts are taken to depend on baseline covariates.
  The slopes are taken to depend on baseline covariates, differentially
  by group (diet or not).

  We take D to be common for both groups and take Ri to be
  common to both groups of the form Ri = sigma^2 I.

*******************************************************************/

options ls=80 ps=59 nodate; run;

/******************************************************************

  Read in the data set

*******************************************************************/

data pdat; infile "C:\Users\Min\Box Sync\teaching\Spring 2015\Stat 382\datasets\press.dat";
  input id time press weight age prev diet;
run;

/*******************************************************************

  Use PROC MIXED to fit linear mixed effects model (i); we use
  normal ML rather than REML to get likelihood ratio tests

*******************************************************************/


** independent;
proc mixed method=ml data=pdat;
  class id;
  model press = weight prev age
               time time*diet time*age time*prev time*diet*prev;
run;

** CS struture;
proc mixed method=ml data=pdat;
  class id;
  model press = weight prev age
               time time*diet time*age time*prev time*diet*prev;
  repeated /type=cs subject=id;
run;

** AR(1) structure;
proc mixed method=ml data=pdat;
  class id;
  model press = weight prev age
               time time*diet time*age time*prev time*diet*prev;
  repeated /type=Ar(1) subject=id;
run;


proc mixed data=pdat;
  class id;
  model press = weight prev age
               time time*diet time*prev time*diet*prev / solution;
  repeated /type=AR(1) subject=id;
 /* estimate "slp, diet, no prev" time 1 time*diet 1;

  estimate "slp, no diet, prev" time 1 time*prev 1;
  estimate "slp, diet, prev" time 1 time*prev 1 time*diet 1 time*diet*prev 1;
  contrast "overall slp diff" time*diet 1,
                              time*prev 1,
                              time*diet*prev 1 / chisq;
  contrast "prev effect" time*prev 1, time*diet*prev 1 / chisq;
  */
  contrast "diet effect" time*diet 1, time*diet*prev 1 /chisq;
run;

====================================================================================>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
====================================================================================>      3

3.	In a clinical trial of studying the effects of AZT in slowing the development of AIDS symptoms. In the study, 338 veterans whose immune systems were beginning to falter after infection with the AIDS virus were randomly assigned either to receive AZT immediately or to wait until their T cells showed severe immune weakness. The following table shows whether they received AZT treatment and whether they developed AIDS symptoms during the 3-year study.
Use logistic regression model to analyze this table.
a. Write down the statistical model you analyze
b. What is the odds ratio between AZT use and development of AIDS symptoms? What is the confidence interval?

+++++++++++++++++++++>SAS CODE================================>

data AZT;
input race $ AZT $ Symptoms $ count;
cards;
white yes yes 14
white yes no 93
white no yes 32
white no no 81
black yes yes 11
black yes no 52
black no yes 12
black no no 43
;

proc logistic data=AZT;
class race AZT/param=ref;
model Symptoms(event='no')=race azt;
weight count;
run;

============================================================>
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++>        4
==============================================================>

options nodate ls=75; run;

/*  Read in the data  */

data lead; infile "C:\Users\Min\Box Sync\teaching\Spring 2015\Stat 382\datasets\lead.dat";
  input id age gender week lead trt;
  time=week;
run;
/*  Fit of model with common D and common within-child variance */

**fixed model;

title "M1";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  repeated week/subject=id;
run;

title "M2";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  repeated week/type=cs subject=id;
run;

title "M3";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  repeated week/type=ar(1) subject=id;
run;

title "M4";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  repeated week/type=un subject=id;
run;


title "check gender effects";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age  time*trt time*age*trt /  solution;
  repeated week/type=un subject=id;
run;

proc mixed method=reml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  repeated week/type=un subject=id;
  contrast "gender effects" gender 1, age*gender 1, time*trt*gender 1 0 0, time*trt*gender 0 1 0,
time*trt*gender 0 0 1, time*age*gender*trt 1 0 0,
time*age*gender*trt 0 1 0, time*age*gender*trt 0 0 1/chisq;
run;


**random model;

title "M1";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  random intercept time/type=un subject=id;
 * repeated week/subject=id;
run;

title "M2";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  random intercept time/type=un subject=id;
  repeated week/type=cs subject=id;
run;

title "M3";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  random intercept time/type=un subject=id;
  repeated week/type=ar(1) subject=id;
run;

title "M4";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  random intercept time/type=un subject=id;
  repeated week/type=un subject=id;
run;


title "check gender effects";
proc mixed method=ml data=lead;
  class trt id week;
  model lead = age time*trt time*age*trt;
  random intercept time/type=un subject=id;
 * repeated week/subject=id;
run;

proc mixed method=reml data=lead;
  class trt id week;
  model lead = age gender age*gender time*trt time*age*trt time*trt*gender time*age*gender*trt;
  random intercept time/type=un subject=id;
  contrast "gender effects" gender 1, age*gender 1, time*trt*gender 1 0 0, time*trt*gender 0 1 0,
time*trt*gender 0 0 1, time*age*gender*trt 1 0 0,
time*age*gender*trt 0 1 0, time*age*gender*trt 0 0 1/chisq;
run;

=======================================================================================>
=======================================================================================================
=============================     HW 6       TOM=========================================================================
=======================================================================================>
=======================================================================================================
======================================================================================================


* Tom Holmes;

*Problem 1;
*H0:  Diet A is not better than Diet B;
*H1:  Diet A is better than Diet B;
proc power;
	twosamplemeans groupmeans=(0 8) stddev=16 alpha=0.05 sides=1 power=0.5 npergroup=.;
run;


/*
She needs to assume that the blood glucose levels for each group is normally distributed.

Note that it doesn't matter what the group means actually are as long as they differ by 8.

Power = .9
Actual Power 	N Total		N Per Group
0.903 			140			70

Power = .8
Actual Power 	N Total		N Per Group
0.806 			102			51

Power = .7
Actual Power 	N Total		N Per Group
0.707 			88			44

Power = .6
Actual Power 	N Total		N Per Group
0.606 			60			30

Power = .5
Actual Power 	N Total		N Per Group
0.510 			46			23
*/

*Problem 2;
*H0:  Men are not better at detecting the sound than women.;
*H1: Men are better are detecting the sound than women.;
proc power;
	twosamplemeans groupmeans=(0 .25) stddev=.2 alpha=0.05 sides=1 power=. npergroup=20;
run;

/*
She needs to assume that the response time for each group is normally distributed.

Mean Difference		Power
.05					0.193
.1					0.463
.15					0.753
.2					0.928
.25					0.987
*/

*Problem 3;
*Same hypothesis as problem 1;
*H0:  Diet A is not better than Diet B;
*H1:  Diet A is better than Diet B;
proc power;
   twosamplewilcoxon
      vardist("DietA") = uniform(50,80)
      vardist("DietB") = normal(75, 100)
      variables = "DietA" |  "DietB"
	  alpha=0.05
	  sides=1
      power = 0.5
	  ntotal = .;
run;

/*
That specified variance is absolutely crazy.

Actual Power 	N Total		N Per Group
0.900 			2106		1053

Actual Power 	N Total		N Per Group
0.800 			1470		735

Actual Power 	N Total		N Per Group
0.700 			1082		541

Actual Power 	N Total		N Per Group
0.600 			798			399

Actual Power 	N Total		N Per Group
0.501 			572			286
*/

*Problem 4;
*Same hypothesis as problem 2;
*H0:  Men are not better at detecting the sound than women.;
*H1: Men are better are detecting the sound than women.;
proc power;
   twosamplewilcoxon
      vardist("Men") = gamma(2, .05)
      vardist("Women") = exponential(.2)
      variables = "Men" |  "Women"
	  alpha=0.05
	  sides=1
      power =.
	  npergroup =40;
run;


/*
N Per Group		Power
10				0.308
20				0.474
30				0.606
40				0.709
*/

data problem5;
	input FeedRate Catalyst AgitRate Temp Concen ReactionPercentage @@;
	datalines;
	10.0   		1.0  		100   		140  	6.0   		37.5
	10.0   		1.0  		120   		180  	3.0   		28.5
	10.0   		2.0  		100   		180  	3.0   		40.4
	10.0   		2.0  		120   		140  	6.0   		48.2
	15.0   		1.0  		100   		180  	6.0   		50.7
	15.0   		1.0  		120   		140  	3.0   		28.9
	15.0   		2.0  		100   		140  	3.0   		43.5
	15.0   		2.0  		120   		180  	6.0   		64.5
	12.5   		1.5  		110   		160  	4.5   		39.0
	12.5   		1.5  		110   		160  	4.5   		40.3
	12.5   		1.5  		110   		160  	4.5   		38.7
	12.5   		1.5  		110   		160  	4.5   		39.7
	10   		2.0  		120   		130  	7.0   		.
	;
run;
proc print; run;

proc reg data=problem5;
	model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen;
run;

/*
Problem 5
1.
The model form with all five factors as independent variables is:

ReactionPercentage = B_0 + B_1*FeedRate + B_2*Catalyst + B_3*AgitRate + B_4*Temp + B_5*Concen

The actual model is:
ReactionPercentage = -43.69167 + 1.65000*FeedRate + 12.75000*Catalyst + -0.02500*AgitRate + 0.16250*Temp + 4.96667*Concen

R-Square of this model is 0.9652 and the Adjusted R-Square is 0.9362
*/

proc glm data=problem5;
	model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen;
run;

proc glm data=problem5;
	model ReactionPercentage=FeedRate Catalyst Temp Concen;
run;

proc reg data=problem5;
	model ReactionPercentage=FeedRate Catalyst Temp Concen;
run;


/*
2.
It seems like the variable AgitRate is not significant.
Removing that model, we have the following:
R-Square = 0.9647 (Lower), but the Adjusted R-Square is 0.9446 Which is higher.
We will trust the Adjusted R-Square and assume we made the right choice.

3. The Final Model is the following
ReactionPercentage = -46.44167 + 1.65000*FeedRate + 12.75000*Catalyst + 0.16250*Temp + 4.96667*Concen
*/
ods select all;
proc reg data=problem5 ;
   	model ReactionPercentage = FeedRate Catalyst Temp Concen/clb cli clm p;
	output out=info p=rpredict lcl=lp ucl=up lclm=lm uclm=um;
run;
/*
4.
The 95% confidence intervals of the parameters are:
Intercept(B_0): -66.16853 	-26.71480
FeedRate		0.88954 	2.41046
Catalyst        8.94769 	16.55231
Temp			0.06744 	0.25756
Concen			3.69923 	6.23410
*/


/*
5.  The 95% confidence interval of the expected mean of the first observation is:
31.2514 	39.4653
*/
proc mixed data=problem5 cl;
   model ReactionPercentage = FeedRate Catalyst Temp Concen/solution;
   estimate 'Part6' intercept 1 FeedRate 10 Catalyst 2 Temp 130 Concen 7;
run;

proc print data=info; run;
/*
6.
The 95% confidence interval of the expected mean of the given entry is:
46.1764 	56.7236
The 95% confidence interval of the predicted value is:
43.9183 	58.9817
The predicted value is:
51.4500
*/


proc reg data=problem5;
   *model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen
         / selection=forward details=summary;
   *model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen
         / selection=backward details=summary;
   model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen
         / selection=maxr details=summary;
run;


proc reg data=problem5;
   	model ReactionPercentage = FeedRate Catalyst Temp Concen/clb cli clm p;
	output out=info p=rpredict lcl=lp ucl=up lclm=lm uclm=um;
run;
proc print data=info; run;

=======================================================================================>
=======================================================================================================
=============================     HW 6   ME        =========================================================================
=======================================================================================>
=======================================================================================================
======================================================================================================

*number one;

proc power;
   twosamplemeans test=diff
   meandiff = 8
   stddev = 16
   power = .5, .6, .7, .8, .9
   npergroup = . ;
 run;

 *number two;

 proc power;
   twosamplemeans test=diff
   meandiff = 0.05, 0.1, 0.15, 0.2, .25
   stddev = .2
   power = .
   npergroup = 20 ;
 run;

 *number three;
 proc power;
 	twosamplewilcoxon
 		VARDIST("Group A") = UNIFORM(50,80)
 		VARDIST("Group B") = NORMAL(75,100)
 		variables= "Group A" | "Group B"
 		nperg= .
 		power=.5, .6, .7, .8, .9;
 run;
 *number four;
 proc power;
 	twosamplewilcoxon
 		VARDIST("Male")=GAMMA(2,.05)
 		VARDIST("Female")=EXPONENTIAL(.2)
 		Variables="Male"|"Female"
 		nperg=10, 20, 30, 40
 		power= . ;
 run;
 *number five;
 data materials;
 	input FeedRate Catalyst AgitRate Temp Concen ReactionPercentage @@;
	datalines;
	10 1 100 140 6 37.5
	10 1 120 180 3 28.5
	10 2 100 180 3 40.4
	10 2 120 140 6 48.2
	15 1 100 180 6 50.7
	15 1 120 140 3 28.9
	15 2 100 140 3 43.5
	15 2 120 180 6 64.5
	12.5 1.5 110 160 4.5 39
	12.5 1.5 110 160 4.5 40.3
	12.5 1.5 110 160 4.5 38.7
	;
	proc reg data=materials;
	model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen;
	run;
	proc reg data=materials;
	model ReactionPercentage=FeedRate Catalyst Temp Concen;
	run;
	pro reg data=materials;
	model ReactionPercentage=FeedRate Catalyst AgitRate Temp Concen/clb cli clm p;
	output out=a2 p=b lcl=c ucl=d lclm=e uclm=f;
	run;


===============================================================================
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++==
===============================================================================
+++++++++++++++++++++++++++     EXAM 2    ++++++++++++++++++++++++++++++++++++++++++++++==
===============================================================================
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++==

http://rpubs.com/mxmrt/stat382_exam2

1.	A company conducted a clinical trial to test a drug treatment for treating bilirubin abnormalities. 86 patients were treated with an experimental drug for 3 months. The following is the summary of the data.

Pre-treatment	Post-treatment
	Yes		No
Yes	6	14
No		6	60
“Yes” means total bilirubin is above upper limit of normal range
Is there evidence of a change in the pre to post-treatment rates of abnormalities?
2.	Lamb’squarter is a common weed in corn fields. A researcher planted corn at the same rate in 8 small plots of ground, then weeded the corn rows by hand to allow no weeds in 4 randomly selected plots and exactly 3 lamb’s-quarter plants per meter of row in the other 4 plots. Here are the yields of corn (bushels per acre) in each of the plots:
0 weeds per meter	166.7,  172.2,  165.0,  176.9
3 weeds per meter	158.6,  176.4,  153.1, 156.0
Choose a proper statistical test to determine whether the presence of small numbers of weeds reduce the yield of corn.

3.	To test whether Drug 6-MP is effective, 10 leukaemia patients (for easy data entry, we just use 10 data point) were divided into two treatment groups: Drug 6-MP and Control. The remission times in weeks for these patients are as following:
Drug 6-MP	6, 6, 6*, 7, 9*
Control	1, 2, 2, 3, 4
	*indicates the survival time is censored.
Can we conclude that Drug 6-MP is effective in term of survival time compared with control treatment?


For the following two problems, you only need to write down the null and alternative hypotheses, as well as the appropriate statistical test for testing the hypotheses.
4.	A study considered the effect of prednisolone on severe hypercalcaemia in women with metastatic breast cancer. Of 30 patients, 15 were randomly selected to receive prednisolone. The other 15 formed a control group. Normalization in their level of serum-ionized calcium was achieved by 7 of the treated patients and none of the control group. Analyze whether results were significantly better for treatment than for control.

5.	In 1878 Charles Darwin recorded some data on the heights of Zea mays plants to determine what effect cross-fertilization or self- fertilization had on the height of Zea mays. The experiment was to select one cross-fertilization and self- fertilization, grow them in the same pot, and later measure their heights. The experiment was conducted on 6 pots. Let (Xi,Yi), be the height of cross-fertilization and the height of self-fertilization at pot i, i=1,…,6.  Charles Darwin wanted to know whether the cross-fertilization plants are generally taller than the self-fertilization plants.  Set up null and alternative hypotheses, and describe your plan to perform the statistical test.