0% found this document useful (0 votes)
4 views16 pages

Expdes Project3

The document presents a project analyzing dental patients and dairy production through experiments on milk production data and dental filling. It outlines objectives, models, methods, data analysis, and interpretations of results, concluding that different treatments (roughage, limited grain, and full grain) significantly affect milk production. The analysis includes statistical tests and graphical representations to validate assumptions of normality and homogeneity of variances.

Uploaded by

awel
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views16 pages

Expdes Project3

The document presents a project analyzing dental patients and dairy production through experiments on milk production data and dental filling. It outlines objectives, models, methods, data analysis, and interpretations of results, concluding that different treatments (roughage, limited grain, and full grain) significantly affect milk production. The analysis includes statistical tests and graphical representations to validate assumptions of normality and homogeneity of variances.

Uploaded by

awel
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

SCHOOL OF MATHEMATICAL AND STATISTICAL SCIENCE Department of Applied Statistics

Project 3:

Dental Patients and Dairy Production

Design of Experiments(stat 631)

Prepared by: ID NO
1. EFREM BELACHEW PGAS/007/08
2. TESHITA UKE PGAS/017/08

Submitted to: Dr. Ayele Taye,Ph.D

Jan, 2017
Hawassa,Ethiopia
Project 3: Dental Patients and Dairy Production

a) Analyze the dental filling experiment in Problem 12.2, page 310 of Oehlert (2010).

b) Analyze the milk production data in Table 13.4, page 339 of Oehlert (2010).

 Objectives of the study

 General objectives
 To compare the milk production during the six week period the cow is on a given feed.

 Specific objectives

 To know the three treatments of interest ; roughage,limited grain, and full [Link]
significantly affects milk production during the six week period .
 To compare which treatment mean effect has greater milk production during the six week
period .

 Models
 Designs balanced for residual effects– two blocks, one treatment, no
interaction.

 A design balanced for residual effects, or carryover design, is a crossover design with the
additional constraint that each treatment follows every other treatment an equal number of
times.

 When g(treatments) is even, we can find a design balanced for residual effects using g subjects;
when g is odd, we need 2g subjects (two squares) to balance for residuals effects.

 A design that includes all possible orders for the treatments an equal number of times will be
balanced for residual effects.

 The model for a residual-effects design has terms for subject, period, direct effect of a
treatment, residual effect of a treatment, and error.

 Methods and Hypothesis

 The subject received treatment i in period l and treatment j in period l-1 indices i and l run from 1
to g, and k runs across the number of subjects. Use j = 0 to indicate that there was no earlier
treatment (that is, when l= 1 and we are in the first period); j then runs from 0 to g. Our model
is

y ijkl =μ+α i + β j +γ k +δ l+ ε ijkl

2
Where ,i=1,2,…,3 ; j=1,2,…,3; k=1,2,…,12; l=1,2,…,3; ε ijkl iidN (0 , σ )

H o :α 1=α 2=…=α i=0 Vs H 1=α i ≠ 0 for some i

With
 y ijkl ; be the response(milk production during the six week period) for the k ' th
subject(the cow is on a given feed) in the l ' thtime period;
 μ is the overall or the grand population mean;
 α i is called the direct effect of treatment i,
 β j ; is called the residual effect of treatment j,
 γ k : is the effect of the k ' th subject(cow); and
 δ l ; is the effect of the l ' th period.
 ε ijkl Random deviation associated with each observation.

Where, All observations are independent and identically distributed normal random variables
2
with equal population variances (σ i =σ 2, for all i) and possibly different population means, i.e
y ijkl iidN ( µi , σ 2 )and ε ijkl iidN (0 , σ 2 )

 We make the usual zero-sum assumptions for the block and direct treatment effects.
3
 ∑ αi=0 for i = 1; 2; 3
i=1

3
 For the β j ' s we assume that β 0=0 and ∑ β j=0, for I,j = 1; 2; 3; That is, we
j=1

assume that there is a zero residual effect when in the first treatment period.

 Direct treatment effects are orthogonal to block effects (we have a crossover design),
but residual effects are not orthogonal to direct treatment effects or subjects.
 Data and Analysis

Data: Dairy production

 Two blocks, column (cow) and row(period) and one treatment. The treatments of interest are
(roughage, limited grain, and full grain.)
 Allocate treatments randomly to units within blocks.
 Response: milk production.

 R Code:
r1=c(0,1,0,0,0,-1,0,-1,1,0,1,-1,0,0,1,0,-1,0,0,1,0,0,0,-1,0,-1,1,0,1,-1,0,0,1,0,-1,0,0,1,0,0,0,-1,0,-1,1,0,1,-
1,0,0,1,0,-1,0)

r2=c(0,0,1,0,1,-1,0,-1,0,0,0,-1,0,1,0,0,-1,1,0,0,1,0,1,-1,0,-1,0,0,0,-1,0,1,0,0,-1,1,0,0,1,0,1,-1,0,-1,0,0,0,-
1,0,1,0,0,-1,1)

period=[Link](rep(c(1,2,3),18))

cow=[Link](c(rep(1,3),rep(2,3),rep(3,3),rep(4,3),rep(5,3),rep(6,3),rep(7,3),rep(8,3),rep(9,3),rep(10,3),r
ep(11,3),rep(12,3),rep(13,3),rep(14,3),rep(15,3),rep(16,3),rep(17,3),rep(18,3)))

treatment=[Link](c(1,2,3,2,3,1,3,1,2,1,3,2,2,1,3,3,2,1,1,2,3,2,3,1,3,1,2,1,3,2,2,1,3,3,2,1,1,2,3,2,3,1,3,1,
2,1,3,2,2,1,3,3,2,1))

[Link]=c(1376,1246,1151,2088,1864,1392,2238,1724,1272,1863,1755,1462,1748,1353,1339,2012,1
626,1010,1655,1517,1366,1938,1804,969,1855,1298,1233,1384,1535,1289,1640,1284,1370,1677,1497,
1059,1342,1294,1371,1344,1312,903,1627,1186,1066,1180,1245,1082,1287,1000,1078,1547,1297,887)

log([Link]) # a log transformation

[Link]=[Link](period,cow,treatment,r1,r2,log([Link]))

[Link]
model=aov(log([Link])~period+cow+treatment+r1+r2,data=[Link])

anova(model)
 interpretation
 The above table gives an ANOVA for the milk production data on the log scale. There is
overwhelming evidence of a treatment effect. There is also reasonably strong evidence that
residual effects exist.

model2=lm(log([Link])~period+cow+treatment+r1+r2,data=[Link])

summary(model2)

 interpretation
 The direct effects for treatments 2 and 3 are estimated to be 0.1458 and 0.241958; the first
must be - 0.387786 by the zero sum criterion. These effects are on the log scale, so there is a
significance partial grain and full grain correspond to about 15% and 27% increases from the
roughage treatment.
 The residual effects for treatments 2 and 3 are estimated to be −.045 and −.002; the third must
be .047 by the zero sum criterion. Thus the period after the roughage treatment tends to be
about 5% lower than might be expected otherwise, and the period after the full-grain treatment
tends to be about 5% higher.
 Multiple R-squareed=0.9508 and Adjusted R-squared=0.9131 which the treatments explains the
milk production difference.
 Tukey method to test all pair wise comparisons of treatment means.

mode3=aov(log([Link])~period+cow+treatment,data=[Link])

TukeyHSD(mode3,"treatment")

 interpretation
 According to the p adj, the means can be considered distinct. i.e, there is a significant
difference among treatment means.

 Statistical tests:

##-- Levene's test --##

library(car)

leveneTest(log([Link]) ~ treatment, data=[Link])

##-- Bartlett Test for Homogeneity of Variances --##

[Link](log([Link]) ~ treatment, data=[Link])

 Interpretation

 At α = 0.05, Since p-value =0.7789 > 0.05, therefore we do not reject the null hypothesis
of homogeneous variances using the Levene’s test at 5% level of significance. And also,
the Bartlett’s test of homogeneity of variance test has p-value = 0.8036>0.05, which
implies that the null hypothesis of homogeneity of variance is not rejected.
plot(log([Link])~treatment,data=[Link])

 interpretation

 The graphical analysis shows that the assumptions of normality and homogeneity of
variances across the treatment levels may not be different.

 Plots used to verify the assumptions normality and Homogeneous variances


test
par(mfrow = c(2, 2))
plot(model, which = 1) #Homogeneous variances test
abline(h = 0, lty = 1)
plot(model, which = 3) #Homogeneous variances test
plot(model, which = 2) #Normal q-q plot to check normality
 interpretation

1. Science A plot of residuals versus predicted values on the original scale shows problems .plot seems
wider on the right than the left, suggesting a lower power to stabilize the variability i.e., homogeneity of
variances are not satisfied.

2. From the log transformation, residual verses fitted plot we can see that the variability of the residuals
contestant with the value of the estimated means. Thus, the residual error variances are homogeneous.

3. From the log transformation Scaled-Location plot we can see that the medians of the absolute values
of the residuals reflects there is no obvious pattern as the treatment means increase it may look like the
error term has constant variance .

4. From the log transformation Q-Q plot we can see that the quartiles of residuals paired with their
corresponding standard normal quartiles lie on the straight line indicate that the residual are normally
distributed.

 Conclusion

 Note that the complete data set is not compatible with the null hypothesis of no
treatment effects.
 In a study on: Dairy production the treatments of interests (roughage, limited grain, and
full grain.) have different response, milk production. during the six week period
 A design balanced for residual effects, or carryover design, the three treatments:
roughage, limited grain, and full grain have different mean milk production.
 The full grain treatment have the higher milk production than the roughage and limited
grain treatments.

 References

 Gray [Link](2010).A First Course in Design and Analysis of


[Link] Minnesota.
 Design and Analysis of experiment 5th edition Douglas
[Link].
 Lecture note prepared by Ayele Taye, Ph.D. School of
Mathematical and Statistical Sciences Hawassa University.
Cross Validated

sign up log in
Questions Tags Users Badges Unanswered Ask
up vote
2
down vote
favorite
What is the right way to analyze a nested design in R?
r mixed-model random-effects-model nested psychology
I know that there are already a host of questions about nested designs but many of
them haven't been answered or come from biological domains which I sometimes find
hard to transfer to my domain.

I am currently trying to analyse the data from a psychological study. Participants read
four statements and indicated how appropriate they found each on a 5-point Likert-
scale. But I'll just call the outcome variable "out" and so the content will not distract
from the statistical problem at hand.

There are two IVs, one that was varied on three levels between subjects (let's call it
"between" and the levels 1, 2 and 3) and one on two levels within subjects ("within"
with its levels a and b). The within subjects factor is realised in such a manner that two
of the four statements correspond to one of the levels and the other two to the other.

Correspondingly, for every participant, I now have four "out" data points, two for the a
statements and two for the b statements, like this (in long format):

subj between within item out


123 1 a a1 3
123 1 a a2 4
123 1 b b1 1
123 1 b b2 2
124 2 a a1 5
124 2 a a2 4
124 2 b b1 2
124 2 b b2 3
125 3 a a1 1
125 3 a a2 1
125 3 b b1 2
125 3 b b2 3
and so on What is the right way to analyse the effect and interaction of between and
within? I assume that the two different a and the two different b statements do not
differ systematically but are just two instances of the same thing. Or do I need an
additional factor that tells R which of the four statements it is in order to allow for that
error? Anyway, I have tried these options:

m1 <- aov(out~between*within+Error(subj/within))
summary(m1)

m2 <- lm(out~between*within+within/subj)
anova(m2)

m3 <- lmer(out~between*within+(between|subj))
Anova(m3)
They produce different results but I'm not entirely sure about which one is right or what
the differences are. Could anyone enlighten me on this? I assume this is about fixed and
random effects which always gets my head in a twist. I have read other posts about
nested designs and I think this one comes nearest. But unfortunately the question ID
being nested groups wasn't answered. Any help would be very much appreciated!

Edit:
I greatly appreciate the answers so far! The comments prompted me to read some more
tutorials on mixed effects models as well as answers to similar questions. This enables
me to clarify my question: I know that I will need to specify random intercepts at least
for subjects and probably also the four different items representing the two levels of the
within factors. However, I am unsure about the specific fixed/random effects structure I
would have to model, and specifically what the maximal model in my case would look
like. Right now, I have tried some code which roughly looks like this:
m1 <- lmer(out ~ between * within + (between|subj) + (within|subj) + (between|
within/item))
This is probably wrong on several levels so any feedback would be greatly appreciated!
Just for clarity: The item factor has no intrinsic meaning, but both levels of the within
factor is realised by two items each (which every participant sees) and I am unsure
whether this is relevant to the model.

Apart from the model specification, I am still getting confused about the concepts
nested and crossed and which one applies to my setup. Lastly, most of my models fail to
converge but that is probably a matter for another question (and has been discussed
here in length.)
share improve this question
asked
Jul 17 '14 at 9:49

SpookyFM
51●6 edited
Jul 22 '14 at 7:29

1
The package "ordinal" does Ordinal Logistic Regression with mixed effects (clmm), which
may be apt for Likert ratings. – jona Jul 17 '14 at 16:35

@jona Thanks for the advice! I had found that package too when googling but wasn't
sure about it. I have now tried it but, referring to your comment in the emudrak's
answer, it doesn't let me specify random slopes... Do you know any solution for this?
– SpookyFM Jul 18 '14 at 9:44

I just analyzed a Likert ratings experiment using ordered logistic regression in Stan,
which let me freely estimate slopes for subjects and items. However, setting up a Stan
model is a somewhat involved procedure. Alternatively, binomial regression in lmer
gave fairly similar results compared to the ordinal regression - maybe you can try doing
the biggest mixed model you can build in clmm and compare its results to binomial
and/or linear regression using lmer. – jona Jul 18 '14 at 10:13
1
I can share my Stan code with you if you want to try that, although I can't promise it's
any good. – jona Jul 21 '14 at 11:55
1
@SpookyFM @jona The documentation for clmm() on CRAN is out of date, make sure to
take a look over at the package on R-Forge. See this message thread from the R special
interest group for mixed models. – Livius Sep 10 '14 at 16:11
show 4 more comments
2 Answers
order by
up vote
2
down vote
As discussed in the comments, here is my Stan model for an Ordered Logistic
Regression. It takes long format data (one observation per row) and estimates the
following parameters: coefficients for dummy-coded predictors (you can for example
use Rs [Link]) from the matrix preds as fixed effects, and also "maximal"
(intercept + slope for all terms plus all interaction) random subject and item effects. It
does not estimate or disallow correlated random effects.

It should give you the coefficients for your predictor matrix (in beta) and the cutoff
points.

data{
int<lower=2> K; // number of outcomes
int<lower=1> n_conds; // e.g. 32 (5 factors and all interactions)
int<lower=0> n_obs; // e.g. 12000 (observations)
int A[n_obs]; // outcomes (e.g. Likert scale rankings, 1-4)
matrix[n_obs,n_conds] preds; // predictor matrix of 0s and 1s

int<lower=0> Ss; // e.g. 190 (subjs)


int<lower=0> Is; // e.g. 2560 (items)

int<lower=0,upper=Ss> S[n_obs]; // subj per observation


int<lower=0,upper=Is> I[n_obs]; // item per observation

parameters{

vector[n_conds] r_subj[Ss]; // random effects subject


vector[n_conds] r_item[Is]; // random effects item

real<lower=0,upper=10> sigma_s; // one sigma for all subjs


real<lower=0,upper=10> sigma_i; // one sigma for all items

real<lower=0,upper=10> sigma; // one sigma for all fixed


ordered[K-1] c; // cut points
vector[n_conds] beta; // fixed effects
}

model {
sigma ~ cauchy(0,5); // rather narrow prior for all sigmas
sigma_s ~ cauchy(0,5);
sigma_i ~ cauchy(0,5);

beta ~ normal(0,sigma); // moderate prior for effects

for (s in 1:Ss) // prior for subj rand. eff.


r_subj[s] ~ normal(0,sigma_s);

for (i in 1:Is) // prior for item rand. eff.


r_item[i] ~ normal(0,sigma_i);

for (n in 1:n_obs) // loop to estimate using ordered logistic regr.

A[n] ~ ordered_logistic( preds[n] * beta // fixed

+ preds[n] * r_subj[S[n]] // random s


+ preds[n] * r_item[I[n]] // random i

, c); // cut points


After you've specified all values asked for in the data part of the code in a list dat, you
can run it, e.g. via fit <- stan(model_code = code, data=dat, chains = 0, iter = 10), and
print the values via print(fit,"beta").

I can't promise this works as intended, but I've compared it with simple ordinal
regression using clmm and "maximal" linear regression using lme4 and the results are
qualitatively very similar!
share improve this answer
answered
Jul 22 '14 at 14:09

jona
1,174●5●17 edited
Jul 23 '14 at 11:13

up vote
0
down vote
Your variables for between and subj are completely confounded- every observation for
subj=123 has between=1, subj=124 has between=1, etc...

Therefore the between variable is redundant, and including both between and subj in
your model is likely causing many of of the problems.

Is there really an independent variable associated with between? If so, then you have no
replication at that level (each between level was tested at only 1 subj), and you can't
separate the IV between from the subject random effect.

If the independent variable you called between is just to differentiate subj, let subj do
that.

I sugget this code:

m3 <- lmer(out ~ within + (1|subj)


For more information about syntax for specifying nested random effects in lme4, see
page 51 of chapter 2 of the book.
share improve this answer
answered
Jul 17 '14 at 16:18

emudrak
116●7
3
Aren't random-intercept-only models anti-conservative? IE., lmer(out ~ within + (within|
subj) is better? – jona Jul 17 '14 at 16:38
1
@emudrak Thank you very much for your response! But my sample data was probably
misleading: Of course I have more than three subjects so subject 126 would have
between = 1 again, 127 would have 2, 128 3, 129 1 again and so on. – SpookyFM Jul 18
'14 at 9:10
add a comment
Your Answer

Body
Add picture

Log in

OR
Name

Email

By posting your answer, you agree to the privacy policy and terms of service.

meta chat tour help blog privacy policy legal contact us full site
2017 Stack Exchange, Inc

You might also like