ANCOVA vs. Change from Baseline Analysis
ANCOVA vs. Change from Baseline Analysis
ORIGINAL ARTICLES
Abstract
Background and Objective: For inferring a treatment effect from the difference between a treated and untreated group on a quantitative
outcome measured before and after treatment, current methods are analysis of covariance (ANCOVA) of the outcome with the baseline as
covariate, and analysis of variance (ANOVA) of change from baseline. This article compares both methods on power and bias, for random-
ized and nonrandomized studies.
Methods: The methods are compared by writing both as a regression model and as a repeated measures model, and are applied to a
nonrandomized study of preventing depression.
Results: In randomized studies both methods are unbiased, but ANCOVA has more power. If treatment assignment is based on the
baseline, only ANCOVA is unbiased. In nonrandomized studies with preexisting groups differing at baseline, the two methods cannot both
be unbiased, and may contradict each other. In the study of depression, ANCOVA suggests absence, but ANOVA of change suggests
presence, of a treatment effect. The methods differ because ANCOVA assumes absence of a baseline difference.
Conclusion: In randomized studies and studies with treatment assignment depending on the baseline, ANCOVA must be used. In non-
randomized studies of preexisting groups, ANOVA of change seems less biased than ANCOVA, but two control groups and two baseline
measurements are recommended. Ó 2006 Elsevier Inc. All rights reserved.
Keywords: ANCOVA; Change from baseline; Nonrandomized studies; Regression to different means; Regression to the mean; Repeated measures
is the average D in the population of interest. Most treat- treated group consisted of 88 students, 14–20 years old,
ments are evaluated by a parallel groups design in which in the medium-sized Dutch town Nijmegen, the control
half of all persons are treated (the experimental group, group of 92 students in the equally large neighboring town
G 5 1) and half are not (control group, G 5 0). The mean Arnhem. Assessment of symptoms of depression and skills
posttest difference between the groups is used to estimate was done before and after intervention. Persons were in-
D. In doing so, one assumes that, apart from sampling error, cluded if their pretest Beck’s Depression Inventory (BDI)
the posttest mean of the control group is equal to the posttest score was between 10 and 25, reflecting mild to moderate
mean of the treated group that would have resulted if that depression.
group had not been treated. This assumption is warranted Of all 180 students, 32 dropped out before the posttest:
if treatment assignment is based on randomization. How- 20 treated and 12 controls. Logistic regression of dropout
ever, randomization is not always possible. Exposure studies on treatment, age, gender, schooltype, and pretest of all out-
involve preexisting communities. Mass media interventions comes, showed dropout to depend on age and schooltype
can only be implemented at the community level. Treatment only (all other P O .30). The present analysis is limited
contamination may occur if persons within the same school to complete cases, postponing the inclusion of dropouts
or hospital are allocated to different groups. until Section 4. So two analyses were run with SPSS: (1)
But if randomization is impossible, then how can we ad- ANOVA of change (post- minus pretest) and (2) ANCOVA
just for a baseline group difference to estimate D unbias- with posttest as outcome and pretest as covariate. Both
edly ? Usually, the outcome is observed before treatment analyses were repeated with age, gender, and schooltype
(pretest, X ) and after treatment (posttest, Y ). If the groups as covariates. Residual checks showed only mild violation
differ significantly at pretest, this invalidates their posttest of normality and homogeneity of variance. No treatment
difference as treatment effect estimator. The next step is ad- by pretest interaction was found. Figure 1 shows the result
justing the posttest difference such that D is estimated un- for Symptoms, and plots for skills were similar.
biasedly. ANCOVA with treatment G as a factor, pretest The treated group had a higher pretest mean than the
X as a covariate, and posttest Y as an outcome, is one at- control group (P 5 .000), and the group difference was
tempt at adjustment. ANOVA of change from baseline, with smaller and no longer significant at posttest (two-tailed
G as a factor and change (Y 2 X ) as an outcome, is another P 5 .10). ANOVA of change suggested a treatment effect,
one. This article compares both methods in terms of power because symptoms decreased more in the treated than in the
and bias. To prevent misunderstanding, it must be empha- untreated group (effect estimate 20.46, SE 5 0.13, P 5
sized that if the groups in a nonrandomized study do not .000). In contrast, ANCOVA suggested absence of an effect
differ at pretest, this does not guarantee that the posttest dif- (estimate 20.13, SE 5 0.13, P 5 0.31). These results were
ference unbiasedly estimates D. For instance, the groups hardly affected either by adjusting for covariates or by
may differ in age, and this may lead to a posttest difference including dropouts (for details, see Section 4).
even if the pretest means are equal and there is no treat-
ment. A more dramatic example is given in Section 7.
For an unbiased effect estimation in nonrandomized studies 4. ANCOVA vs. ANOVA of change: A formal
‘‘strongly ignorable treatment assignment’’ [8] is needed. comparison
Roughly, this means that the actual treatment assignment Traditionally, ANCOVA is treated as an extension of
of person i is independent of Di, and rules out selection ANOVA [7,13]. Here, we present ANCOVA as a regression
of the treatment by each person. Strongly ignorable treat-
ment assignment may hold after correcting for some cova- .4
riate, which is then called a ‘‘complete confounding factor’’ control
[11]. For an example, see Section 6. One other assumption .2
is needed to test D, that is, ‘‘stable unit-treatment value’’ treated
[8], which comes down to independence between the Di 0.0
group mean
-.4
3. Example: A nonrandomized study
of prevention of depression
-.6
Before going into the differences between ANCOVA and
ANOVA of change, both methods will be applied to a non- -.8
pretest posttest
randomized study of prevention of depression [12], which
serves as an example throughout this article. The study time point
aim was to evaluate the effectiveness of a psychotherapeutic Fig. 1. Change of mean Symptoms score per group in the study of depres-
course in preventing depression among adolescents. The sion prevention.
922 G.J.P. Van Breukelen / Journal of Clinical Epidemiology 59 (2006) 920–925
model, briefly mentioning the difference with classical AN- existence after the pretest, by randomization or treatment
COVA. In terms of regression analysis, ANCOVA assumes assignment based on X. For these two designs, ANCOVA
that: is known to be the best method [2,9,10].
Repeated-measures analysis of the psychotherapy exam-
Yij 5 b0 1 b1 Gij 1 b2 Xij 1 eij ð1Þ ple in Section 3 was run with the SPSS procedure Mixed,
or equivalently, using model (2) with and without the pretest group effect
g1 Gij. These two models gave the same effect estimate,
Yij 2 b2 Xij 5 b0 1 b1 Gij 1 eij SE and P-value as ANOVA of change and ANCOVA, re-
spectively, confirming that g3 in (2) is equivalent to b1 in
where Yij is the posttest score of person i in group j (e.g., (1). An advantage of the repeated measures approach is that
j 5 1 for control, j 5 2 for treated); Gij is a treatment indi- it allows inclusion of persons with a missing posttest due to
cator (Gi1 5 0 for controls, Gi2 5 1 for treated); Xij is the dropout. In this example, including dropouts hardly
covariate, for example, the pretest score; and eij is normally affected the results. In general, it may make a difference
distributed with zero mean and constant variance. Classical (see Section 7).
ANCOVA differs from (1) in that Gij is coded (21,11) and In summary, in terms of regression (1), ANOVA of
Xij is ‘‘centered’’ by subtracting its mean [13,14]. This only change is a special case of ANCOVA in that it assumes
affects b0, called the ‘‘grand mean’’ in ANOVA. a slope b2 5 1 for regressing posttest Y on pretest X. In
In eq. (1), b1 is the group difference on Y adjusted for terms of repeated measures (2), ANCOVA is a special case
differences on X. Practical use of ANCOVA requires esti- of ANOVA of change in that it assumes a slope g1 5 0 for
mation of b2, which is a function of the within-group vari- regressing pretest X on group G. It is this difference that
ances and correlation of pretest and posttest. ANCOVA makes ANCOVA superior in randomized studies and ques-
assumes linearity of the covariate effect and absence of co- tionable in nonrandomized ones.
variate by group interaction. Both assumptions can be re-
laxed [14], but this article is limited to the classical
model to allow a comparison with ANOVA of change (Y 2 5. Randomized studies: Power
X ), which comes down to (1) with the assumption that
b2 5 1. In a randomized study any pretest group difference is
The real difference between ANCOVA and ANOVA of due to sampling error, so any value of b2 in (1) gives the
change becomes clear, however, by writing both in terms same b1 (5 D) apart from sampling error, because b1 is
of repeated measures. ANOVA of change is equivalent to the posttest difference minus b2 3 the pretest difference.
testing the group by time interaction in the following model ANOVA of the posttest lets b2 5 0, ANOVA of change
(with g instead of b for regression weights to prevent takes b2 5 1, and ANCOVA computes b2 such that the re-
confusion with the ANCOVA model (1): sidual posttest variance is minimized, thereby minimizing
the standard error of the treatment effect estimate. So AN-
Yijt 5 g0 1 g1 Gij 1 g2 Tit 1 g3 Gij Tit 1 eijt ð2Þ COVA gives the largest power and the smallest confidence
interval. If pretest and posttest have the same within-group
where Yijt is the observation of person i in group j at time
variance and rXY denotes the pretest–posttest correlation
point t, G is the group (0 5 control, 1 5 treated), T is within groups, then ANCOVA needs a sample size only
the time point (0 5 pretest, 1 5 posttest), and eijt is a ran- (1 1 rXY)/2 as large as that for ANOVA of change to have
dom person by time effect. Filling in G and T shows that g0 the same standard error, for instance, only 75% if rXY 5
is the pretest (population) mean of the control group, g1 is 0.50.
the pretest mean difference between the groups, g2 is the
In terms of repeated measures (2), the superiority of AN-
mean change in the control group, and g3 is the difference
COVA in randomized studies is due to the fact that, because
in mean change between the groups. So testing absence of
there is no group effect at pretest, ANCOVA is more
group by time interaction, that is, of H0: g3 5 0 in eq. (2), is
parsimonious than ANOVA of change, which contains a
equivalent to testing the H0 of no group effect on the superfluous parameter g1.
change (Y 2 X ). Repeated-measures ANOVA differs from In nonrandomized studies the group indicator G in (1)
(2) only in that it uses (21,11) instead of (0,1) coding for correlates with the pretest X, thereby inflating the SE of
G and T. the ANCOVA estimator [14]. This explains why both
It is much less known that ANCOVA is equivalent to
methods gave the same SE in Section 3. But in nonrandom-
testing the group by time interaction g3 in the reduced
ized studies bias, not power, is the issue.
model (2), which is obtained by assuming that g1 5 0.
So ANCOVA assumes that there is no group difference at
pretest [15]. This assumption is warranted if treatment as-
6. Treatment assignment based on the pretest: Bias
signment is based on randomization or on the pretest X.
In both cases, there is only one group of persons, and so Suppose that treatment assignment is based on the pre-
there can be no group effect at pretest. Groups come into test X such that the groups have different pretest means.
G.J.P. Van Breukelen / Journal of Clinical Epidemiology 59 (2006) 920–925 923
posttest
methods cannot both be unbiased, because b1 in (1) is the
10
posttest difference minus b2 3 the pretest difference. So
b1 depends on b2 , which is 1 for ANOVA of change, but
less than 1 for ANCOVA unless posttest variance is much
larger than pretest variance. If pre- and posttest have the 0
That ANCOVA is biased for preexisting groups, which on the total sample. This is a problem in nonrandomized
are random samples from their populations is shown in studies where the inclusion of persons is based on cutoffs
Fig. 3. The sample of Fig. 2 is now the control group (solid for the pretest, which act like a mild matching. But there
circles) and the experimental group is obtained by adding is a simple solution. If exclusion is based on the pretest
110 to each X and Y in the control group (clear circles). data, then posttest data of the excluded persons are ‘‘miss-
So the group difference is 10 at both time points. There ing at random’’ [17]. This type of missingness can be han-
is no treatment in either group and so D 5 0. ANOVA of dled by repeated-measures analysis (2), including the
change correctly estimates b1 to be zero (P 5 1.00), but pretest data of excluded persons, which is not possible with
ANCOVA estimates b1 to be 4.8 (P 5 .03). Almost the (1). In the present example repeated-measures analysis of
same result (effect 5 5.4, P 5 .03) is obtained by first all 40 persons, using only the pretest data of excluded per-
matching on X, which leads to the exclusion of all persons sons, gave an effect estimate of 2.55, with a two-tailed P 5
with X ! 11 or X O 20 (vertical lines in Fig. 3), and then .30, leading to the correct conclusion of no effect, while
applying ANOVA to the posttest Y or the change Y 2 X of ANCOVA of all data and ANOVA of change without
included persons only. Figure 3 shows the cause of this excluded persons led to the wrong conclusion.
bias. Matching leads to selection of the upper half of con- What do these results imply for the example in Section
trol group I and the lower half of experimental group II. At 3? Given that the groups were recruited from different
posttest there is regression, not to a common mean as AN- towns and had different pretest means, ANOVA of change
COVA assumes, but to the mean that would have been ob- seems more reasonable than ANCOVA, and one might con-
served without selection, that is, to 10.5 in the control clude that there was a treatment effect. But there are two
group and 20.5 in the experimental group. Of all 10 in- complications. First, the BDI score was used both as inclu-
cluded controls, 6 are below the Y 5 X line (downward re- sion criterion (10 ! BDI ! 25) and as part of the outcome
gression). Of all 10 included experimentals, 8 are above it Symptoms, implying some matching on the pretest. As
(upward regression). The opposite trends occur in the Fig. 3 shows, this may lead to differential regression if
excluded subgroups. ANCOVA is a mathematical method the two populations (before exclusion based on BDI) have
of matching and shares its bias in nonrandomized studies. different BDI means. This not only threatens the unbiased-
In this example the bias is clear, because there is no ness of ANCOVA, but also that of ANOVA of change when
treatment and we have posttest data of all persons. In prac- applied to the included persons only. But because no data
tice, there are no posttest data of excluded persons and AN- are available from the excluded persons, no further analysis
OVA of change on the included (matched) persons suffers is possible. A second complication is that the BDI score
from the same differential regression effect as ANCOVA was measured twice before the intervention period in the
control group. The first was used as inclusion criterion
and the second was 1 month later on the pretest of all out-
40
comes ([12], p. 142). Given that a BDI score O10 is well
above the population mean ([12], p. 72), it is likely that
regression to the mean had already occurred at pretest in
30 the control group. So the pretest difference in Fig. 1 may be
artificially large, casting doubt on the treatment effect. Un-
fortunately, no data from that first BDI measurement in the
20 control group are available.
posttest
10 8. Discussion
Based on literature, we saw that (1) the difference be-
tween ANCOVA and ANOVA of change is that between as-
0
group 1 suming absence or presence of a baseline group difference,
and (2) the choice between both methods depends on the
group 2
treatment assignment procedure. If treatment assignment
-10 is by randomization, both methods are unbiased but AN-
-10 0 10 20 30 40
pretest
COVA has more power. If treatment assignment is based
on the pretest, ANCOVA is unbiased but ANOVA of change
Fig. 3. Bias introduced by matching on the pretest X due to regression to is not, due to regression to the mean. Both designs imply
different means, in a nonrandomized study of preexisting groups with
treatment assignment after the pretest and so at pretest there
a fixed mean each. Reference lines: X 5 10.5 and X 5 20.5 (inclusion cri-
terion: 10 ! X ! 21) and Y 5 X. : Included: X O 10, result: Y ! X; is one group, justifying the ANCOVA assumption of no
excluded: X ! 10, result: Y O X, for a majority. B:Included: X ! 20, group effect at pretest. In contrast, if preexisting groups
result: Y O X; excluded: X O 20, result: Y ! X, for a majority. are assigned to treatment, the unbiasedness of both methods
G.J.P. Van Breukelen / Journal of Clinical Epidemiology 59 (2006) 920–925 925
depends on strong assumptions about trends in the absence groups may be a repeated-measures analysis including all
of treatment. The ANOVA of change assumption (equal available data of dropouts and of excluded persons.
change) is more plausible than the ANCOVA assumption
(regression to a common mean), at least if each group is
a random sample from its population. The larger the pretest
difference between preexisting groups, the worse ANCOVA References
is on bias (Section 7) and efficiency (Section 5). This bias
[1] Campbell DT, Kenny DA. A primer on regression artifacts. New
has to do with measurement error (intraindividual variabil-
York: Guilford Press; 1999.
ity) in the covariate, which leads to underestimation of b2 in [2] Holland PW, Rubin DB. On Lord’s paradox. In: Wainer H,
(1) and a greater discrepancy between ANCOVA and AN- Messick S, eds. Principals of modern psychological measurement.
OVA of change. Statistical corrections for this underestima- Hillsdale, NJ: Erlbaum; 1983. p. 3–25.
tion exist [7,18], but are beyond the present scope. Instead, [3] Kenny DA. A quasi-experimental approach to assessing treatment ef-
measurement error can be reduced by repeated pretesting fects in the nonequivalent control group design. Psychol Bull
1975;82:345–62.
and taking a person’s average as covariate [19]. [4] Lord FM. A paradox in the interpretation of group comparisons.
The present results lead to practical advice for non- Psychol Bull 1967;68:304–5.
randomized studies, assuming that person and cluster ran- [5] Lord FM. Statistical adjustments when comparing pre-existing
domization are impossible. The design is enhanced by groups. Psychol Bull 1969;72:336–7.
having (1) more than one control group [20], and (2) more [6] Maris E. Covariance adjustment versus gain scores revisited. Psychol
Methods 1998;3:309–27.
than one pretest [7,11], and (3) more than one outcome, in- [7] Porter AC, Raudenbush SW. Analysis of covariance: its model and
cluding some that are known to be unaffected by treatment use in psychological research. J Counseling Psychol 1987;34:383–92.
[21]. Having more than one control group or pretest allows [8] Rosenbaum PR, Rubin DB. Estimating the effects caused by treat-
estimation of group trend in the absence of treatment. Equal ments. J Am Stat Assoc 1984;79:26–8.
change of two control groups provides support for ANOVA [9] Rubin DB. Estimating causal effects of treatments in randomized and
nonrandomized studies. J Ed Psychol 1974;66:688–701.
of change, especially if the treated group is in-between both [10] Rubin DB. Assignment to treatment group on the basis of a covariate.
control groups at pretest. Likewise, equal change between J Ed Stat 1977;2:1–26.
repeated pretests of treated and control group suggests the [11] Weisberg HI. Statistical adjustments and uncontrolled studies.
use of ANOVA of change rather than ANCOVA. Moreover, Psychol Bull 1979;86:1149–64.
a person’s average pretest is less subject to measurement er- [12] Ruiter M. Preventie van depressie bij jongeren (Prevention of depres-
sion among adolescents). Doctoral dissertation, Nijmegen University,
ror than a single pretest, leading to larger power for both The Netherlands; 1997.
methods and less disagreement. Finally, including out- [13] Maxwell SE, Delaney HD. Designing experiments and analyzing
comes known to be unaffected by treatment allows checks data: a model comparison perspective. Pacific Grove, CA: Brooks/
on hidden bias in methods of analysis. For instance, finding Cole; 1990.
[14] Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applied regres-
a treatment effect on intelligence in the study of depression
sion analysis and other multivariable methods Pacific Grove, CA:
would cast doubt on the method of analysis. In nonrandom- Brooks/Cole; 1998.
ized studies with one preexisting control group and one pre- [15] Laird NM, Wang F. Estimating rates of change in randomized clinical
test, ANOVA of change may be better than ANCOVA, but trials. Controlled Clin Trials 1990;11:405–19.
running both methods may be even better. If both methods [16] Stigler SM. Regression towards the mean, historically considered.
lead to the same conclusion, differing only in effect size, Stati Methods Med Res 1997;6:103–14.
[17] Schafer JL, Graham JW. Missing data: our view of the state of the art.
this increases one’s confidence in that conclusion [3]. Psychol Methods 2002;7:147–77.
Additional problems, illustrated by the study of depres- [18] Carroll RJ. Covariance analysis in general linear measurement error
sion, are dropout and bias due to inclusion criteria in non- models. Stat Med 1989;8:1075–93.
randomized studies. In randomized and nonrandomized [19] Senn SJ. Covariance analysis in generalized linear measurement error
studies, dropouts must be included by using proper methods models. Stat Med 1990;9:583–6.
[20] Rosenbaum PR. The role of a second control group in an observa-
for missing data [17]. In nonrandomized studies, further tional study. Stat Sci 1987;2:292–316.
bias arises from the exclusion of persons based on pretest [21] Rosenbaum PR. The role of known effects in observational studies.
data, as Fig. 3 showed. So the best analysis of preexisting Biometrics 1987;45:557–69.