0% found this document useful (0 votes)
10 views15 pages

Causal Inference in Observational Studies

Establishing that a relationship is causal is important to understand the causes of the diseases

Uploaded by

Rafaela Ribeiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views15 pages

Causal Inference in Observational Studies

Establishing that a relationship is causal is important to understand the causes of the diseases

Uploaded by

Rafaela Ribeiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

15.

Causal inference with observational data*


Richard Breen †

1. INTRODUCTION
Causality permeates our understanding of reality; it is fundamental to how we operate in,
and make sense of, the world, but, as many philosophers have pointed out, ‘this deep
intuitive familiarity has not led to any philosophical account of causation that is at once
clean, precise and widely agreed upon’ (Paul & Hall 2013, p. 1). Establishing relationships
of cause and effect is also central to scientific inquiry and rigorous sociological analysis
must treat seriously the difficulties this entails. Social scientists have approached causality
in diverse ways: in the 1970s and 1980s, ‘causal models’ in sociology meant path models
and structural equation models, in time-series econometrics Granger (1969) developed a
widely used approach to causality, Hill (1965) presented a set of nine criteria for causality
that were important in epidemiology, and Ní Brolcháin & Dyson (2007) proposed a set
of related criteria for causality in demography. But in recent years two approaches to
causality have come to prominence in the social sciences and more widely: the counter-
factual or potential outcomes model (as in Angrist & Pischke 2009) and the approach that
understands causality in terms of a causal structure displayed as a graph (Pearl 2009),
sometimes called a Structural Causal Model.
In many sciences, causal relations can be established through experiments, often
termed randomized control trials (RCTs). Sometimes this is possible in the social sciences
(see the chapter by Gërxhani & Miller on experimental sociology in this Handbook) but
often experiments are not feasible or ethical; as a result, social scientists commonly want
to estimate causal relationships using observational data – that is, data in which they do
not have control over the factor considered to be a cause of the effect they are studying.
This makes the drawing of conclusions about causal relationships much more challeng-
ing. Lack of control over the causal treatment is a problem in some other sciences too.
Sociologists have played fast and loose with the word ‘cause’ and its associated terms –
‘effect’, ‘impact’, ‘tendency’, ‘mechanism’ and so on. Causal language is often used in
analyses that are descriptive. Imagine we observe that students whose families have
higher incomes tend to have higher scores on a standardized test or examination. This is
a descriptive statement and to write of the ‘effect’ of family income on test scores in this
case is misleading. A causal, as distinct from a descriptive, statement, concerns what
would happen if the cause were changed: ‘increasing students’ family income leads, on
average, to better test scores’ is a causal statement. Establishing that a relationship is

*
K. Gërxhani, N.D. de Graaf and W. Raub (eds) (2022), Handbook of Sociological Science: Contributions
to Rigorous Sociology, Cheltenham, UK and Northampton, MA, USA: Edward Elgar Publishing. ISBN: 978
1 78990 942 5; [Link]

I thank the editors, Mads Meier Jaeger, John Ermisch, and participants in the online meetings held in
December 2020 for helpful comments on an earlier version.

272 Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­273

causal is important but it is at least as important to know the magnitude of the effect.
Smoking is known to cause lung cancer but we also want to know how much the risk is
elevated depending on how much people smoke. Likewise, we would like to know not
only whether class size affects a particular student outcome but also how much a given
change in class size would change that outcome. So contemporary approaches to causal
analysis attempt to both identify a causal effect (is the relationship causal?) and estimate
its ­magnitude.
There are two common ways of talking about causal relationships. On the one hand,
we can ask, what causes people to join protest movements, on the other we can ask, how
does a person’s education affect whether or not he or she joins a protest movement? The
former looks for a more or less complete explanation of an outcome, the latter for the
effect of a particular thing on that outcome. The former is called a ‘causes of effects’
approach, the latter ‘effects of causes’. Another frequently drawn distinction is between
singular and general causal claims. ‘John’s cancer was caused by smoking 60 cigarettes a
day’ is singular, ‘smoking causes cancer’ is general.
The causal approaches I shall discuss deal, for the most part, with general causal claims
from an effects of causes perspective. Sociologists’ causal statements (like everyday ones)
are often not of this kind; they have written about the causes of singular events and they
have taken a causes of effects perspective in seeking to account for particular phenomena.
‘How did the industrial revolution come about?’ is an example of the sort of causal
­question that lies outside the scope of the approaches discussed here.

2. AN EXAMPLE

Imagine that we survey a large sample of people who have just completed their school
education. We gather data on their grades in a terminal school examination and we
compare the grades of students who attended public secondary schools with those who
attended private schools. We find that the average grade is significantly higher among the
latter. This is a descriptive finding but could we also conclude that the kind of school
attended was the cause of this gap in average grades? Probably not, because different
kinds of students attend private and public schools and some of these differences – for
example in parental income – might account for some or all of the average gap in their
exam scores. This implies that, even if they had attended the same kind of school, the
students who in fact attended private schools would, on average, have done better. In this
case, parental income plays the role of a confounder, something that is correlated with
the treatment (type of school attended) and affects the outcome but which is not a conse-
quence of treatment. Failure to take account of confounders leads to biased estimates of
causal effects. In this example, the bias in question is sometimes called ‘baseline bias’: the
treated and the non-treated would have had different average outcomes even if neither
had received the treatment.
There is another kind of bias arising from confounders. Suppose that there is variation
in the degree to which students’ families provide a supportive learning environment. We
would expect that students from supportive families would benefit more from being in a
private school than would students from less supportive homes. If the students with the
better home learning environment were indeed more likely to attend a private school, the

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
274  Handbook of sociological science

effect on educational outcomes of being in such a school would be overstated. This is


‘differential treatment effect bias’: those who received a private education benefitted more
from it than those who received a public education would have benefitted from a private
education if, counter to fact, they had received it. Differential treatment effect bias exem-
plifies the fundamental idea that people may respond differently to the same treatment.
This is referred to as heterogeneity of treatment effects.

3. THE POTENTIAL OUTCOMES APPROACH

In the example just given, there were references to what would have eventuated in circum-
stances that did not, in fact, arise. This idea is formalized in the potential outcomes, or
counterfactual, approach. Call the variable representing the cause, D, and the effect, Y. D
is often referred to as ‘the treatment’ or the ‘exposure’. The underlying idea of the coun-
terfactual model is that each unit (typically an individual person) has a potential outcome
for each value that D could take, YD = d. The causal effect of D on Y for the ith person is
given by

ICE = YiDi = d – YiDi = dʹ(15.1)

ICE stands for individual causal effect, i indexes individuals and d and d ʹ are different
values of D. Two things stand out. First, a causal effect is defined in relative terms as the
difference between potential outcomes for two values of the treatment. If D were binary,
with possible values 1 and 0, the potential outcomes would be YiD = 1 and YiD = 0 but in
general there can be many treatment effects, depending on how many comparisons
can be made, though not all of them may be of interest. Second, the ICE cannot
be observed because a person can, in reality, have only one value of D and so only
one potential outcome can be realized. The unobservability of some potential outcomes
leads to what Holland (1986, p. 947) termed ‘the fundamental problem of causal
­inference’.
The average causal effect, or average treatment effect, ATE, is the average of the ICE
in the population of interest:

ATE = E (YiDi = 1 – YiDi = 0) = E (YiDi = 1) – E (YiDi = 0)(15.2)

The second part follows by the linearity of the expectation operator. It would not be true
if we wanted the median treatment effect, for example. Here the treatment is assumed
binary and I will maintain this assumption for simplicity, referring to those cases
for which D = 1 as being treated, with those for which D = 0 being the controls, or
untreated.
The ATE is the expected causal effect for any member of the population but more often
we want to know how the treatment affected those people who received it rather than how
it would affect a person picked at random. This is the ATT, the average treatment effect
for the treated, defined as

ATT = E ((YiDi = 1 – YiDi = 0) | Di = 1) = E (YiDi = 1 | Di = 1) – E (YiDi = 0 | Di = 1) (15.3)

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­275

Here we condition on those who were treated (Di = 1) or, more simply, we compute the
ATE only for those who were treated. Similarly, the average treatment effect for the
untreated, ATU, is the average effect of treatment among those who did not, in fact, receive
treatment:

ATU = E ((YiDi = 1 – YiDi = 0) | Di = 0) = E (YiDi = 1 | Di = 0) – E (YiDi = 0 | Di = 0) (15.4)

The observed outcome, Y, is related to the potential Ys by

Yi = YiDi = 1 Di + (1 – Di )YiDi = 0(15.5)

For each individual their observed Y is their potential Y for the treatment they actually
received.
Calculating the ATE, ATT or ATU seems more promising than calculating the ICE
because now we have observations of both potential outcomes, given that some individu-
als were treated (so for them we observe YiDi = 1) and others were not (we observe YiDi = 0).
An obvious way of trying to estimate the ATE is by comparing the average outcomes of
the treated and the untreated:

ATE * = E (Y | D = 1) – E (Y | D = 0) (15.6)

Here I have dropped the i subscripts for ease of notation. Equation 15.6 is expressed in
observed outcomes whereas Equation 15.2 through Equation 15.4 are expressed in
potential outcomes leading us to wonder how well ATE* performs as an estimator of the
ATE. In fact, ATE*, is equal to the true ATE plus two bias terms (see Morgan & Winship
2015, pp. 58–59). The first bias arises from the difference between the potential out-
comes under non-treatment between the people who received treatment and those
who did not. The second bias term depends on the difference between the ATT and
the ATU. These are the two biases discussed in the previous section. Our naive esti-
mate of the ATE in this case is the observed difference in average exam scores
between the students who attended private and public schools. Here the baseline bias is
the average difference in exam scores between the privately and publicly educated stu-
dents that would have been observed even if none of them had attended a private school
(that is, the average difference in YD = 0). The differential treatment effect bias is non-zero
if the returns to attending a private school for those who did attend (ATT) were different
than the returns would have been to those who did not attend if they had attended
(ATU).
The naive estimator of the ATE will be unbiased only if both these bias terms are zero,
and this will occur when there are, on average, no differences between the treated and the
untreated in the distribution of factors that affect the outcome other than the treatment.
In an RCT this condition will be met (at least up to some sampling error) because units
are randomly assigned to the treatment and control groups. This is the attraction of
RCTs: a simple comparison of the average outcomes for the treatment and control
groups will be an unbiased estimate of the ATE.
This condition for an unbiased estimate can be written in two ways. Linking observed
and potential outcomes we have

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
276  Handbook of sociological science

E (Y | D = d) = E (Y D = d ) for all d(15.7a)

and linking potential outcomes and the treatment indicator:

Y D = d ⊥ D(15.7b)

Equation 15.7a says that the average outcome for the group that receives treatment D = d
is equal to the average potential outcome for D = d. Equation 15.7b says that the poten-
tial outcomes are independent of the actual treatment received.
The counterfactual model is heavily indebted to RCTs with causes being identified with
treatments. As noted earlier, in an RCT the experimenter has control over treatment
assignment and this has given rise to the view that, from the perspective of the potential
outcomes approach, ‘causes are only those things that could, in principle, be treatments
in experiments. The qualification “in principle” is important because practical, ethical,
and other considerations might make some experiments infeasible, that is, limit us to
contemplating hypothetical experiments’ (Holland 1986, p. 954). This claim and others
like it have proved troublesome to sociologists who often want to know the effect of
things like age or race or ethnicity or gender on particular outcomes. Useful, and less hard
line, discussions of what could count as a cause can be found in Woodward (2003, espe-
cially Chapter 3) and Morgan & Winship (2015, pp. 439–441). In fact, the effects of
ascribed characteristics, such as ethnicity, are often brought about through the differen-
tial treatment of people from different perceived ethnic backgrounds by other people. So
even though it is difficult to imagine how ethnicity might be a treatment to which subjects
could be assigned, it is less difficult to design a study to vary other people’s perceptions
of ethnicity. Audit studies where names are used to shape subjects’ beliefs about the ethnic
origin of job applicants, are an example.
In practice, the potential outcomes approach entails an assumption that one person’s
potential outcomes do not depend on the treatment received by anyone else. This is called
the stable unit treatment value assumption, SUTVA. It would be violated if, for example,
Y D = 1 for each person depended on how many people received D = 1. Vaccination pro-
vides an example because the value to an individual of getting vaccinated declines as the
number of other people being vaccinated grows, implying that each person has not two
potential outcomes but a large number, depending on their own treatment status and on
how many others were treated. This is sometimes called the problem of interference
(Hudgens & Halloran 2008). One could write down the set of potential outcomes in a case
like this but it would present formidable obstacles to estimation. There are certainly some
situations of interest to sociologists and social scientists in which SUTVA is unlikely to
hold (see Morgan & Winship 2015, pp. 48–52 for a discussion). For example, the returns
to investment in learning a certain skill will, all else equal, depend on how many other
people possess that skill.

4. STRUCTURAL CAUSAL MODELS

Structural Causal Models (SCM) were developed by Pearl, combining ‘features of the
structural equation models (SEM) used in economics and social science (Goldberger,

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­277

197[2], Duncan, 1975), the potential-outcome framework of Neyman (1923) and Rubin
(1974), and the graphical models developed for probabilistic reasoning and causal analy-
sis (Pearl, 1988; Lauritzen, 1996, Spirtes, Glymour, and Scheines, 2000, Pearl 2000)’
(Pearl 2010, p. 5).
The bedrock of the SCM is a graphical representation of relationships between
­variables.1
The particular kind of graph used is called a DAG – a directed acyclic graph – in which
nodes or vertices represent variables and directed edges (arrows) from one node to
another indicate that the first is a cause of the second. In a DAG, reciprocal arrows are
not permitted, nor are cycles (A causes B causes C causes A, for example). Sometimes
undirected edges are used to show associations between variables when their causal rela-
tionship is not relevant. Unlike the structural equation diagrams with which sociologists
are familiar, independent error terms are not shown. What we would think of as correla-
tions between error terms in a structural equation model are represented in a DAG by
one or more variables that affect both of the variables whose errors are assumed corre-
lated. DAGs thus can include both observed and unobserved variables. Figure 15.1
shows a simple DAG for the running example of school type, X, affecting test grades, Y,
with A and B being confounders of the effect of X on Y that may or may not have been
measured. A could represent parental income, B a measure of the home learning environ-
ment. DAGs identify causal effects non-parametrically: that is, they do not suppose any
particular functional forms for the relationships between variables.
A DAG is a model of the data generating mechanism underlying the causal system
represented by the graph, which is to say the DAG should capture the underlying social
processes and how they came to be represented in the data. It encodes our assumptions
and beliefs about this data-generating mechanism based on theory and subject matter
knowledge (ideally established by prior research). Pearl has derived three criteria for
identifying causal effects using a DAG: backdoor and front door identification and
instrumental variables. Instrumental variables are discussed later and here I focus on the
backdoor criterion and, more briefly, the front door criterion.
Under the backdoor criterion a causal path from X to Y is identified if all backdoor
paths from X to Y are blocked. In a DAG, a path from nodes P to Q is defined as a set
of arrows, in any direction, that link P and Q. Backdoor paths from X to Y are all those
paths linking X and Y that start with an arrow going into X. In Figure 15.1 there are two

X A

B Y

Figure 15.1 A simple DAG

1
The presentation of the SCM here is informal: other accessible and more comprehensive presentations
are numerous (Pearl 2010; Pearl et al. 2016, and Elwert 2013). More rigorous expositions are plentiful, but see
especially Pearl (2009).

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
278  Handbook of sociological science

backdoor paths from X to Y: one from X to A to Y, the other from X to B to Y. These


backdoor paths represent associations between X and Y that are not causal and so to
identify the causal effect of X on Y we need to block both of them. Blocking a path
involves two things: conditioning on non-colliders and not conditioning on colliders. A
collider is a variable that has more than one arrow going into it: in Figure 15.2a R is a
collider on the path linking P and Q. A collider blocks a path on which it sits: so, accord-
ing to this DAG, P and Q are independent. But conditioning on a collider opens the path:
conditional on R, P and Q are no longer independent. Thus, a backdoor path is blocked
by a collider that has not been conditioned on. Figure 15.2b shows an example in which
the effect of X on Y would be identified even if we had not observed any of U, W or V
because W is a collider blocking the single backdoor path from X to Y. In the DAG of
Figure 15.1, the effect is not identified if we have not conditioned on both A and B.
The idea of the backdoor criterion, then, is to identify that set of variables that must
be controlled, or conditioned on, in order to identify the causal effect. There may be more
than one set: in the DAG in Figure 15.3 the effect of X on Y is identified by conditioning
on A or B or both. If all backdoor paths are indeed blocked, then Equations 15.7a and
15.7b hold, conditional on the DAG being true.
Although the backdoor criterion is by far the most widely used method for identifying
causal effects in DAGs, a second method, the front door criterion, has some attractive
properties, not least because it is based on specifying the mechanisms by which a cause
has its effect. Suppose we believe that private schools get better exam results because they
spend more money per pupil. We might add, to the DAG in Figure 15.1, a node, S, rep-
resenting the annual amount spent on a pupil by his or her school. This would lie on the
path from X to Y, as shown in Figure 15.4. If we had not measured one of the confound-
ing variables, A and B, we could not use the backdoor criterion but we could use the front
door criterion. This tells us that the effect of X on Y is equal to the effect of X on S

U W V

P R Q

X Y

Figure 15.2 (a) A collider, (b) Unconditional identification

A B

X Y

Figure 15.3 Conditioning sets

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­279

X A

B Y

Figure 15.4 Front door criterion

multiplied by the effect of S on Y. The effect of X is identified here because there are no
backdoor paths from X to S or from S to Y and the path X to S to Y is the only mechanism
through which X affects Y. These assumptions are unlikely to be true and in a real appli-
cation we would need a more elaborated DAG to take into account the other mechanisms
linking school type and exam performance and to address the likely backdoor paths from
X to these mediating mechanisms and from them to Y.
Although DAGs encode theoretical knowledge, theories in the social sciences are
seldom if ever definitive. However, it is possible to test some of the assumptions that
underlie a DAG. For example, if, in Figure 15.3, our causal estimate differed depending
on whether we conditioned on A, B or both, this would suggest that the DAG was incor-
rect. More direct tests can be made of the independencies, unconditional and conditional,
contained in a DAG. For example, the DAG in Figure 15.2a implies that P ┴ Q and this
can be tested given data on the variables represented by P and Q in the DAG. More
complex DAGs will give rise to more testable independencies.
DAGs are widely used in epidemiology and they are especially valuable for identifica-
tion in cases with time-varying treatments and time-varying confounders. Figure 15.5
shows a simple example. Here, a variable, Z, which is a confounder of treatment, D, at
time t is also a consequence of treatment at t–1 and may also be a collider on the path
from treatment at t–1 to the final outcome, Y, because of the variable V (in a circle to
denote that it is unmeasured). In this case there are several causal quantities we might be
interested in, such as the effect of treatment at t on Y, the effect of treatment at t–1, and
the effect of treatment at both t–1 and t. DAGs allow one to determine if and how causal
effects can be identified in such complex settings. Sharkey & Elwert (2011) use a DAG
similar to Figure 15.5 to analyze a sociological example. They explore the effects, on
cognitive ability, of neighborhood poverty during childhood in consecutive generations,

Z V

D (t )

D (t –1) Y

Figure 15.5 Time varying confounding

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
280  Handbook of sociological science

so, in our Figure 15.5, Dt–1 would be the poverty rate in the neighbourhood where parents
lived at age 15–17 and Dt the poverty rate in the child’s neighborhood. Sociologists have
also used DAGs to point out the weaknesses in various aspects of sociological practice:
Elwert & Winship (2014) illustrate and explain hitherto unrecognized problems with col-
lider variables that render common research practices problematic. Breen (2018) draws
on DAGS to demonstrate some of the pitfalls in studying multi-generational mobility.
Morgan (2012) is an early use of a DAG in sociology to point to problems with the
primary and secondary effects distinction in studies of educational choice.

5. ESTIMATION OF CAUSAL EFFECTS

We have so far been concerned with identification: that is, the question of whether and
to what extent a particular association can be considered causal. Estimation is about
deriving the causal effect from the data so as to obtain an estimate that is, ideally, unbi-
ased or consistent, and also efficient. In recent years, many books and papers have pre-
sented methods for estimating causal effects using observational data (for example,
Morgan & Winship 2015; Angrist & Pischke 2009; Gangl 2010). I discuss them only in
rather general terms.
There are two broad approaches. The first is to make Equation 15.7b true condition-
ally: that is,

Y D = d ⊥ D | X(15.8)

Now we say that the potential outcomes are independent of treatment, D, conditional on
some variables, X. Most simply this is the idea, familiar to most social scientists, of con-
trolling for or conditioning on the factors that cause the bias – parental income in the
example we began with (though, in a real application, many other things as well). In
Pearl’s terms, we must block all the backdoor paths that link X to the observed Y. These
approaches are often termed ‘matching methods’.
There are by now very many such methods, including regression (see Angrist & Pischke
2009, pp. 73–77 for why regression is a matching method). They all follow the same logic:
we seek to match treated with untreated cases on the basis of similarity in their values of
the variables that predict treatment status. DAGs are particularly helpful for matching
methods because they show which variables should be conditioned or matched on and
which (such as colliders or consequences of treatment) should not. Propensity score
matching (Rosenbaum & Rubin 1983) is one of the most popular matching methods. The
propensity score is the estimated probability of receiving treatment, given a set of predic-
tors of treatment. Treated and untreated cases are matched on the basis of closeness in
their propensity scores2 and an average causal effect is calculated as an average of the
observed average differences in the outcome between treated and untreated matched
cases. Matching itself can be done in many ways but stratifying by the propensity score
is popular.

2
Matching on the propensity score is equivalent to matching on all the confounders, X, in the sense that, if
Equation (15.8) holds, it also holds that Y D = d ┴ D|p(X) where p(X) is the true propensity score.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­281

Another use for the propensity score is to employ it to weight the data in such a way
that treatment is made independent of the predictors used to form the propensity score.
Since these predictors are confounders, this makes the potential outcomes independent
of treatment status in the reweighted data. The method is called inverse probability of
treatment weighting. Marginal Structural Models are regressions using the weighted data
(Robins 1999; Hernán et al. 2001). Sociological examples include Sharkey & Elwert
(2011), Wodtke et al. (2011), Lawrence & Breen (2016), and Breen & Ermisch (2017).
Matching methods assume that the variables used for matching are sufficient to make
Equation 15.8 true. Skepticism about matching methods arises from the possibility that,
even if we condition or match on a large number of relevant variables, there may be some
that we have omitted, leading to bias in our estimates. Sensitivity analyses have been
developed as a way to address this objection. The idea here is to ask how much unob-
served confounding there would have to be in order to get rid of an estimated causal
effect – where ‘get rid of’ might mean reducing the estimate to zero or making it no longer
statistically significant. An early example of this technique is Cornfield et al. (1959, dis-
cussed in Ding & VanderWeele 2016) who demonstrated that in order for the estimated
association of smoking with lung cancer to be explained away by an unmeasured con-
founder (such as a genetic predisposition to both smoking and lung cancer) the associa-
tions between smoking and the unobserved confounder and between the unobserved
confounder and lung cancer would have to be implausibly large. This gave considerable
support to the causal interpretation of the link between smoking and lung cancer. In
general, such analyses are used to show how the magnitude of the estimated causal effect
will be attenuated by increasing the degree of hypothesized confounding. One of the best
known of these tests is the Mantel-Haenszel bounds, often used with propensity scores
(Becker & Caliendo 2007). Other sensitivity tests have been used in sociological studies
by, among others, Sharkey & Elwert (2011) and Lawrence & Breen (2016). A feature of
these is that they calibrate the hypothetical unmeasured confounding relative to the meas-
ured confounding removed by matching. Sensitivity analyses will be most helpful when
the amount of confounding required to remove the causal effect is clearly large or
clearly small.

6. NATURAL EXPERIMENTS

The second broad approach to estimating causal effects searches for situations or circum-
stances that approximate an RCT. In rare cases an observational study uses a treatment
that is random. Fallesen & Breen (2016), for example, estimate the effect on divorce of
having a child who suffers from colic, for which there are no known predictors. More
usually, however, social scientists have to look for cases for which the potential outcomes
and the treatment are independent or conditionally independent; in other words, subsets
of the data for which assignment to treatment is random. These approaches exploit what
are sometimes called natural experiments. The underlying idea is that naturally occurring
events, such as weather, lightning strikes, earthquakes, a person’s genetic makeup, the
birth of twins, and so on, or human-made features, such as the location of boundaries,
rules about class sizes, quotas, the use of lotteries or the use of cut-offs for deciding what
happens to people, may cause some people to be treated and others not, in an essentially

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
282  Handbook of sociological science

random fashion. For example, Torche (2011) studied the effect of maternal stress during
pregnancy on children’s birthweight using an earthquake as a natural experiment.
Earthquakes induce stress, so by comparing the birthweight of children born to women
who experienced an earthquake during pregnancy with the birthweight of children of
women who did not, and assuming that the impact of the earthquake on birthweight was
wholly due to the impact of the earthquake on maternal stress, she could deduce the
causal effect of maternal stress on birthweight. Kirk (2009) investigated the possible
causal effect of released prisoners returning to their old neighborhoods on the likelihood
that they would be re-incarcerated. He used hurricane Katrina as a source of naturally
occurring variation in whether released prisoners would return to their old neighbor-
hoods or not. In this case the hurricane acted as an instrumental variable (IV). An instru-
mental variable is one which is related to the outcome Y (re-incarceration in this case)
only through the cause, X (returning to the old neighborhood); thus the IV is related to
X, X is related to Y, but there are no other paths from the IV to Y (or none that cannot
be blocked by introducing other variables, W; in Kirk’s study, no association between
Katrina and re-incarceration other than through neighborhood). The assumption that
this is the case is called an exclusion restriction.
Regression discontinuity designs (Lee & Lemieux 2010) exploit cut-offs. If those who
score above a threshold are admitted to a particular kind of school while those who fall
below are admitted to a school of a different kind, we can assess the effect of school type
on some outcome of interest by comparing those who just exceeded the threshold with
those who just fell short. The intuition here is that these two groups are, to all intents and
purposes, the same, differing only in their treatment status (see Clark & del Bono 2016
for an example).
The use of natural experiments to identify causal effects has proved attractive to
economists and increasingly to other social scientists. The attraction of methods such as
IV and regression discontinuity is that we apparently no longer need to worry about any
uncontrolled confounders. Their shortcoming is that they estimate effects only for the
group in which the potential outcomes are independent of treatment – in other words,
only those whose behavior is changed in the natural experiment. For example, studies
using the raising of the minimum school leaving age as an instrument to estimate
the effect of years of education on an outcome are informative only about the effects at
the bottom of the educational distribution – those students whose years of education
were increased by the reform. Heterogeneous treatment effects are likely the norm in
social science applications and so the average causal effect for people whose behavior is
changed by the instrument (the ‘local average treatment effect’ or LATE (Imbens &
Angrist 1994)) is unlikely to be the same as the average across the population or among
the treated.
There has been a robust debate between proponents and critics of the LATE (for
example, Angrist & Imbens 1999; Deaton 2010; Heckman 1997, 1999; Heckman et al.
2006; Heckman & Urzua 2010; Imbens 2010). Central to these debates has been the
critics’ claim that the LATE estimand (and the LATE interpretation of the conventional
IV estimator when treatment effects are heterogeneous) is often not helpful because at
best it identifies an ‘effect’ that does not correspond to a parameter of a theoretical model
or a quantity of relevance to policy. Morgan & Winship (2015, pp. 304–324) summarize
many of the criticisms of LATE.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­283

The assumption in Torche’s (2011) study that her instrument (the earthquake) affected
the outcome only via the treatment (maternal stress) is an example of an exclusion restric-
tion. Just as matching methods require the assumption that all backdoor paths have been
blocked, so IV requires the assumption that the instrument has no effect on the outcome
except through the cause. This is not testable when the model is just-identified (when we
have as many IVs as causal effects we want to estimate – in the usual case, one IV for one
X) or when the model is over-identified but causal effects are assumed heterogeneous.3

7. CONCLUDING REMARKS

7.1 Heterogeneous Effects

The vast majority of causal relationships studied in the social sciences are likely to be
heterogeneous – that is, people, or other units, respond differently to the same causal
stimulus. Some, but not all, of this heterogeneity is captured in the distinction between
the ATT and ATU. Causal effects might vary according to measured characteristics of
individuals and according to unmeasured characteristics, as in models of self-selection in
economics (the Roy 1951 model is a well-known example). Sociologists have made some
contributions here (Xie et al. 2012; Breen et al. 2015; Zhou & Xie 2019) but the identifica-
tion of heterogeneous effects is a difficult problem.

7.2 External Validity

Heterogeneity is usually thought of in terms of variation across individuals but it can also
be thought of more widely to include variation within individuals over time or over other
contexts. This leads to the question of external validity: that is, establishing whether ‘a
causal relationship holds over variations in persons, setting, treatments and outcomes’
(Shadish et al. 2002, p. 21). Many writers have addressed this topic, though more com-
monly in relation to experimental estimates than to those from observational analyses,
and terminology proliferates, but the underlying issue is that causal estimates will often
depend, to a degree that varies from case to case, on a wider context not specified in the
causal model (Cartwright 2010). We might think of causal claims as lying on a continuum
from causal relationships that are invariant and for which contextual variation plays little
or no role (as in some of the natural sciences), through to social causes that may be very
susceptible to contextual variation. Sociologists seem to have paid little attention to these
issues; this may be unproblematic if they intend their causal estimates to be of only his-
torical (or other context-specific) interest, but it certainly is problematic if they want to
go beyond this, to make policy recommendations, for example, or to make more general
claims. Sociologists are familiar with the idea that the application of their theories is given
by scope conditions: the same is also true of causal estimates. The challenge is to establish
those scope conditions with the same seriousness that is used to try to secure the internal

3
As Angrist & Pischke (2009, p. 166) put it: ‘overidentification tests . . . where multiple instruments are
validated according to whether or not they estimate the same thing, is out the window in a fully heterogeneous
world’.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
284  Handbook of sociological science

validity of causal estimates. Pearl & Bareinboim (2018, p. 3) discuss the use of DAGs for
specifying ‘formal and transparent conditions under which the transport of results across
differing environments or populations is licensed from first principles’.

7.3 Causal Estimates from Observational Data

Experiments solve the problem of identifying causal effects through their design.
Approaches based on natural experiments also attempt to solve the identification
problem in this way,4 but to do so they have to rely on untestable assumptions of a kind
not required by a true RCT. DAGs can be used for identification if they accurately
capture the underlying data generating mechanism, DAGs are, ideally, derived from
substantive subject matter knowledge, which, in turn, means established scientific find-
ings and well-founded theory. In many areas of science, large bodies of such findings exist
and theories are invariant across space and time or their scope conditions are known. This
is less true of the social sciences. As Efron (2020, pp. 644–645) puts it:

Science, historically, has been the search for the underlying truths that govern our universe,
truths that are supposed to be eternal, like Newton’s laws. The eternal part is clear enough in
physics and astronomy, . . ., [b]ut modern science has moved on to fields where truth may be
more contingent, such as economics, sociology, and ecology.

Similarly, analyses using natural experiments and related methods will yield valid causal
estimates if the exclusion restriction holds and we have an understanding of to whom the
causal estimate applies. But this will not always be the case. Just as true experiments are
not always possible in sociology, so valid instrumental variables or other natural sources
of variation may not always be available or, if they are, may yield estimates that are dif-
ficult to interpret.
Causal claims derived from observational data, rather than being cleanly and defini-
tively proved, are most likely to be credible if they are established through evidence
gathered and analyzed in multiple ways: as Jackson & Cox (2013, p. 44) put it ‘progress
typically depends on the synthesis of conclusions from various sources’. Freedman (2003,
p. 19) suggested a number of ways that a multi-phasic approach to establishing causality
from observational data can be pursued:

Causal inferences can be drawn from non-experimental data. However, no mechanical rules can
be laid down for the activity. Since Hume, that is almost a truism. Instead, causal inference
seems to require an enormous investment of skill, intelligence, and hard work. Many convergent
lines of evidence must be developed. Natural variation needs to be identified and exploited. Data
must be collected. Confounders need to be considered. Alternative explanations have to be
exhaustively tested. Before anything else, the right question needs to be framed.5

4
‘[E]mpirical researchers in economics have increasingly looked to the ideal of a randomized experiment
to justify causal inference. . . . researchers seek real experiments where feasible, and useful natural experiments
if real experiments seem (at least for a time) infeasible. . . . These studies can be said to be design based in that
they give the research design underlying any sort of study the attention it would command in a real experi-
ment’. (Angrist & Pischke 2010, p. 12).
5
Freedman (2010) presents informative and entertaining historical examples of how causal claims in epi-
demiology were substantiated in this manner.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
Causal inference with observational data  ­285

The approaches discussed in this paper can play important roles in this process. To
Freedman’s list should also be added replication, to support or cast doubt on existing
causal claims. Unfortunately, replication is widely neglected in sociology (see the chapter
by Auspurg & Brüderl on reproducibility and credibility): this is a barrier to the disci-
pline’s ability to establish well-founded relationships of cause and effect.

REFERENCES

Angrist, J.D. and G.W. Imbens (1999), ‘Comment on James J. Heckman, “Instrumental variables: A
study of implicit behavioural assumptions in making program evaluations”’, Journal of Human Resources,
34, 824–827.
Angrist, J. and J.-S. Pischke (2009), Mostly Harmless Econometrics, Princeton, NJ: Princeton University
Press.
Angrist, J.D. and J.-S. Pischke (2010), ‘The credibility revolution in empirical economics: How better research
design is taking the con out of econometrics’, Journal of Economic Perspectives, 24, 3–30.
Becker, S.O. and M. Caliendo (2007), ‘Sensitivity analysis for average treatment effects’, The Stata Journal,
7, 71–83.
Breen, R. (2018), ‘Some methodological problems in the study of multigenerational mobility’, European
Sociological Review, 34, 603–611.
Breen, R., S. Choi, and A. Holm (2015), ‘Heterogeneous causal effects and sample selection bias’, Sociological
Science, 2, 351–369.
Breen, R. and J. Ermisch (2017), ‘Educational reproduction in Great Britain: A prospective approach’,
European Sociological Review, 33, 590–603.
Cartwright, N. (2010), ‘What are randomized control trials good for?’, Philosophical Studies, 147, 59–70.
Clark, D. and E. del Bono (2016), ‘The long-run effects of attending an elite school: Evidence from the United
Kingdom’, American Economic Journal: Applied Economics, 8, 150–176.
Cornfield, J., W. Haenszel, E.C. Hammond, A.M. Lilienfeld, M.B. Shimkin, and E.L. Wynder (1959),
‘Smoking and lung cancer: Recent evidence and a discussion of some questions’, Journal of the National
Cancer Institute, 22, 173–203.
Deaton, A.S. (2010), ‘Instruments, randomization, and learning about development’, Journal of Economic
Literature, 48, 424–455.
Ding, P. and T.J. VanderWeele (2016), ‘Sensitivity analysis without assumptions’, Epidemiology, 27, 368–377.
Duncan, O.D. (1975), Introduction to Structural Equation Models, New York, NY: Academic Press.
Efron, B. (2020), ‘Prediction, estimation, and attribution’, Journal of the American Statistical Association,
115, 636–655.
Elwert, F. (2013), ‘Graphical causal models’, in S.L. Morgan (ed.), Handbook of Causal Analysis for Social
Research, chapter 13, Dordrecht: Springer, pp. 245–274.
Elwert, F. and C. Winship (2014), ‘Endogenous selection bias: The problem of conditioning on a collider’,
Annual Review of Sociology, 40, 31–50.
Fallesen P. and R. Breen (2016), ‘Temporary life changes and the timing of divorce’, Demography,
53, 1377–1398.
Freedman, D.A. (2003), ‘Structural equation models: A critical review’, Statistics Technical Report, Berkeley,
CA: University of California.
Freedman, D.A. (2010), Statistical Models and Causal Inference: A Dialogue with the Social Sciences,
Cambridge, UK: Cambridge University Press.
Gangl, M. (2010), ‘Causal inference in sociological research’, Annual Review of Sociology, 36, 21–48.
Goldberger, A. (1972), ‘Structural equation models in the social sciences’, Econometrica, 40, 979–1001.
Granger, C.W.J. (1969), ‘Investigating causal relations by econometric models and cross-spectral methods’,
Econometrica, 37, 424–438.
Heckman, J. J. (1997), ‘Instrumental variables: A study of implicit behavioural assumptions in making program
evaluations’, Journal of Human Resources, 32, 441–462.
Heckman, J. J. (1999), ‘Instrumental variables: Response to Angrist and Imbens’, Journal of Human Resources,
34, 828–837.
Heckman, J. J. and S. Urzua (2010), ‘Comparing IV with structural models: What simple IV can and cannot
identify’, Journal of Econometrics, 156, 27–37.
Heckman, J. J., Urzua, S. and E. Vytlacil (2006), ‘Understanding instrumental variable in models with essential
heterogeneity’, The Review of Economics and Statistics, 88, 389–432.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]
286  Handbook of sociological science

Hernán, M.A., B. Brumback, and J.M. Robins (2001), ‘Marginal structural models to estimate the joint causal
effect of nonrandomized treatments’, Journal of the American Statistical Association, 96, 440–448.
Hill, A.B. (1965), ‘The environment and disease: Association or causation?’, Proceedings of the Royal Society
of Medicine, 58, 295–300.
Holland, P.W. (1986), ‘Statistics and causal inference’, Journal of the American Statistical Association,
81, 945–960.
Hudgens, M.G. and M.E. Halloran (2008), ‘Toward causal inference with interference’, Journal of the American
Statistical Association, 103, 832–842.
Imbens, G.W. (2010), ‘Better LATE than nothing: Some comments on Deaton (2009) and Heckman and Urzua
(2009)’, Journal of Economic Literature, 48, 399–423.
Imbens, G.W. and J.D. Angrist (1994), ‘Identification and estimation of local average treatment effects’,
Econometrica, 62, 467–476.
Jackson, M. and D.R. Cox (2013), ‘The principles of experimental design and their application in sociology’,
Annual Review of Sociology, 39, 27–49.
Kirk, D.S. (2009), ‘A natural experiment on residential change and recidivism: Lessons from hurricane
Katrina’, American Sociological Review, 74, 484–505.
Lauritzen, S. (1996), Graphical Models, Oxford, UK: Clarendon Press.
Lawrence, M. and R. Breen (2016), ‘And their children after them? The effect of college on educational repro-
duction’, American Journal of Sociology, 122, 532–572.
Lee, D.S and T. Lemieux (2010), ‘Regression discontinuity designs in economics’, Journal of Economic
Literature, 48, 281–355.
Morgan, S.L. (2012), ‘Models of college entry and the challenges of estimating primary and secondary effects’,
Sociological Methods and Research, 41, 17–56.
Morgan, S. L. and C. Winship (2015), Counterfactuals and Causal Inference: Methods and Principles for Social
Research, 2nd ed., New York, NY: Cambridge University Press.
Neyman, J. (1923), ‘On the application of probability theory to agricultural experiments. Essay on principles.
Section 9’, Statistical Science, 5, 465–480.
Ní Brolcháin, M. and T. Dyson (2007), ‘On causation in demography: Issues and illustrations’, Population and
Development Review, 33, 1–36.
Paul L.A. and N. Hall (2013), Causation, Oxford, UK: Oxford University Press.
Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems, San Mateo, CA: Morgan Kaufmann.
Pearl, J. (2009), Causality: Models, Reasoning, and Inference, 2nd ed. (1st ed. 2000), New York, NY: Cambridge
University Press.
Pearl, J. (2010), ‘An introduction to causal inference’, The International Journal of Biostatistics, 6, 1–62.
Pearl, J. and E. Bareinboim (2018), ‘Transportability across studies: A formal approach’, Technical
Report R-372, 1st version 2011, Los Angeles, CA: Computer Science Department, University of
California.
Pearl, J., M. Glymour and N.P. Jewell (2016), Causal Inference in Statistics, Chichester, UK: Wiley.
Robins, J. M. (1999), ‘Association, causation, and marginal structural models’, Synthese, 121, 151–179.
Rosenbaum, P.R. and D.B. Rubin (1983), ‘The central role of the propensity score in observational studies for
causal effects’, Biometrika, 40, 41–55.
Roy, A. (1951), ‘Some thoughts on the distribution of earnings’, Oxford Economic Papers, 3, 135–146.
Rubin, D.B. (1974), ‘Estimating causal effects of treatments in randomized and non-randomized studies’,
Journal of Educational Psychology, 66, 688–701.
Shadish, W.R., T.D. Cook, and D.T. Campbell (2002), Experimental and Quasi-Experimental Designs for
Generalized Causal Inference, Boston, MA: Houghton Mifflin.
Sharkey, P. and F. Elwert (2011), ‘The legacy of disadvantage: Multigenerational neighborhood effects on
cognitive ability’, American Journal of Sociology, 116, 1934–1981.
Spirtes, P., C. Glymour, and R. Scheines (2000), Causation, Prediction, and Search, 2nd ed., Cambridge, MA:
MIT Press.
Torche, F. (2011), ‘The effect of maternal stress on birth outcomes: Exploiting a natural experiment’, Demography,
48, 1473–1491.
Wodtke, G., D. Harding, and F. Elwert (2011), ‘Neighborhood effects in temporal perspective’, American
Sociological Review, 76, 713–736.
Woodward, J. (2003), Making Things Happen: A Theory of Causal Explanation, Oxford, UK: Oxford University
Press.
Xie, Y., J.E. Brand, and B. Jann (2012), ‘Estimating heterogeneous treatment effects with observational data’,
Sociological Methodology, 42, 314–347.
Zhou, X. and Y. Xie (2019), ‘Heterogeneous treatment effects in the presence of self-selection: A propensity
score perspective’, Sociological Methodology, 50, 350–385.

Richard Breen - 9781789909432


Downloaded from [Link] at 10/30/2023 [Link]PM
via Open Access. This work is licensed under the Creative Commons
Attribution-NonCommercial-No Derivatives 4.0 License
[Link]

Common questions

Powered by AI

Parental income acts as a confounder because it is correlated with the type of school attended (private or public) and affects exam outcomes. If not accounted for, it biases estimates of the causal effect of school type because students with higher parental income tend to attend private schools and achieve better exam scores on average. This results in baseline bias where treated (private school) and non-treated (public school) students would have had different outcomes even if both groups attended the same type of school .

Audit studies can help identify causal relations regarding ascribed characteristics, such as ethnicity, by manipulating perceptions of these traits in experimental setups without altering the traits themselves. For example, by submitting job applications with names suggesting different ethnic backgrounds, researchers can study how perceived ethnicity impacts hiring decisions. This method tests causal effects indirectly by assessing how changes in perception lead to different outcomes, effectively isolating the influence of ethnicity from other confounding factors .

The front door criterion allows for causal inference when not all confounders can be measured, by focusing on the mechanism through which the treatment affects the outcome. It involves identifying an intermediary variable (mediator) on the path from treatment to outcome and demonstrating that there are no backdoor paths from the treatment to this mediator or from the mediator to the outcome. This pathway must be the sole mechanism for causation, allowing researchers to isolate the effect along this identified path even when confounders cannot be adjusted .

The potential outcomes approach, also known as the counterfactual model, defines causal effects as differences between potential outcomes under different treatments (exposures). It posits that each individual has a potential outcome for each possible treatment value. The individual causal effect is the difference in outcomes when treated condition D takes on different values. This approach is pivotal to causal inference as it allows researchers to conceptualize what would have happened under alternative interventions, serving as a foundation for estimating causal effects in both experimental and observational studies .

Endogenous selection bias occurs when conditioning on a collider, a variable influenced by two other variables of interest, creates a spurious association between them, complicating causal inference. To address this, researchers should avoid conditioning on colliders and instead control for other variables that block non-causal paths without introducing bias. Identifying and understanding the structure of causal relationships through methods like DAGs helps recognize and mitigate the effects of such bias, enhancing the reliability of causal estimates .

The backdoor criterion is used to identify a set of variables that must be conditioned on to ascertain the causal effect of one variable on another in a Directed Acyclic Graph (DAG). It involves blocking all backdoor paths (paths that enter the treatment variable from behind) from the treatment variable to the outcome variable, ensuring that the only path left is the direct causal path. By adjusting for these variables, we can control for confounding and attribute changes in the outcome to the treatment effect .

Replication is critical for validating causal claims by confirming that results are consistent across different studies and contexts. It helps in building credibility and increasing confidence in the findings. However, replication faces challenges such as variation in methodological approaches, data access, and contextual differences that may lead to different outcomes. Moreover, there is often a lack of incentives for researchers to replicate existing studies, which contributes to the neglect of replication in sociology, hindering the field's ability to establish robust causal inferences .

SUTVA assumes that an individual's potential outcomes are unaffected by the treatments received by others, necessitating that there is no interference between units. This assumption can be violated in contexts like vaccination, where the treatment effect on one person depends on the number of others treated. Violations of SUTVA complicate causal inference as they suggest that potential outcomes are not independent, leading to biased causal effect estimates unless accounted for with complex models that handle interference .

Randomized Controlled Trials (RCTs) are often impractical for sociological questions due to ethical and logistical challenges, such as randomizing legally or morally significant treatments like age or race. Practical limitations also arise in controlling complex social environments characteristic of sociological studies. Additionally, RCTs can't easily account for the broader social context impacting outcomes, limiting their applicability in sociological explorations which often rely on observational data and alternative methods like DAGs or the potential outcomes approach for causal inference .

Differential treatment effect bias occurs because individuals or groups might respond differently to the same treatment, leading to an over- or underestimation of the effect due to the presence of confounding variables. For example, students from supportive families might benefit more from private education compared to those from less supportive families, displaying a heterogeneity of treatment effects where the benefit of private schooling varies across different contexts and individual backgrounds .

You might also like