Causal Inference in Natural Language Processing:
Estimation, Prediction, Interpretation and Beyond
Amir Feder1,10∗, Katherine A. Keith2 , Emaad Manzoor3 , Reid Pryzant4 ,
Dhanya Sridhar5 , Zach Wood-Doughty6 , Jacob Eisenstein7 , Justin Grimmer8 ,
Roi Reichart1 , Margaret E. Roberts9 , Brandon M. Stewart10 ,
Victor Veitch7,11 , and Diyi Yang12
1
Technion - Israel Institute of Technology, Israel
2
Williams College, USA
3
University of Wisconsin - Madison, USA
4
Microsoft, USA
Downloaded from [Link] by guest on 09 December 2025
5
Columbia University, Canada
6
Northwestern University, USA
7
Google Research, USA
8
Stanford University, USA
9
University of California San Diego, USA
10
Princeton University, USA
11
University of Chicago, USA
12
Georgia Tech, USA
Abstract models. We thus provide a unified overview
of causal inference for the NLP community.1
A fundamental goal of scientific research is 1 Introduction
to learn about causal relationships. However,
despite its critical role in the life and so- The increasing effectiveness of NLP has created
cial sciences, causality has not had the same exciting new opportunities for interdisciplinary
importance in Natural Language Processing
collaborations, bringing NLP techniques to a
(NLP), which has traditionally placed more
emphasis on predictive tasks. This distinc- wide range of external research disciplines (e.g.,
tion is beginning to fade, with an emerging Roberts et al., 2014; Zhang et al., 2020; Ophir
area of interdisciplinary research at the con- et al., 2020) and incorporating new data and tasks
vergence of causal inference and language into mainstream NLP (e.g., Thomas et al., 2006;
processing. Still, research on causality in NLP Pryzant et al., 2018). In such interdisciplinary
remains scattered across domains without uni- collaborations, many of the most important re-
fied definitions, benchmark datasets and clear search questions relate to the inference of causal
articulations of the challenges and opportuni-
relationships. For example, before recommending
ties in the application of causal inference to
the textual domain, with its unique proper- a new drug therapy, clinicians want to know the
ties. In this survey, we consolidate research causal effect of the drug on disease progression.
across academic areas and situate it in the Causal inference involves a question about a coun-
broader NLP landscape. We introduce the sta- terfactual world created by taking an intervention:
tistical challenge of estimating causal effects What would a patient’s disease progression have
with text, encompassing settings where text is been if we had given them the drug? As we ex-
used as an outcome, treatment, or to address
plain below, with observational data, the causal
confounding. In addition, we explore potential
uses of causal inference to improve the ro- effect is not equivalent to the correlation between
bustness, fairness, and interpretability of NLP whether the drug is taken and the observed dis-
ease progression. There is now a vast literature
on techniques for making valid inferences using
∗
All authors equally contributed to this paper. Au-
1
thor names are organized alphabetically in two clusters: An online repository containing existing research on
First students and post-docs and then faculty members. causal inference and language processing is available
The email address of the corresponding (first) author is: here: [Link]
feder@[Link]. -text-papers.
1138
Transactions of the Association for Computational Linguistics, vol. 10, pp. 1138–1158, 2022. [Link] a 00511
Action Editor: Chris Brew. Submission batch: 4/2022; Revision batch: 7/2022; Published 10/2022.
c 2022 Association for Computational Linguistics. Distributed under a CC-BY 4.0 license.
traditional (non-text) datasets (e.g., Morgan and sion of spurious correlation in NLP). One possible
Winship, 2015), but the application of these confounder is the topic of each post: Posts writ-
techniques to natural language data raises new ten by users who have selected the female icon
fundamental challenges. may be about certain topics (e.g., child birth or
Conversely, in many classical NLP applica- menstruation) more often, and those topics may
tions, the main goal is to make accurate predic- not receive as many likes from the audience of
tions: Any statistical correlation is admissible, the broader online platform. As we will see in
regardless of the underlying causal relationship. § 2, due to confounding, estimating a causal effect
However, as NLP systems are increasingly de- requires assumptions.
ployed in challenging and high-stakes scenarios, Example 1 highlights the setting where the text
we cannot rely on the usual assumption that encodes the relevant confounders of a causal ef-
training and test data are identically distributed, fect. The text as a confounder setting is one of
and we may not be satisfied with uninterpretable many causal inferences we can make with text
black-box predictors. For both of these problems, data. The text data can also encode outcomes or
Downloaded from [Link] by guest on 09 December 2025
causality offers a promising path forward: Domain treatments of interest. For example, we may won-
knowledge of the causal structure of the data gen- der about how gender signal affects the sentiment
erating process can suggest inductive biases that of the reply that a post receives (text as outcome),
lead to more robust predictors, and a causal view or about how a writing style affects the ‘‘likes’’ a
of the predictor itself can offer new insights on its post receives (text as treatment).
inner workings.
The core claim of this survey paper is that NLP Helps Causal Inference. Causal inference
deepening the connection between causality and with text data involves several challenges that are
NLP has the potential to advance the goals of both distinct from typical causal inference settings:
social science and NLP researchers. We divide Text is high-dimensional, needs sophisticated
the intersection of causality and NLP into two modeling to measure semantically meaningful fac-
areas: Estimating causal effects from text, and tors like topic, and demands careful thought to
using causal formalisms to make NLP methods formalize the intervention that a causal question
more reliable. We next illustrate this distinction. corresponds to. The developments in NLP around
modeling language, from topic models (Blei
Example 1. An online forum has allowed its users et al., 2003) to contextual embeddings (e.g.,
to indicate their preferred gender in their profiles Devlin et al., 2019), offer promising ways to ex-
with a female or male icon. They notice that tract the information we need from text to estimate
users who label themselves with the female icon causal effects. However, we need new assump-
tend to receive fewer ‘‘likes’’ on their posts. To tions to ensure that the use of NLP methods leads
better evaluate their policy of allowing gender to valid causal inferences. We discuss existing re-
information in profiles, they ask: Does using the search on estimating causal effects from text and
female icon cause a decrease in popularity for emphasize these challenges and opportunities in
a post? § 3.
Ex. 1 addresses the causal effect of signaling
female gender (treatment) on the likes a post Example 2. A medical research center wants to
receives (outcome) (see discussion on signaling build a classifier to detect clinical diagnoses from
at Keith et al., 2020). The counterfactual question the textual narratives of patient medical records.
is: If we could manipulate the gender icon of The records are aggregated across multiple hos-
a post, how many likes would the post have pital sites, which vary both in the frequency of the
received? target clinical condition and the writing style of
The observed correlation between the gender the narratives. When the classifier is applied to
icons and the number of ‘‘likes’’ generally does records from sites that were not in the training
not coincide with the causal effect: It might in- set, its accuracy decreases. Post-hoc analysis in-
stead be a spurious correlation, induced by other dicates that it puts significant weight on seemingly
variables, known as confounders, which are cor- irrelevant features, such as formatting markers.
related with both the treatment and the outcome Like Ex. 1, Ex. 2 also involves a counter-
(see Gururangan et al., 2018, for an early discus- factual question: Does the classifier’s prediction
1139
change if we intervene to change the hospital using causal formalisms to improve robustness
site, while holding the true clinical status fixed? and interpretability in NLP methods (§4). After
We want the classifier to rely on phrases that reading this paper, we envision that the reader
express clinical facts, and not writing style. How- will have a broad understanding of: different types
ever, in the training data, the clinical condition of causal queries and the challenges they pre-
and the writing style are spuriously correlated, sent; the statistical and causal challenges that
due to the site acting as a confounding variable. are unique to working with text data and NLP
For example, a site might be more likely to en- methods; and open problems in estimating effects
counter the target clinical condition due to its from text and applying causality to improve NLP
location or speciality, and that site might also methods.
employ distinctive textual features, such as boil-
erplate text at the beginning of each narrative. In 2 Background
the training set, these features will be predictive
of the label, but they are unlikely to be useful in Both focal problems of this survey (causal effect
Downloaded from [Link] by guest on 09 December 2025
deployment scenarios at new sites. In this ex- estimation and causal formalisms for robust and
ample, the hospital site acts like a confounder: explainable prediction) involve causal inference.
It creates a spurious correlation between some The key ingredient to causal inference is defin-
features of the text and the prediction target. ing counterfactuals based on an intervention of
Example 2 shows how the lack of robustness interest. We will illustrate this idea with the
can make NLP methods less trustworthy. A re- motivating examples from §1.
lated problem is that NLP systems are often black Example 1 involves online forum posts and
boxes, making it hard to understand how human- the number of likes Y that they receive. We
interpretable features of the text lead to the ob- use a binary variable T to indicate whether a
served predictions. In this setting, we want to post uses a ‘‘female icon’’ (T = 1) or a ‘‘male
know if some part of the text (e.g., some se- icon’’ (T = 0). We view the post icon T as the
quence of tokens) causes the output of an NLP ‘‘treatment’’ in this example, but do not assume
method (e.g., classification prediction). that the treatment is randomly assigned (it may
be selected by the posts’ authors). The counterfac-
Causal Models Can Help NLP. To address the tual outcome Y (1) represents the number of
robustness and interpretability challenges posed likes a post would have received had it used a
by NLP methods, we need new criteria to learn female icon. The counterfactual outcome Y (0) is
models that go beyond exploiting correlations. defined analogously.
For example, we want predictors that are invari- The fundamental problem of causal inference
ant to certain changes that we make to text, such (Holland, 1986) is that we can never observe
as changing the format while holding fixed the Y (0) and Y (1) simultaneously for any unit
ground truth label. There is considerable promise of analysis, the smallest unit about which one
in using causality to develop new criteria in ser- wants to make counterfactual inquiries (e.g., a
vice of building robust and interpretable NLP post in Ex. 1). This problem is what makes
methods. In contrast to the well-studied area of causal inference harder than statistical inference
causal inference with text, this area of causality and impossible without identification assumptions
and NLP research is less well understood, though (see § 2.2).
well-motivated by recent empirical successes. In Example 2 involves a trained classifier f (X )
§4, we cover the existing research and review that takes a textual clinical narrative X as input
the challenges and opportunities around using and outputs a diagnosis prediction. The text X
causality to improve NLP. is written based on the physician’s diagnosis Y ,
This position paper follows a small body of and is also influenced by the writing style used
surveys that review the role of text data within at the hospital Z . We want to intervene upon
causal inference (Egami et al., 2018; Keith et al., the hospital Z while holding the label Y fixed.
2020). We take a broader view, separating the in- The counterfactual narrative X (z ) is the text we
tersection of causality and NLP into two distinct would have observed had we set the hospital to
lines of research on estimating causal effects in the value z while holding the diagnosis fixed. The
which text is at least one causal variable (§3) and counterfactual prediction f (X (z )) is the output
1140
the trained classifier would have produced had we Randomized treatment assignment guarantees
given the counterfactual review X (z ) as input. ignorability by design. For example, we can guar-
antee ignorability in Example 1 by flipping a coin
2.1 Causal Estimands to select the icon for each post, and disallowing
An analyst begins by specifying target causal post authors from changing it.
quantities of interest, called causal estimands, Without randomized treatment assignment, ig-
which typically involve counterfactuals. In Ex- norability could be violated by confounders, var-
ample 1, one possible causal estimand is the iables that influence both the treatment status and
average treatment effect (ATE) (Rubin, 1974), potential outcomes. In Example 1, suppose that:
(i) the default post icon is male, (ii) only experi-
ATE = E[Y (1) − Y (0)] (1) enced users change the icon for their posts based
on their gender, (iii) experienced users write posts
where the expectation is over the generative dis- that receive relatively more likes. In this scenario,
tribution of posts. The ATE can be interpreted as the experience of post authors is a confounder:
Downloaded from [Link] by guest on 09 December 2025
the change in the number of likes a post would Posts having female icons are more likely to be
have received, on average, had the post used a written by experienced users, and thus receive
female icon instead of a male icon. more likes. In the presence of confounders, causal
Another possible causal effect of interest is inference is only possible if we assume condi-
the conditional average treatment effect (CATE) tional ignorability,
(Imbens and Rubin, 2015),
T ⊥
⊥ Y (a) | X ∀a ∈ {0, 1} (4)
CATE = E[Y (1) − Y (0) | G] (2)
where X is a set of observed variables, condition-
where G is a predefined subgroup of the pop- ing on which ensures independence between the
ulation. For example, G could be all posts on treatment assignment and the potential outcomes.
political topics. In this case, the CATE can be In other words, we can assume that all confounders
interpreted as the change in the number of likes are observed.
a post on a political topic would have received,
on average, had the post used a male icon instead Positivity requires that the probability of receiv-
of a female icon. CATEs are used to quantify ing treatment is bounded away from 0 and 1 for
the heterogeneity of causal effects in different all values of the confounders X :
population subgroups.
0 < Pr(T = 1 | X = x) < 1, ∀x (5)
2.2 Identification Assumptions for
Intuitively, positivity requires that each unit under
Causal Inference
study has the possibility of being treated and has
We will focus on Example 1 and the ATE in the possibility of being untreated. Randomized
Equation (1) to explain the assumptions needed treatment assignment can also guarantee positivity
for causal inference. Although we focus on the by design.
ATE, related assumptions are needed in some
form for all causal estimands. Variables are the Consistency requires that the outcome observed
same as those defined previously in this section. for each unit under study at treatment level a ∈
{0, 1} is identical to the outcome we would have
Ignorability requires that the treatment assign- observed had that unit been assigned to treatment
ment be statistically independent of the counter- level a,
factual outcomes,
T = a ⇔ Y (a) = Y ∀a ∈ {0, 1} (6)
T ⊥
⊥ Y (a) ∀a ∈ {0, 1} (3)
Consistency ensures that the potential outcomes
Note that this assumption is not equivalent to for each unit under study take on a single value
independence between the treatment assignment at each treatment level. Consistency will be vi-
and the observed outcome Y . For example, if olated if different unobservable ‘‘versions’’ of
ignorability holds, Y ⊥ ⊥ T would additionally the treatment lead to different potential outcomes.
imply that the treatment has no effect. For example, if red and blue female icons had
1141
different effects on the number of likes received,
but icon color was not recorded. Consistency will
also be violated if the treatment assignment of
one unit affects the potential outcomes of another;
a phenomenon called interference (Rosenbaum,
2007). Randomized treatment assignment does not
guarantee consistency by design. For example, if
different icon colors affect the number of likes Figure 1: Causal graphs for the motivating examples.
but are not considered by the model, then a ran- (Left) In Example 1, the post icon (T ) is correlated
domized experiment will not solve the problem. with attributes of the post (X ), and both variables af-
fect the number of likes a post receives (Y ). (Right)
As Hernán (2016) discusses, consistency assump-
In Example 2, the label (Y , i.e., diagnosis) and hospi-
tions are a ‘‘matter of expert agreement’’ and, tal site (Z ) are correlated, and both affect the clini-
while subjective, these assumptions are at least cal narrative (X ). Predictions f (X ) from the trained
made more transparent by causal formalisms.
Downloaded from [Link] by guest on 09 December 2025
classifier depend on X .
These three assumptions enable identifying the
ATE defined in Equation (1), as formalized in
the following identification proof: Figure 1 illustrates the causal DAGs we assume
for Example 1 and Example 2. Given a causal
( i) DAG, causal dependencies between any pair of
E[Y (a)] = EX [E[Y (a) | X ]]
variables can be derived using the d-separation
(ii)
= EX [E[Y (a) | X, T = a]] algorithm (Pearl, 1994). These dependencies can
(iii) then be used to assess whether conditional ignor-
= EX [E[Y | X, T = a]], ∀a ∈ {0, 1} ability holds for a given treatment, outcome, and
set of conditioning variables X . For example, in
where equality (i) is due to iterated expectation, the left DAG in Figure 1, the post icon T is not
equality (ii) follows from conditional ignorabil- independent of the number of likes Y unless we
ity, and equality (iii) follows from consistency condition on X . In the right DAG, the prediction
and positivity, which ensures that the conditional f (X ) is not independent of the hospital Z even
expectation E[Y | X, T = a] is well defined. The after conditioning on the narrative X .
final expression can be computed from observ-
able quantities alone. 3 Estimating Causal Effects with Text
We refer to other background material to discuss
how to identify and estimate causal effects with In §2, we described assumptions for causal in-
these assumptions in hand (Rubin, 2005; Pearl, ference when the treatment, outcome, and con-
2009; Imbens and Rubin, 2015; Egami et al., founders were directly measured. In this section,
2018; Keith et al., 2020). we contribute a novel discussion about how
causal assumptions are complicated when vari-
2.3 Causal Graphical Models ables necessary for a causal analysis are extracted
Finding a set of variables X that ensure con- automatically from text. Addressing these open
ditional ignorability is challenging, and requires challenges will require collaborations between
making several carefully assessed assumptions the NLP and causal estimation communities to
about the causal relationships in the domain un- understand what are the requisite assumptions
der study. Causal directed-acyclic graphs (DAGs) to draw valid causal conclusions. We highlight
(Pearl, 2009) enable formally encoding these as- prior approaches and future challenges in settings
sumptions and deriving the set of variables X after where the text is a confounder, the outcome, or
conditioning on which ignorability is satisfied. the treatment – but this discussion applies broadly
In a causal DAG, an edge X → Y implies that to many text-based causal problems.
X may or may not cause Y . The absence of an To make these challenges clear, we will ex-
edge between X and Y implies that X does not pand upon Example 1 by supposing that a hy-
cause Y . Bi-directed dotted arrows between vari- pothetical online forum wants to understand and
ables indicate that they are correlated potentially reduce harassment on its platform. Many such
through some unobserved variable. questions are causal: Do gendered icons influence
1142
the harassment users receive? Do longer suspen- loss on the treatment and counterfactual outcomes,
sions make users less likely to harass others? How they show that confounding properties could be
can a post be rewritten to avoid offending others? found within text data. Roberts et al. (2020) com-
In each case, using NLP to measure aspects of bine these strategies with the topic model approach
language is integral to any causal analysis. in a text matching framework.
3.1 Causal Effects with Textual Confounders Challenges for Causal Assumptions with Text.
Returning to Example 1, suppose the platform In settings without randomized treatments, NLP
worries that users with female icons are more methods that adjust for text confounding require a
likely to receive harassment from other users. particularly strong statement of conditional ignor-
Such a finding might significantly influence plans ability (Equation 4): All aspects of confounding
for a new moderation strategy (Jhaver et al., 2018; must be measured by the model. Because we can-
Rubin et al., 2020). We may be unable or un- not test this assumption, we should seek domain
expertise to justify it or understand the theoretical
Downloaded from [Link] by guest on 09 December 2025
willing to randomize our treatment (the gender
signal) of the author’s icon), so the causal effect and empirical consequences if it is violated.
of gender signal on harassment received might be When the text is a confounder, its high-
confounded by other variables. The topic of the dimensionality makes positivity unlikely to hold
post may be an important confounder: some sub- (D’Amour et al., 2020). Even for approaches that
ject areas may be discussed by a larger proportion extract a low-dimensional representation of the
of users with female icons, and more controversial confounder from text, positivity is a concern.
subjects may attract more harassment. The text of For example, in Example 1, posts might con-
the post provides evidence of the topic and thus tain phrases that near-perfectly encode the chosen
acts as a confounder (Roberts et al., 2020). gender-icon of the author. If the learned represen-
tation captures this information alongside other
Previous Approaches. The main idea in this confounding aspects, it would be nearly impos-
setting is to use NLP methods to extract con- sible to imagine changing the gender icon while
founding aspects from text and then adjust for holding the gendered text fixed.
those aspects in an estimation approach such as
propensity score matching. However, how and 3.2 Causal Effects on Textual Outcomes
when these methods violate causal assumptions Suppose platform moderators can choose to sus-
are still open questions. Keith et al. (2020) pro- pend users who violate community guidelines for
vide a recent overview of several such methods either one day or one week, and we want to know
and many potential threats to inference. which option has the greatest effect at decreasing
One set of methods apply unsupervised di- the toxicity of the suspended user. If we could col-
mensionality reduction methods that reduce high- lect them for each user’s post, ground-truth human
dimensional text data to a low-dimensional set of annotations of toxicity would be our ideal outcome
variables. Such methods include latent variable variable. We would then use those outcomes to
models such as topic models, embedding meth- calculate the ATE, following the discussion in
ods, and auto-encoders. Roberts et al. (2020) and § 2. Our analysis of suspensions is complicated
Sridhar and Getoor (2019) have applied topic if, instead of ground-truth labels for our toxicity
models to extract confounding patterns from text outcome, we rely on NLP methods to extract the
data, and performed an adjustment for these outcome from the text. A core challenge is to distill
inferred variables. Mozer et al. (2020) match the high-dimensional text into a low-dimensional
texts using distance metrics on the bag-of-words measure of toxicity.
representation.
A second set of methods adjust for confounders Challenges for Causal Assumptions with Text.
from text with supervised NLP methods. Recently, We saw in § 2 that randomizing the treat-
Veitch et al. (2020) adapted pre-trained language ment assignment can ensure ignorability and pos-
models and supervised topic models with multi- itivity; but even with randomization, we require
ple classification heads for binary treatment and more careful assessment to satisfy consistency.
counterfactual outcomes. By learning a ‘‘suffi- Suppose we randomly assign suspension lengths
cient’’ embedding that obtained low classification to users and then once those users return and
1143
continue to post, we use a clustering method to tervened on during an experiment or extracted
discover toxic and non-toxic groupings among the from text for observational studies (Pryzant et al.,
formerly suspended users. To estimate the causal 2021; Wood-Doughty et al., 2018). For example,
effect of suspension length, we rely on the trained Gerber et al. (2008) studied the effect of appeal-
clustering model to infer our outcome variable. ing to civic duty on voter turnout. In this setting,
Assuming that the suspension policy does in truth factors are latent properties of the text for which
have a causal effect on posting behavior, then be- we need a measurement model.
cause our clustering model depends on all posts in
its training data, it also depends on the treatment Challenges for Causal Assumptions with Text.
assignments that influenced each post. Thus, when Ensuring positivity and consistency remains a
we use the model to infer outcomes, each user’s challenge in this setting, but assessing conditional
outcome depends on all other users’ treatments. ignorability is particularly tricky. Suppose the
This violates the assumption of consistency—that treatment is the use of second-person pronouns,
potential outcomes do not depend on the treatment
Downloaded from [Link] by guest on 09 December 2025
but the relationship between this treatment and the
status of other units. This undermines the theoret- outcome is confounded by other properties of the
ical basis for our causal estimate, and, in practice, text (e.g., politeness). For conditional ignorability
implies that different randomized treatment as- to hold, we would need to extract from the text and
signments could lead to different treatment ef- condition on all such confounders, which requires
fect estimates. assuming that we can disentangle the treatment
These issues can be addressed by developing from many other aspects of the text (Pryzant
the measure on only a sample of the data and then et al., 2021). Such concerns could be avoided
estimating the effect on a separate, held-out data by randomly assigning texts to readers (Fong and
sample (Egami et al., 2018). Grimmer, 2016, 2021), but that may be impracti-
cal. Even if we could randomize the assignment
3.3 Causal Effects with Textual Treatments of texts, we still have to assume that there is no
As a third example, suppose we want to understand confounding due to latent properties of the reader,
what makes a post offensive. This might allow such as their political ideology or their tastes.
the platform to provide automated suggestions
that encourage users to rephrase their post. Here, 3.4 Future Work
we are interested in the causal effect of the text
itself on whether a reader reports it as offensive. We next highlight key challenges and oppor-
Theoretically, the counterfactual Y (t) is defined tunities for NLP researchers to facilitate causal
for any t, but could be limited to an exploration inference from text.
of specific aspects of the text. For example, do
second-person pronouns make a post more likely Heterogeneous Effects. Texts are read and in-
to be reported? terpreted differently by different people; NLP
researchers have studied this problem in the con-
Previous Approaches. One approach to study- text of heterogeneous perceptions of annotators
ing the effects of text involves treatment discovery: (Paun et al., 2018; Pavlick and Kwiatkowski,
producing interpretable features of the text—such 2019). In the field of causal inference, the idea that
as latent topics or lexical features like n-grams different subgroups experience different causal ef-
(Pryzant et al., 2018)—that can be causally linked fects is formalized by a heterogeneous treatment
to outcomes. For example, Fong and Grimmer effect, and is studied using conditional average
(2016) discovered features of candidate biogra- treatment effects (Equation (2)) for different sub-
phies that drove voter evaluations, Pryzant et al. groups. It may also be of interest to discover
(2017) discovered writing styles in marketing subgroups where the treatment has a strong effect
materials that are influential in increasing sales on an outcome of interest. For example, we may
figures, and Zhang et al. (2020) discovered con- want to identify text features that characterize
versational tendencies that lead to positive mental when a treatment such as a content moderation
health counseling sessions. policy is effective. Wager and Athey (2018) pro-
Another approach is to estimate the causal posed a flexible approach to estimating heteroge-
effects of specific latent properties that are in- neous effects based on random forests. However,
1144
such approaches, which are developed with tabu- make sense to randomly assign a ‘reasonable’
lar data in mind, may be computationally infea- dose—one that is large enough to plausibly be
sible for high-dimensional text data. There is an effective but not so large as to be toxic. But when
opportunity to extend NLP methods to discover the causal question involves natural language, do-
text features that capture subgroups where the main knowledge might not provide a small set
causal effect varies. of ‘reasonable’ texts. Instead, we might turn to
controllable text generation to sample texts that
Representation Learning. Causal inference fulfill some requirements (Kiddon et al., 2016).
from text requires extracting low-dimensional Such methods have a long history in NLP; for
features from text. Depending on the setting, example, a conversational agent should be able to
the low-dimensional features are tasked with ex- answer a user’s question while being perceived
tracting confounding information, outcomes, or as polite (Niu and Bansal, 2018). In our text as
treatments. The need to measure latent aspects treatment example where we want to understand
from text connects to the field of text representa- which textual aspects make a text offensive, such
Downloaded from [Link] by guest on 09 December 2025
tion learning (Le and Mikolov, 2014; Liu et al., methods could enable an experiment allowing us
2015; Liu and Lapata, 2018). The usual objective to randomly assign texts that differ on only a spe-
of text representation learning approaches is to cific latent aspect. For example, we could change
model language. Adapting representation learning the style of a text while holding its content fixed
for causal inference offers open challenges; for (Logeswaran et al., 2018). Recent work has ex-
example, we might augment the objective func- plored text generation from a causal perspective
tion to ensure that (i) positivity is satisfied, (ii) (Hu and Li, 2021), but future work could develop
confounding information is not discarded, or (iii) these methods for causal estimation.
noisily measured outcomes or treatments enable
accurate causal effect estimates. 4 Robust and Explainable Predictions
from Causality
Benchmarks. Benchmark datasets have pro-
pelled machine learning forward by creating Thus far we have focused on using NLP tools
shared metrics by which predictive models can for estimating causal effects in the presence of
be evaluated. There are currently no real-world text data. In this section, we consider using causal
text-based causal estimation benchmarks due to reasoning to help solve traditional NLP tasks such
the fundamental problem of causal inference that as understanding, manipulating, and generating
we can never obtain counterfactuals on an individ- natural language.
ual and observe the true causal effects. However, At a first glance, NLP may appear to have lit-
as Keith et al. (2020) discuss, there has been tle need for causal ideas. The field has achieved
some progress in evaluating text-based estimation remarkable progress from the use of increasingly
methods on semi-synthetic datasets in which real high-capacity neural architectures to extract cor-
covariates are used to generate treatment and out- relations from large-scale datasets (Peters et al.,
comes (e.g., Veitch et al., 2020; Roberts et al., 2018; Devlin et al., 2019; Liu et al., 2019). These
2020; Pryzant et al., 2021; Feder et al., 2021; architectures make no distinction between causes,
Weld et al., 2022). Wood-Doughty et al. (2021) effects, and confounders, and they make no at-
employed large-scale language models for con- tempt to identify causal relationships: A feature
trolled synthetic generation of text on which may be a powerful predictor even if it has no direct
causal methods can be evaluated. An open prob- causal relationship with the desired output.
lem is the degree to which methods that perform Yet correlational predictive models can be un-
well on synthetic data generalize to real-world trustworthy (Jacovi et al., 2021): They may latch
data. onto spurious correlations (‘‘shortcuts’’), leading
to errors in out-of-distribution (OOD) settings
Controllable Text Generation. When running (e.g., McCoy et al., 2019); they may exhibit un-
a randomized experiment or generating synthetic acceptable performance differences across groups
data, researchers make decisions using the em- of users (e.g., Zhao et al., 2017); and their be-
pirical distribution of the data. If we are study- havior may be too inscrutable to incorporate into
ing whether a drug prevents headaches, it would high-stakes decisions (Guidotti et al., 2018). Each
1145
of these shortcomings can potentially be addressed inference, where negation words are correlated
by the causal perspective: Knowledge of the causal with semantic contradictions in crowdsourced
relationship between observations and labels can training data but not in text that is produced under
be used to formalize spurious correlations and mit- more natural conditions (Gururangan et al., 2018;
igate their impact (§ 4.1); causality also provides Poliak et al., 2018).
a language for specifying and reasoning about Such observations have led to several proposals
fairness conditions (§ 4.2); and the task of ex- for novel evaluation methodologies (Naik et al.,
plaining predictions may be naturally formulated 2018; Ribeiro et al., 2020; Gardner et al., 2020)
in terms of counterfactuals (§ 4.3). The applica- to ensure that predictors are not ‘‘right for the
tion of causality to these problems is still an active wrong reasons’’. These evaluations generally take
area of research, which we attempt to facilitate two forms: invariance tests, which assess whether
by highlighting previously implicit connections predictions are affected by perturbations that are
among a diverse body of prior work. causally unrelated to the label, and sensitivity
tests, which apply perturbations that should in
Downloaded from [Link] by guest on 09 December 2025
4.1 Learning Robust Predictors some sense be the minimal change necessary to
The NLP field has grown increasingly concerned flip the true label. Both types of test can be moti-
with spurious correlations (Gururangan et al., vated by a causal perspective. The purpose of an
2018; McCoy et al., 2019, inter alia). From a invariance test is to determine whether the predic-
causal perspective, spurious correlations arise tor behaves differently on counterfactual inputs
when two conditions are met. First, there must X (Z = z̃ ), where Z indicates a property that
be some factor(s) Z that are informative (in the an analyst believes should be causally irrelevant
training data) about both the features X and label to Y . A model whose predictions are invariant
Y . Second, Y and Z must be dependent in the across such counterfactuals can in some cases be
training data in a way that is not guaranteed to expected to perform better on test distributions
hold in general. A predictor f : X → Y will learn with a different relationship between Y and Z
to use parts of X that carry information about (Veitch et al., 2021). Similarly, sensitivity tests
Z (because Z is informative about Y ), which can be viewed as evaluations of counterfactuals
can lead to errors if the relationship between Y X (Y = ỹ ), in which the label Y is changed but
and Z changes when the predictor is deployed.2 all other causal influences on X are held constant
This issue is illustrated by Example 2, where (Kaushik et al., 2020). Features that are spuriously
the task is to predict a medical condition from the correlated with Y will be identical in the factual
text of patient records. The training set is drawn X and the counterfactual X (Y = ỹ ). A predictor
from multiple hospitals which vary both in the that relies solely on such spurious correlations
frequency of the target clinical condition (Y ) will be unable to correctly label both factual and
and the writing style of the narratives (represented counterfactual instances.
in X ). A predictor trained on such data will use A number of approaches have been proposed
textual features that carry information about the for learning predictors that pass tests of sensi-
hospital (Z ), even when they are useless at pre- tivity and invariance. Many of these approaches
dicting the diagnosis within any individual hospi- are either explicitly or implicitly motivated by a
tal. Spurious correlations also appear as artifacts causal perspective. They can be viewed as ways
in benchmarks for tasks such as natural language to incorporate knowledge of the causal structure
of the data into the learning objective.
2
From the perspective of earlier work on domain adap-
tation (Søgaard, 2013), spurious correlations can be viewed 4.1.1 Data Augmentation
as a special case of a more general phenomenon in which To learn predictors that pass tests of invariance
feature-label relationships change across domains. For exam-
ple, the lexical feature boring might have a stronger negative
and sensitivity, a popular and straightforward ap-
weight in reviews about books than about kitchen appliances, proach is data augmentation: Elicit or construct
but this is not a spurious correlation because there is a di- counterfactual instances, and incorporate them
rect causal relationship between this feature and the label. into the training data. When the counterfactu-
Spurious correlations are a particularly important form of
distributional shift in practice because they can lead to in-
als involve perturbations to confounding factors
consistent predictions on pairs of examples that humans Z , it can help to add a term to the learning
view as identical. objective to explicitly penalize disagreements in
1146
the predictions for counterfactual pairs, for exam- the locale. If such decisions are left to the anno-
ple, |f (X (Z = z )) − f (X (Z = z̃ ))|, when f is tators’ intuitions, it is difficult to ascertain what
the prediction function (Garg et al., 2019). When robustness guarantees we can get from counter-
perturbations are applied to the label Y , training factual data augmentation. Finally, there is the
on label counterfactuals X (Y = ỹ ) can improve possibility that counterfactuals will introduce new
OOD generalization and reduce noise sensitivity spurious correlations. For example, when asked to
(Kaushik et al., 2019, 2020; Jha et al., 2020).3 rewrite NLI examples without using negation, an-
Counterfactual examples can be generated notators (or automated text rewriters) may simply
in several ways: (1) manual post-editing (e.g., find another shortcut, introducing a new spuri-
Kaushik et al., 2019; Gardner et al., 2020), (2) ous correlation. Keyword substitution approaches
heuristic replacement of keywords (e.g., Shekhar may also introduce new spurious correlations if the
et al., 2017; Garg et al., 2019; Feder et al., 2021), keyword lexicons are incomplete (Joshi and He,
and (3) automated text rewriting (e.g., Zmigrod 2021). Automated methods for conditional text
et al., 2019; Riley et al., 2020; Wu et al., 2021; rewriting are generally not based on a formal coun-
Downloaded from [Link] by guest on 09 December 2025
Calderon et al., 2022). Manual editing is typi- terfactual analysis of the data generating process
cally fluent and accurate but relatively expensive. (cf. Pearl, 2009), which would require model-
Keyword-based approaches are appropriate in ing the relationships between various causes and
some cases—for example, when counterfactuals consequences of the text. The resulting counterfac-
can be obtained by making local substitutions of tual instances may therefore fail to fully account
closed-class words like pronouns—but they can- for spurious correlations and may introduce new
not guarantee fluency or coverage of all labels spurious correlations.
and covariates of interest (Antoniak and Mimno,
2021), and are difficult to generalize across lan- 4.1.2 Distributional Criteria
guages. Fully generative approaches could po- An alternative to data augmentation is to design
tentially combine the fluency and coverage of new learning algorithms that operate directly on
manual editing with the ease of lexical heuristics. the observed data. In the case of invariance tests,
Counterfactual examples are a powerful re- one strategy is to derive distributional properties
source because they directly address the missing of invariant predictors, and then ensure that these
data issues that are inherent to causal inference, properties are satisfied by the trained model.
as described in § 2. However, in many cases it Given observations of the potential confounder
is difficult for even a fluent human to produce at training time, the counterfactually invariant pre-
meaningful counterfactuals: Imagine the task of dictor will satisfy an independence criterion that
converting a book review into a restaurant re- can be derived from the causal structure of the
view while somehow leaving ‘‘everything else’’ data generating process (Veitch et al., 2021). Re-
constant (as in Calderon et al., 2022). A related turning to Example 2, the desideratum is that the
concern is lack of precision in specifying the de- predicted diagnosis f (X ) should not be affected
sired impact of the counterfactual. To revise a by the aspects of the writing style that are associ-
text from, say, U.S. to U.K. English, it is unam- ated with the hospital Z . This can be formalized
biguous that ‘‘colors’’ should be replaced with as counterfactual invariance to Z : The predic-
‘‘colours’’, but should terms like ‘‘congress’’ be tor f should satisfy f (X (z )) = f (X (z )) for all
replaced with analogous concepts like ‘‘parlia- z, z . In this case, both Z and Y are causes of
ment’’? This depends on whether we view the the text features X .4 Using this observation, it
semantics of the text as a causal descendent of can be shown that any counterfactually invariant
predictor will satisfy f (X ) ⊥⊥ Z | Y , that is, the
prediction f (X ) is independent of the covariate
3
More broadly, there is a long history of methods that Z conditioned on the true label Y . In other cases,
elicit or construct new examples and labels with the goal of such as content moderation, the label is an effect
improving generalization, e.g., self-training (McClosky et al., of the text, rather than a cause—for a detailed
2006; Reichart and Rappoport, 2007), co-training (Steedman
et al., 2003), and adversarial perturbations (Ebrahimi et al.,
4
2018). The connection of such methods to causal issues such This is sometimes called the anticausal setting, because
as spurious correlations has not been explored until recently the predictor f : X → Ŷ must reverse the causal direction
(Chen et al., 2020; Jin et al., 2021). of the data generating process (Schölkopf et al., 2012).
1147
discussion of this distinction, see Jin et al. (2021). 4.2 Fairness and Bias
In such cases, it can be shown that a counterfac- NLP systems inherit and sometimes amplify un-
tually invariant predictor will satisfy f (X ) ⊥ ⊥Z desirable biases encoded in text training data
(without conditioning on Y ). In this fashion, (Barocas et al., 2019; Blodgett et al., 2020).
knowledge of the true causal structure of the Causality can provide a language for specifying
problem can be used to derive observed-data sig- desired fairness conditions across demographic
natures of the counterfactual invariance. Such attributes like race and gender. Indeed, fairness
signatures can be incorporated as regulariza- and bias in predictive models have close connec-
tion terms in the training objective (e.g., using tions to causality: Hardt et al. (2016) argue that a
kernel-based measures of statistical dependence). causal analysis is required to determine the fair-
These criteria do not guarantee counterfactual ness properties of an observed distribution of data
invariance—the implication works in the other and predictions; Kilbertus et al. (2017) show that
direction—but in practice they increase coun- fairness metrics can be motivated by causal inter-
terfactual invariance and improve performance
Downloaded from [Link] by guest on 09 December 2025
pretations of the data generating process; Kusner
in out-of-distribution settings without requiring et al. (2017) study ‘‘counterfactually fair’’ predic-
counterfactual examples. tors where, for each individual, predictions are the
An alternative set of distributional criteria can same for that individual and for a counterfactual
be derived by viewing the training data as aris- version of them created by changing a protected
ing from a finite set of environments, in which attribute. However, there are important questions
each environment is endowed a unique distri- about the legitimacy of treating attributes like
bution over causes, but the causal relationship race as variables subject to intervention (e.g.,
between X and Y is invariant across environ- Kohler-Hausmann, 2018; Hanna et al., 2020), and
ments. This view motivates a set of environmental Kilbertus et al. (2017) propose to focus instead on
invariance criteria: The predictor should include invariance to observable proxies such as names.
a representation function that is invariant across
environments (Muandet et al., 2013; Peters et al., Fairness with Text. The fundamental connec-
2016); we should induce a representation such tions between causality and unfair bias have been
that the same predictor is optimal in every en- explored mainly in the context of relatively
vironment (Arjovsky et al., 2019); the predictor low-dimensional tabular data rather than text.
should be equally well calibrated across envi- However, there are several applications of the
ronments (Wald et al., 2021). Multi-environment counterfactual data augmentation strategies from
training is conceptually similar to domain adap- § 4.1.1 in this setting: For example, Garg et al.
tation (Ben-David et al., 2010), but here the goal (2019) construct counterfactuals by swapping lists
is not to learn a predictor for any specific target of ‘‘identity terms’’, with the goal of reducing
domain, but rather to learn a predictor that works bias in text classification, and Zhao et al. (2018)
well across a set of causally compatible domains, swap gender markers such as pronouns and names
known as domain generalization (Ghifary et al., for coreference resolution. Counterfactual data
2015; Gulrajani and Lopez-Paz, 2020). However, augmentation has also been applied to reduce
it may be necessary to observe data from a very bias in pre-trained models (e.g., Huang et al.,
large number of environments to disentangle the 2019; Maudslay et al., 2019) but the extent
true causal structure (Rosenfeld et al., 2021). to which biases in pre-trained models propa-
Both general approaches require richer training gate to downstream applications remains unclear
data than in typical supervised learning: Either ex- (Goldfarb-Tarrant et al., 2021). Fairness appli-
plicit labels Z for the factors to disentangle from cations of the distributional criteria discussed in
the predictions or access to data gathered from § 4.1.2 are relatively rare, but Adragna et al.
multiple labeled environments. Obtaining such (2020) show that invariant risk minimization
data may be rather challenging, even compared (Arjovsky et al., 2019) can reduce the use of spu-
to creating counterfactual instances. Furthermore, rious correlations with race for toxicity detection.
the distributional approaches have thus far been
applied only to classification problems, while 4.3 Causal Model Interpretations
data augmentation can easily be applied to struc- Explanations of model predictions can be cru-
tured outputs such as machine translation. cial to help diagnose errors and establish trust
1148
with decision makers (Guidotti et al., 2018; Jacovi with an adversarial component designed to ‘‘for-
and Goldberg, 2020). One prominent approach get’’ the concept of choice, while controlling for
to generate explanations is to exploit network ar- confounding concepts. Ravfogel et al. (2020)
tifacts, such as attention weights (Bahdanau et al., offer a method for removing information from
2014), which are computed on the path to gen- representations by iteratively training linear clas-
erating a prediction (e.g., Xu et al., 2015; Wang sifiers and projecting the representations on their
et al., 2016). Alternatively, there have been at- null-spaces, but do not account for confound-
tempts to estimate simpler and more interpretable ing concepts.
models by using perturbations of test examples or A complementary approach is to generate
their hidden representations (Ribeiro et al., 2016; counterfactuals with minimal changes that ob-
Lundberg and Lee, 2017; Kim et al., 2018). How- tain a different model prediction (Wachter et al.,
ever, both attention and perturbation-based meth- 2017; Mothilal et al., 2020). Such examples allow
ods have important limitations. Attention-based us to observe the changes required to change a
explanations can be misleading (Jain and Wallace,
Downloaded from [Link] by guest on 09 December 2025
model’s prediction. Causal modeling can facili-
2019), and are generally possible only for indi- tate this by making it possible to reason about the
vidual tokens; they cannot explain predictions causal relationships between observed features,
in terms of more abstract linguistic concepts. thus identifying minimal actions which might
Existing perturbation-based methods often gen- have downstream effects on several features, ul-
erate implausible counterfactuals and also do timately resulting in a new prediction (Karimi
not allow for estimating the effect of sentence- et al., 2021).
level concepts. Finally, a causal perspective on attention-based
Viewed as a causal inference problem, explana- explanations is to view internal nodes as mediators
tion can be performed by comparing predictions of the causal effect from the input to the output
for each example and its generated counterfac- (Vig et al., 2020; Finlayson et al., 2021). By
tual. While it is usually not possible to observe querying models using manually crafted counter-
counterfactual predictions, here the causal system factuals, we can observe how information flows,
is the predictor itself. In those cases it may be and identify where in the model it is encoded.
possible to compute counterfactuals, for example,
by manipulating the activations inside the network 4.4 Future Work
(Vig et al., 2020; Geiger et al., 2021). Treatment
effects can then be computed by comparing the In general we cannot expect to have full causal
predictions under the factual and counterfactual models of text, so a critical question for future
conditions. Such a controlled setting is similar to work is how to safely use partial causal mod-
the randomized experiment described in § 2, where els, which omit some causal variables and do not
it is possible to compute the difference between completely specify the causal relationships within
an actual text and what the text would have been the text itself. A particular concern is unobserved
had a specific concept not existed in it. Indeed, confounding between the variables that are ex-
in cases where counterfactual texts can be gener- plicitly specified in the causal model. Unobserved
ated, we can often estimate causal effects on text- confounding is challenging for causal inference
based models (Ribeiro et al., 2020; Gardner et al., in general, but it is likely to be ubiquitous in
2020; Rosenberg et al., 2021; Ross et al., 2021; language applications, in which the text arises
Meng et al., 2022; Zhang et al., 2022). However, from the author’s intention to express a structured
generating such counterfactuals is challenging arrangement of semantic concepts, and the label
(see § 4.1.1). corresponds to a query, either directly on the in-
To overcome the counterfactual generation tended semantics or on those understood by the
problem, another class of approaches proposes reader.
to manipulate the representation of the text and Partial causal models of text can be ‘‘top
not the text itself (Feder et al., 2021; Elazar et al., down’’, in the sense of representing causal rel-
2021; Ravfogel et al., 2021). Feder et al. (2021) ationships between the text and high-level doc-
compute the counterfactual representation by pre- ument metadata such as authorship, or ‘‘bottom
training an additional instance of the language up’’, in the sense of representing local linguistic
representation model employed by the classifier, invariance properties, such as the intuition that a
1149
multiword expression like ‘San Francisco’ has a A particular advantage of causal methodology
single cause. The methods described here are al- is that it forces practitioners to explicate their
most exclusively based on top-down models, but assumptions. To improve scientific standards, we
approaches such as perturbing entity spans (e.g., believe that the NLP community should be clearer
Longpre et al., 2021) can be justified by implicit about these assumptions and analyze their data
bottom-up causal models. Making these connec- using causal reasoning. This could lead to a better
tions more explicit may yield new insights. Future understanding of language and the models we
work may also explore hybrid models that con- build to process it.
nect high-level document metadata with medium-
scale spans of text such as sentences or paragraphs.
A related issue is when the true variable of References
interest is unobserved but we do receive some
noisy or coarsened proxy variable. For example, Robert Adragna, Elliot Creager, David Madras,
Downloaded from [Link] by guest on 09 December 2025
we may wish to enforce invariance to dialect but and Richard Zemel. 2020. Fairness and ro-
have access only to geographical information, with bustness in invariant learning: A case study
which dialect is only approximately correlated. in toxicity classification. arXiv preprint arXiv:
This is an emerging area within the statistical 2011.06485.
literature (Tchetgen et al., 2020), and despite the
clear applicability to NLP, we are aware of no Maria Antoniak and David Mimno. 2021. Bad
relevant prior work. seeds: Evaluating lexical methods for bias mea-
surement. In Proceedings of the 59th Annual
Finally, applications of causality to NLP have
Meeting of the Association for Computational
focused primarily on classification, so it is natural
Linguistics and the 11th International Joint
to ask how these approaches might be extended to
Conference on Natural Language Processing
structured output prediction. This is particularly
(Volume 1: Long Papers), pages 1889–1904,
challenging for distributional criteria like f (X ) ⊥
⊥
Online. Association for Computational Lin-
Z | Y , because f (X ) and Y may now represent
guistics. [Link]
sequences of vectors or tokens. In such cases it
/[Link]-long.148
may be preferable to focus on invariance criteria
that apply to the loss distribution or calibration. Martin Arjovsky, Léon Bottou, Ishaan Gulrajani,
and David Lopez-Paz. 2019. Invariant risk min-
5 Conclusion imization. arXiv preprint arXiv:1907.02893.
Our main goal in this survey was to collect the Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
various touchpoints of causality and NLP into Bengio. 2014. Neural machine translation by
one space, which we then subdivided into the jointly learning to align and translate. arXiv
problems of estimating the magnitude of causal preprint arXiv:1409.0473.
effects. and more traditional NLP tasks. These
branches of scientific inquiry share common goals, Solon Barocas, Moritz Hardt, and Arvind
intuitions, and are beginning to show methodo- Narayanan. 2019. Fairness and Machine Learn-
logical synergies. In § 3 we showed how recent ing. [Link]. [Link]
advances in NLP modeling can help researchers .org.
make causal conclusions with text data and the Shai Ben-David, John Blitzer, Koby Crammer,
challenges of this process. In § 4, we showed Alex Kulesza, Fernando Pereira, and Jennifer
how ideas from causal inference can be used Wortman Vaughan. 2010. A theory of learn-
to make NLP models more robust, trustworthy, ing from different domains. Machine Learning,
and transparent. We also gather approaches that 79(1):151–175. [Link]
are implicitly causal and explicitly show their /s10994-009-5152-4
relationship to causal inference. Both of these
spaces, especially the use of causal ideas for robust David M. Blei, Andrew Y. Ng, and Michael I.
and explainable predictions, remain nascent with Jordan. 2003. Latent Dirichlet allocation.
a large number of open challenges which we have Journal of machine Learning research,
detailed throughout this paper. 3(Jan):993–1022.
1150
Su Lin Blodgett, Solon Barocas, Hal Daumé III, Behavioral explanation with amnesic counter-
and Hanna Wallach. 2020. Language (tech- factuals. Transactions of the Association
nology) is power: A critical survey of ‘‘bias’’ for Computational Linguistics, 9:160–175.
in NLP. In Proceedings of the 58th Annual [Link] a 00359
Meeting of the Association for Computational
Amir Feder, Nadav Oved, Uri Shalit, and Roi
Linguistics, pages 5454–5476. [Link]
Reichart. 2021. Causalm: Causal model expla-
.org/10.18653/v1/[Link]-main.485
nation through counterfactual language mod-
Nitay Calderon, Eyal Ben-David, Amir Feder, els. Computational Linguistics, 47(2):333–386.
and Roi Reichart. 2022. Docogen: Domain [Link] a 00404
counterfactual generation for low resource do- Matthew Finlayson, Aaron Mueller, Sebastian
main adaptation. In Proceedings of the 60th Gehrmann, Stuart Shieber, Tal Linzen, and
Annual Meeting of the Association of Com- Yonatan Belinkov. 2021. Causal analysis of syn-
putational Linguistics (ACL). [Link] tactic agreement mechanisms in neural language
Downloaded from [Link] by guest on 09 December 2025
.org/10.18653/v1/[Link]-long.533 models. In Proceedings of the 59th Annual
Meeting of the Association for Computational
Yining Chen, Colin Wei, Ananya Kumar, and
Linguistics and the 11th International Joint
Tengyu Ma. 2020. Self-training avoids using
Conference on Natural Language Processing
spurious features under domain shift. Advances
(Volume 1: Long Papers), pages 1828–1843,
in Neural Information Processing Systems,
Online. Association for Computational Lin-
33:21061–21071.
guistics. [Link]
Alexander D’Amour, Peng Ding, Avi Feller, /[Link]-long.144
Lihua Lei, and Jasjeet Sekhon. 2020. Overlap Christian Fong and Justin Grimmer. 2016. Dis-
in observational studies with high-dimensional covery of treatments from text corpora. In
covariates. Journal of Econometrics. Proceedings of the 54th Annual Meeting of
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and the Association for Computational Linguistics
Kristina Toutanova. 2019. BERT: Pre-training (Volume 1: Long Papers), pages 1600–1609.
of deep bidirectional transformers for lan- Christian Fong and Justin Grimmer. 2021. Causal
guage understanding. In Proceedings of the inference with latent treatments. American
2019 Conference of the North American Chap- Journal of Political Science. Forthcoming.
ter of the Association for Computational
Linguistics: Human Language Technologies, Matt Gardner, Yoav Artzi, Victoria Basmov,
NAACL-HLT 2019, Minneapolis, MN, USA, Jonathan Berant, Ben Bogin, Sihao Chen,
June 2–7, 2019, Volume 1 (Long and Short Pradeep Dasigi, Dheeru Dua, Yanai Elazar,
Papers), pages 4171–4186. Association for Ananth Gottumukkala, Nitish Gupta, Hannaneh
Computational Linguistics. Hajishirzi, Gabriel Ilharco, Daniel Khashabi,
Kevin Lin, Jiangming Liu, Nelson F. Liu,
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Phoebe Mulcaire, Qiang Ning, Sameer Singh,
Dejing Dou. 2018. Hotflip: White-box adver- Noah A. Smith, Sanjay Subramanian, Reut
sarial examples for text classification. In Pro- Tsarfaty, Eric Wallace, Ally Zhang, and Ben
ceedings of the 56th Annual Meeting of the Zhou. 2020. Evaluating models’ local decision
Association for Computational Linguistics (Vol- boundaries via contrast sets. In Findings of
ume 2: Short Papers), pages 31–36. https:// the Association for Computational Linguis-
[Link]/10.18653/v1/P18-2006 tics: EMNLP 2020, pages 1307–1323, Online.
Association for Computational Linguistics.
Naoki Egami, Christian J. Fong, Justin Grimmer, [Link]
Margaret E. Roberts, and Brandon M. Stewart. .findings-emnlp.117
2018. How to make causal inferences using
texts. arXiv preprint arXiv:1802.02163. Sahaj Garg, Vincent Perot, Nicole Limtiaco,
Ankur Taly, Ed H. Chi, and Alex Beutel. 2019.
Yanai Elazar, Shauli Ravfogel, Alon Jacovi, Counterfactual fairness in text classification
and Yoav Goldberg. 2021. Amnesic probing: through robustness. In Proceedings of the 2019
1151
AAAI/ACM Conference on AI, Ethics, and So- Alex Hanna, Emily Denton, Andrew Smart,
ciety, pages 219–226. [Link] and Jamila Smith-Loud. 2020. Towards a crit-
/10.1145/3306618.3317950 ical race methodology in algorithmic fairness.
In Proceedings of the 2020 Conference on
Atticus Geiger, Hanson Lu, Thomas Icard, and
Fairness, Accountability, and Transparency,
Christopher Potts. 2021. Causal abstractions of
pages 501–512. [Link]
neural networks. Advances in Neural Informa-
/3351095.3372826
tion Processing Systems, 34.
Moritz Hardt, Eric Price, and Nati Srebro. 2016.
Alan S. Gerber, Donald P. Green, and Christopher Equality of opportunity in supervised learning.
W. Larimer. 2008. Social pressure and voter Advances in Neural Information Processing
turnout: Evidence from a large-scale field ex- Systems, 29:3315–3323.
periment. American Political Science Review,
102(1):33–48. [Link] Miguel A. Hernán. 2016. Does water kill? A call
for less casual causal inferences. Annals of
Downloaded from [Link] by guest on 09 December 2025
/S000305540808009X
Epidemiology, 26(10):674–680. [Link]
Muhammad Ghifary, W. Bastiaan Kleijn, Mengjie .org/10.1016/[Link].2016.08.016,
Zhang, and David Balduzzi. 2015. Domain gen- PubMed: 27641316
eralization for object recognition with multi-
task autoencoders. In Proceedings of the IEEE Paul W. Holland. 1986. Statistics and causal in-
International Conference on Computer Vision, ference. Journal of the American Statistical
pages 2551–2559. [Link] Association, 81(396):945–960. [Link]
.1109/ICCV.2015.293 .org/10.2307/2289069
Seraphina Goldfarb-Tarrant, Rebecca Marchant, Zhiting Hu and Li Erran Li. 2021. A causal lens
Ricardo Muñoz Sánchez, Mugdha Pandya, and for controllable text generation. Advances in
Adam Lopez. 2021. Intrinsic bias metrics do Neural Information Processing Systems, 34.
not correlate with application bias. In Proceed- Po-Sen Huang, Huan Zhang, Ray Jiang, Robert
ings of the 59th Annual Meeting of the Asso- Stanforth, Johannes Welbl, Jack Rae, Vishal
ciation for Computational Linguistics and the Maini, Dani Yogatama, and Pushmeet Kohli.
11th International Joint Conference on Nat- 2019. Reducing sentiment bias in language
ural Language Processing (Volume 1: Long models via counterfactual evaluation. arXiv
Papers), pages 1926–1940, Online. Association preprint arXiv:1911.03064. [Link]
for Computational Linguistics. [Link] /10.18653/v1/[Link]-emnlp.7
.org/10.18653/v1/[Link]-long.150
Guido W. Imbens and Donald B. Rubin.
Riccardo Guidotti, Anna Monreale, Salvatore 2015. Causal Inference in Statistics, Social,
Ruggieri, Franco Turini, Fosca Giannotti, and and Biomedical Sciences. Cambridge Univer-
Dino Pedreschi. 2018. A survey of methods for sity Press. [Link]
explaining black box models. ACM Computing /CBO9781139025751
Surveys (CSUR), 51(5):1–42. [Link]
.org/10.1145/3236009 Alon Jacovi and Yoav Goldberg. 2020. To-
wards faithfully interpretable nlp systems: How
Ishaan Gulrajani and David Lopez-Paz. 2020. should we define and evaluate faithfulness? In
In search of lost domain generalization. arXiv Proceedings of the 58th Annual Meeting of
preprint arXiv:2007.01434. the Association for Computational Linguistics,
pages 4198–4205. [Link]
Suchin Gururangan, Swabha Swayamdipta, Omer
.18653/v1/[Link]-main.386
Levy, Roy Schwartz, Samuel R. Bowman, and
Noah A. Smith. 2018. Annotation artifacts in Alon Jacovi, Ana Marasović, Tim Miller, and
natural language inference data. Proceedings Yoav Goldberg. 2021. Formalizing trust in ar-
of the North American Chapter of the Asso- tificial intelligence: Prerequisites, causes and
ciation for Computational Linguistics: Human goals of human trust in ai. In Proceedings of the
Language Technologies (NAACL). https:// 2021 ACM Conference on Fairness, Account-
[Link]/10.18653/v1/N18-2017 ability, and Transparency, pages 624–635.
1152
[Link] from causal estimates. In ACL. [Link]
.3445923 .org/10.18653/v1/[Link]-main.474
Sarthak Jain and Byron C. Wallace. 2019. Atten- Chloé Kiddon, Luke Zettlemoyer, and Yejin Choi.
tion is not explanation. arXiv preprint arXiv: 2016. Globally coherent text generation with
1902.10186. neural checklist models. In Proceedings of the
2016 Conference on Empirical Methods in
Rohan Jha, Charles Lovering, and Ellie Pavlick. Natural Language Processing, pages 329–339.
2020. Does data augmentation improve gen- [Link]
eralization in NLP? arXiv preprint arXiv: -1032
2004.15012.
Niki Kilbertus, Mateo Rojas-Carulla, Giambattista
Shagun Jhaver, Sucheta Ghoshal, Amy Parascandolo, Moritz Hardt, Dominik Janzing,
Bruckman, and Eric Gilbert. 2018. Online ha- and Bernhard Schölkopf. 2017. Avoiding dis-
rassment and content moderation: The case of crimination through causal reasoning. In Pro-
Downloaded from [Link] by guest on 09 December 2025
blocklists. ACM Transactions on Computer- ceedings of the 31st International Conference
Human Interaction (TOCHI), 25(2):1–33. on Neural Information Processing Systems,
[Link] pages 656–666.
Zhijing Jin, Julius von Kügelgen, Jingwei Ni, Been Kim, Martin Wattenberg, Justin Gilmer,
Tejas Vaidhya, Ayush Kaushal, Mrinmaya Carrie Cai, James Wexler, Fernanda Viegas,
Sachan, and Bernhard Schoelkopf. 2021. and Rory Sayres. 2018. Interpretability be-
Causal direction of data collection matters: Im- yond feature attribution: Quantitative testing
plications of causal and anticausal learning for with concept activation vectors (tcav). In In-
NLP. In Proceedings of the 2021 Conference ternational Conference on Machine Learning,
on Empirical Methods in Natural Language pages 2668–2677.
Processing, pages 9499–9513. [Link] Issa Kohler-Hausmann. 2018. Eddie murphy and
.org/10.18653/v1/[Link]-main.748 the dangers of counterfactual causal think-
Nitish Joshi and He He. 2021. An investigation ing about detecting racial discrimination. Nw.
of the (in) effectiveness of counterfactually UL Rev., 113:1163. [Link]
augmented data. arXiv preprint arXiv:2107 .2139/ssrn.3050650
.00753. [Link] Matt J. Kusner, Joshua Loftus, Chris Russell, and
/[Link]-long.256 Ricardo Silva. 2017. Counterfactual fairness.
In Advances in Neural Information Processing
Amir-Hossein Karimi, Bernhard Schölkopf, and
Systems, pages 4066–4076.
Isabel Valera. 2021. Algorithmic recourse: from
counterfactual explanations to interventions. In Quoc Le and Tomas Mikolov. 2014. Distributed
Proceedings of the 2021 ACM Conference on representations of sentences and documents. In
Fairness, Accountability, and Transparency, International Conference on Machine Learn-
pages 353–362. [Link] ing, pages 1188–1196. PMLR.
/3442188.3445899 Xiaodong Liu, Jianfeng Gao, Xiaodong He, Li
Divyansh Kaushik, Eduard Hovy, and Zachary Deng, Kevin Duh, and Ye-Yi Wang. 2015.
C. Lipton. 2019. Learning the difference Representation learning using multi-task deep
that makes a difference with counterfactually- neural networks for semantic classification
augmented data. arXiv preprint arXiv:1909 and information retrieval. In Proceedings of
.12434. the 2015 Conference of the North American
Chapter of the Association for Computational
Divyansh Kaushik, Amrith Setlur, Eduard Hovy, Linguistics: Human Language Technologies,
and Zachary C. Lipton. 2020. Explaining the pages 912–921.
efficacy of counterfactually-augmented data.
Yang Liu and Mirella Lapata. 2018. Learning
arXiv preprint arXiv:2010.02114.
structured text representations. Transactions
Katherine Keith, David Jensen, and Brendan of the Association for Computational Linguis-
O’Connor. 2020. Text and causal inference: tics, 6:63–75. [Link]
A review of using text to remove confounding /tacl a 00005
1153
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Cambridge University Press. [Link]
Du, Mandar Joshi, Danqi Chen, Omer Levy, .org/10.1017/CBO9781107587991
Mike Lewis, Luke Zettlemoyer, and Veselin
Stoyanov. 2019. RoBERTa: A robustly opti- Ramaravind K. Mothilal, Amit Sharma, and
mized bert pretraining approach. arXiv pre- Chenhao Tan. 2020. Explaining machine learn-
print arXiv:1907.11692. ing classifiers through diverse counterfactual
explanations. In Proceedings of the 2020
Lajanugen Logeswaran, Honglak Lee, and Samy Conference on Fairness, Accountability, and
Bengio. 2018. Content preserving text genera- Transparency, pages 607–617. [Link]
tion with attribute controls. Advances in Neural .org/10.1145/3351095.3372850
Information Processing Systems, 31.
Reagan Mozer, Luke Miratrix, Aaron Russell
Shayne Longpre, Kartik Perisetla, Anthony Chen,
Kaufman, and L. Jason Anastasopoulos. 2020.
Nikhil Ramesh, Chris DuBois, and Sameer
Matching with text data: An experimental eval-
Singh. 2021. Entity-based knowledge conflicts
Downloaded from [Link] by guest on 09 December 2025
uation of methods for matching documents and
in question answering. In Proceedings of the
of measuring match quality. Political Analy-
2021 Conference on Empirical Methods in Nat-
sis, 28(4):445–468. [Link]
ural Language Processing, pages 7052–7063.
.1017/pan.2020.1
[Link]
.emnlp-main.565 Krikamol Muandet, David Balduzzi, and
Scott M. Lundberg and Su-In Lee. 2017. A Bernhard Schölkopf. 2013. Domain general-
unified approach to interpreting model predic- ization via invariant feature representation. In
tions. In Advances in Neural Information Pro- International Conference on Machine Learn-
cessing Systems, pages 4765–4774. ing, pages 10–18.
Rowan Hall Maudslay, Hila Gonen, Ryan Aakanksha Naik, Abhilasha Ravichander,
Cotterell, and Simone Teufel. 2019. It’s all in Norman Sadeh, Carolyn Rose, and Graham
the name: Mitigating gender bias with name- Neubig. 2018. Stress test evaluation for natu-
based counterfactual data substitution. arXiv ral language inference. In Proceedings of the
preprint arXiv:1909.00871. [Link] 27th International Conference on Computa-
.org/10.18653/v1/D19-1530 tional Linguistics, pages 2340–2353, Santa Fe,
New Mexico, USA. Association for Computa-
David McClosky, Eugene Charniak, and Mark tional Linguistics.
Johnson. 2006. Effective self-training for pars-
ing. In Proceedings of the Main Conference on Tong Niu and Mohit Bansal. 2018. Polite dialogue
Human Language Technology Conference of generation without parallel data. Transactions
the North American Chapter of the Association of the Association for Computational Linguis-
of Computational Linguistics, pages 152–159. tics, 6:373–389. [Link]
Citeseer. [Link] /tacl a 00027
/1220835.1220855
Yaakov Ophir, Refael Tikochinski, Christa
R. Thomas McCoy, Ellie Pavlick, and Tal Linzen. S. C. Asterhan, Itay Sisso, and Roi Reichart.
2019. Right for the wrong reasons: Diagnos- 2020. Deep neural networks detect suicide
ing syntactic heuristics in natural language risk from textual facebook posts. Scientific Re-
inference. arXiv preprint arXiv:1902.01007. ports, 10(1):1–10. [Link]
[Link] .1038/s41598-020-73917-0, PubMed:
-1334 33028921
Kevin Meng, David Bau, Alex Andonian, and
Silviu Paun, Bob Carpenter, Jon Chamberlain,
Yonatan Belinkov. 2022. Locating and edit-
Dirk Hovy, Udo Kruschwitz, and Massimo
ing factual knowledge in GPT. arXiv preprint
Poesio. 2018. Comparing bayesian models
arXiv:2202.05262.
of annotation. Transactions of the Associa-
Stephen L. Morgan and Christopher Winship. tion for Computational Linguistics, 6:571–585.
2015. Counterfactuals and Causal Inference, [Link] a 00040
1154
Ellie Pavlick and Tom Kwiatkowski. 2019. Inher- Reid Pryzant, Kelly Shen, Dan Jurafsky, and
ent disagreements in human textual inferences. Stefan Wagner. 2018. Deconfounded lexicon
Transactions of the Association for Computa- induction for interpretable social science. In
tional Linguistics, 7:677–694. [Link] Proceedings of the 2018 Conference of the
.org/10.1162/tacl_a_00293 North American Chapter of the Association
for Computational Linguistics: Human Lan-
Judea Pearl. 1994. A probabilistic calculus
guage Technologies, Volume 1 (Long Papers),
of actions, Uncertainty Proceedings 1994,
pages 1615–1625. [Link]
pages 454–462. Elsevier. [Link]
.18653/v1/N18-1146
.org/10.1016/B978-1-55860-332-5
.50062-6 Shauli Ravfogel, Yanai Elazar, Hila Gonen,
Michael Twiton, and Yoav Goldberg. 2020.
Judea Pearl. 2009. Causality. Cambridge Uni-
Null it out: Guarding protected attributes by
versity Press.
iterative nullspace projection. arXiv preprint
Downloaded from [Link] by guest on 09 December 2025
J. Peters, P. Bühlmann, and N. Meinshausen. 2016. arXiv:2004.07667. [Link]
Causal inference using invariant prediction: .18653/v1/[Link]-main.647
identification and confidence intervals. Jour-
nal of the Royal Statistical Society-Statistical Shauli Ravfogel, Grusha Prasad, Tal Linzen,
Methodology-Series B, 78(5):947–1012. and Yoav Goldberg. 2021. Counterfactual in-
[Link] terventions reveal the causal effect of relative
clause representations on agreement prediction.
Matthew E. Peters, Mark Neumann, Mohit Iyyer, arXiv preprint arXiv:2105.06965. https://
Matt Gardner, Christopher Clark, Kenton Lee, [Link]/10.18653/v1/[Link]-1.15
and Luke Zettlemoyer. 2018. Deep contextu-
alized word representations. In Proceedings of Roi Reichart and Ari Rappoport. 2007. Self-
the 2018 Conference of the North American training for enhancement and domain adap-
Chapter of the Association for Computational tation of statistical parsers trained on small
Linguistics: Human Language Technologies, datasets. In Proceedings of the 45th Annual
NAACL-HLT 2018, New Orleans, Louisiana, Meeting of the Association of Computational
USA, June 1–6, 2018, Volume 1 (Long Papers), Linguistics, pages 616–623.
pages 2227–2237. Association for Computa- Marco Tulio Ribeiro, Sameer Singh, and Carlos
tional Linguistics. [Link] Guestrin. 2016. Why should I trust you?: Ex-
.18653/v1/N18-1202 plaining the predictions of any classifier. In
Adam Poliak, Jason Naradowsky, Aparajita Proceedings of the 22nd ACM SIGKDD Inter-
Haldar, Rachel Rudinger, and Benjamin Van national Conference on Knowledge Discovery
Durme. 2018. Hypothesis only baselines in nat- and Data Mining, pages 1135–1144. ACM.
ural language inference. arXiv preprint arXiv: [Link]
1805.01042. [Link] .2939778
/v1/S18-2023
Marco Tulio Ribeiro, Tongshuang Wu, Carlos
Reid Pryzant, Dallas Card, Dan Jurafsky, Victor Guestrin, and Sameer Singh. 2020. Beyond
Veitch, and Dhanya Sridhar. 2021. Causal ef- accuracy: Behavioral testing of NLP mod-
fects of linguistic properties. In Proceedings els with CheckList. In Proceedings of the
of the 2021 Conference of the North American 58th Annual Meeting of the Association for
Chapter of the Association for Computational Computational Linguistics, pages 4902–4912,
Linguistics: Human Language Technologies, Online. Association for Computational Lin-
pages 4095–4109. [Link] guistics. [Link]
.18653/v1/[Link]-main.323 /[Link]-main.442
Reid Pryzant, Youngjoo Chung, and Dan Parker Riley, Noah Constant, Mandy Guo, Girish
Jurafsky. 2017. Predicting sales from the lan- Kumar, David Uthus, and Zarana Parekh.
guage of product descriptions. In eCOM@ 2020. Textsettr: Label-free text style extraction
SIGIR. and tunable targeted restyling. arXiv preprint
1155
arXiv:2010.03802. [Link] sociation, 100(469):322–331. [Link]
.18653/v1/[Link]-long.293 .org/10.1198/016214504000001880
Margaret E. Roberts, Brandon M. Stewart, and Jennifer D. Rubin, Lindsay Blackwell, and Terri
Richard A. Nielsen. 2020. Adjusting for con- D. Conley. 2020. Fragile masculinity: Men,
founding with text matching. American Journal gender, and online harassment. In Proceed-
of Political Science, 64(4):887–903. https:// ings of the 2020 CHI Conference on Human
[Link]/10.1111/ajps.12526 Factors in Computing Systems, pages 1–14.
Margaret E. Roberts, Brandon M. Stewart, [Link]
Dustin Tingley, Christopher Lucas, Jetson .3376645
Leder-Luis, Shana Kushner Gadarian, Bethany
B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa,
Albertson, and David G. Rand. 2014. Struc-
K. Zhang, and J. Mooij. 2012. On causal and
tural topic models for open-ended survey
anticausal learning. In 29th International Con-
responses. American Journal of Political Sci-
Downloaded from [Link] by guest on 09 December 2025
ference on Machine Learning (ICML 2012),
ence, 58(4):1064–1082. [Link]
pages 1255–1262. International Machine Learn-
/10.1111/ajps.12103
ing Society.
Paul R. Rosenbaum. 2007. Interference be-
tween units in randomized experiments. Jour- Ravi Shekhar, Sandro Pezzelle, Yauhen
nal of the american statistical association, Klimovich, Aurélie Herbelot, Moin Nabi,
102(477):191–200. [Link] Enver Sangineto, and Raffaella Bernardi.
.1198/016214506000001112 2017. FOIL it! Find one mismatch between
image and language caption. In Proceedings
Daniel Rosenberg, Itai Gat, Amir Feder, and of the 55th Annual Meeting of the Associ-
Roi Reichart. 2021. Are VQA systems rad? ation for Computational Linguistics (Volume
Measuring robustness to augmented data with 1: Long Papers), pages 255–265, Vancouver,
focused interventions. In Proceedings of the Canada. Association for Computational Lin-
59th Annual Meeting of the Association for guistics. [Link]
Computational Linguistics and the 11th Inter- /v1/P17-1024
national Joint Conference on Natural Lan-
guage Processing (Volume 2: Short Papers), Anders Søgaard. 2013. Semi-supervised learning
pages 61–70. [Link] and domain adaptation in natural language
/v1/[Link]-short.10 processing. Synthesis Lectures on Human Lan-
guage Technologies, 6(2):1–103. [Link]
Elan Rosenfeld, Pradeep Ravikumar, and .org/10.2200/S00497ED1V01Y201304HLT021
Andrej Risteski. 2021. The risks of invariant
risk minimization. In International Conference Dhanya Sridhar and Lise Getoor. 2019. Estimat-
on Learning Representations, volume 9. ing causal effects of tone in online debates. In
International Joint Conference on Artificial In-
Alexis Ross, Tongshuang Wu, Hao Peng,
telligence. [Link]
Matthew E. Peters, and Matt Gardner. 2021.
/ijcai.2019/259
Tailor: Generating and perturbing text with
semantic controls. arXiv preprint arXiv:2107 Mark Steedman, Miles Osborne, Anoop
.07150. [Link] Sarkar, Stephen Clark, Rebecca Hwa, Julia
/[Link]-long.228 Hockenmaier, Paul Ruhlen, Steven Baker, and
Donald B. Rubin. 1974. Estimating causal Jeremiah Crim. 2003. Bootstrapping statistical
effects of treatments in randomized and non- parsers from small datasets. In 10th Conference
randomized studies. Journal of Educational of the European Chapter of the Association for
Psychology, 66(5):688. [Link] Computational Linguistics. [Link]
/10.1037/h0037350 .org/10.3115/1067807.1067851
Donald B. Rubin. 2005. Causal inference using Eric J. Tchetgen Tchetgen, Andrew Ying, Yifan
potential outcomes: Design, modeling, deci- Cui, Xu Shi, and Wang Miao. 2020. An in-
sions. Journal of the American Statistical As- troduction to proximal causal learning. arXiv
1156
preprint arXiv:2009.10982. [Link] ical Methods in Natural Language Processing,
.org/10.1101/2020.09.21.20198762 pages 606–615, Austin, Texas. Association for
Computational Linguistics. [Link]
Matt Thomas, Bo Pang, and Lillian Lee. 2006. Get
.org/10.18653/v1/D16-1058
out the vote: Determining support or opposition
from congressional floor-debate transcripts. Galen Weld, Peter West, Maria Glenski, David
In Proceedings of the 2006 Conference on Arbour, Ryan Rossi, and Tim Althoff. 2022.
Empirical Methods in Natural Language Adjusting for confounders with text: Chal-
Processing, pages 327–335, Sydney, Australia. lenges and an empirical evaluation framework
Association for Computational Linguistics. for causal inference. ICWSM.
[Link]
.1610122 Zach Wood-Doughty, Ilya Shpitser, and
Mark Dredze. 2018. Challenges of using text
Victor Veitch, Alexander D’Amour, Steve classifiers for causal inference. In EMNLP.
Yadlowsky, and Jacob Eisenstein. 2021. Coun-
Downloaded from [Link] by guest on 09 December 2025
[Link]
terfactual invariance to spurious correlations: -1488, PubMed: 31633125
Why and how to pass stress tests. arXiv pre-
print arXiv:2106.00545. Zach Wood-Doughty, Ilya Shpitser, and Mark
Dredze. 2021. Generating synthetic text data
Victor Veitch, Dhanya Sridhar, and David M. to evaluate causal inference methods. arXiv
Blei. 2020. Adapting text embeddings for preprint arXiv:2102.05638.
causal inference. In UAI.
Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey
Jesse Vig, Sebastian Gehrmann, Yonatan
Heer, and Daniel S. Weld. 2021. Polyjuice: Au-
Belinkov, Sharon Qian, Daniel Nevo, Yaron
tomated, general-purpose counterfactual gener-
Singer, and Stuart M. Shieber. 2020. Investi-
ation. arXiv preprint arXiv:2101.00288.
gating gender bias in language models using
causal mediation analysis. In Advances in Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun
Neural Information Processing Systems 33: Cho, Aaron Courville, Ruslan Salakhudinov,
Annual Conference on Neural Information Pro- Rich Zemel, and Yoshua Bengio. 2015. Show,
cessing Systems 2020, NeurIPS 2020, De- attend and tell: Neural image caption generation
cember 6–12, 2020, virtual. with visual attention. In International Confer-
Sandra Wachter, Brent Mittelstadt, and Chris ence on Machine Learning, pages 2048–2057.
Russell. 2017. Counterfactual explanations PMLR.
without opening the black box: Automated de- Justine Zhang, Sendhil Mullainathan, and Cristian
cisions and the GDPR. Harvard Journal of Law Danescu-Niculescu-Mizil. 2020. Quantifying
& Technology, 31:841. [Link] the causal effects of conversational tendencies.
/10.2139/ssrn.3063289 Proceedings of the ACM on Human-Computer
Stefan Wager and Susan Athey. 2018. Esti- Interaction, 4(CSCW2):1–24. [Link]
mation and inference of heterogeneous treat- .org/10.1145/3415202
ment effects using random forests. Jour- Yi-Fan Zhang, Hanlin Zhang, Zachary C.
nal of the American Statistical Association, Lipton, Li Erran Li, and Eric P. Xing. 2022.
113(523):1228–1242. [Link] Can transformers be strong treatment effect
.1080/01621459.2017.1319839 estimators? arXiv preprint arXiv:2202.01336.
Yoav Wald, Amir Feder, Daniel Greenfeld,
Jieyu Zhao, Tianlu Wang, Mark Yatskar,
and Uri Shalit. 2021. On calibration and
Vicente Ordonez, and Kai-Wei Chang. 2017.
out-of-domain generalization. arXiv preprint
Men also like shopping: Reducing gender bias
arXiv:2102.10395.
amplification using corpus-level constraints.
Yequan Wang, Minlie Huang, Xiaoyan Zhu, and In Proceedings of the 2017 Conference on
Li Zhao. 2016. Attention-based LSTM for Empirical Methods in Natural Language
aspect-level sentiment classification. In Pro- Processing, pages 2979–2989, Copenhagen,
ceedings of the 2016 Conference on Empir- Denmark. Association for Computational
1157
Linguistics. [Link] tional Linguistics. [Link]
/v1/D17-1323 .18653/v1/N18-2003
Jieyu Zhao, Tianlu Wang, Mark Yatskar, Ran Zmigrod, Sabrina J. Mielke, Hanna Wallach,
Vicente Ordonez, and Kai-Wei Chang. 2018. and Ryan Cotterell. 2019. Counterfactual data
Gender bias in coreference resolution: Evalu- augmentation for mitigating gender stereo-
ation and debiasing methods. In Proceedings types in languages with rich morphology. In
of the 2018 Conference of the North American Proceedings of the 57th Annual Meeting of
Chapter of the Association for Computational the Association for Computational Linguistics,
Linguistics: Human Language Technologies, pages 1651–1661, Florence, Italy. Association
Volume 2 (Short Papers), pages 15–20, New for Computational Linguistics. https://
Orleans, Louisiana. Association for Computa- [Link]/10.18653/v1/P19-1161
Downloaded from [Link] by guest on 09 December 2025
1158