Regression Discontinuity Design
Jimmy Toko
[Link]@[Link]
0701422116
01/29/2025 Jimmy Toko 070-1422116 1
Regression analysis
• In statistical modeling, regression analysis is a set of statistical
processes for estimating the relationships among variables
• Helps in analyzing several variables, when the focus is on the
relationship between a dependent variable and one or
more independent variables (or 'predictors’).
• Regression is a method for studying the relationship between a
response (or dependent) variable Y and covariates (or independent
variables) X.
• The covariate is also called a predictor variable or explanatory variable
01/29/2025 Jimmy Toko 070-1422116 2
• A regression model provides the user with a functional
relationship between the response variable and explanatory
variables that allows the user to determine which of the
explanatory variables have an e ffect on the response.
01/29/2025 Jimmy Toko 070-1422116 3
Definition (Example1)
• Financial officers must predict future cash flows based on specified values
of interest rates, raw material costs, salary increases, and so on.
• More specifically, regression analysis helps one understand how the
typical value of the dependent variable (or 'criterion variable') changes
when any one of the independent variables is varied, while the other
independent variables are held fixed.
• Regression is a statistical measure used in finance, investing and other
disciplines such as interventions that attempts to determine the strength
of the relationship between one dependent variable (usually denoted by
Y) and a series of other changing variables (known as independent
variables)
01/29/2025 Jimmy Toko 070-1422116 4
• The two basic types of regression are linear regression and multiple
linear regression, although there are non-linear regression methods
for more complicated data and analysis.
• Linear regression uses one independent variable to explain or predict
the outcome of the dependent variable Y, while multiple regression uses
two or more independent variables to predict the outcome
Regression models involve the following parameters and variables:
• The unknown parameters , denoted as β
• The independent variable X
• The dependent variable Y
• A regression model relates Y to a function X and β
Y=f(X, β)
01/29/2025 Jimmy Toko 070-1422116 5
Linear Regression Analysis—cont…
Is a technique that is also used to measure relationships
between two or more continuous variables.
1. Simple linear analysis
Is used to measure the relationships between two continuous
variables (one IV and one DV).
In the regression context, Y = a + bX where Y is the value of
DV for a given value of X (IV), a is a constant and b is the
regression coefficient.
01/29/2025 Jimmy Toko 070-1422116 6
Simple Linear Regression Model
• In simple linear regression, we attempt to model the
relationship between two variables, for example:
• income and number of years of education, height and weight of
people, length and width of envelopes, temperature and output
of an industrial process, altitude and boiling point of water, or
dose of a drug and response.
01/29/2025 Jimmy Toko 070-1422116 7
2. Multiple Linear Regression
Is used to determine the relationships between many IVs and
a given DV.
It can help to find out the effect of the individual as well as the
combined effect of the IVs on the DV .
01/29/2025 Jimmy Toko 070-1422116 8
• Linear Regression: Y = a + bX + u
• Multiple Regression: Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + u
• Where:
• Y = the variable that you are trying to predict (dependent variable)
• X = the variable that you are using to predict Y (independent variable)
• a = the intercept
• b = the slope
• u = the regression residual
01/29/2025 Jimmy Toko 070-1422116 9
Introduction
• Social programs often use an index to decide who is eligible to en-
roll in the program and who is not.
• For example, antipoverty programs are typically targeted to poor
households, which are identified by a poverty score or index.
• The poverty score can be based on a proxy means formula that
measures a set of basic household assets.
• Households with low scores are classified as poor, and households
with higher scores are considered relatively well-off.
01/29/2025 Jimmy Toko 070-1422116 10
Linear Regression Analysis
3. Simple11non-Linear Regression
Is used to correlate one IV and one DV where the relationship between the two
variables is not linear.
4. Multiple Non-Linear Regression
Is an extension of the simple non-linear regression model.
It is used when non-linear relationships between many IVs with the DV are being
investigated.
Regressions are hampered by:
Restricted range.
Extreme scores – outriders (or outliers).
Over-generalization.
Jimmy Toko 070-1422116 01/29/2025
• The program authorities typically determine a threshold or cut-off
score, below which households are deemed poor and are eligible for
the program.
• Pension programs are another example of a type of program that
targets units based on an eligibility index.
• Age constitutes a continuous index, and the retirement age
constitutes the cut-off that determines eligibility.
• In other words, only people above a certain age are eligible to
receive the pension.
• A third example of a continuous eligibility index would be test
scores.
01/29/2025 Jimmy Toko 070-1422116 12
Key Concept:
• Regression discontinuity design (RDD) is adequate for programs
that use a continuous index to rank potential participants and that
have a cut-off point along the index that determines whether or
not potential participants receive the program/ intervention
• The regression discontinuity design (RDD) is an impact
evaluation method that can be used for programs that have a
continuous eligibility index with a clearly defined cut-off score to
determine who is eligible and who is not.
• To apply a regression discontinuity design, two main conditions
are needed:
01/29/2025 Jimmy Toko 070-1422116 13
• Regression discontinuity design (RDD) is adequate for programs
that use a continuous index to rank potential participants and
that have a cut-off point along the index that determines whether
or not potential participants receive the program.
• A continuous eligibility index, in other words, a continuous
measure on which the population of interest can be ranked, such
as a poverty index, a test score, or age.
• A clearly defined cut-off score, that is, a point on the index above
or below which the population is classified as eligible for the
program.
01/29/2025 Jimmy Toko 070-1422116 14
For example,
• Households with a poverty index score less than 50 out of 100
might be classified as poor, individuals age 67 and older might
be classified as pensioners, and students with a test score of 90
or more out of 100 might be eligible for a scholarship.
• The cut-off scores in these examples are 50, 67, and 90,
respectively.
01/29/2025 Jimmy Toko 070-1422116 15
01/29/2025 Jimmy Toko 070-1422116 16
• Once the program rolls out and subsidizes the cost of
fertilizer for small and medium farms, the program
evaluators could use a regression discontinuity method
to evaluate its impact.
• The regression discontinuity measures the difference in
post intervention outcomes, such as total rice yields,
between the units near the eligibility cut-off.
01/29/2025 Jimmy Toko 070-1422116 17
• The farms that were just too large to en-roll in the program
constitute the comparison group and generate an estimate of the
counterfactual outcome for those farms in the treatment group that
were just small enough to en-roll.
• Given that these two groups of farms were very similar at baseline
and are exposed to the same set of external factors over time (such
as weather, price shocks, local and national agricultural policies, and
so on), the only plausible reason for different outcomes in the post
intervention period must be the program itself.
01/29/2025 Jimmy Toko 070-1422116 18
• The regression discontinuity method allows us to successfully
estimate the impact of a program without excluding any
eligible population.
• However, note that the estimated impact is only valid in the
neighbourhood around the eligibility cut-off score.
01/29/2025 Jimmy Toko 070-1422116 19
Example
• Assume that we are trying to evaluate the impact of a cash
transfer program on the daily food expenditures of poor
households.
• Also assume that we can use a poverty index, which takes
observations of a household’s assets and summarizes them into
a score between 0 and 100 that is used to rank households from
the poorest to the richest.
• At the baseline, you would expect the poorer households to
spend less on food, on average, than the richer ones.
01/29/2025 Jimmy Toko 070-1422116 20
• Figure below presents a possible relationship between the
poverty index and daily household expenditures (the outcome)
on food.
• Now assume that the program targets only poor households,
which are determined to be those with a score below 50.
• In other words, the poverty index can be used to determine
eligibility: treatment will be offered only to households with a
score of 50 or less. Households with a score above 50 are
ineligible
01/29/2025 Jimmy Toko 070-1422116 21
01/29/2025 Jimmy Toko 070-1422116 22
• The RDD strategy exploits the discontinuity around the cut-off
score to estimate the counterfactual.
• Intuitively, eligible households with scores just below the cut-
off (50 and just below) will be very similar to households with
a score just above the cut-off (for example, those scoring 51).
• On the continuous poverty index, the program has decided on
one particular point (50) at which there is a sudden change, or
discontinuity, in eligibility for the program.
• Since the households just above the cut-off score of 50 are
similar to the ones that are just below it, except that they do not
receive the cash transfers, the households just above can be
used as a comparison group for the households just below.
01/29/2025 Jimmy Toko 070-1422116 23
• In other words, households ineligible for the program but close
enough to the cut-off will be used as a comparison group to
estimate the counterfactual (what would have happened to
the group of eligible households in the absence of the
program).
01/29/2025 Jimmy Toko 070-1422116 24
Regression Discontinuity Trial
Regression Discontinuity Trial Regression Discontinuity Trial
With No Treatment Effects With an Effective Treatment
P P
o
62 Control Intervention o
62
Control
s 60 s 60
t 58 t 58
56 56
T 54 T 54 Intervention
e 52
e 52 Effect
50 50
s 48 s 48
46 46
t 44 t 44
S 42 S 42
40 40
c 38
c 38
o 36 o 36
r 36 38 40 42 44 46 48 50 52 54 56 58 60 62 r 36 38 40 42 44 46 48 50 52 54 56 58 60 62
e e
s Assignment Variable Score s Assignment Variable Score
01/29/2025 Jimmy Toko 070-1422116 25
01/29/2025 Jimmy Toko 070-1422116 26
Limitations and Interpretation of the Regression Discontinuity
Design Method
• Regression discontinuity design estimates local average impacts around
the eligibility cut-off at the point where treatment and comparison units
are most similar.
• As we get closer to the cut-off, the units that are to the left and right of it
will look more similar
• In fact, when we get extremely close to the cut-off score, the units on the
left and right of the line will be so similar that our comparison will be as
good as if we had chosen the treatment and comparison groups using
randomized assignment of the treatment.
01/29/2025 Jimmy Toko 070-1422116 27
• Because the RDD method estimates the impact of the program
around the cut-off score, or locally, the estimate cannot
necessarily be generalized to units whose scores are further
away from the cut-off score, this is, where eligible and
ineligible individuals may not be as similar.
• The fact that the RDD method produces local average treatment
effects also raises challenges in terms of the statistical power of
the analysis.
• Since effects are estimated only around the cut-off score, fewer
observations can be used than in other methods that would
include all units
01/29/2025 Jimmy Toko 070-1422116 28
• Even with these limitations, regression discontinuity design yields
unbiased estimates of the impact in the vicinity of the eligibility cut-
off.
• The regression discontinuity strategy takes advantage of the
program assignment rules, using continuous eligibility indexes,
which are already common in many social programs.
• When index-based targeting rules are applied, it is not necessary to
exclude a group of eligible households or individuals from receiving
the treatment for the sake of the evaluation because regression
discontinuity design can be used instead.
01/29/2025 Jimmy Toko 070-1422116 29