0% found this document useful (0 votes)
20 views9 pages

Understanding Residuals in Regression

Uploaded by

22070172
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Understanding Residuals in Regression

Uploaded by

22070172
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

1. Is the residual the error term?

Explain
-Residual: The residual, also known as the "error term," represents
the difference between the observed value of the dependent
variable in a regression analysis and the value predicted by the
regression model. Mathematically, the residual ei for each
observation i is calculated as:
ei = yi – ^y i
Where:
 yi is the observed value of the dependent variable for
observation
 ^y i is the predicted value of the dependent variable for
observation i based on the regression model.
The residual captures the unexplained variation in the dependent
variable after accounting for the effects of the independent variables
in the regression model. It represents the extent to which the
model's predictions deviate from the actual observed data-
Error Term: The error term, denoted by, represents the
unobservable random variability or noise in the relationship between
the dependent variable and the independent variables. It
encompasses all factors that influence the dependent variable but
are not explicitly included in the regression model. The error term
reflects the discrepancy between the true underlying relationship
between the variables and the estimated relationship captured by
the regression model.
Unlike the residual, which is calculated from the observed data, the
error term is a theoretical concept that is assumed to follow certain
statistical properties, such as having a mean of zero and being
independently and identically distributed (IID) across observations.
In summary, while the residual and the error term both capture
unexplained variability in the dependent variable, the residual is a
calculated value based on the observed data and the model's
predictions, whereas the error term is a theoretical concept
representing the underlying stochastic process that generates the
observed data.
2. Why do the sum and mean of the residual always equal
zero
The residual is the difference between the observed value and the
value estimated by the regression model. The regression line is
chosen to minimize the sum of squared residuals, the sum of
positive and negative deviations cancel each other out so it is zero.
Similarly, the average value of the residual is zero because the sum
divided by the average number of observations is zero.

3. What happens to the sum and mean of the residual if


we exclude the intercept from the OLS model
-With Intercept (Intercept Included): When the intercept term is
included in the regression model, it ensures that the fitted line
passes through the mean of the observed data points. This means
that the sum of the residuals, which represents the deviations of the
observed data points from the fitted line, will be balanced around
zero. Consequently, the mean of the residuals will be zero.
-Without Intercept (Intercept Excluded): If the intercept is excluded
from the model, the fitted line is no longer required to pass through
the mean of the observed data points. As a result, the sum of the
residuals may not necessarily equal zero. Some residuals may be
positive, indicating that the observed values are above the fitted
line, while others may be negative, indicating that the observed
values are below the fitted line. Consequently, the mean of the
residuals will not necessarily be zero.
In summary, excluding the intercept from the OLS model can lead to
a situation where the sum of the residuals does not equal zero, and
the mean of the residuals may not be zero either. This emphasizes
the importance of considering whether the intercept term should be
included in the regression model based on theoretical and empirical
considerations.
4. What happens to the OLS estimator if the sample is not
randomly selected from a population
If the sample is not randomly selected from the population, the OLS estimator
may produce biased and inefficient parameter estimates, leading to invalid
inference and potentially incorrect conclusions about the relationships between
variables. Therefore, it is important to ensure that samples are selected
randomly or through appropriate sampling methods to ensure the validity and
reliability of OLS estimates.
5. What happens to a simple linear regression model if the
value of the explanatory variable is similar for all
observations
If the value of the explanatory variable is similar for all observations in a simple
linear regression model, perfect collinearity occurs, leading to non-invertibility of
the matrix of independent variables, unreliable coefficient estimates, and a
degenerate model that lacks predictive power. It's essential to address perfect
collinearity through data preprocessing techniques such as variable
transformation or removal before conducting regression analysis.

6. Suppose our model satisfies SLR assumptions 1-4 but


suffers from heterokedasticity. In this case, are our
estimates biases? What is the consequence of the
heterokedasticity
While heteroscedasticity does not introduce bias into the coefficient estimates
themselves, it can lead to inefficient estimates, incorrect inference about the
significance of coefficients, and inflated or deflated confidence intervals. It is
essential to diagnose and address heteroscedasticity to ensure the reliability and
validity of regression results. Common approaches to addressing
heteroscedasticity include robust standard errors estimation, heteroscedasticity-
consistent standard errors, and data transformation techniques.

7. Comment on the statement that a model with a high R-


squared shows a strongly casual relationship
A high R-squared indicates a strong explanatory power of the model, it does not
by itself establish a causal relationship between the variables. Causal inference
requires careful consideration of confounding variables, omitted variable bias,
endogeneity, and other potential sources of bias in the regression model.
Therefore, caution should be exercised when interpreting R-squared as evidence
of causation, and additional methods such as experimentation, instrumental
variables, or causal inference techniques may be necessary to establish
causality.

8. Which model violates the assumption of the OLS


Y= β0 + β11/X +μ; (1)
Y= β0 + 1/β1X +μ; (2)
Y= β0 + β12X +μ; (3)
9. Let Qd denote the quantity of a given product, and let P
denote the price of that product. A simple model is
presented that connects quantity demanded to price: Q d
= β0+ β1P+ μ
(i)What possible factors are contained in μ? Is it likely that
these will be related to price
(ii)Will a simple regression analysis show the ceteris paribus
effect of price on quantity demanded? Explain
(i) The term μ represents the error term or the residual in the simple
linear regression model connecting quantity demanded (Qd) to price
(P). It captures all factors other than price that influence quantity
demanded but are not explicitly accounted for in the model. Possible
factors contained in μ could include:
-Consumer preferences and tastes
-Income levels
-Prices of substitute or complementary goods
-Advertising and marketing efforts
-Seasonal variations
-Economic conditions and macroeconomic factors
-Technological advancements
-Government policies and regulations
-Random shocks and unforeseen events
Some of these factors may indeed be related to price, such as
changes in income levels affecting consumers' purchasing power or
changes in the prices of substitute goods influencing demand
elasticity. However, μ encompasses all factors that are not explicitly
included in the model and may have varying degrees of correlation
with price.
(ii) In a simple regression analysis of the model Qd = β0+ β1P+ μ the
estimated coefficient β1 represents the partial effect of price (P) on
quantity demanded (Qd), holding all other factors constant. This
means that the estimated coefficient captures the ceteris paribus
effect of price on quantity demanded, assuming that all other factors
influencing demand remain unchanged.
However, it's essential to note that the presence of the error term μ
implies that the ceteris paribus interpretation of β1 assumes that all
relevant factors other than price are indeed constant. In reality, this
assumption may not hold perfectly due to the presence of omitted
variables, measurement errors, or unobserved factors. Therefore,
while a simple regression analysis provides an estimate of the
ceteris paribus effect of price on quantity demanded, it is essential
to interpret the results cautiously and consider potential
confounding factors.
10. The following table contains monthly meat
consumption per household (thousand VND) and
monthly household income per capita
List Meat Income List Meat Income
1 1390 5031 11 1770 4365
2 1320 6491 12 1620 4727
3 2900 4900 13 1460 5067
4 790 3267 14 650 5094
5 1600 5164 15 995 3000
6 2400 3260 16 2900 8208
7 1310 4847 17 1450 3613
8 1690 8395 18 1460 4624
9 1880 6625 19 510 4751
10 1205 2394 20 760 5151
(i) Estimate the relationship between the dependent variable
(meat consumption) and the independent variable (household
income per captita) using an OLS regression model. Comment
on the link between two variables. What is the meaning of the
intercept and slope coefficients
(ii)How much higher is the the level of meat consumption
predicted to be if the monthly income per captita is increased
by 200 thousand VND
(iii)Is this true is we say that given a one million VND increase
in household income per capita, the value of meat consumption
increases at the same level for all households
(iv) Caculate the fitted values of the dependent variable and
the residuals. Do the sum and mean of the residuals equal
zero? What is average of the fitted values and the observed
values of the dependent variables
(v)Please interpret the R-squared. How much of the variation in
meat consumption is unexplained by the regression
(vi)Caculate the standard error of the regression. What is the
unit of analysis for
the standard error of the regression
11. Using a simple linear regression model, a
reasearcher investigates the dependence of the
monthly wage (in thousand VND) on the number of
years of education among wage workers in Hanoi in
2018
(i)What is the average predicted wage when education
equals zero
(ii)How much does the monthly wage increase if the
number of years of educaation increase from
(iii)Does this model infer a casual relationship between
wage and education
(iv)What percentage of the variance in wages is
explained by education
Monthly Wage=β0+β1×Years of Education+ϵ
Where:
 β0 is the intercept, representing the average predicted
wage when education equals zero.
 β1is the slope coefficient, indicating how much the
monthly wage increases for each additional year of
education.
 ϵ is the error term, representing the unexplained variance
in monthly wage.

(i) What is the average predicted wage when education


equals zero?
average predicted wage=797.1756
(ii) How much does the monthly wage increase if the number
of years of education increases by one unit?
The monthly wage increases by 513.25*4 units for each
additional year of education.
(iii) Does this model infer a causal relationship between wage
and education?
A simple linear regression model does not infer causation on its
own. While it can show an association between the
independent and dependent variables, establishing causation
requires additional evidence, such as experimental design or
rigorous control for confounding variables.
(iv) What percentage of the variance in wages is explained by
education?
15.07% of the variance in wages is explained by education
12. A sample of 11 households with their income and
food consumption is given in the table
Income Food consumption
Thousand VND/per Thousand VND/per
person/month person/month
3000 995
8208 2900
3613 1450
4624 1460
4751 510
5151 760
5884 1005
2696 100
2485 912
8860 570
1436 512

Using the OLS estimator, estimate the relationship between the


dependent variable (food consumption) and the explanatory
variable(income):
Food = β0 + β1Income+ μ
(i)Using the regression result, please report the marginal propensity
to consume food (MPCF)
(iii)What is the MPCF is the regression model excludes the intercept?
Food = β1Income+ μ
(iv)Using the result from the model without intercept, caculate the
fitted values of the dependent variables
(v)Does the exlussion of the intercept from the model cause the
bias? Explain

(i) The marginal propensity to consume food (MPCF) is equal to the


coefficient of income (β1). From the regression output, we can see
the estimated value of 0.143.
(iii) If we exclude the intercept from the model, the MPCF is 0.205

(iv)

mean: residual = 67.6047; fitted value=948.2135, food=1015.818


(v) Excluding the intercept from the model can cause bias if there is
a systematic pattern in the residuals. In econometrics, the intercept
term captures the effect of all other variables not explicitly included
in the model. If we exclude it and there are omitted variables or
other issues causing systematic errors, the coefficient estimates of
the included variables may be biased

You might also like