BBA-4 UNIT-2
(STATISTICS ANALYSIS)
By- Prof. Saloni Rastogi
Introduction of Linear Regression
Linear regression is one of the most important statistical tools used
to study the relationship between two variables. It helps us
understand how the value of one variable changes when the value of
another variable changes. In business, economics, accounting, and
social sciences, linear regression is widely used for forecasting,
planning, and decision-making.
For example, it can be used to estimate sales based on advertising
expenditure or predict profit based on production levels. The
relationship between variables in linear regression is assumed to be
linear, meaning it can be represented by a straight line.
Ex- If we know that advertising and sales are closely correlated we
find out expected amount of sales for a given advertising
expenditure or the required amount of expenditures for attaining the
given amount of sales.
Similary , if we know that the yield of rice and amount of rainfall are
closely related we may find out the amount of rain required to
achieve a certain production figure.
Meaning of Linear Regression
Linear regression refers to a method of measuring the average
relationship between a dependent variable and an independent
variable. The dependent variable is the variable whose value we
want to predict or explain, while the independent variable is the
factor that influences it. The term “linear” indicates that the
relationship between the variables can be shown by a straight
line on a graph. By using linear regression, we can find the best-
fitting line that explains how the dependent variable changes
with respect to the independent variable.
Linear regression can be defined as a statistical technique used
to determine the functional relationship between two variables
by fitting a straight line to the observed data. According to
statistics, linear regression shows the degree and direction of
relationship between variables and helps in predicting the value
of one variable from another.
In simple words, linear regression is a method that explains and
predicts one variable on the basis of another variable using a
straight-line equation.
Definition of Linear Regression
Linear regression is a method of estimating the value of a
dependent variable on the basis of the value of an independent
variable by fitting a straight-line equation to the given data.
OR
Linear regression is a statistical tool that establishes a linear
relationship between two variables and expresses it in the form of
a straight-line equation.
Explanation of Linear Regression
In linear regression, the relationship between variables is
expressed through a regression equation of the form:
𝑌 = 𝑎 + 𝑏𝑋
Where:
•Y = Dependent variable
•X = Independent variable
•a = Intercept (value of Y when X = 0)
•b = Regression coefficient (slope of the line)
Uses of Regression Analysis
Regression analysis is a powerful statistical tool used to study
the relationship between variables and to make predictions. It
helps in understanding how one variable depends on one or
more other variables. The following are the major uses of
regression analysis:
1. Prediction and Forecasting
One of the most important uses of regression analysis is to
predict future values. By establishing a relationship between
dependent and independent variables, regression helps in
forecasting sales, profits, demand, costs, population growth, and
economic trends. For example, future sales can be predicted
based on past sales and advertising expenses.
2. Measuring the Relationship Between Variables
Regression analysis helps in measuring the nature and extent of
the relationship between variables. It shows how much the
dependent variable changes when the independent variable
changes by one unit. This helps in understanding whether the
relationship is strong or weak, positive or negative.
3. Business Planning and Decision Making
In business, regression analysis is widely used for planning
and decision-making. Managers use it to analyze the impact
of factors like price, income, advertising, and production on
sales and profits. It helps in selecting the best course of
action and in minimizing risks.
4. Cost and Profit Estimation
Regression analysis is used to estimate cost and profit
behavior. It helps in determining how costs change with
changes in output levels. This is useful in budgeting, cost
control, and profit planning. For example, regression can be
used to separate fixed and variable costs.
5. Economic and Financial Analysis
Economists use regression analysis to study relationships
between economic variables such as income and
consumption, price and demand, interest rates and
investment, etc. In finance, it is used to analyze risk, return,
and the effect of various factors on financial performance.
6. Policy Formulation and Evaluation
Government agencies and policy makers use regression
analysis to formulate and evaluate policies. It helps in
assessing the impact of policies related to taxation,
employment, education, inflation, and public expenditure
by studying their effect on economic indicators.
7. Control and Performance Evaluation
Regression analysis helps in controlling business
operations by comparing actual performance with expected
performance. Deviations can be identified and corrective
actions can be taken. It is also used to evaluate employee
performance, productivity, and efficiency.
8. Scientific and Social Research
In scientific and social research, regression analysis is used
to test hypotheses and study cause-and-effect
relationships. Researchers use it to analyze data in fields
like psychology, sociology, education, and health sciences.
9. Filling Missing Values
Regression analysis can be used to estimate missing or
unknown values in a data series. When data for one variable
is missing, it can be estimated with the help of related
variables.
10. Risk Analysis
Regression analysis helps in identifying and analyzing risk
by studying how different factors affect outcomes. In
finance, it is used to measure market risk, credit risk, and
investment risk.
Regression Line – IMP POINT
➢ When there is perfect positive or perfect negative
correlation between the two variables (r = +1 or r = −1), the
regression lines coincide, i.e., we will have only one line.
➢ Farther the two regression lines are from each other, lesser
is the degree of correlation.
➢ Nearer the two regression lines are to each other, higher is
the degree of correlation.
➢ If the variables are independent, then r = 0 and the lines of
regression are at right angles, i.e., parallel to OX and OY.
➢ It should be noted that the regression lines cut each other
at the point of average of X and Y. That is, if from the point
where both the regression lines cut each other, a
perpendicular is drawn on the X-axis, we will get the mean
value of X. If from that point a horizontal line is drawn on
the Y-axis, we will get the mean value of Y.
Example – Regression Analysis
Height Data (in inches)
Height of fathers (X): 65, 63, 67, 64, 68, 62, 70, 66, 68, 67, 69, 71
Height of sons (Y): 68, 66, 68, 65, 69, 66, 68, 65, 71, 67, 68, 70
Regression Equations
The two regression equations corresponding to these variables are:
(i) Regression equation of X on Y 𝑋 = −3.38 + 1.036𝑌
(ii) Regression equation of Y on X 𝑌 = 35.82 + 0.476𝑋
Finding Values Using Regression Equations
By assuming any value of Y, we can find the corresponding value of X from
equation (i).
•If Y = 65, 𝑋 = −3.38 + 1.036 65 = 63.96
•If Y = 70, 𝑋 = −3.38 + 1.036 70 = 69.14
We can plot these points on the graph and obtain the regression line of X on Y.
Similarly, by assigning any value to X in equation (ii), we can obtain the
corresponding value of Y.
•If X = 63, 𝑌 = 35.82 + 0.476 63 = 65.808 ≈ 65.81
•If X = 70, 𝑌 = 35.82 + 0.476 70 = 69.14
Conclusion
Thus, by calculating corresponding values and plotting them, we can draw the
two regression lines and study the relationship between the heights of fathers
and sons.
Properties of Regression Coefficients
Regression coefficients are the constants 𝑏𝑥𝑦 and 𝑏𝑦𝑥 used in
regression equations. They show the extent and direction of change in
one variable caused by a unit change in the other variable.
1. Regression Coefficients Have the Same Sign
Both regression coefficients 𝑏𝑥𝑦 and 𝑏𝑦𝑥 always have the same sign.
If the correlation between X and Y is positive, both regression
coefficients are [Link] the correlation is negative, both regression
coefficients are negative.
Reason: Regression coefficients depend on the correlation
coefficient (r), which determines the sign.
2. Regression Coefficients Are Independent of Origin but Not of
Scale
Independent of origin:- Changing the starting point (origin) of data
does not affect regression coefficients.
Not independent of scale: -Changing the unit of measurement
(scale) affects regression coefficients.
Example: Changing values from rupees to paise will change the
regression coefficients.
3. Product of Regression Coefficients Equals Square of
Correlation Coefficient
𝑏𝑥𝑦 × 𝑏𝑦𝑥 = 𝑟 2
I. This shows a direct relationship between regression and
correlation.
II. Since 𝑟 2 is always non-negative, the product of regression
coefficients is also non-negative.
4. If One Regression Coefficient Is Greater Than 1, the Other
Must Be Less Than 1
I. Both regression coefficients cannot exceed 1
simultaneously.
II. If one coefficient is greater than unity, the other must be less
than unity.
III. This happens because their product is equal to 𝑟 2 ,which is
≤ 1.
5. Regression Coefficients Can Take Any Real Value
Regression coefficients are not restricted between −1 and +1 like
the correlation coefficient. They can take any real value, such as
positive, negative, zero, or values greater than one. This is
because regression coefficients depend on the units and scale of
measurement.
Hence, regression coefficients only show the rate of change
between variables.
6. Regression Coefficients Are Zero When Correlation Is Zero-
When the correlation coefficient 𝑟 = 0, it indicates that there is no
linear relationship between the variables.
Regression coefficients are directly dependent on the correlation
coefficient.
From the formulas 𝑏𝑦𝑥 = 𝑟 𝜎𝜎𝑦𝑥and 𝑏𝑥𝑦 = 𝑟 𝜎𝜎𝑦𝑥 ,both coefficients
become zero when 𝑟 = 0.
Hence, no change in one variable causes a systematic change in
the other.
7. Regression Coefficients Indicate Cause and Effect
Relationship
Regression coefficients help in identifying the cause and
effect relationship between variables. They show how much
the dependent variable changes due to a unit change in the
independent variable. The variable being predicted is treated
as the effect, while the other is the cause.
Hence, regression analysis is useful for prediction and
forecasting.
8. Regression Coefficients Are Affected by Extreme Values
Regression coefficients are affected by extreme or outlying
values in the data. Such extreme values can distort the
regression line and change the coefficients. This may lead to
misleading conclusions about the relationship between
variables.
Therefore, data should be carefully examined before applying
regression analysis.
Difference Between Correlation and Regression
Basis Correlation Regression
Regression explains the functional
Correlation measures the degree and
relationship between variables. It
direction of relationship between two
shows how much one variable
Meaning variables. It only shows whether variables
changes due to a unit change in
move together or not. It does not explain
another. It is mainly used for
how much change occurs.
prediction.
The main purpose of correlation is to
The main purpose of regression is
measure the strength of association
prediction and estimation. It helps
Purpose between variables. It helps in
in forecasting future values. It is
understanding whether variables are
widely used in decision-making.
related. It is not used for forecasting.
Regression clearly indicates a
cause-and-effect relationship. One
Correlation does not indicate cause and
variable is independent and the
Cause and Effect effect relationship. Both variables are
other is dependent. The dependent
treated as equal. It only shows association.
variable is influenced by the
independent variable.
Difference Between Correlation and Regression
Basis Correlation Regression
In regression, variables are
In correlation, there is no distinction clearly classified into
between dependent and independent dependent and independent
Variables
variables. Both variables are treated variables. Their roles are fixed.
equally. Either variable can be taken first. Interchange of variables
changes the equation.
Regression coefficients are not
The value of correlation coefficient lies
restricted to any fixed range.
between −1 and +1. It cannot exceed
Range of Values They can take any real value.
these limits. This restriction helps in
Their value depends on the
judging strength.
scale of measurement.
Difference Between Correlation and Regression
Basis Correlation Regression
In regression, interchanging
In correlation, interchanging variables
variables gives different
does not change the value of
equations. Regression of Y
Interchangeability correlation coefficient. The
on X is different from X on Y.
relationship remains the same.
Hence variables are not
Direction only changes sign.
interchangeable.
Regression is both
Correlation is mainly a descriptive descriptive and predictive. It
statistical tool. It summarizes the explains the relationship and
Nature
relationship between variables. It does also provides an equation.
not provide a prediction equation. This equation is used for
forecasting.
Question 1- Obtain Regression equation of Y on X by the least
square method for the following data:
X 1 2 3 4 5
Y 9 9 10 12 11
Also estimate the value of Y when X=10
Question 2- Obtain Regression equation of X on Y by the
least square method for the following data:
X 1 2 3 4 5
Y 2 5 3 8 7
Also estimate the value of X when Y=20
Question 3- Obtain Regression equation of X on Y by the
least square method for the following data:
X 9 10 8 9 11
Y 8 7 6 5 4
Also estimate the value of X when Y=43 and estimate the
value of Y when X= 79
Question 4- Calculate the regression equation of X on Y and
Y on X from the following data.(Actual values method)
X 12 6 16 9 15
Y 13 21 14 16 12
Question 5- Calculate the regression equation of X on Y and
Y on X from the following data.(Actual values method)
X 1 2 3 4 5
Y 2 5 3 8 7
Question 6- Calculate the regression equation of X on Y and
Y on X from the following data.(Actual values method)
X 1 2 3 4 5
Y 2 5 3 8 7
Question 7- Calculate the Two regression equation from the
following data:
X 2 4 6 8 10 12
Y 4 2 5 10 3 6
Question 8- Calculate the Two regression equation from the
following data by method :
X 43 44 46 40 44 42 45 42 38 40 52 57
Y 29 31 19 18 19 27 27 29 41 30 26 10
Also find the value of X then Y =49 when X = 50. Hence or
Question 9- Calculate the Two regression equation from the
following data by the least square method :
X 12 17 9 29 10 14
Y 13 18 15 10 15 10
Question 10- Calculate the Two regression equation from the
following data by method :
X 34 26 18 15 22 18
Y 15 19 17 12 12 25