Father Saturnino Urios University
Arts and Sciences Program
Regression and
Correlation Analysis
GE104: Mathemarics in the Modern World
GRACEFYL JOY L. PIA
Faculty, Natural Science and Mathematics Division
glpia@[Link]
Scatter plot - is a visual
way to describe the nature
of the relationship between
the variables. It is a graph
of the ordered pairs (x, y) of
numbers consisting of the
independent variable x and
the independent variable y.
Basically, the independent
variable is scaled along the
x-axis and the dependent
variable is scaled along the
y-axis. Graphing the data
on a scatter plot gives
preliminary information
about the shape and spread
2
of the data.
REGRESSION ANALYSIS
Regression analysis is a statistical method that makes use of
the relationship between two or more quantitative variables so
that one variable, called the dependent or response variable (Y),
can be explained with the knowledge of the values of the other
variable, called the independent or explanatory variable (X).
3
REGRESSION ANALYSIS
There are two main reasons for conducting a regression analysis:
❑ To determine the relationship between two or more variables,
and
❑ To predict the values of the dependent variable from the
known values of the independent variable.
4
line of best fit
or
least square
line
Given a scatter plot, you must be able
to draw the line of best fit or least
square line. Best fit means that the
sum of the squares of the vertical
distances from each point to the line is
at a minimum. The reason you need a
line of best fit is that the values
of y will be predicted from the values
of x; hence, the closer the points to
the line, the better the fit and the
prediction will be.
5
Simple regression analysis considers a straight-line relationship
between two variables. This linear relationship can be expressed in
an equation in the form
𝑌 = 𝑎 + 𝑏𝑥
where
𝑌 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑎 = 𝑡ℎ𝑒 𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡
𝑏 = 𝑡ℎ𝑒 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒
6
For the Slope of the Line,
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑏=
𝑛 σ 𝑥2 − σ 𝑥 2
For the 𝑦-intercept,
σ𝑦 σ𝑥
𝑎= −𝑏 ⇒ 𝑎 = 𝑦ത − 𝑏𝑥ҧ
𝑛 𝑛
7
EXAMPLE
A law enforcement officer obtained a data on the performance
rating of the police offices and the crime solution efficiency in their
respective areas of responsibilities for the last 6 months. Use the
equation of the regression line to predict the crime solution efficiency
of an area with the police office performance rating of 82.
PERFORMANCE
85 89 91 93 84 89
RATING (X)
CRIME RATE (Y) 89 90 92 92 88 90
8
PERFORMANCE RATING (X) 85 89 91 93 84 89
CRIME RATE (Y) 89 90 92 92 88 90
Crime Solution Efficiency of the Police Solution 1:
Offices for the last 6 months Draw a Scatter plot
94
CRIME RATE
92
90
88
86
82 84 86 88 90 92 94
PERFORMANCE RATING 9
Solution 2: Arrange the data in table as shown
MONTH 𝑥 𝑦 𝑥𝑦 𝑥2
1 85 89 7565 7225
2 89 90 8010 7921
3 91 92 8372 8281
4 93 92 8556 8649
5 84 88 7392 7056
6 89 90 8010 7921
𝑥 = 531 𝑦 = 541 𝑥𝑦 = 47,905 𝑥 2 = 47,053
10
Solution 3: Solve for the slope of the Regression
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑏=
𝑛 σ 𝑥2 − σ 𝑥 2
6 47,905 − 531 541
𝑏=
6 47,053 − 531 2
𝑏 = 0.4454
Solution 4: Solve for the y-intercept
541 531
𝑎= − (0.4454)
6 6
𝑎 = 50.75
11
Solution 5: Determine the Regression equation or the least square line
𝑌 = 𝑎 + 𝑏𝑥
𝑌 = 50.75 + 0.4454𝑥
Solution 6: Solve for the crime solution efficiency if the police
performance rating is 82.
𝑌 = 50.75 + 0.4454𝑥
𝑌 = 50.75 + 0.4454 82
𝑌 = 50.75 + 36.5228
𝑌 = 87.27
12
CORRELATION ANALYSIS
Correlation Analysis is a statistical method used to determine
whether a linear relationship or association between variables
exists. It attempts to measure the strength of the relationship
between two random variables using a single number called a
correlation coefficient. It is computed from the sample data to
measure the strength and direction of a linear relationship between
two variables. The symbol for the sample correlation coefficient is 𝑟
while the symbol for the population correlation coefficient is 𝜌 (rho).
13
REMARKS
❖ Pearson Correlation Coefficient, , ranges from -1 to +1
❖ An 𝑟 close to +1 indicates a positively high linear relationship
between the two variables X and Y, that is, if the value of Y
increases then the value of X also increases.
❖ An 𝑟 close to -1 indicates a negatively high linear relationship
between the sample values, that is the value of Y decreases as
the value of X increases.
14
REMARKS
❖An 𝑟 near 0 means that there is lack of linearity between the two
variables or there is no linear relationship between them. Note
that this doesn’t mean they are not associated at all.
❖ 𝑟 is used to measure the strength and direction of linear
relationship between two quantitative variables.
15
16
A suggested guideline for interpreting correlation coefficient, 𝑟, is
given below.
Correlation Coefficient 𝐫 Interpretation
(−0.25, 0.00) or (0.00, 0.25) Weak Linear Relationship
(−0.50, −0.25] or [0.25,0.50) Fair Degree of Linear Relationship
(−0.75, −0.50] or [0.50, 0.75) Moderate or Good Linear Relationship
(−1.00, −0.75] or [0.75, 1.00) Strong linear relationship
17
There are several ways to compute the value of the correlation
coefficient. One known as the Pearson product moment correlation
coefficient (PPMC) or simply the Pearson 𝑟 named after statistician Karl
Pearson, who pioneered the research in this area. The formula is
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2
where 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑡𝑎 𝑝𝑎𝑖𝑟𝑠
𝑥 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑦 = 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑑𝑎𝑡𝑎 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
18
EXAMPLE
The average normal daily temperature (in degree Celsius) and the
corresponding average monthly precipitation (in inches) for the
month of June are shown here for seven randomly selected cities.
Determine if there is a relationship between two variables.
TEMPERATURE (X) 30 27 28 32 27 23 18
PRECIPITATION (Y) 3.4 1.8 3.5 3.6 3.7 1.5 0.2
19
TEMPERATURE (X) 30 27 28 32 27 23 18
PRECIPITATION (Y) 3.4 1.8 3.5 3.6 3.7 1.5 0.2
Average Normal Daily Temperature
Solution 1:
and Monthly Precipitation
Draw a Scatter plot
4
PRECIPITATION
3
2
1
0
0 10 20 30 40
TEMPERATURE 20
Solution 2: Arrange the data in table as shown
CITY 𝑥 𝑦 𝑥𝑦 𝑥2 𝑦2
A 30 3.4 102.00 900.00 11.56
B 27 1.8 48.60 729.00 3.24
C 28 3.5 98.00 784.00 12.25
D 32 3.6 115.20 1024.00 12.96
E 27 3.7 99.90 729.00 13.69
F 23 1.5 34.50 529.00 2.25
G 18 0.2 3.60 324.00 0.04
𝑥 = 185 𝑦 = 17.70 𝑥𝑦 = 501.80 𝑥 2 = 5019 𝑦 2 = 55.99
21
Solution 3: Substitute the corresponding values to the formula for r
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2
7 501.80 − 185 17.70
𝑟=
7 5019 − 185 2 7 55.99 − 17.70 2
3,512.60 − 3,274.5
𝑟=
35,133 − 34,225 391.93 − 313.29
238.1
𝑟=
(908)(78.64)
𝑟 = 0.891
The correlation coefficient suggests a very strong positive relationship between the
average normal daily temperature and the corresponding average monthly precipitation. 22