Module 3
Correlation and Regression-I
Correlation and Regression – Rank Correlation;
Partial and Multiple correlation; Multiple regression.
Correlation Coefficient:
Introduce Pearson's correlation coefficient (𝑟), which ranges from −1 to 1.
A positive value indicates a positive relationship,
a negative value indicates an inverse (negative) relationship,
and
a value near 0 indicates no linear relationship.
Scatter Plot
Scatter Plot
Karl Pearson’s coefficient of Correlation
Correlation coefficient between two variables 𝑋 𝑎𝑛𝑑 𝑌, usually denoted by 𝑟(𝑋, 𝑌) or
simply 𝑟𝑋𝑌 or simply 𝑟, is a numerical measure of linear relationship between them and
is defined as: 𝐶𝑂𝑉 𝑋, 𝑌
𝑟 𝑋, 𝑌 =
𝜎𝑋 𝜎𝑌
𝐼𝑓 (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3 ), … , (𝑥𝑛 , 𝑦𝑛 ) are 𝑛 pairs of observations of the variables
𝑋 𝑎𝑛𝑑 𝑌 in a bivariate distribution, then
𝑟 𝑋, 𝑌 = = --------------------(A)
The equation (A) can also be written as
Calculate Karl Pearson’s coefficient of correlation between expenditure on advertising
and sales from the data given below
Solution:
Let the advertising expenses(‘in Rs.) be denoted by the variable 𝑥 and the sales (in lakhs Rs.) be denoted
by the variable 𝑦.
We have to find the Calculation for correlation coefficient
𝑟 = 0.7804
Hence, there is a fairly high degree of positive correlation between expenditure on advertising sales.
We may, therefore conclude that in general, sales have increased with an increase in the advertising
expenditures.
Correlation
Given the following data:
The following table gives indices of industrial production and number of registered unemployed people (in lakh).
Calculate the value of the correlation coefficient.
The following table gives indices of industrial production and number of registered unemployed people (in lakh).
Calculate the value of the correlation coefficient.
Rank Correlation method
Let the random variables 𝑋 and 𝑌 denote the ranks of the individuals in the
characteristics A and B respectively.
If we assume that there is no tie, i.e., if no two individuals get the same rank in a
characteristic then, obviously, 𝑋 and 𝑌 assume numerical values ranging from 1 to 𝑛.
Spearman’s rank correlation coefficient, usually denoted by 𝜌 (Rho) is given by the
Formula
Where 𝑑 is the difference between the pair of ranks of the same individual in the two
characteristics and 𝑛 is the number of pairs.
Computation of rank correlation coefficient
We shall discuss below the method of computing the Spearman’s rank
correlation coefficient 𝜌 under the following situations:
❑ When actual ranks are given
❑ When ranks are not given
Example:1
The ranks of the same 15 students in two subjects 𝐴 and 𝐵 are given below:
the two numbers within the brackets denoting the ranks of the same student in 𝐴 and 𝐵
respectively. (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (8,1), (9,11), (10,15), (11,9),
(12,5), (13,14), (14,12), (15,13).
Use Spearman’s formula to find the rank correlation coefficient .
Solution: Let 𝑋 denotes the advertising cost(thousand Rs.) and 𝑌 denotes the Sales (lakhs Rs.).
6 σ 𝑑2
𝜌 =1−
𝑛(𝑛2 − 1)
2. Ten competitors in a beauty contest are ranked by three judges in the following order
Use the rank correlation coefficient to determine which pair of judges has the nearest
approach in common tastes in beauty
Repeated ranks:
6 σ 𝑑 2 + 𝐶𝐹
𝜌 =1−
𝑛(𝑛2 − 1)
𝑚(𝑚2 −1)
𝐶𝐹=correction factor = , 𝑚 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑
12
A psychologist wanted to compare two methods A and B of teaching. He selected a random sample of 22
students. He grouped them into 11 pairs so the students in a pair have approximately equal scores on an
intelligence test. In each pair one student was taught by method A and the other by method B and examined
after the course. The marks obtained by them are tabulated below:
6 σ 𝑑 2 + 𝐶𝐹
𝜌 =1−
𝑛(𝑛2 − 1)
𝑚(𝑚2 −1)
𝐶𝐹=correction factor = , 𝑚 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑
12
Problems
1. Obtain the rank correlation coefficient for the following data
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
2. A sample of 12 fathers and their eldest sons have the following data about their heights in inches.
Regression
Regression is the measure of the average relationship between two or more
variables in terms of the original units of the data.
Regression analysis attempts to establish the nature of relationship
between variables and thereby provide a mechanism for prediction, or
forecasting.
In general, a mathematical model for the regression analysis is
𝑦𝑖 = 𝑓 𝑥𝑖 , 𝛽 + ℰ
Where
𝑦𝑖 ; dependent variable
𝑓: function
The goal is to estimate the function 𝑓 𝑥𝑖 , 𝛽
𝛽: unknown parameters
that most closely fit the data.
𝑥𝑖 independent variable
ℰ error terms
Simple regression - when only two variables are studied to find the regression
relationships 𝒚𝒊 = 𝒂 + 𝒃𝒙𝒊 + 𝜺
Partial regression - when more than two variables are studied in a functional
relationship but a regression of only two variables is analyzed at a time, keeping
other variables as constant.
𝒚𝒊 = 𝒂 + 𝒃𝒙𝒊 + 𝒄𝒛𝒊 + 𝜺
Multiple regression - when more than two variables are studied and their
relationship are simultaneously worked out. Example, study of the growth in the
production of wheat in relation to fertilizers, hybrid seeds, irrigation etc.
𝒚𝒊 = 𝒇 𝒙𝒊 , 𝜷 + 𝓔
A regression line is a graphic technique to show the functional relationship
between two variables 𝑋 and 𝑌, i.e., dependent and independent variables.
𝑦 = 𝑎 + 𝑏𝑥
It is a line which shows average relationship between two variables X and Y.
Thus, this is a line of average.
Lines of regression
Properties