0% found this document useful (0 votes)
9 views43 pages

Module 3 PandS

Module 3 covers correlation and regression, including Pearson's correlation coefficient, rank correlation, and multiple regression. It explains how to calculate correlation coefficients and provides examples of their application in analyzing relationships between variables. The module also discusses regression analysis as a method for predicting relationships between dependent and independent variables.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views43 pages

Module 3 PandS

Module 3 covers correlation and regression, including Pearson's correlation coefficient, rank correlation, and multiple regression. It explains how to calculate correlation coefficients and provides examples of their application in analyzing relationships between variables. The module also discusses regression analysis as a method for predicting relationships between dependent and independent variables.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 3

Correlation and Regression-I

Correlation and Regression – Rank Correlation;


Partial and Multiple correlation; Multiple regression.
Correlation Coefficient:
Introduce Pearson's correlation coefficient (𝑟), which ranges from −1 to 1.
A positive value indicates a positive relationship,
a negative value indicates an inverse (negative) relationship,
and
a value near 0 indicates no linear relationship.
Scatter Plot
Scatter Plot
Karl Pearson’s coefficient of Correlation
Correlation coefficient between two variables 𝑋 𝑎𝑛𝑑 𝑌, usually denoted by 𝑟(𝑋, 𝑌) or
simply 𝑟𝑋𝑌 or simply 𝑟, is a numerical measure of linear relationship between them and
is defined as: 𝐶𝑂𝑉 𝑋, 𝑌
𝑟 𝑋, 𝑌 =
𝜎𝑋 𝜎𝑌

𝐼𝑓 (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3 ), … , (𝑥𝑛 , 𝑦𝑛 ) are 𝑛 pairs of observations of the variables


𝑋 𝑎𝑛𝑑 𝑌 in a bivariate distribution, then

𝑟 𝑋, 𝑌 = = --------------------(A)
The equation (A) can also be written as
Calculate Karl Pearson’s coefficient of correlation between expenditure on advertising
and sales from the data given below
Solution:
Let the advertising expenses(‘in Rs.) be denoted by the variable 𝑥 and the sales (in lakhs Rs.) be denoted
by the variable 𝑦.
We have to find the Calculation for correlation coefficient

𝑟 = 0.7804

Hence, there is a fairly high degree of positive correlation between expenditure on advertising sales.
We may, therefore conclude that in general, sales have increased with an increase in the advertising
expenditures.
Correlation
Given the following data:
The following table gives indices of industrial production and number of registered unemployed people (in lakh).
Calculate the value of the correlation coefficient.
The following table gives indices of industrial production and number of registered unemployed people (in lakh).
Calculate the value of the correlation coefficient.
Rank Correlation method

Let the random variables 𝑋 and 𝑌 denote the ranks of the individuals in the
characteristics A and B respectively.
If we assume that there is no tie, i.e., if no two individuals get the same rank in a
characteristic then, obviously, 𝑋 and 𝑌 assume numerical values ranging from 1 to 𝑛.

Spearman’s rank correlation coefficient, usually denoted by 𝜌 (Rho) is given by the


Formula

Where 𝑑 is the difference between the pair of ranks of the same individual in the two
characteristics and 𝑛 is the number of pairs.
Computation of rank correlation coefficient

We shall discuss below the method of computing the Spearman’s rank


correlation coefficient 𝜌 under the following situations:
❑ When actual ranks are given
❑ When ranks are not given

Example:1
The ranks of the same 15 students in two subjects 𝐴 and 𝐵 are given below:
the two numbers within the brackets denoting the ranks of the same student in 𝐴 and 𝐵
respectively. (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (8,1), (9,11), (10,15), (11,9),
(12,5), (13,14), (14,12), (15,13).
Use Spearman’s formula to find the rank correlation coefficient .
Solution: Let 𝑋 denotes the advertising cost(thousand Rs.) and 𝑌 denotes the Sales (lakhs Rs.).

6 σ 𝑑2
𝜌 =1−
𝑛(𝑛2 − 1)
2. Ten competitors in a beauty contest are ranked by three judges in the following order

Use the rank correlation coefficient to determine which pair of judges has the nearest
approach in common tastes in beauty
Repeated ranks:

6 σ 𝑑 2 + 𝐶𝐹
𝜌 =1−
𝑛(𝑛2 − 1)
𝑚(𝑚2 −1)
𝐶𝐹=correction factor = , 𝑚 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑
12

A psychologist wanted to compare two methods A and B of teaching. He selected a random sample of 22
students. He grouped them into 11 pairs so the students in a pair have approximately equal scores on an
intelligence test. In each pair one student was taught by method A and the other by method B and examined
after the course. The marks obtained by them are tabulated below:
6 σ 𝑑 2 + 𝐶𝐹
𝜌 =1−
𝑛(𝑛2 − 1)
𝑚(𝑚2 −1)
𝐶𝐹=correction factor = , 𝑚 = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑟𝑒𝑝𝑒𝑎𝑡𝑒𝑑
12
Problems
1. Obtain the rank correlation coefficient for the following data

X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70

2. A sample of 12 fathers and their eldest sons have the following data about their heights in inches.
Regression
Regression is the measure of the average relationship between two or more
variables in terms of the original units of the data.

Regression analysis attempts to establish the nature of relationship


between variables and thereby provide a mechanism for prediction, or
forecasting.

In general, a mathematical model for the regression analysis is


𝑦𝑖 = 𝑓 𝑥𝑖 , 𝛽 + ℰ
Where
𝑦𝑖 ; dependent variable
𝑓: function
The goal is to estimate the function 𝑓 𝑥𝑖 , 𝛽
𝛽: unknown parameters
that most closely fit the data.
𝑥𝑖 independent variable
ℰ error terms
Simple regression - when only two variables are studied to find the regression
relationships 𝒚𝒊 = 𝒂 + 𝒃𝒙𝒊 + 𝜺

Partial regression - when more than two variables are studied in a functional
relationship but a regression of only two variables is analyzed at a time, keeping
other variables as constant.
𝒚𝒊 = 𝒂 + 𝒃𝒙𝒊 + 𝒄𝒛𝒊 + 𝜺
Multiple regression - when more than two variables are studied and their
relationship are simultaneously worked out. Example, study of the growth in the
production of wheat in relation to fertilizers, hybrid seeds, irrigation etc.
𝒚𝒊 = 𝒇 𝒙𝒊 , 𝜷 + 𝓔
A regression line is a graphic technique to show the functional relationship
between two variables 𝑋 and 𝑌, i.e., dependent and independent variables.
𝑦 = 𝑎 + 𝑏𝑥
It is a line which shows average relationship between two variables X and Y.
Thus, this is a line of average.
Lines of regression
Properties

You might also like