0% found this document useful (0 votes)

87 views15 pages

Understanding Correlation Coefficients

1. Correlation measures the strength and direction of the relationship between two variables. 2. A scatter plot can show the correlation visually - an upward linear pattern indicates positive correlation, downward linear indicates negative correlation. 3. Pearson's correlation coefficient (r) quantifies the linear correlation between two variables, ranging from -1 to 1. Values closer to -1 or 1 indicate a stronger linear correlation.

Uploaded by

GAYATHRI NATESH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views15 pages

Understanding Correlation Coefficients

Uploaded by

GAYATHRI NATESH

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Correlation

So far we have confined to the distributions involving

only one variable known as univariate distributions.

Normally we collect the data on many variables. Whenever

we conduct any experiment we gather information on more

related variables. A questionnaire will have the details about

name, age, education, gender, years of experience, income

etc.

Correlation is the study of relationship between two or

more variables.

When there are two related variables their joint

distribution is known as bivariate normal distribution and if

there are more than two variables their joint distribution is

known as multivariate normal distribution.

In case of bi-variate or multivariate normal distribution,

we are interested in discovering and measuring the

magnitude and direction of relationship between 2 or more

variables. For this we use the tool known as correlation.

Suppose we have two continuous variables X and Y and

if the change in X affects Y, the variables are said to be

correlated. In other words, the systematic relationship

between the variables is termed as correlation. When only 2

variables are involved the correlation is known as simple

correlation and when more than 2 variables are involved the

correlation is known as multiple correlation. For example if

there are two variables X and Y the relation is called as

simple correlation and if there are three variables say Y, X1

and X2 then the relationship is known multiple correlation.

Types of correlation

An increase in the value of X results in an increase in

the value of Y, then they move in the same direction. A

decrease in the value of X results in a decrease in the value

of Y, then also we say they move in the same direction.

Similarly an increase in X if it results in a decrease in Y they

move in the opposite direction and a decrease in X if it

results in an increase in Y then they move in the opposite

direction.

When the variables move in the same direction, these

variables are said to be positively correlated and if they

move in the opposite direction they are said to be negatively

correlated. The positive correlation is also known as direct

correlation. The negative correlation is also termed as

inverse correlation. When the two variables are not at all

related they are said to be independent. An eample for each

of the above are

rainfall and yield of a crop - positive correlation

the speed of a vehicle and the time taken to reach a

destination - negative correlation

amount of rainfall in Mumbai and yield of a crop in

Chennai - independent

In correlation we need not know which is the cause

variable and which is the effect variable.

Other examples of positive correlation

Height and weight. Taller people tend to be heavier.

Day temperature and sale of ice cream

An example of negative correlation would be

height above sea level and temperature. As you climb the
mountain (increase in height) it gets colder (decrease in
temperature).

A zero correlation (independent) exists when there is no

relationship between two variables. For example there is no
relationship between the amount of tea drunk and level of
intelligence.

Some more examples are

• age and salary

• experience and salary
• quantity of fertiliser applied and yield of a crop
• demand and supply
• the cost (in rupees) of a call to its duration (length)
• The more time you spend running on a treadmill, the more
calories you will burn.
• Taller people have larger shoe sizes and shorter people have
smaller shoe sizes.
• The number of days of absences in a course and the final
exam grade

Scatter Diagram

In correlation studies first we have to investigate

whether there is a relation between the variables X and Y.

For this a correlation can be expressed visually. This is done

by drawing a scatter diagram (also known as a scatterplot,

scatter graph, scatter chart).

A scatter diagram is a graphical display that shows the

relationships between two numerical variables, which are

represented as points (or dots) for each pair of score.

A scatter diagram indicates the strength and direction of the

correlation between the variables.

To investigate whether there is any relation between

the variables X and Y we use scatter diagram. Let (x1,y1),

(x2,y2)….(xn,yn) be n pairs of observations. If the variables X

and Y are plotted along the X-axis and Y-axis respectively in

the x-y plane of a graph sheet the resultant diagram of dots

is known as scatter diagram. From the scatter diagram we

can say whether there is any correlation between x and y

and whether it is positive or negative or the correlation is

linear or curvilinear. It may take any one of the following

forms.

When you draw a scatter diagram it doesn't matter which

variable goes on the x-axis and which goes on the y-axis.

Remember, in correlations we are always dealing with

paired scores, so the values of the 2 variables taken

together will be used to make the diagram.

Decide which variable goes on each axis and then simply

put a dot or a cross at the point where the 2 values coincide.

There is no rule for determining what size of correlation is

considered strong, moderate or weak. The interpretation of

the coefficient depends on the topic of study.

If the correlation coefficient is

> 0.4 we say the relationship is moderate and > 0.75

relatively strong.

Correlation can have a value:

• 1 is a perfect positive correlation

• 0 is no correlation (the values don't seem linked at all)

• -1 is a perfect negative correlation

Pearson's product moment correlation coefficient
The most common measure of correlation is Pearson’s

product-moment correlation, which is commonly referred to

as the correlation coefficient. The measures of the degree of

relationship between two continuous variables is called

correlation coefficient. It is denoted by r (in case of sample)

and  ( rho in case of population). The correlation coefficient

r is known as Pearson’s correlation coefficient as it was

discovered by Karl Pearson. It is also called as product

moment correlation.

The correlation coefficient r is given as the ratio of

covariance of the variables X and Y to the product of the

standard deviation of X and Y.

Symbolically,
1
( (x − x )( y − y ))
r = n − 1
1
 ( x − x )2 1  ( y − y )2
n −1 n −1

as n-1 is common in all it can be removed and written as

Where:

• rxy – the correlation coefficient of the linear relationship between

the variables x and y

• xi – the values of the x-variable in a sample

• x̅ – the mean of the values of the x-variable

• yi – the values of the y-variable in a sample

• ȳ – the mean of the values of the y-variable

In order to calculate the correlation coefficient using the formula

above, you must undertake the following steps:

1. Obtain a data sample with the values of x-variable and y-variable.

2. Calculate the means (averages) x̅ for the x-variable and ȳ for the

y-variable.

3. For the x-variable, subtract the mean from each value of the x-

variable (let’s call this new variable “a”). Do the same for the y-

variable (let’s call this variable “b”).

4. Multiply each a-value by the corresponding b-value and find the

sum of these multiplications (the final value is the numerator in

the formula).

5. Square each a-value and calculate the sum of the result

6. Find the square root of the value obtained in the previous step

(this is the denominator in the formula).

7. Divide the value obtained in step 4 by the value obtained in step

𝑐𝑜𝑣(𝑥,𝑦)
r=
√𝑣𝑎𝑟(𝑥 ) 𝑋 𝑣𝑎𝑟 (𝑦)

Another formula that can be used is

 x y
 xy −
r = n
( x )2
( y )2

x 2
−
n
y 2
−
n
𝑆𝑃(𝑥,𝑦)
r=
√𝑆𝑆(𝑥) 𝑋 𝑆𝑆(𝑦)
This correlation coefficient r is known as Pearson’s product

moment correlation coefficient. The numerator is termed as

sum of product of X and Y and abbreviated as SP(XY). In the

denominator the first term is called sum of squares of X (i.e)

SS(X) and second term is called sum of squares of Y (i.e)

SS(Y)
SP( XY )
r =
SS ( X ) SS (Y )

The denominator in the above formula is always positive.

The numerator may be positive or negative making r to be

either positive or negative.

Assumptions in correlation analysis:

Correlation coefficient r is used under certain assumptions

and they are

1. The variables under study are continuous random

variables and they are normally distributed

2. The relationship between the variables is linear

3. Each pair of observations is unconnected with other

pair (independent)

Problem:
Compute Pearsons coefficient of correlation between plant height (cm)

X and yield (Kgs) Y as per the data given below:

X 39 65 62 90 82 75 25 98 36 78

Y 47 53 58 86 62 68 60 91 51 84

X Y (x-
65)(y-
(x-65) (y-66) (x-65)2 (y-66)2 66)
39 47 -26 -19 676 361 494
65 53 0 -13 0 169 0
62 58 -3 -8 9 64 24
90 86 25 20 625 400 500
82 62 17 -4 289 16 -68
75 68 10 2 100 4 20
25 60 -40 -6 1600 36 240
98 91 33 25 1089 625 825
36 51 -29 -15 841 225 435
78 84 13 18 169 324 234
650 660 5398 2224 2704

rxv = 2704 / √(5398 X 2224) = 0.7804

n = 10

 x = 650  y = 660  xy = 45604  x 2

= 47648 y 2
= 45784
 x y
 xy −
r = n
( x ) 2
( y ) 2

x 2
−
n
y 2
−
n

(650)(660)
45604 −
= 10
(650) 2 (660) 2
47648 − 45784 −
10 10
45604 − 42900
= = 0.7804
(73.47)( 47.1)
Correlation coefficient is positively correlated.
Both the methods give the same result.

u=y- v=x-
Y X 5.13 79.41 v sqr v sqr uXv
5.22 94.2 0.09 14.79 0.0081 218.74 1.3311
8.13 69.3 3 -10.11 9 102.21 -30.33
6.52 114.3 1.39 34.89 1.9321 1217.3 48.497
4.16 83.3 -0.97 3.89 0.9409 15.132 -3.773
8.98 85.4 3.85 5.99 14.823 35.88 23.062
3.05 68.1 -2.08 -11.31 4.3264 127.92 23.525
3.49 50.7 -1.64 -28.71 2.6896 824.26 47.084
5.4 96.2 0.27 16.79 0.0729 281.9 4.5333
2.39 76.1 -2.74 -3.31 7.5076 10.956 9.0694
2.71 52 -2.42 -27.41 5.8564 751.31 66.332
3.97 82.1 -1.16 2.69 1.3456 7.2361 -3.12
7.56 81.3 2.43 1.89 5.9049 3.5721 4.5927
61.58 953 0.02 0.08 54.407 3596.4 190.8
0.4313

Y X y-3 x-20
5.22 94.2 2.22 74.2
8.13 69.3 5.13 49.3
6.52 114.3 3.52 94.3
4.16 83.3 1.16 63.3
8.98 85.4 5.98 65.4
3.05 68.1 0.05 48.1
3.49 50.7 0.49 30.7
5.4 96.2 2.4 76.2
2.39 76.1 -0.61 56.1
2.71 52 -0.29 32
3.97 82.1 0.97 62.1
7.56 81.3 4.56 61.3

corr 0.4313

Properties

1. It is a unit free measure.

[Link] correlation coefficient value ranges between –1

and +1. If we get a value of r beyond these limits, it is

an indication of wrong computation.

3. The correlation coefficient is not affected by change

of origin or scale or both. When a constant is added or

subtracted from the original values of a variable we say

that the origin is changed. When the original values of a

variable is multiplied or divided by a constant we say

that the scale is changed.

4. It is symmetric. i.e. rxy = ryx.

Consider
X 2 4 6 8 10
Y 5 8 11 14 17
For an increase in X value of 2 there is an increase of 3 units
in Y. The rate of change is constant and we say they are
linearly related. The correlation calculated for such data is
called simple linear correlation.

Rank correlation

One of the assumption under correlation analysis is that the

2 variables are normally distributed. When both the variables
are not normal the linear correlation procedure is not
applicable. In such case we use rank correlation. There are
two methods available to calculate rank correlation. One is
proposed by Spearman and the other by Kendall. Both can
be applied for the same data. But Spearman rank correlation
is more popular than the other.

Spearman's rank correlation

This is indicated by rs. This procedure starts with ranking of
the measurements of the variable X and Y separately. The
ranks are assigned with the highest value getting the 1st
rank. The differences between the ranks of each of n pairs
are then found out. They are denoted by d. The Spearman's
rank correlation is then calculated by using the formula
6 𝑑2
rs =1 -
𝑛(𝑛2−1)
when there are no ties.
Calculate Spearman's rank correlation for the following data.
X Y d d2
5 7 -2 4
6 6 0 0
4 2 2 4
8 4 4 16
1 3 -2 4
3 1 2 4
2 5 -3 9
7 10 -3 9
9 8 1 1
10 9 1 1
∑ d = 52
2

rs = 1 - (6 ∑d2 / (n(n2-1)))
rs = 1- (6 X 52 / (10(102-1)))
rs = 1- (312/990) = 0.68

The following are the marks assigned by two judges in an interview.

Calculate Spearman's rank correlation.

Judge A 40, 48, 25, 32, 50, 41, 15, 18, 27, 36, 43, 49

Judge B 35, 40, 22, 25, 47, 38, 17, 19, 26, 33, 41, 39

Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
9 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
5 pages
78bdc117-fb4e-4bf7-a708-4aef31463ba4 (1)
No ratings yet
78bdc117-fb4e-4bf7-a708-4aef31463ba4 (1)
32 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
16 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
95 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
24 pages
Scatter Diagrams in Correlation Analysis
No ratings yet
Scatter Diagrams in Correlation Analysis
54 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
9 pages
Correlation
No ratings yet
Correlation
9 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
30 pages
Understanding Correlation Analysis
100% (1)
Understanding Correlation Analysis
51 pages
Understanding Correlation Types and Methods
100% (1)
Understanding Correlation Types and Methods
78 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
37 pages
Understanding Correlation Coefficients
No ratings yet
Understanding Correlation Coefficients
6 pages
Correlation vs Regression Explained
No ratings yet
Correlation vs Regression Explained
27 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
17 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
16 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
10 pages
Correlation Analysis and Types
100% (2)
Correlation Analysis and Types
46 pages
Understanding Correlation and Regression
100% (1)
Understanding Correlation and Regression
57 pages
Correlation and Regression Analysis Guide
0% (1)
Correlation and Regression Analysis Guide
17 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
44 pages
Types of Correlation Explained
No ratings yet
Types of Correlation Explained
4 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
15 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
46 pages
CORRELATION1
No ratings yet
CORRELATION1
82 pages
STAT-111 - (C1) Corelations and Regression
No ratings yet
STAT-111 - (C1) Corelations and Regression
10 pages
Understanding Correlation Types and Methods
No ratings yet
Understanding Correlation Types and Methods
14 pages
Correlation and Regression Explained
No ratings yet
Correlation and Regression Explained
29 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
77 pages
Regression and Correlation
No ratings yet
Regression and Correlation
10 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
24 pages
Understanding Correlation Types and Methods
No ratings yet
Understanding Correlation Types and Methods
61 pages
Types and Computation of Correlation
No ratings yet
Types and Computation of Correlation
25 pages
CorrelationPracticalWriteup
No ratings yet
CorrelationPracticalWriteup
5 pages
Understanding Correlation Analysis Techniques
No ratings yet
Understanding Correlation Analysis Techniques
48 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
11 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
71 pages
Unit2 A
No ratings yet
Unit2 A
17 pages
Understanding Correlation: Types & Methods
No ratings yet
Understanding Correlation: Types & Methods
72 pages
Understanding Statistical Correlation
No ratings yet
Understanding Statistical Correlation
13 pages
Bivariate Data Analysis and Correlation
No ratings yet
Bivariate Data Analysis and Correlation
32 pages
Correlation vs. Regression Explained
No ratings yet
Correlation vs. Regression Explained
16 pages
Understanding Correlation Types and Methods
No ratings yet
Understanding Correlation Types and Methods
37 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
14 pages
Correlation Types and Coefficient Explained
No ratings yet
Correlation Types and Coefficient Explained
1 page
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
100 pages
Understanding Correlation Coefficient
No ratings yet
Understanding Correlation Coefficient
29 pages
Understanding Correlation and Its Types
No ratings yet
Understanding Correlation and Its Types
4 pages
Bivariate Analysis in Statistics
No ratings yet
Bivariate Analysis in Statistics
13 pages
Correlation and Its Types Explained
No ratings yet
Correlation and Its Types Explained
6 pages
Understanding Bivariate Correlation Analysis
No ratings yet
Understanding Bivariate Correlation Analysis
50 pages
Understanding Correlation Types and Methods
No ratings yet
Understanding Correlation Types and Methods
17 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
15 pages
Correlation and Regression Analysis Guide
100% (1)
Correlation and Regression Analysis Guide
35 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
50 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
21 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
54 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
57 pages
Hosakote Taluk Villages and Population
No ratings yet
Hosakote Taluk Villages and Population
14 pages
Environmental Sensitivity Mapping Guide
100% (1)
Environmental Sensitivity Mapping Guide
2 pages
Evolution of Performance Management in SMEs
No ratings yet
Evolution of Performance Management in SMEs
16 pages
LLM Strategies for Financial Institutions
No ratings yet
LLM Strategies for Financial Institutions
53 pages
BTH3752 Molecular Biology Exam Paper
No ratings yet
BTH3752 Molecular Biology Exam Paper
4 pages
Status of Rebel Returnees in Aurora
No ratings yet
Status of Rebel Returnees in Aurora
66 pages
Understanding Business Process Management
No ratings yet
Understanding Business Process Management
18 pages
Amrutanjan: Legacy of Pain Relief Since 1893
No ratings yet
Amrutanjan: Legacy of Pain Relief Since 1893
61 pages
Call for Papers: Ukraine Conference 2025
No ratings yet
Call for Papers: Ukraine Conference 2025
2 pages
Military Hyperspectral Sensor Insights
No ratings yet
Military Hyperspectral Sensor Insights
5 pages
Higher Modern Studies Assignment Guide
No ratings yet
Higher Modern Studies Assignment Guide
6 pages
2020 Census Consent Form for CAWI
No ratings yet
2020 Census Consent Form for CAWI
2 pages
Student Assessment Handbook 2020-2021
No ratings yet
Student Assessment Handbook 2020-2021
61 pages
E-Commerce Impact on Indian Consumers
100% (2)
E-Commerce Impact on Indian Consumers
72 pages
Versión Española de La Escala de Creencias Irracionales Sobre Los Alimentos
No ratings yet
Versión Española de La Escala de Creencias Irracionales Sobre Los Alimentos
9 pages
AI in Sri Lankan Military Leadership
No ratings yet
AI in Sri Lankan Military Leadership
26 pages
Graduates' Employability and Soft Skills
No ratings yet
Graduates' Employability and Soft Skills
6 pages
Understanding Risk Assessment Basics
No ratings yet
Understanding Risk Assessment Basics
31 pages
Cybersecurity Performance and Investment Insights
No ratings yet
Cybersecurity Performance and Investment Insights
12 pages
Family Planning Study in Calabar Municipality
No ratings yet
Family Planning Study in Calabar Municipality
7 pages
Statistical Modeling and Report Writing Guide
No ratings yet
Statistical Modeling and Report Writing Guide
4 pages
Sample Imrad
No ratings yet
Sample Imrad
17 pages
Eco-Anxiety's Impact on Mental Health
No ratings yet
Eco-Anxiety's Impact on Mental Health
6 pages
Understanding Risk Matrices in Projects
No ratings yet
Understanding Risk Matrices in Projects
4 pages
Academic Writing Guide: APA 7th Format
No ratings yet
Academic Writing Guide: APA 7th Format
23 pages
Post-Doctoral Research Fellowships at All Souls
No ratings yet
Post-Doctoral Research Fellowships at All Souls
5 pages
Reservation and Billing System Overview
No ratings yet
Reservation and Billing System Overview
14 pages
Inquiry-Guided Instruction Explained
No ratings yet
Inquiry-Guided Instruction Explained
32 pages
Ethical Scale for Instagram Use Motivation
No ratings yet
Ethical Scale for Instagram Use Motivation
18 pages
Data Modeling Principles Overview
100% (1)
Data Modeling Principles Overview
21 pages