0% found this document useful (0 votes)
10 views34 pages

Understanding Correlation Analysis

The document provides an overview of correlation analysis, explaining the concept of correlation, the simple correlation coefficient (Pearson's r), and its interpretation. It details the types of correlation, methods to compute correlation coefficients, and includes examples illustrating the relationships between variables. Additionally, it discusses Spearman Rank Correlation as a non-parametric measure for ordinal data.

Uploaded by

attikabasharat
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views34 pages

Understanding Correlation Analysis

The document provides an overview of correlation analysis, explaining the concept of correlation, the simple correlation coefficient (Pearson's r), and its interpretation. It details the types of correlation, methods to compute correlation coefficients, and includes examples illustrating the relationships between variables. Additionally, it discusses Spearman Rank Correlation as a non-parametric measure for ordinal data.

Uploaded by

attikabasharat
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CORRELATION ANALYSIS

DR. FAHIM ULLAH KHAN


Associate Professor
Department of Agriculture
Hazara University, Mansehra
Correlation

◼ Correlation is a statistical technique used


to determine the degree to which two
variables are related

◼ A measurement of the magnitude and


direction of two numerical variables

◼ The estimated correlation value may not


be 100% accurate but the stronger is the
relationship, the more accurate the
estimate.
Simple Correlation coefficient (r)

➢ It is also called Pearson's correlation or


product moment correlation coefficient.
➢ It measures the nature and strength between
two variables of the quantitative type.
➢ The sign of r denotes the nature of
association
➢ while the value of r denotes the strength of
association.
➢ If the sign is +ve this means the relation is
direct (an increase in one variable is
associated with an increase in the
other variable and a decrease in one variable
is associated with a
decrease in the other variable).

➢ While if the sign is -ve this means an inverse


or indirect relationship (which means an
increase in one variable is associated with a
decrease in the other).
➢ The value of r ranges between ( -1) and ( +1)
➢ The value of r denotes the strength of the
association as illustrated
by the following diagram.

strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1


indirect Direct
perfect perfect
correlation correlation
no relation
If r = Zero this means no association or
correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.
Bivariate distribution
• A distribution showing the association between
two (2) numerical variables
• Usually illustrated via the Scatter diagram/ plot

7
W t . 6 7 6 9 8 5 8 3 7 4 8 1 9 7 9 2 1 1 4 8 5
( k g )
S B P 1 2 0 1 2 5 1 4 0 1 6 0 1 3 0 1 8 0 1 5 0 1 4 0 2 0 0 1 3 0
m H g )

SBP(m mHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120

Scatter diagram of weight and systolic blood pressure


SCATTER PLOTS

The pattern of data is indicative of the type of


relationship between your two variables:
➢ Positive correlation
(r = + ; variables change in the same direction)
➢ Negative correlation
(r = - ; variables change in the opposite direction)
➢ Zero correlation
(r=0 ; variables show no association and direction)
Types of correlation

10
Types of correlation

11
S C H O O L O F NUTR IT IO N AND D IE TE T IC S • UNIV E R S IT I S UL T AN Z AINA L ABID IN
Interpretation
• A relationship exits between two variables; X & Y
• Does not mean any change in X will also alter Y
• Does not involve causal relationship (e.g. X causes Y to occur)
• If r = 0.50 does not mean 50% relationship

12
Interpretation
Correlation Interpretation
-1.00 to -0.76 Strong negative correlation
-0.51 to -0.75 Good negative correlation
-0.50 to -0.26 Fair negative correlation
-0.25 to 0.01 Poor negative correlation
0 No correlation
0.01 to 0.25 Poor positive correlation
0.26 to 0.50 Fair positive correlation
0.51 to 0.75 Good positive correlation
0.76 to 1.00 Strong positive correlation 13
Linear vs. Non-linear correlation
• Not all correlations are linear
• Sometimes correlation can be non-linear e.g.
Curvilinear
– Score are concentrated on a curved line
• The Pearson’s coefficient is not suitable to
quantify this relationship

14
Linear vs. Non-linear correlation

15
Simple example on correlation
• Research Question: Is sleeping duration (hrs)
correlated with exam score (marks)?
• Null Hypothesis: There is no correlation between
sleeping duration (hrs) and exam score (marks)

• Results: If r = -0.89; p < 0.05; n = 220

• Interpretation: The correlation is negative i.e. as


the sleeping duration increase, the exam score
will decrease and vise versa.
– The large magnitude of correlation (nearing 1.00) 16
means the association is strong
Positive relationship
Negative relationship

Reliability

Age of Car
No relation
How to compute the simple correlation
coefficient (r)

xy −  xy
r = n

x −
2
(  x) 2
 
.  y −
2
(  y) 2


 n  n 
  
Example:
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.

serial Age Weight


No (years) (Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13
These 2 variables are of the quantitative type, one
variable (Age) is called the independent and
denoted as (X) variable and the other (weight)
is called the dependent and denoted as (Y)
variables to find the relation between age and
weight compute the simple correlation coefficient
using the following formula:

 xy − xy
r = n

  x2 −
(  x) 2

.  y 2 −
(  y) 2


 n  n 
  
Age Weight
Serial
(years) (Kg) xy X2 Y2
n.
(x) (y)
1 7 12 84 49 144
2 6 8 48 36 64
3 8 12 96 64 144
4 5 10 50 25 100
5 6 11 66 36 121
6 9 13 117 81 169
Total ∑x= ∑y= ∑xy= ∑x2= ∑y2=
41 66 461 291 742
Total ∑x= ∑y= ∑xy= ∑x2= ∑y2=
41 66 461 291 742

 xy − xy
r = n
 (  x) 2
 (  y) 2 
 x2 −  .  y −
2

 n  n 
  

461− 41 66
r = 6
 (41)2   (66)2 
291− .742 − 
 6   6 

r = 0.759
strong direct correlation
EXAMPLE: Relationship between Anxiety and
Test Scores

Anxiety Test X2 Y2 XY
(X) score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129
Calculating Correlation Coefficient

∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129

(6)(129) − (32)(32) 774 −1024


r= = = −.94
(6(230) − 32 )(6(204) − 32 )
2 2 (356)(200)

r = - 0.94
Indirect strong correlation
Spearman Rank Correlation Coefficient
(rs)
It is a non-parametric measure of correlation.
This procedure makes use of the two sets of
ranks that may be assigned to the sample
values of x and Y.
Spearman Rank correlation coefficient could be
computed in the following cases:
Both variables are quantitative.
Both variables are qualitative ordinal.
One variable is quantitative and the other is
qualitative ordinal.
Procedure:
1. Rank the values of X from 1 to n where n
is the numbers of pairs of values of X and
Y in the sample.
2. Rank the values of Y from 1 to n.
3. Compute the value of di for each pair of
observation by subtracting the rank of Yi
from the rank of Xi
4. Square each di and compute ∑di2 which
is the sum of the squared values.
5. Apply the following formula

6  di 2
rs = 1 −
n(n 2 − 1)

The value of rs denotes the magnitude and


nature of association giving the same
interpretation as simple r.
Example
In a study of the relationship between level
education and income the following data was
obtained. Find the relationship between them
and comment.
sample level education Income
numbers (X) (Y)
A Preparatory 25
B Primary 10
C University 8
D Secondary 10
E Secondary 15
F Illiterate 50
G University 60
• RANKING
• The score with the highest value should be labelled "1" and the
lowest score should be labelled “7" (if your data set has more than 7
cases then the lowest score will be how many cases you have).
• Look carefully at the two samples that their level of education is same
(University, Secondary).
• Notice their joint rank of 1.5 and 3.5 for University and Secondary.
• This is because when you have two identical values in the data (called a "tie"),
you need to take the average of the ranks that they would have otherwise
occupied.
• We do this because, in this example, we have no way of knowing which score
should be put in rank 1 or 2 (University) or 4 and 5 (Secondary).
• Therefore, you will notice that the ranks of 1 and 2 do not exist for University.
These two ranks have been averaged ((1 + 2)/2 = 1.5) and assigned to each of
these "tied" scores.
Education(X) Arrange it (X) Rank (Normal-no double value) Rank (with double value) Final Rank (X)
Preparatory 1.5
University 1
(1+2)/2 = 1.5
Primary 1.5
University 2
University Secondary 3.5
3
(3+4)/2=3.5
Secondary Secondary 3.5
4
Secondary Preparatory 5
5 5
Illiterate Primary 6 6 6
7
University Illiterate 7 7

Income (Y) Arrange it (Y) Rank (Normal-no double value) Rank (with double value) Final Rank (X)
25 60 1 1 1
10 50 2 2 2
s8 25 3 3 3
10 15 4 4 4
15 10 5 5.5
(5+6)/2= 5.5
50 10 6 5.5
60 8 7 7 7
Rank Rank di di2
(X) (Y) X Y
A Preparatory 25 5 3 2 4

B Primary. 10 6 5.5 0.5 0.25


C University. 8 1.5 7 -5.5 30.25
D secondary 10 3.5 5.5 -2 4
E secondary 15 3.5 4 -0.5 0.25
F illiterate 50 7 2 5 25
G university. 60 1.5 1 0.5 0.25

∑ di2=64
6  64
rs = 1 − = −0.1
7(48)
Comment:
There is an indirect weak correlation between level of education and income.
exercise

You might also like