PEARSON
C O R R E L AT I O N
A N A LY S I S
(REGRESSION)
S TAT I S T I C A L T E S T S
WHEN TO USE THE PEARSON
CORRELATION ANALYSIS AND WHAT
DOES IT DO?
• Both, the independent and dependent variables, are QUANTITATIVE.
• A person correlation analysis tells you if there is a direct or inverse linear
relation between the variables.
– Direct correlation: If one increases, the other increases in response, and vice
versa.
– Inverse correlation: If one increases the other one decreases, and vice versa.
• It does NOT tell you if one is causing the other. CORRELATION CAUSATION
• To know the cause of something you need further analysis, isolate variables
under strict conditions and several repetitions.
WHAT DOES A
CORRELATION TEST RESULT
LOOK LIKE?
• The result is a coefficient (r) that evaluates how well the data fits an
increasing function (usually a linear equation).
• This correlation coefficient can take values from -1 to +1
• Interpretation of results in this range is arbitrary and highly depends on the
topic of research, but usually this is the criteria used:
Coefficient (r) Interpretation
between:
+1 and +0.7 Strong correlation (direct)
+0.7 and +0.4 Moderate correlation
(direct)
+0.4 and 0 Weak or no correlation
(direct)
0 and -0.4 Weak or no correlation
GRAPHICAL DISPLAY
r close to r between 0.4 and
+1 0.7
r close to
0
r close to - r between -0.4 and
1 -0.7
PEARSON CORRELATION
TEST IN EXCEL
• On the first column register the values of the independent variables (if
there are repetitions, put them all on this column).
• On the second column register the values obtained for the dependent
variable right next to the corresponding independent variable value.
• Type =PEARSON(X values, Y values).
– X values refers to the values of the independent variable.
– Y values refers to the values of the dependent variable.
BUT WE ALSO NEED THE P-
VALUE!
• The correlation coefficient only means something IF there is a correlation, IF
the null hypothesis is rejected.
• The p-value will tell us if there is a correlation to begin with.
• To calculate it we use the following formula in Excel =TDIST(t-statistic,
degress of freedom, 2)
• This means we have to calculate two numbers before we can apply this
formula:
– t-statistic
– Degrees of freedom
CALCULATING THE P-VALUE
• To get the t-statistic:
• Where r is the ABSOLUTE VALUE correlation coefficient, and n is the
number of data points.
• To calculate the degrees of freedom:
• Where n is, once again, the number of data points.
• Do not fear! It’s much easier than you think.
EXAMPLE
• You want to see the relation between weight and height, so you gather 10
volunteers and measure both for each one of them:
Height (m) Weight (kg)
1.60 80.53
1.85 93.85
1.83 90.58
1.70 86.12
1.72 88.34
1.63 80.30
1.75 89.87
1.81 92.13
1.81 89.79
1.77 89.41
REPORT THE RESULTS
• The pearson coefficient (r) obtained was________.
• The p-value obtained was______, which is ______ than alpha (0.05)
• There is _____________ correlation between height and weight.
REGRESSION
• Notice that now you know there is a linear pattern to follow. With this you could predict the
outcome of the dependent variable for any given value of the independent variable, even if
that value was not registered in your data.
• For example, you could predict the weight of a person whose height is 2.05m.
• For this you need a model, an equation.
• You need to perform a linear regression analysis to obtain the equation.
• Or Excel can do it for you using a graph.
• CAREFUL: Do not use the data analysis toolpak for this unless you want to
overload your eyes with numbers. It is a much better analysis, but you need to
know how to read it properly.
IMPORTANT DETAILS
• Not all correlations are linear.
• To test a non-linear correlation, we need other tools (we won’t see those).
• There are assumptions that we need to follow if we want to apply this test.
We will not follow them as strictly, for the sake of simplicity (at the price of
imprecision).
• You need to report all, the correlation coefficient, the p-value and the model
(equation).