0% found this document useful (0 votes)

33 views6 pages

Correlation and Simple Linear Regression Guide

This document provides an overview of correlation and simple linear regression analyses. It defines correlation analysis as measuring the strength of the linear or nonlinear relationship between two continuous variables. Simple linear regression focuses on evaluating the impact of a predictor variable on an outcome variable. The document outlines the Pearson correlation coefficient and Spearman's rank correlation coefficient, and explains how to interpret their values. It also describes the key aspects and assumptions of simple linear regression models, including the least squares method for estimating regression parameters.

Uploaded by

Marianne Christie Ragay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views6 pages

Correlation and Simple Linear Regression Guide

Uploaded by

Marianne Christie Ragay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Correlation and Simple Linear Regression Analyses

Objectives
At the end of this module, the students will be able to:
1. Describe the correlation and simple linear regression analyses;
2. Appreciate the importance of correlation and simple linear regression analyses; and
3. Apply the concepts of correlation and simple linear regression analyses.

A. Introduction to Correlation and Simple Linear Regression Analysis

(Lind, et al., 2006, Mann, 2004, Freund and Simon, 1997)

Analyses between two variables may focus on:

(a) Any association between the variables,
(b) The value of one variable in predicting the other; and
(c) The magnitude/strength of relationship

For Example:
 Family income and expenditure on luxury items.
 Sales revenue and expenses incurred on advertising.
 Yield of a crop and quantity of fertilizer applied.

Aspects to consider in examining the statistical relationship between two or more variables:
 Is there an association between two or more variables? If yes, what is the form and degree of that relationship?
 Is the relationship strong or significant enough to arrive at a desirable conclusion?
 Can the relationship be used for predictive purpose, that is, to predict the most likely value of a dependent
variable corresponding to the given value of independent variable or variables?

The objective of correlation analysis is to gain insight into the strength of the relationship whereas regression
analysis focuses on the form of the relationship between variables. These two techniques are used to investigate
relationships between continuous variables. Correlation analysis is often conducted in a retrospective or
observational study. On the one hand, simple regression analysis is preferred when the aim is to evaluate the relative
impact of the predictor variable on the particular outcome.

Although correlation and regression analyses are mathematically similar, their purposes are different.
Correlation analysis is generally overused. It is often interpreted incorrectly (to establish “causation”) and should be
reserved for generating hypotheses rather than for testing them. On the other hand, regression modeling is a more
useful statistical technique that allows us to assess the strength of the relationships in the data and the uncertainty
in the model by using confidence intervals.

B. Correlation Analysis

Correlation analysis measures and interprets the strength of a linear or nonlinear (eg, exponential,
polynomial, and logistic) relationship between two continuous variables. When conducting correlation analysis, the
term association is used to mean “linear association”.
For this course, the focus is on the Pearson r and Spearman  correlation coefficients. Both correlation
coefficients take on values between -1 and +1, ranging from being negatively correlated (-1) to uncorrelated (0) to
positively correlated (+1). The sign of the correlation coefficient (ie, positive or negative) defines the direction of the
relationship. The absolute value indicates the strength of the correlation

Specifically, the topics covered herein include two commonly used correlation coefficients, the Pearson correlation
coefficient and the Spearman  for measuring linear and nonlinear relationship, respectively, between two continuous
variables.
TABLE 1. Interpretation of Correlation Coefficient

Correlation Coefficient Direction and Strength

Value of Correlation
- 1.0 Perfectly negative
- 0.8 Strongly negative
- 0.5 Moderately negative
- 0.2 Weakly negative
0.0 No association
+0.2 Weakly positive
+0.5 Moderately positive
+0.8 Strongly positive
+1.0 Perfectly positive

Note.—The sign of the correlation coefficient (ie, positive or negative) defines the direction of the
relationship. The absolute value indicates the strength of the correlation.

1. Linear Correlation

The Pearson correlation coefficient is also known as the sample correlation coefficient (r), product-
moment correlation coefficient, or coefficient of correlation. It was introduced by Galton in 1877 and developed
later by Pearson. It measures the linear relationship between two random variables.
For example, when the value of the predictor is manipulated (increased or decreased) by a fixed amount, the
outcome variable changes proportionally (linearly). A linear correlation coefficient can be computed by means of the
data and their sample means. When a scientific study is planned, the required sample size may be computed on the
basis of a certain hypothesized value with the desired statistical power at a specified level of significance

2. Rank Correlation

The Spearman  is the sample correlation coefficient (rs or rho) of the ranks (the relative order) based on
continuous data. It was first introduced by Spearman in 1904. The Spearman  is used to measure the monotonic
relationship between two variables (ie, whether one variable tends to take either a larger or smaller value, though not
necessarily linearly) by increasing the value of the other variable.

Linear vs Rank Correlation Coefficients

The Pearson correlation coefficient necessitates use of interval or continuous measurement scales of the
measured outcome in the study population. In contrast, rank correlations also work well with ordinal rating data, and
continuous data are reduced to their ranks. The rank procedure will also be illustrated briefly with our example data.
The smallest value in the sample has rank 1, and the largest has the highest rank. In general, rank correlations are not
easily influenced by the presence of skewed data or data that are highly variable.

Limitations and Precautions

It is worth noting that even if two variables (eg, cigarette smoking and lung cancer) are highly correlated, it
is not sufficient proof of causation. One variable may cause the other or vice versa, or a third factor is involved, or a
rare event may have occurred. To conclude causation, the causal variables must precede the variable it causes, and
several conditions must be met (eg, reversibility, strength, and exposure response on the basis of the Bradford-Hill
criteria or the Rubin causal model).

C. Simple Linear Regression

Linear Regression:
When the dependence of the variable is represented by a straight line then it is called linear regression,
otherwise it is said to be non linear or curvilinear regression.
For Example, if ‘X’ is dependent variable and ‘Y’ is dependent variable, then the relation Y = a + bX is
linear regression.

The purpose of simple regression analysis is to evaluate the relative impact of a predictor variable on a
particular outcome. This is different from a correlation analysis where the purpose is to examine the strength and
direction of the relationship between two random variables.

Linear regression attempts to find a straight line that best “fits” the data, where the variation of the data
above and below the line is minimized. For this course, only the linear regression of one continuous variable on
another continuous variable with no gaps on each measurement scale is dealt with as an introductory topic. Higher
levels of regression analysis are dealt with at the postgraduate level. There are other types of regression (eg, multiple
linear, logistic, and ordinal) analyses, which are beyond the scope of this course in inferential statistics.

A simple regression model contains only one independent (explanatory) variable, Xi, for i = 1, . . ., n subjects,
and is linear with respect to both the regression parameters and the dependent variable. The corresponding dependent
(outcome) variable is labeled. The model is expressed as

Yi = a + bXi + ei

where the regression parameter a is the intercept (on the y axis), and the regression parameter b is the slope of the
regression line (Fig 1). The random error term ei is assumed to be uncorrelated, with a mean of 0 and constant
variance.
For convenience in inference and improved efficiency in estimation, analyses often incur an additional
assumption that the errors are distributed normally. Transformation of the data to achieve normality may be applied.
Thus, the word line (linear, independent, normal, equal variance) summarizes these requirements.

Figure 1. Simple linear regression model shows that the expectation of the dependent variable Y is linear in the
independent variable X, with an intercept a = 1.0 and a slope b = 2.0.
Typical steps for regression model analysis are the following:
(a) determine if the assumptions underlying a normal relationship are met in the data,
(b) obtain the equation that best fits the data,
(c) evaluate the equation to determine the strength of the relationship for prediction and estimation, and
(d) assess whether the data fit these criteria before the equation is applied for prediction and estimation.
Least Squares Method

The main goal of linear regression is to fit a straight line through the data that predicts Y based on X. To
estimate the intercept and slope regression parameters that determine this line, the least squares method is commonly
used. It is not necessary for the errors to have a normal distribution, although the regression analysis is more efficient
with this assumption. With this regression method, a set of regression parameters are found such that the sum of
squared residuals (ie, the differences between the observed values of the outcome variable and the fitted values) are
minimized. The fitted y value is then computed as a function of the given x value and the estimated intercept and slope
regression parameter. For example, in Eq. 1, once the estimates of a and b are obtained from the regression analysis,
the predicted y value at any given x value is calculated as a + bx.

Coefficient of Determination, R2

It is meaningful to interpret the value of the Pearson correlation coefficient r by squaring it; hence, the term
R-square (R2) or coefficient of determination. This measure (with a range of 0–1) is the fraction of the variability in Y
that can be explained by the variability in X through their linear relationship, or vice versa. That is, R2 =
SSregression/SStotal, where SS stands for the sum of squares. Note that R2 is calculated only on the basis of the
Pearson correlation coefficient in the linear regression analysis. Thus, it is not appropriate to compute R2 on the basis
of rank correlation coefficients such as the Spearman .

Limitations and Precautions

The following understandings should be considered when regression analysis is performed.

(a) To understand whether the assumptions have been met, determine the magnitude of the gap between the data
and the assumptions of the model.
(b) No matter how strong a relationship is demonstrated with regression analysis, it should not be interpreted as
causation (as in the correlation analysis).
(c) The regression should not be used to predict or estimate outside the range of values of the independent
variable of the sample (eg, extrapolation of radiation cancer risk from the Hiroshima data to that of diagnostic
radiologic tests).

Summary

When correlation analysis is conducted to measure the association between two random variables, either the
Pearson linear correlation coefficient or the Spearman rank correlation coefficient  may be adopted. The former
coefficient is used to measure the linear relationship but is not recommended for use with skewed data or data with
extremely large or small values (often called the outliers). In contrast, the latter coefficient is used to measures a
general association, and it is recommended for use with data that are skewed or that have outliers.

When simple regression analysis is conducted to assess the linear relationship of a dependent variable as a
function of the independent variable, caution must be used when determining which of the two variables is viewed as
the independent variable that makes sense clinically. A useful graphical aid is a scatterplot.
Figure 1. Scatterplots of four sets of data generated by means of the following Pearson correlation coefficients
(from left to right): r = 0 (uncorrelated data), r = 0.8 (strongly positively correlated), r = 1.0 (perfectly positively
correlated), and r = -1 (perfectly negatively correlated).

Once the regression line is obtained, caution should also be used to avoid prediction of a y value for any value
of x that is outside the range of the data. Finally, correlation and regression analyses do not infer causality, and
more rigorous analyses are required if causal inference is to be made.

Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
22 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
5 pages
Correlation and Regression Overview
100% (1)
Correlation and Regression Overview
5 pages
Understanding Linear Regression Analysis
No ratings yet
Understanding Linear Regression Analysis
18 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Correlation and Simple Linear Regression Guide
No ratings yet
Correlation and Simple Linear Regression Guide
15 pages
Correlation and Regression Analysis Explained
No ratings yet
Correlation and Regression Analysis Explained
8 pages
Stat Len
No ratings yet
Stat Len
9 pages
Understanding Correlation and Regression Analysis
No ratings yet
Understanding Correlation and Regression Analysis
9 pages
Correlation vs. Regression Explained
No ratings yet
Correlation vs. Regression Explained
3 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
65 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
9 pages
Correlation vs Regression Explained
No ratings yet
Correlation vs Regression Explained
2 pages
Linear Correlation and Regression Guide
No ratings yet
Linear Correlation and Regression Guide
13 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
13 pages
Correlation and Simple Linear Regression
100% (1)
Correlation and Simple Linear Regression
9 pages
Understanding Relationships in Quantitative Research
No ratings yet
Understanding Relationships in Quantitative Research
9 pages
Correlation vs. Regression Analysis Explained
No ratings yet
Correlation vs. Regression Analysis Explained
23 pages
Correlation and Regression Techniques
No ratings yet
Correlation and Regression Techniques
10 pages
Bivariate Linear Regression Overview
No ratings yet
Bivariate Linear Regression Overview
31 pages
Correlation and Linear Regression Guide
No ratings yet
Correlation and Linear Regression Guide
59 pages
Correlation and Linear Regression Basics
No ratings yet
Correlation and Linear Regression Basics
30 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
13 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
15 pages
Correlation and Regression Analysis Guide
100% (1)
Correlation and Regression Analysis Guide
19 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
3 pages
Chapter7 CorrelationRegression
No ratings yet
Chapter7 CorrelationRegression
7 pages
MANSCI Topic 4: Correlation and Regression Analysis
No ratings yet
MANSCI Topic 4: Correlation and Regression Analysis
6 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
42 pages
Correlation vs. Regression Analysis
No ratings yet
Correlation vs. Regression Analysis
8 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
22 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
4 pages
Understanding Correlation Coefficients
No ratings yet
Understanding Correlation Coefficients
33 pages
Corelation Coefficient and Regression 19mph402
No ratings yet
Corelation Coefficient and Regression 19mph402
11 pages
Methods of Studying Correlation
No ratings yet
Methods of Studying Correlation
17 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
10 pages
Unit 05 Corr and Reg
No ratings yet
Unit 05 Corr and Reg
16 pages
Correlation
No ratings yet
Correlation
10 pages
Lecture 12
No ratings yet
Lecture 12
45 pages
Stat Lec 10
No ratings yet
Stat Lec 10
36 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
67 pages
Simple Linear Regression & Correlation
No ratings yet
Simple Linear Regression & Correlation
42 pages
Understanding Linear Correlation and Regression
No ratings yet
Understanding Linear Correlation and Regression
16 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
8 pages
Simple Linear Correlation Explained
No ratings yet
Simple Linear Correlation Explained
15 pages
Derivative
No ratings yet
Derivative
59 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
19 pages
ML Notes Regression Unit-II
No ratings yet
ML Notes Regression Unit-II
39 pages
Simple and Multiple Regression Analysis
No ratings yet
Simple and Multiple Regression Analysis
216 pages
Understanding Correlation Analysis
No ratings yet
Understanding Correlation Analysis
30 pages
Correlation Analysis in Statistics
No ratings yet
Correlation Analysis in Statistics
8 pages
Correlation and Simple Linear Regression
No ratings yet
Correlation and Simple Linear Regression
10 pages
Lecture 2
No ratings yet
Lecture 2
42 pages
Correlation and Regression Analysis
100% (1)
Correlation and Regression Analysis
45 pages
Personal Development Module 2 Overview
No ratings yet
Personal Development Module 2 Overview
4 pages
Personal Development Monitoring Sheet
No ratings yet
Personal Development Monitoring Sheet
4 pages
Microarray Image Analysis Techniques
No ratings yet
Microarray Image Analysis Techniques
44 pages
Understanding Viewing as a Macro Skill
100% (1)
Understanding Viewing as a Macro Skill
23 pages
Understanding Inferential Statistics Basics
No ratings yet
Understanding Inferential Statistics Basics
12 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
12 pages
Travel and Health Disclosure Form
No ratings yet
Travel and Health Disclosure Form
5 pages
Objectives of Statistics Explained
No ratings yet
Objectives of Statistics Explained
8 pages
Understanding Holistic Development
No ratings yet
Understanding Holistic Development
28 pages
Translating Realities in Dahl's Fairy Tale
No ratings yet
Translating Realities in Dahl's Fairy Tale
26 pages
Adjectives and Adverbs Test Papers
No ratings yet
Adjectives and Adverbs Test Papers
15 pages
Business Appointment Guide
No ratings yet
Business Appointment Guide
13 pages
Analyzing Structure in Into The Wild
No ratings yet
Analyzing Structure in Into The Wild
6 pages
Buffett's 2005 PacifiCorp Acquisition Analysis
No ratings yet
Buffett's 2005 PacifiCorp Acquisition Analysis
2 pages
Expository Text Lesson Plan for Grade 7
No ratings yet
Expository Text Lesson Plan for Grade 7
9 pages
Classroom Technology Improvement Proposals
No ratings yet
Classroom Technology Improvement Proposals
2 pages
2019 DSE Physics Paper 1B Marking Scheme
No ratings yet
2019 DSE Physics Paper 1B Marking Scheme
13 pages
All India Test Series for CSIR-JRF Physics
No ratings yet
All India Test Series for CSIR-JRF Physics
11 pages
Rangkuman Bahasa Inggris Kelas XII SMK
100% (1)
Rangkuman Bahasa Inggris Kelas XII SMK
4 pages
Understanding the Factortame Case
No ratings yet
Understanding the Factortame Case
3 pages
Managerial Accounting Reflection Journal
No ratings yet
Managerial Accounting Reflection Journal
5 pages
Class Divisions and Lifestyle in Norway
No ratings yet
Class Divisions and Lifestyle in Norway
30 pages
Gerunds and Infinitives Explained
No ratings yet
Gerunds and Infinitives Explained
1 page
Savage Worlds Character Sheet Guide
100% (1)
Savage Worlds Character Sheet Guide
1 page
Top Informatica Interview Questions
No ratings yet
Top Informatica Interview Questions
8 pages
Managerial Cognitive Capabilities and The Microfoundations of Dynamic Capabilities
No ratings yet
Managerial Cognitive Capabilities and The Microfoundations of Dynamic Capabilities
20 pages
Scribes of the Holy Prophet(s) Explained
No ratings yet
Scribes of the Holy Prophet(s) Explained
3 pages
Pronology Insights and Predictions
100% (1)
Pronology Insights and Predictions
8 pages
Civil Society and Grievances Project - 210101062
No ratings yet
Civil Society and Grievances Project - 210101062
19 pages
Cardoso Et Al. 2015 - RBE
No ratings yet
Cardoso Et Al. 2015 - RBE
7 pages
AP Psychology 2017 Scoring Guidelines
No ratings yet
AP Psychology 2017 Scoring Guidelines
7 pages
Al-Ameen Islamic Active Allocation Plan
No ratings yet
Al-Ameen Islamic Active Allocation Plan
1 page
Class and Death in Mansfield's Garden Party
No ratings yet
Class and Death in Mansfield's Garden Party
6 pages
Swole Solar Seeker Barbarian Guide
No ratings yet
Swole Solar Seeker Barbarian Guide
3 pages
Hrvoje Kekez - In-the-service-of-the-mighty-king-political-relations-between-the-counts-of-Blagaj-and-king-Sigismund-of-Luxemburg
No ratings yet
Hrvoje Kekez - In-the-service-of-the-mighty-king-political-relations-between-the-counts-of-Blagaj-and-king-Sigismund-of-Luxemburg
41 pages
Unique TTRPG Mechanics and Ideas
No ratings yet
Unique TTRPG Mechanics and Ideas
11 pages
Flying Foxes Boost Durian Pollination
No ratings yet
Flying Foxes Boost Durian Pollination
15 pages
Fish Processing and Composition Overview
No ratings yet
Fish Processing and Composition Overview
30 pages
Appeal of George D'Metri in Conspiracy Case
No ratings yet
Appeal of George D'Metri in Conspiracy Case
11 pages

Correlation and Simple Linear Regression Guide

Uploaded by

Correlation and Simple Linear Regression Guide

Uploaded by

Correlation and Simple Linear Regression Analyses

A. Introduction to Correlation and Simple Linear Regression Analysis

Analyses between two variables may focus on:

Correlation Coefficient Direction and Strength

Linear vs Rank Correlation Coefficients

Limitations and Precautions

C. Simple Linear Regression

Limitations and Precautions

The following understandings should be considered when regression analysis is performed.

You might also like