0% found this document useful (0 votes)

13 views7 pages

Chapter7 CorrelationRegression

Chapter 7 covers correlation and regression analysis, explaining the concepts of correlation, correlation coefficients, and the distinction between correlation and causation. It details regression analysis techniques, including simple and multiple linear regression, and discusses assumptions, estimating coefficients, and model evaluation metrics. Additionally, it introduces advanced regression models and data transformation methods to improve model accuracy.

Uploaded by

saltwatersouls65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Chapter7 CorrelationRegression

Uploaded by

saltwatersouls65

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

CHAPTER 7

CORRELATION AND REGRESSION ANALYSIS

1. CORRELATION ANALYSIS
Correlation Analysis — a statistical method used to evaluate the strength and direction of
the linear relationship between two quantitative variables. It measures how closely two
variables move together.
Dependent Variable — the variable whose value is being predicted or explained. It is the
outcome or response variable in the analysis, often denoted as Y.
Independent Variable — the variable used to predict or explain changes in the dependent
variable. It is the predictor or explanatory variable, often denoted as X.
Scatter Diagram (Scatter Plot) — a graphical representation that shows the relationship
between two variables. Each point on the graph represents a pair of values (X, Y). It is used
to visually assess whether a relationship exists between two variables.
Positive Correlation — a relationship between two variables where both variables move in
the same direction. When one variable increases, the other also increases.
Negative Correlation — a relationship between two variables where the variables move in
opposite directions. When one variable increases, the other decreases.
Zero Correlation (No Correlation) — a situation where there is no linear relationship
between two variables. Changes in one variable do not predict changes in the other.
Perfect Positive Correlation — a relationship where the correlation coefficient equals
exactly +1, meaning the two variables increase together in perfect proportion.
Perfect Negative Correlation — a relationship where the correlation coefficient equals
exactly -1, meaning one variable increases in perfect proportion to the decrease in the other.

2. THE CORRELATION COEFFICIENT

Correlation Coefficient — a numerical measure that quantifies the strength and direction of
the linear relationship between two variables. It ranges from -1 to +1.
• A value close to +1 indicates a strong positive relationship.
• A value close to -1 indicates a strong negative relationship.
• A value close to 0 indicates little to no linear relationship.

a. Pearson's Correlation Coefficient

Pearson's Correlation Coefficient (r) — also called the Pearson Product-Moment
Correlation Coefficient, it is a measure of the linear relationship between two continuous,
quantitative variables. It assumes that both variables are normally distributed and that the
relationship between them is linear. It is the most commonly used correlation coefficient.
The value of r always falls between -1 and +1:
• r = +1 → perfect positive linear relationship
• r = -1 → perfect negative linear relationship
• r = 0 → no linear relationship
Strength Interpretation of Pearson's r:
• ±0.00 to ±0.19 — Very Weak Correlation
• ±0.20 to ±0.39 — Weak Correlation
• ±0.40 to ±0.59 — Moderate Correlation
• ±0.60 to ±0.79 — Strong Correlation
• ±0.80 to ±1.00 — Very Strong Correlation

b. Spearman's Rank Correlation Coefficient

Spearman's Rank Correlation Coefficient (rₛ or ρ) — a nonparametric measure of the
monotonic relationship between two variables. Instead of using the actual data values, it
uses the ranks of the data. It is used when the data are ordinal, or when the assumptions of
Pearson's correlation (normality, linearity) are not met.
Rank — the position of a data value when all values are arranged in ascending or
descending order.
Monotonic Relationship — a relationship where as one variable increases, the other
variable either consistently increases or consistently decreases, but not necessarily at a
constant rate.
Tied Ranks — when two or more data values are equal, they are assigned the average of
the ranks they would have occupied.

3. CORRELATION AND CAUSATION

Correlation and Causation — the principle that a correlation between two variables does
not necessarily mean that one variable causes the other. Just because two variables are
correlated does not imply a cause-and-effect relationship.
Causation — implies that a change in one variable directly causes a change in another
variable.
Spurious Correlation — a correlation between two variables that appears to be meaningful
but is actually caused by a third variable (confounding variable) or is purely coincidental.
Confounding Variable — a variable that is not included in the analysis but influences both
the dependent and independent variables, creating a misleading correlation between them.

4. REGRESSION ANALYSIS
Regression Analysis — a statistical technique used to model and analyze the relationship
between a dependent variable and one or more independent variables. It is used to predict
the value of the dependent variable based on the values of the independent variables.
Least Squares Principle — the method used in regression analysis to determine the best-
fitting line by minimizing the sum of the squared differences (residuals) between the
observed values and the values predicted by the regression line. It produces the line that
results in the smallest possible total squared error.
Regression Line — the line that best fits the data in a scatter plot, determined using the
least squares principle. It represents the predicted relationship between the dependent and
independent variables.
Residual (Error Term) — the difference between the observed value of the dependent
variable and the value predicted by the regression model. It represents the portion of the
dependent variable that cannot be explained by the independent variable(s).

5. SIMPLE LINEAR REGRESSION

Simple Linear Regression — a regression model that examines the linear relationship
between one dependent variable (Y) and one independent variable (X). The model is
expressed as:
Y = β₀ + β₁X + ε
Where:
• Y = dependent variable
• β₀ = y-intercept (the value of Y when X = 0)
• β₁ = slope (the change in Y for every one-unit change in X)
• ε = error term
Y-Intercept (β₀ or b₀) — the value of the dependent variable Y when the independent
variable X is equal to zero. It is the point where the regression line crosses the Y-axis.
Slope (β₁ or b₁) — the change in the dependent variable Y for every one-unit increase in
the independent variable X. It represents the rate of change in the relationship between X
and Y.

a. Assumptions of Simple Linear Regression

Linearity — the relationship between the dependent variable and the independent variable
must be linear.
Independence — the observations (residuals) must be independent of each other. There
should be no pattern or relationship between the residuals.
Homoscedasticity — the variance of the residuals (errors) must be constant across all
levels of the independent variable. The spread of the residuals should be roughly equal
throughout.
Normality — the residuals (error terms) must be normally distributed.
No Multicollinearity — in simple linear regression, the independent variable should not be
perfectly correlated with another variable (this assumption is more critical in multiple
regression).

b. Estimating the Coefficients of the Simple Linear Regression Model

Estimated Regression Equation — the sample-based equation used to predict the
dependent variable, written as:
Ŷ = b₀ + b₁X
Where:
• Ŷ (Y-hat) = predicted value of Y
• b₀ = estimated y-intercept
• b₁ = estimated slope
b₁ (Estimated Slope) — computed using the sample data to estimate the true population
slope β₁. It tells us how much Y changes for every one-unit change in X.
b₀ (Estimated Y-Intercept) — the estimated value of Y when X equals zero, based on
sample data.

c. Estimating the Standard Error

Standard Error of the Estimate (Sₑ) — also called the standard error of regression, it
measures the average distance that the observed values fall from the regression line. It
indicates the accuracy of the predictions made by the regression model. A smaller standard
error indicates a better-fitting model.

d. Constructing the Confidence and Prediction Intervals

Confidence Interval for the Mean Response — an interval estimate of the mean (average)
value of the dependent variable Y for a given value of X. It captures where the true mean of
Y lies for a specific X value.
Prediction Interval — an interval estimate for the value of an individual observation of the
dependent variable Y for a given value of X. It is wider than the confidence interval because
it accounts for additional uncertainty in predicting a single value rather than a mean.

e. Coefficient of Determination
Coefficient of Determination (R²) — a measure that indicates what proportion or
percentage of the total variation in the dependent variable Y is explained by the independent
variable(s) in the regression model. It ranges from 0 to 1 (or 0% to 100%).
• R² = 1 (or 100%) means the model perfectly explains all the variation in Y.
• R² = 0 means the model explains none of the variation in Y.
Total Sum of Squares (SST) — the total variation in the dependent variable Y around its
mean. It measures how much Y values deviate from the mean of Y.
Regression Sum of Squares (SSR) — the portion of the total variation in Y that is
explained by the regression model (i.e., by the independent variable X).
Error Sum of Squares (SSE) — the portion of the total variation in Y that is NOT explained
by the regression model. It represents the unexplained variation or residual variation.
Relationship: SST = SSR + SSE

f. Relationship of Correlation Coefficient, Standard Error of Estimate, and Coefficient of

Determination
• R² = r² — the coefficient of determination is the square of the Pearson correlation
coefficient in simple linear regression.
• A higher r (closer to ±1) results in a higher R² (more variation explained) and a lower
standard error (better predictions).
• A lower r (closer to 0) results in a lower R² and a higher standard error (less accurate
predictions).

6. MULTIPLE LINEAR REGRESSION

Multiple Linear Regression — a regression model that examines the linear relationship
between one dependent variable (Y) and two or more independent variables (X₁, X₂, ...,
Xₖ). The model is expressed as:
Y = β₀ + β₁X₁ + β₂X₂ + ... + βₖXₖ + ε

a. Assumptions of Multiple Linear Regression

Linearity — the relationship between the dependent variable and each independent
variable must be linear.
Independence of Errors — the residuals must be independent of each other.
Homoscedasticity — the variance of the residuals must be constant across all levels of the
independent variables.
Normality of Errors — the residuals must be normally distributed.
No Multicollinearity — the independent variables must not be highly correlated with each
other. High multicollinearity makes it difficult to determine the individual effect of each
independent variable on Y.
Multicollinearity — a condition in multiple regression where two or more independent
variables are highly correlated with each other, making it difficult to isolate the individual
effect of each predictor on the dependent variable.

b. Estimating the Coefficients of the Multiple Linear Regression Model

Partial Regression Coefficient — the coefficient (b₁, b₂, etc.) associated with each
independent variable in the multiple regression model. It represents the change in Y for a
one-unit change in that specific independent variable, holding all other independent
variables constant.
Adjusted R² (Adjusted Coefficient of Determination) — a modified version of R² that
accounts for the number of independent variables in the model. Unlike R², adjusted R²
penalizes the addition of variables that do not significantly improve the model. It is more
appropriate for evaluating multiple regression models.

c. Interpreting the Multiple Linear Regression

Holding Other Variables Constant (Ceteris Paribus) — when interpreting a partial
regression coefficient, the effect of one independent variable on Y is interpreted while
assuming all other independent variables remain unchanged.
Overall F-Test — a hypothesis test used to determine whether the multiple regression
model as a whole is statistically significant, i.e., whether at least one independent variable
significantly predicts Y.
Individual t-Test — a hypothesis test used to determine whether each individual regression
coefficient (βᵢ) is statistically significant, i.e., whether a specific independent variable
significantly contributes to predicting Y when all other variables are in the model.

7. REGRESSION ANALYSIS WITH DUMMY VARIABLES

Dummy Variable — also called an indicator variable or binary variable, it is an artificial
variable created to represent a categorical variable in a regression model. It takes the value
of 1 if a condition is met and 0 if it is not.
Reference Category (Base Category) — the category of a categorical variable that is
excluded when creating dummy variables. All other categories are compared against this
reference category.
Dummy Variable Trap — a situation in regression where too many dummy variables are
created, leading to perfect multicollinearity. To avoid this, if a categorical variable has k
categories, only k - 1 dummy variables should be created.

8. POSITIVE MONOTONIC TRANSFORMATION OF DATA

Positive Monotonic Transformation — a mathematical transformation applied to data that
preserves the order (ranking) of the data values. As the original value increases, the
transformed value also increases.
Logarithmic Transformation (Log Transformation) — a transformation that applies the
logarithm function to the data values. Used to reduce skewness, handle non-linearity, or
stabilize variance.
Square Root Transformation — a transformation that takes the square root of each data
value. Used to reduce the effect of large values and normalize skewed data.
Data Transformation — the process of applying a mathematical function to data values to
make the data meet the assumptions of regression (e.g., linearity, normality,
homoscedasticity).

9. MODELING RELATIONSHIPS OF MULTIPLE VARIABLES WITH LINEAR

REGRESSION
Model Building — the process of selecting the most appropriate set of independent
variables to include in a regression model to best predict the dependent variable.
Stepwise Regression — a method of model selection where independent variables are
added or removed from the regression model one at a time based on statistical criteria (such
as p-values or F-statistics), until the best model is found.
Forward Selection — a stepwise method where variables are added one at a time to the
model, starting with the most significant variable, until no remaining variable significantly
improves the model.
Backward Elimination — a stepwise method where all variables are initially included in the
model, and variables are removed one at a time, starting with the least significant, until all
remaining variables are statistically significant.
Overfitting — a problem that occurs when a regression model is too complex and fits the
sample data too closely, including random noise, resulting in poor predictive performance on
new data.

10. ADVANCED REGRESSION MODELS

Polynomial Regression — a form of regression where the relationship between the
dependent variable and the independent variable is modeled as an nth-degree polynomial.
Used when the relationship between X and Y is curvilinear rather than strictly linear.
Interaction Effect — occurs when the effect of one independent variable on the dependent
variable depends on the value of another independent variable. Modeled by including an
interaction term (the product of two independent variables) in the regression equation.
Interaction Term — a new variable created by multiplying two independent variables
together, used to capture the interaction effect between them in a regression model.
Logistic Regression — a regression model used when the dependent variable is
categorical (e.g., binary: yes/no, 0/1). It predicts the probability that an observation belongs
to a particular category.
Nonlinear Regression — a regression model where the relationship between the
dependent variable and the independent variables is not linear and cannot be linearized
through transformation.

Correlation and Regression Overview
100% (1)
Correlation and Regression Overview
5 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
2 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
58 pages
Understanding R² in Regression Analysis
No ratings yet
Understanding R² in Regression Analysis
40 pages
Understanding Linear Regression Analysis
No ratings yet
Understanding Linear Regression Analysis
18 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
5 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
22 pages
Unit#03 SR
No ratings yet
Unit#03 SR
7 pages
Lecture 2
No ratings yet
Lecture 2
42 pages
Correlation and Regression in Business
No ratings yet
Correlation and Regression in Business
31 pages
Correlation vs. Multiple Regression Analysis
100% (1)
Correlation vs. Multiple Regression Analysis
9 pages
SPSS Regression Analysis Guide
100% (1)
SPSS Regression Analysis Guide
17 pages
Statistical Analysis: T-tests and Correlation
No ratings yet
Statistical Analysis: T-tests and Correlation
85 pages
Regression Analysis
No ratings yet
Regression Analysis
41 pages
Understanding Correlation and Regression Analysis
No ratings yet
Understanding Correlation and Regression Analysis
9 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
65 pages
Correlation Analysis Overview and Methods
No ratings yet
Correlation Analysis Overview and Methods
52 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
49 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
10 pages
Regression Analysis and Covariance Concepts
No ratings yet
Regression Analysis and Covariance Concepts
13 pages
Multivariate Statistical Techniques Guide
No ratings yet
Multivariate Statistical Techniques Guide
25 pages
Regression Analysis Fundamentals
No ratings yet
Regression Analysis Fundamentals
11 pages
Understanding Simple Regression Analysis
No ratings yet
Understanding Simple Regression Analysis
3 pages
Simple Linear Regression Explained
No ratings yet
Simple Linear Regression Explained
22 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Regression Concepts and Model Building
50% (2)
Regression Concepts and Model Building
15 pages
Understanding Variable Relationships in Experiments
No ratings yet
Understanding Variable Relationships in Experiments
20 pages
Regression and Correlation Analysis Guide
100% (1)
Regression and Correlation Analysis Guide
32 pages
Simple Regression: Concepts & Applications
No ratings yet
Simple Regression: Concepts & Applications
39 pages
Correlation vs. Regression Analysis Explained
No ratings yet
Correlation vs. Regression Analysis Explained
23 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
8 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
10 pages
Correlation and Regression Analysis Explained
No ratings yet
Correlation and Regression Analysis Explained
8 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
8 pages
Correlation and Simple Linear Regression
100% (1)
Correlation and Simple Linear Regression
9 pages
Simple Regression Explained: Key Concepts
No ratings yet
Simple Regression Explained: Key Concepts
3 pages
Stat Len
No ratings yet
Stat Len
9 pages
Correlation and Regression Analysis Guide
100% (1)
Correlation and Regression Analysis Guide
19 pages
Correlation and Simple Linear Regression Guide
No ratings yet
Correlation and Simple Linear Regression Guide
6 pages
Understanding Correlation in Statistics
No ratings yet
Understanding Correlation in Statistics
44 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
3 pages
Data Analysis: Correlation & Regression Insights
No ratings yet
Data Analysis: Correlation & Regression Insights
4 pages
Regression and Correlation Analysis Guide
No ratings yet
Regression and Correlation Analysis Guide
19 pages
Understanding Regression Analysis Techniques
No ratings yet
Understanding Regression Analysis Techniques
13 pages
Correlation vs Regression Analysis Explained
100% (1)
Correlation vs Regression Analysis Explained
14 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
13 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
18 pages
Correlation vs. Regression Explained
No ratings yet
Correlation vs. Regression Explained
3 pages
ML Notes Regression Unit-II
No ratings yet
ML Notes Regression Unit-II
39 pages
Derivative
No ratings yet
Derivative
59 pages
Linear Regression: Simple vs. Multiple
No ratings yet
Linear Regression: Simple vs. Multiple
6 pages
Business Regression Analysis Basics
No ratings yet
Business Regression Analysis Basics
15 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
30 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
17 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
16 pages
Understanding Scatter Diagrams & Regression
No ratings yet
Understanding Scatter Diagrams & Regression
8 pages
Understanding Correlation Types
No ratings yet
Understanding Correlation Types
9 pages
Correlation and Linear Regression Explained
No ratings yet
Correlation and Linear Regression Explained
15 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
33 pages
Intersectional Collaboration in School Leadership
No ratings yet
Intersectional Collaboration in School Leadership
26 pages
Gamelore: Your Game Discovery App
No ratings yet
Gamelore: Your Game Discovery App
1 page
Nytro Lyra X Safety Data Sheet
No ratings yet
Nytro Lyra X Safety Data Sheet
16 pages
Bandung Wedding Photography Guide
No ratings yet
Bandung Wedding Photography Guide
26 pages
Data Analyst Project
No ratings yet
Data Analyst Project
6 pages
Finger Entrapment Safety Guidelines UK
No ratings yet
Finger Entrapment Safety Guidelines UK
68 pages
English Language Skills Course Syllabus
No ratings yet
English Language Skills Course Syllabus
5 pages
Forensic Document Examination Report 2023
No ratings yet
Forensic Document Examination Report 2023
2 pages
Bio-Inspired Self-Healing Concrete Techniques
No ratings yet
Bio-Inspired Self-Healing Concrete Techniques
2 pages
CSEC Biology Vacation Workbook 2019
100% (1)
CSEC Biology Vacation Workbook 2019
79 pages
Overcoming Sinful Thoughts (FR T J Morrow) (Z-Library)
No ratings yet
Overcoming Sinful Thoughts (FR T J Morrow) (Z-Library)
174 pages
Kband Installation Design Guide
No ratings yet
Kband Installation Design Guide
32 pages
29.01.2026 To 08.02.2026 - Cbse - Ka & TN - Class 10 - Opt - 16 - (Based On Prefinal-3) - Schedule
No ratings yet
29.01.2026 To 08.02.2026 - Cbse - Ka & TN - Class 10 - Opt - 16 - (Based On Prefinal-3) - Schedule
1 page
2-D Discrete Cosine Transform Overview
No ratings yet
2-D Discrete Cosine Transform Overview
19 pages
Industrial Relations and Labour Laws
No ratings yet
Industrial Relations and Labour Laws
22 pages
Easter Celebrations and Traditions Guide
No ratings yet
Easter Celebrations and Traditions Guide
13 pages
Atlas 412 4-Post Lift Installation Guide
No ratings yet
Atlas 412 4-Post Lift Installation Guide
39 pages
Types of Strategic Planning Explained
No ratings yet
Types of Strategic Planning Explained
5 pages
Bok:978 3 642 24574 9 PDF
100% (1)
Bok:978 3 642 24574 9 PDF
416 pages
NumPy Basics in Jupyter Notebook
No ratings yet
NumPy Basics in Jupyter Notebook
15 pages
TPMS vs. Strut-Based Lattice Structures
No ratings yet
TPMS vs. Strut-Based Lattice Structures
1 page
.Archivetempmarrs Texe - Voices From The Dead
100% (2)
.Archivetempmarrs Texe - Voices From The Dead
104 pages
SEO Straregy
100% (4)
SEO Straregy
105 pages
License ST Patrick Day Icon Design 412029017
No ratings yet
License ST Patrick Day Icon Design 412029017
2 pages
Foresight in Project Management Review
No ratings yet
Foresight in Project Management Review
9 pages
The Design and Analysis of Computer Algorithms (Aho, Hopcroft & Ullman 1974-01-11) PDF
71% (7)
The Design and Analysis of Computer Algorithms (Aho, Hopcroft & Ullman 1974-01-11) PDF
479 pages
Infosys and Partner Locations in Bengaluru
No ratings yet
Infosys and Partner Locations in Bengaluru
7 pages
History and Scope of Forensic Science
100% (1)
History and Scope of Forensic Science
4 pages
21st Century Philippine Literary Genres
No ratings yet
21st Century Philippine Literary Genres
44 pages
S4 History and Political Education Topics
No ratings yet
S4 History and Political Education Topics
2 pages

Chapter7 CorrelationRegression

Uploaded by

Chapter7 CorrelationRegression

Uploaded by

CHAPTER 7

CORRELATION AND REGRESSION ANALYSIS

2. THE CORRELATION COEFFICIENT

a. Pearson's Correlation Coefficient

b. Spearman's Rank Correlation Coefficient

3. CORRELATION AND CAUSATION

5. SIMPLE LINEAR REGRESSION

a. Assumptions of Simple Linear Regression

b. Estimating the Coefficients of the Simple Linear Regression Model

c. Estimating the Standard Error

d. Constructing the Confidence and Prediction Intervals

f. Relationship of Correlation Coefficient, Standard Error of Estimate, and Coefficient of

6. MULTIPLE LINEAR REGRESSION

a. Assumptions of Multiple Linear Regression

b. Estimating the Coefficients of the Multiple Linear Regression Model

c. Interpreting the Multiple Linear Regression

7. REGRESSION ANALYSIS WITH DUMMY VARIABLES

8. POSITIVE MONOTONIC TRANSFORMATION OF DATA

9. MODELING RELATIONSHIPS OF MULTIPLE VARIABLES WITH LINEAR

10. ADVANCED REGRESSION MODELS

You might also like