0% found this document useful (0 votes)
13 views22 pages

Lecture 8

The document outlines a course on Descriptive Statistics focusing on Correlation and Regression. It covers key concepts such as scatter plots, correlation coefficients, and regression analysis, including definitions, examples, and methods for studying relationships between variables. The document also provides formulas and practical applications for understanding and predicting outcomes based on statistical relationships.

Uploaded by

drbkillua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views22 pages

Lecture 8

The document outlines a course on Descriptive Statistics focusing on Correlation and Regression. It covers key concepts such as scatter plots, correlation coefficients, and regression analysis, including definitions, examples, and methods for studying relationships between variables. The document also provides formulas and practical applications for understanding and predicting outcomes based on statistical relationships.

Uploaded by

drbkillua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

University of Khartoum

Faculty of Mathematical Sciences and Informatics

Descriptive Statistics (S1013)


Correlation and Regression

February 5, 2026

1 / 22
Week Outline

Correlation
Scatter Plots
Correlation Coefficient
Regression

2 / 22
Lecture objectives

Draw a scatter plot:


Visualize the relationship between two variables on a coordinate
plane.
Compute the correlation coefficient:
Measure the strength and direction of the linear relationship
between variables.
Regression:
Develop an understanding of regression analysis, including how to
interpret the regression equation and use it for predictive
purposes.

3 / 22
Correlation

Definitions
Correlation is a statistical methods used to determine whether a
relationship exists between two or more variables.

There are two types of relationships:


simple relationships
multiple relationships.

4 / 22
Simple Relationship

In a simple relationship, also called simple regression, there are


two variables:
1 Independent variable, also called an explanatory variable or a
predictor variable
2 Dependent variable, also called a response variable

Example
A manager, may wish to see whether the number of years the
salespeople have been working for the company has anything to do
with the amount of sales they make.

This type of study involves a simple relationship, since there are only
two variables: years of experience and amount of sales.

5 / 22
Simple Relationship

Simple relationships can be either positive or negative.


Positive relationship: As one variable increases, the other also
increases.
Example: A person’s height and weight tend to increase together.

Negative relationship: As one variable increases, the other


decreases.
Example: In older adults, as age increases, physical strength may
decrease.

6 / 22
Multiple Relationship

In a multiple relationship, also called multiple regression, two or


more independent variables are used to predict one dependent
variable.
Example:
An educator may want to study a student’s academic success based
on several predictors, such as:
the number of hours spent studying,
the student’s GPA, and
the student’s high school background.
This type of study involves multiple variables and examines their
collective influence on a single outcome.

7 / 22
Studying Relationships Between Variables

Once we identify that a relationship may exist between two variables,


we can study it in different ways:
Visually, using a scatter plot, to observe general patterns.
Numerically, by calculating the correlation coefficient to
measure the strength and direction of the relationship.
Analytically, using regression analysis to describe the
relationship with an equation and make predictions.

These methods allow us to assess whether a relationship exists, how


strong it is, and how we can model it for decision-making.

8 / 22
Scatter Plots
Definition
A scatter plot is a graph of ordered pairs (x, y) of numbers consisting
of the independent variable x and the dependent variable y. It is used
to visually determine whether there is a relationship between the two
variables.

Example 1:
The following table shows the number of hours students studied and
their test scores:
Hours (x) Score (y)
2 65
4 70
6 75
8 80
10 85

9 / 22
10 / 22
Example 2:

Problem:
Car Rental Companies: Construct a scatter plot for the data shown
for car rental companies in the United States for a recent year.

Company Cars (in ten thousands) Revenue (in billions)


A 63.0 7.0
B 29.0 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

11 / 22
12 / 22
Example 3:

Problem:
Absences and Final Grades Construct a scatter plot for the data
obtained in a study on the number of absences and the final grades of
seven randomly selected students from a statistics class. The data are
shown here.

Student Number of absences x Final grade y (%)


A 6 82
B 2 86
C 15 43
D 9 74
E 12 58
F 5 90
G 8 78

13 / 22
14 / 22
Correlation Coefficient
Definition
The correlation coefficient r is a numerical measure that describes
the strength and direction of a linear relationship between two
variables.
There are several ways to compute the value of the correlation
coefficient. One method is to use the formula shown below:
Formula for the Correlation Coefficient r
P P P
n( xy) − ( x)( y)
r=p P
[n( x 2 ) − ( x)2 ][n( y 2 ) − ( y)2 ]
P P P

Where:
n is the number of data pairs.
x represents values of the independent variable.
y represents values of the dependent variable.
15 / 22
Correlation Coefficient

The symbol for the sample correlation coefficient is r .


The symbol for the population correlation coefficient is ρ (Greek
letter rho).
The range of the correlation coefficient is from −1 to +1.
If there is a strong positive linear relationship between the
variables, the value of r will be close to +1.
If there is a strong negative linear relationship between the
variables, the value of r will be close to −1.
When there is no linear relationship or only a weak relationship
between the variables, the value of r will be close to 0.

16 / 22
17 / 22
Assumptions for the Correlation Coefficient

1 The sample is a random sample.


2 The data pairs fall approximately on a straight line and are
measured at the interval or ratio level.
3 The variables have a joint normal distribution. (This means that
given any specific value of x, the y values are normally
distributed; and given any specific value of y , the x values are
normally distributed.)

18 / 22
Example

Problem
For the previous Car Rental Companies example, compute the
correlation coefficient.

Solution
Company x (Cars) y (Revenue) xy x2 y2
A 63.0 7.0 441.00 3969.00 49.00
B 29.0 3.9 113.10 841.00 15.21
C 20.8 2.1 43.68 432.64 4.41
D 19.1 2.8 53.48 364.81 7.84
E 13.4 1.4 18.76 179.56 1.96
F 8.5 1.5 12.75 72.25 2.25

Totals:
x 2 = 5859.26
P P P P
x = 153.8 y = 18.7 xy = 682.77
y 2 = 80.67
P

19 / 22
Example

P P P
n( xy) − ( x)( y)
r=p P
[n( x 2 ) − ( x)2 ][n( y 2 ) − ( y)2 ]
P P P

6(682.77) − (153.8)(18.7)
r=p = 0.982
[6(5859.26) − (153.8)2 ][6(80.67) − (18.7)2 ]

The correlation coefficient suggests a strong relationship between the


number of cars a rental agency has and its annual revenue.

20 / 22
Regression
Definition
Regression is a statistical method used to describe the nature of the
relationship between variables whether it is positive or negative, linear
or nonlinear.

Regression Line: It is the line of best fit.


The purpose of the regression line is to enable the researcher to
observe trends and make predictions based on the data.
Determination of the Regression Line Equation:
The equation of the regression line is:

y = a + bx

where:
a is the y-intercept
b is the slope of the line
21 / 22
Example
Problem:
A farming cooperative in Sudan wants to estimate groundnut yield (in
tons) based on the number of irrigation days during the growing
season.

From past records, the estimated regression line is:


y = 0.35x + 1.2
where:
x = number of irrigation days
y = groundnut yield in tons per hectare

Question: What is the expected yield if a field is irrigated 20 times?

Solution:
x = 20 ⇒ y = 0.35(20) + 1.2 = 8.2

Interpretation: The predicted yield is 8.2 tons per hectare.


22 / 22

You might also like