0% found this document useful (0 votes)
17 views5 pages

Understanding Simple Linear Regression

The document provides an overview of simple linear regression, detailing its purpose of finding a linear relationship between independent and dependent variables. It explains the method of least squares for calculating the regression line and includes worked examples demonstrating how to derive the regression equation. Additionally, it discusses the interpretation of the regression line and its application for predicting values.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

Understanding Simple Linear Regression

The document provides an overview of simple linear regression, detailing its purpose of finding a linear relationship between independent and dependent variables. It explains the method of least squares for calculating the regression line and includes worked examples demonstrating how to derive the regression equation. Additionally, it discusses the interpretation of the regression line and its application for predicting values.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10/23/24, 4:37 PM Numeracy, Maths and Statistics - Academic Skills Kit

Simple Linear Regression


Contents

Definition

Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable. The
regression line can be used to predict or estimate missing values, this is known as interpolation.

Least Squares Regression Line, LSRL

The calculation is based on the method of least squares. The idea behind it is to minimise the sum of the vertical distance between all of the data points
and the line of best fit.

Consider these attempts at drawing the line of best fit, they all look like they could be a fair line of best fit, but in fact Diagram 3 is the most accurate as
the regression line has been calculated using the least squares regression line.

The equation of the least squares regression line is

y
^ = a + bx

where:

y
^ is the predicted value of y,
a = ȳ − bx̄ ,
∑x∑y
Sxy ∑(xi − x̄)(yi − ȳ ) ∑(xy) −
n
b = =
2
=
2
,
Sxx ∑(xi − x̄) (∑ x)
2
∑(x ) −
n

∑x
x̄ = ,
n
∑y
ȳ = ,
n

Note: The underlying statistical model here is that there is a linear relation between the variables, say y = a′ + b′ x, and so we should regard the
equation that we obtain using the method above as resulting in an estimate for the true equation. For this reason many authorities write y = a + bx + ϵ
to emphasise this point. A further discussion on the nature of the error ϵ is not appropriate here, but is covered in the references below.

Worked Examples

Example 1

Consider the example below where the mass, y (grams), of a chemical is related to the time, x (seconds), for which the chemical reaction has been
taking place according to the table:

Time, x (seconds) 5 7 12 16 20

Mass, y (grams) 40 120 180 210 240

Find the equation of the regression line.

Solution

[Link] 1/5
10/23/24, 4:37 PM Numeracy, Maths and Statistics - Academic Skills Kit

Sxy
To work out the regression line the following values need to be calculated: a = ȳ − bx̄ and b = . The easiest way of calculating them is by using a
Sxx

table.

Start off by working out the mean of the independent and dependent variables.

∑x
x̄ =
n

5 + 7 + 12 + 16 + 20
=
5

60
=
5

= 12,

∑y
ȳ =
n

40 + 120 + 180 + 210 + 240


=
5

790
=
5

= 158.

yi 2
xi xi − x̄ yi − ȳ (xi − x̄)(yi − ȳ ) (xi − x̄)

2
5 40 5 − 12 = −7 40 − 158 = −118 −7 × −118 = 826 −7 = 49
2
7 120 7 − 12 = −5 120 − 158 = −38 −5 × −38 = 190 −5 = 25
2
12 180 12 − 12 = 0 180 − 158 = 22 0 × 22 = 0 0 = 0
2
16 210 16 − 12 = 4 210 − 158 = 52 4 × 52 = 208 4 = 16
2
20 240 20 − 12 = 8 240 − 158 = 82 8 × 82 = 656 8 = 64
2
∑ x = 60 ∑ y = 790 ∑(xi − x̄)(yi − ȳ ) = 1880 ∑ (xi − x̄) = 154

Now calculate b

Sxy
b =
Sxx

∑(xi − x̄)(yi − ȳ )
=
2
∑(xi − x̄)

1880
= = 12.20779...
154

= 12.208 (3.d.p.)

and calculate a

[Link] 2/5
10/23/24, 4:37 PM Numeracy, Maths and Statistics - Academic Skills Kit
a = ȳ − bx̄

= 158 − 12.208 × 12

= 11.506...

= 11.506 (3.d.p.).

So the equation of the regression line is: y^ = a + bx = 11.506 + 12.208x.

Example 2

To see how students' reaction skills have improved over a year, eight students took a reactions test at the start of the year and at the end of the year.
These are their scores:

Student Liam Felicity Adian Mel Leroy Vic Lawrie Louise


First Test, x 56 75 61 61 67 72 62 61

Second Test, y 21 39 34 21 32 24 29 24

Find the equation of the regression line given that:

2 2
∑ x = 515, ∑ y = 224, ∑x = 33441, ∑y = 6576 and ∑ xy = 14590.

Solution

We know that the equation of the least squares regression line is

y
^ = a + bx.

∑x∑y
Sxy ∑(xy) −
n
As we have been given some summed values we are going to use b = = .
(∑ x)2
Sxx 2
∑(x ) −
n

Sxy
b =
Sxx

∑x∑y
∑(xy) −
n
=
2
(∑ x)
2
∑(x ) −
n

515×224
14590 −
8
=
2
515
33441 −
8

= 0.590534...

= 0.590 (3.d.p.)

To find a we need to first work out the mean of x and y.

∑x 515
x̄ = = = 64.375,
n 8

∑y 224
ȳ = = = 28
n 8

a = ȳ − bx̄

= 28 − (0.590 × 64.375)

= −10.015631...

= −10.016 (3.d.p.)

So the equation of our regression line is y^ = −10.106 + 0.590x .

Video Example

Alissa Grant-Walker presents a video on working out the linear regression line.

[Link] 3/5
10/23/24, 4:37 PM Numeracy, Maths and Statistics - Academic Skills Kit

Calculating the Simple


Linear Regression Line

Maths Support Centre

03:34

Interpreting the Regression Line

The simple linear regression line, y^ = a + bx , can be interpreted as follows:

y
^ is the predicted value of y,
a is the intercept and predicts where the regression line will cross the y-axis,
b predicts the change in y for every unit change in x.

We can also use the equation of the regression line for finding approximate values for missing data.

Note: Using this to estimate outside the range of your data is unreliable.

Worked Example

Using the data from the last worked example about the mass of a chemical as time increases, we worked out the equation of the regression line to be
y
^ = 11.506 + 12.208x . We can interpret this as for every 1 minute increase in time the mass of the chemical increases by 12.208 grams. The
equation also tells us that when no time has passed, (when x is zero), the initial mass of the chemical is 11.506 grams.

Example 1

What is the mass of the chemical after ten seconds has passed?

Solution 1

Take your equation and enter the value of time x = 10 and calculate y^.

y
^ = 11.506 + 12.208 × 10 = 133.586.

[Link] 4/5
10/23/24, 4:37 PM Numeracy, Maths and Statistics - Academic Skills Kit
This means that after 10 seconds of our experiment has passed, the mass of the chemical will be 133.586 grams. Check this value against a scatter
plot of our data to see if this answer is reasonable.

Example 2

By how much does the chemical increase in weight in five seconds?

Solution 2

For every minute increase in time the mass of the chemical increases by 12.208 grams. Multiply 12.208 grams by 5 to find the increase in weight of the
chemical in 5 seconds.

12.208 × 5 = 61.40 (grams).

Example 3

How much time does it take for the weight of the chemical to increase by 50 grams?

Solution 3

We know that for every minute increase in time the mass of the chemical increases by 12.208 grams, this also means it takes
1

12.208
seconds for the
chemical to increase by 1 gram. To find the time taken for the chemical to increase in weight by 50 grams we need to multiply
1

12.208
by 50.

1
× 50 = 4.096 (3 d.p.).
12.208

Workbook

This workbook produced by HELM is a good revision aid, containing key points for revision and many worked examples.

Regression and correlation

Test Yourself

's test on regression

Test yourself: Numbas test on linear regression

External Resources

Simple Linear Regression at


Worked Example of Linear Regression at
Wonnacott, T and Wonnacott, T (1990). Introductary Statistics. 5th ed. USA: John Wiley & Sons. p355-514
Regression line calculator online at [Link]

See Also

Residuals
Multiple Regression

[Link] 5/5

Common questions

Powered by AI

Calculating the regression line using a table is advantageous because it organizes data systematically and allows for straightforward computation of necessary statistics, such as means, sums, and deviations. This structured approach minimizes computational errors and provides a visual medium for tracking calculations, ensuring clarity in deriving the regression equation's parameters .

The Least Squares Regression Line (LSRL) determines the line of best fit by minimizing the sum of the vertical distances between all data points and the line. It uses the least squares method to calculate the slope and intercept that best describe the data .

Interpreting a regression line is beneficial in scenarios where predicting future values based on existing data is necessary, such as in determining trends over time (e.g., sales forecasts), understanding the effects of one variable on another (e.g., dose-response relationships in pharmacology), and filling in missing data points in a series. It allows for estimates within the scope of observed data, given a valid linear relationship .

In a linear regression equation, the slope (b) indicates the change in the dependent variable for every unit increase in the independent variable, while the intercept (a) represents the expected value of the dependent variable when the independent variable is zero, marking the point where the line crosses the y-axis .

The primary objective of simple linear regression is to find a linear relationship that describes the correlation between an independent variable and a possibly dependent variable. This relationship can be used to predict or estimate missing values through interpolation .

The intercept in a regression model indicates the expected value of the dependent variable when the independent variable is zero. Its implications vary with the context—sometimes, it might not have a practical meaning, particularly if the independent variable cannot logically be zero. In other cases, it can provide insights into baseline conditions. For instance, in experimental setups, it might signify initial conditions before changes occur due to the independent effects .

To find the parameters of the regression line using a dataset, first compute the mean of the independent and dependent variables. Then calculate the sum of the product of the deviations of each pair from their mean for both variables and the sum of the squared deviations of the independent variable from its mean. The slope (b) is found by dividing the sum of the products by the sum of squared deviations, and the intercept (a) is calculated using a = mean(y) - b*mean(x).

The accuracy of a calculated regression line is typically verified by checking how well the line fits the data points, which involves evaluating the residuals—the differences between observed and predicted values. The sum of squares of these residuals should be minimized. Additionally, plotting the line and computing the correlation coefficient can assess fit; higher values indicate a better fit. Alternative methods include cross-validation and visual inspection using scatter plots .

If all sampled points lie perfectly on a straight line, the linear regression line will pass through all points without any error in prediction, implying a perfect correlation. In this scenario, the residual sum of squares will be zero, indicating that all the variance in the dependent variable is explained by the independent variable. The correlation coefficient will also be either 1 or -1, depending on the direction of the relationship .

Using the regression line to estimate values outside the data range is considered unreliable because it can lead to inaccurate predictions. The model is fit to the data within the range and assumes the linear relationship holds beyond those limits, which may not be true if there are external or uncontrolled variables influencing the data .

You might also like