STATISTICS – Chapter 13
LESSON 1
STATISTICS
is a branch of mathematics that deals with the collection, organisation and interpretation of
numerical data.
Data can be continuous or discreet
Continuous – information collected by measurement
Discreet – information collected by counting
Measures of central tendency from a list
Measures of central tendency – are values that tell you where the middle (centre) of the
data lies.
Mean – the average of the data set
∑𝑥 (𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒𝑠)
𝑥ҧ =
𝑛 (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠)
Median – the middle value of the data set after the data has been arranged in ascending order.
Mode – the value that occurs most often.
Mean, median & mode of ungrouped data
Example 1:
The following shoe sizes were collected from 13 learners:
7 8 8 12 7 9 8 10 9 10 11 5 8
Determine:
∑𝑥 112
a) Mean 𝑥ҧ = = = 8,62
𝑛 13
b) Median 5 7 7 8 8 8 8 9 9 10 10 11 12
c) Mode 8
Example 2:
The following test marks out of 20 were collected from 10 learners:
18 20 15 15 8 10 12 13 9 17
Determine:
∑𝑥 137
a) Mean 𝑥ҧ = = = 13,7
𝑛 10
13+15
a) Median 8 9 10 12 13 15 15 17 18 20 ∴ = 14
2
a) Mode 15
Mean, median, mode & range from grouped data
Example 3:
The frequency table below shows the marks of 125 Grade 12 Mathematics learners. Marks are
recorded as whole percentages.
Mark Frequency (𝒇) Midpoint (𝒙𝒊 ) 𝒇. 𝒙𝒊
0 ≤ 𝑥 < 20 6 0 + 20
= 10 6 × 10 = 60
2
20 ≤ 𝑥 < 40 14 30 420
40 ≤ 𝑥 < 60 38 50 1900
60 ≤ 𝑥 < 80 45 70 3150
80 ≤ 𝑥 < 100 22 90 1980
TOTAL: 125 7510
Determine:
a) The estimated mean
∑𝑓. 𝑥𝑖 7510
𝑥ҧ = = = 60,08
𝑛 125
b) The modal class
60 ≤ 𝑥 < 80
c) The class interval where the median is found
𝑛 125
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 = = = 62,5 ∴ 63𝑟𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
∴ 60 ≤ 𝑥 < 80
5 number summary
21 elements in the list
Example:
34 35 35 38 41 41 44 45 46 47 47 51 51 54 55 56 57 62 74 75 82
The Median Q2 → the value that divides the data in two halves
Q2 = 47
The lower quartile Q1 → the median of the lower half of the data
Q1 = 41
The upper quartile Q3 → the median of the upper half of the data
Q3 = 56,5
The minimum value → the smallest value of the set
min = 34
The maximum value → the biggest value of the set
max = 82
Box and whiskers diagram
Example continue:
min = 34 Q1 = 41 Q2 = 47 Q3 = 56 max = 82
30 40 50 60 70 80
Skewed and symmetrical data ഥ − 𝑸𝟐
𝒙
𝒎𝒆𝒂𝒏 – 𝒎𝒆𝒅𝒊𝒂𝒏
Symmetrical
Data is balanced, same amount of scores on each side
of the median
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 = 0 ⇔ 𝑚𝑒𝑎𝑛 = 𝑚𝑒𝑑𝑖𝑎𝑛
Skewed to the right
Data is spread more to the right of the median,
positively skewed
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 > 0 ⟺ 𝑚𝑒𝑎𝑛 > 𝑚𝑒𝑑𝑖𝑎𝑛
Skewed to the left
Data is spread more to the left of the median,
negatively skewed
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 < 0 ⟺ 𝑚𝑒𝑎𝑛 < 𝑚𝑒𝑑𝑖𝑎𝑛
Histograms and frequency polygons
Marks out of 50 Frequency Midpoint
0 ≤ 𝑥 < 10 0 5
10 ≤ 𝑥 < 20 25 15
20 ≤ 𝑥 < 30 74 25
30 ≤ 𝑥 < 40 66 35
40 ≤ 𝑥 < 50 35 45
50 ≤ 𝑥 < 60 0 55
Histogram showing learners marks
80
60
Frequency
40
20
0
10 20 30 40 50 60
Marks
Measures of dispersion
Measures of dispersion – are values that tell you how spread out or grouped the data values are.
Range – the difference between the highest and the lowest score
𝑅 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
Interquartile range – the difference between the 3rd and 1st quartiles
IQ𝑅 = 𝑄3 − 𝑄1
Semi-interquartile range – half the difference between the 3rd and 1st quartiles
𝑄3 −𝑄1
SIQ𝑅 =
2
Quartiles – divide ordered data into 4 equal parts (into quarters)
Percentiles – divide ordered data into 100 equal parts
Example 5: (Do the questions for Grade 11A only!!)
The 32 learners from each of the grade 11 classes wrote a maths test out of 60. Their
mathematics teachers recorded their marks on stem and leaf diagrams
Grade 11 A Grade 11 B Grade 11 C
1 2345667 1 222335799999 1 4
2 0012 2 001223356789 2 07
3 0112579 3 1579 3 13
4 46899 4 0002 4 00111122669
5 01266788 5 5 0112344667899
6 0 6 6 000
Determine for each class:
a) The mean and the five number summary
b) The difference between the mean and the median
c) Draw a box and whiskers diagram
d) Whether the data is skewed or symmetrical
e) The interquartile range
Example continue:
a) Grade 11 A Grade 11 B Grade 11 C
1155 779 1484
Mean 𝑥
Mean 𝑥ҧҧ 𝑥ҧ = = 36,094 𝑥ҧ = = 24,344 𝑥ҧ = = 46,375
32 32 32
Min value 12
Min value 12 14
𝑄1 20 19 41
𝑄1
𝑄2 22
𝑄2 36 49,5
𝑄3 30
𝑄3 50,5 56
Max value 42 60
Max value 60
b) 𝑥ҧ - median 𝑥ҧ - median 𝑥ҧ - median
= 36,094 – 36 = 24,344 – 22 = 46,375 – 49,5
=0,094 = 2,344 =-3,125
Example continue:
c)
Skewed left
11 C
11 B Skewed right
11 A symmetrical
10 20 30 40 50 60
𝑥ҧ - median 𝑥ҧ - median 𝑥ҧ - median
= 36,094 – 36 = 24,344 – 22 = 46,375 – 49,5
=0,094 = 2,344 =-3,125
d)
𝐼𝑄𝑅 = 50,5 − 20 𝐼𝑄𝑅 = 30 − 19 𝐼𝑄𝑅 = 56 − 41
= 30,5 = 11 = 15
Exercise 11.1 pg. 331 no. b
LESSON 2
Exercise 11.1 pg. 331 no. b
Ogive (Cumulative frequency)
Example 6:
Mark Tally Frequency Cumulative Cumulative frequency →
(f) frequency tells us how many learners got that
1 I 1 1 mark or less
2 III 3 1+3 = 4
3 II 2 4+2 = 6 11 learners got 5 marks or less
4 III 3 9
5 II 2 11
6 IIII 4 15
7 I 16 learners got 7 marks or less
1 16
8 II 2 18
Frequency → tells us how many 9 learners got between
learners got that mark 4 marks and 8
(18 – 9)
Example continue:
a) Complete the table
Cumulative
Marks Frequency Points to plot
frequency
1 – 10 0 0 (10;0)
11 – 20 2 2 (20;2)
21 – 30 6 8 (30;8)
31 – 40 7 15 (40;15)
41 – 50 14 29 (50;29)
51 – 60 20 49 (60;49)
61 – 70 35 84 (70;84)
71 – 80 29 113 (80;113)
81 – 90 6 119 (90;119)
91 – 100 1 120 (100;120)
Example continue:
b) Draw an ogive to represent the data
Marks obtained in an exam
Cumulative frequency
graph = OGIVE
Cumulative frequency
Marks
Example continue:
c) Find an estimate of the median, lower quartile and upper quartile.
1 1
𝑄1 : (𝑛) = 120 = 30 ∴ 30𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Position of the quartiles 4 4
1 1
𝑄2 : (𝑛) = 120 = 60 ∴ 60𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
2 2
3 3
𝑄3 : (𝑛) = 120 = 90 ∴ 90𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
4
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
d) 80th percentile
Position = 𝑛 × 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒%
= 120 × 80%
= 96𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Example continue:
Marks obtained in an exam
Cumulative frequency
𝑸𝟏 ≈ 𝟓𝟐 𝑸𝟐 ≈ 𝟔𝟑 𝑸𝟑 ≈ 𝟕𝟐
Marks
c) 𝑄1 = 52 ; 𝑄2 = 63 ; 𝑄4 = 72 𝟖𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 ≈ 𝟕𝟓
d) 80𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = 75
Example continue:
e) Draw the box and whisker plot representing the marks obtained in the exam.
Cumulative frequency
𝑸 ≈ 𝟔𝟑 𝑸𝟑 ≈ 𝟕𝟐
Marks 𝑸𝟏 ≈ 𝟓𝟐 𝟐
Exercise 11.2 pg. 340 no. b,c,d,e
LESSON 3
Exercise 11.2 pg. 340 no. b,c,d,e
Variance and standard deviation
Variance (𝑣𝑎𝑟) and Standard deviation (𝜎) measures the dispersion around the mean.
• How spread out a set of data values are.
• Does most of the data lie close to the mean? Or is it widely spread out?
mean → 62%
← 𝜎=8
Frequency
← 𝜎 = 16
Marks (%)
Variance (𝑣𝑎𝑟) – the average of the squared differences from the mean
Standard deviation (𝜎) – the square root of the variance
Steps to follow:
∑ 𝑥 − 𝑥ҧ 2
𝜎= 𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Find the mean 𝑥ҧ 𝑛
Find the deviations from the mean 𝑥 − 𝑥ҧ
Square the deviations 𝑥 − 𝑥ҧ 2
Add up the squared deviation column and divide by the number of data values
– this is to find variance 𝑣𝑎𝑟
Square root the variance – this is the standard deviation 𝜎
Example 7:
A girl measured the heights of her friends and family. She decided to compare her findings.
Heights of friends Heights of family
𝒙 𝒙
162 179
173 184
160 156
175 174
175 220
180 135
160 135
a) Determine the mean of her friend group.
𝐹𝑟𝑖𝑒𝑛𝑑𝑠
1185
𝑥ҧ = = 169,3
7
b) Work out the variance for her friend group.
∑ 𝑥 − 𝑥ҧ 2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛
Deviations
Heights of (Deviations)2
from the
friends
mean ഥ
𝒙−𝒙 𝟐
𝒙
𝒙−𝒙 ഥ
162 162 – 169,3 (-7,3)2 𝐹𝑟𝑖𝑒𝑛𝑑𝑠
= -7,3 =53,29 419,43
𝑉𝑎𝑟 =
173 3,7 13,69 7
160 -9,3 𝑉𝑎𝑟 = 59,9 𝑐𝑚2
86,49
175 5,7 32,49
175 5,7 32,49
180 10,7 114,49
160 -9,3 86,49 This column is the sum of
419,43 (deviation)2 → ∑ 𝑥 − 𝑥ҧ 2
Heights of family
c) Work out the standard deviation for her friends. 𝒙
2 179
∑ 𝑥 − 𝑥ҧ
𝜎= 𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 184
𝑛
156
𝐹𝑟𝑖𝑒𝑛𝑑𝑠 174
Calculator change to:
𝜎 = 59,9 220
frequency table off 𝜎 =7,7 cm 135
135
d) Work out the mean, variance and standard deviation of her family.
Use the calculator.
𝐹𝑎𝑚𝑖𝑙𝑦
𝐹𝑎𝑚𝑖𝑙𝑦 𝐹𝑎𝑚𝑖𝑙𝑦 𝑉𝑎𝑟 = 𝜎 2
𝑥ҧ = 169 𝜎 =27,86 cm 𝑉𝑎𝑟 = 776 𝑐𝑚2
e) What do the two standard deviation tell us about the spread of the data
around the mean?
𝐻𝑒𝑟 𝑓𝑎𝑚𝑖𝑙𝑦’𝑠 ℎ𝑒𝑖𝑔ℎ𝑡𝑠 𝑎𝑟𝑒 𝑚𝑜𝑟𝑒 𝑠𝑝𝑟𝑒𝑎𝑑 𝑜𝑢𝑡 𝑎𝑠 𝑡ℎ𝑒 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 ℎ𝑖𝑔ℎ𝑒𝑟 𝑡ℎ𝑎𝑛 𝑡ℎ𝑒 𝑓𝑟𝑖𝑒𝑛𝑑𝑠 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛,
𝑚𝑒𝑎𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑓𝑟𝑖𝑒𝑛𝑑𝑠 ℎ𝑒𝑖𝑔ℎ𝑡𝑠 𝑎𝑟𝑒 𝑚𝑜𝑟𝑒 𝑐𝑙𝑜𝑠𝑒𝑟 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟 𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑑 𝑡𝑜 ℎ𝑒𝑟 𝑓𝑎𝑚𝑖𝑙𝑦.
Heights of family
𝒙
179
f) How many people in her family lie within one standard deviation of the mean? 184
169 − 27,86 ; 169 + 27,86 156
174
∴ 141,14 ; 196,86 220
135
∴ 4 𝑝𝑒𝑜𝑝𝑙𝑒
135
Example 8:
Five numbers, 4; 8; 10; 𝑥 and 𝑦, have a mean of 10 and a standard deviation of 4. Find the values of 𝑥 and 𝑦.
ഥ = 𝟏𝟎
𝒙 𝜎=𝟒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
𝑥ҧ =
𝟐
𝑛
𝒙 ഥ
𝒙−𝒙 ഥ
𝒙−𝒙
4 36 4 + 8 + 10 + 𝑥 + 𝑦
4 − 10 = −6 10 =
5
8 8 − 10 = −2 4
50 = 22 + 𝑥 + 𝑦
10 10 – 10 = 0 0
28 = 𝑥 + 𝑦
𝑥 𝑥 − 10 𝑥 2 − 20𝑥 + 100
𝑦 𝑦 − 10 𝑦 2 − 20𝑦 + 100
∑ 𝑥 − 𝑥ҧ 2
2
𝑥 − 𝑥ҧ = 36 + 4 + 0 + 𝑥2 − 20𝑥 + 100 + 𝑦2 − 20𝑦 + 100 𝜎=
𝑛
= 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
4=
5
𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
16 =
5
80 = 𝑥 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
2
0 = 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 160
0 = 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 160
28 = 𝑥 + 𝑦
𝑓𝑟𝑜𝑚 1 𝑥 = 28 − 𝑦
𝑠𝑢𝑏 𝑥 𝑖𝑛 2
0 = 28 − 𝑦 2 − 20 28 − 𝑦 + 𝑦 2 − 20𝑦 + 160
0 = 784 − 56𝑦 + 𝑦 2 − 560 + 20𝑦 + 𝑦 2 − 20𝑦 + 160
0 = 2𝑦 2 − 56𝑦 + 384
0 = 𝑦 2 − 28𝑦 + 192
0 = 𝑦 − 16 𝑦 − 12
𝑦 = 16 𝑜𝑟 𝑦 = 12
𝑠𝑢𝑏 𝑦 𝑖𝑛 3
𝑥 = 28 − 16 𝑜𝑟 𝑥 = 28 − 12
𝑥 = 12 𝑜𝑟 𝑥 = 16
∴ 𝑚𝑦 𝑡𝑤𝑜 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 𝑎𝑟𝑒 12 𝑎𝑛𝑑 16
Exercise 11.3 pg. 347 no. b,d,f,h
LESSON 4
Exercise 11.3 pg. 347 no. b,d,f,h
Calculator change to:
frequency table off
Calculator change to:
frequency table on
Distribution of data
NORMALLY DISTRIBUTED OR SYMMETRICAL DATA
Mean= Median
ഥ − 𝑸𝟐 = 𝟎
𝒙
If the data to the left of the
median balances with the data on
the right, then the mean, median
and mode will be the same.
The data is said to be normally
distributed
or symmetrical.
DISTRIBUTIONS THAT ARE NOT NORMALLY DISTRIBUTED Mean > Median
Skewed to the right / positively skewed ഥ − 𝑸𝟐 > 𝟎
𝒙
Skewed right implies that there is a mass of
data values on the left side of the
distribution with fewer higher vales on the
right. Skewed right implies that the data is
spread more to the right of the median, the
data is said to be positively skewed or
skewed to the right.
Skewed to the left / negatively skewed Mean < Median
ഥ − 𝑸𝟐 < 𝟎
𝒙
Skewed left implies that there is a mass of
data values on the right side of the
distribution with fewer higher vales on
the left. Skewed left implies that the data
is spread more to the left of the median,
the mode will be closer to the right of the
distribution, the data is said to be
negatively skewed or skewed to the left.
Distribution of data (graphical)
Bell curve
98 or 99 or 100%
95%
68%
-𝜎 ഥ
𝒙 +𝜎
-2𝜎 +2𝜎
-3𝜎 +3𝜎
Exercise 11.4 pg. 352 no. a,c,e
LESSON 5
Exercise 11.4 pg. 352 no. a,c,e
Calculator change to:
frequency table off
Calculator change to:
frequency table on
Outlier
Outlier
Exercise 11.5 pg. 357 no. a,b,e
LESSON 5
Exercise 11.5 pg. 357 no. a,b,e
Calculator change to:
frequency table off
Bivariate data: scatter plots, least squares regression line
(line of best fit) and correlation coefficient.
Bivariate data – relationship between two types of data (two variables)
• We display bivariate data graphically in a scatter plot
• 𝑥 −axis → independent variable
• 𝑦 − axis → dependant variable
The questions normally consist of three parts:
• Draw a scatter plot.
• Determine the equation of the least squares regression line and draw the line of best fit.
• Determine the correlation coefficient and comment on it.
Bivariate data: Scatter plot
The shape of the graph tells us about the relationship between the two variables.
Ask yourself these questions when looking at the scatter plot:
Is there a positive or negative association?
Is the relationship strong or weak?
Is the relationship linear or non linear?
Are there outliers?
Are there groupings?
Bivariate data: scatter plots
Positive association Negative association
Strong relship Strong relship
Linear Linear
No outliers No outliers
No groupings No groupings
Negative association
Weak relship Positive association
Linear Weak relship
No outliers Linear
No groupings No outliers
No groupings
Bivariate data: scatter plots
Positive association Positive association
Strong relship Strong relship
Exponential Parabolic
No outliers No outliers
No groupings No groupings
Positive association Positive association
Strong relship Strong relship
Linear Linear
One outliers No outliers
No groupings Two groupings
Bivariate data: correlation coefficient - r
Correlation coefficient – determines the strength of the relationship between the two variables.
The value can vary between −1 (negative) and 1 (positive).
1 Perfect positive correlation
Strong positive correlation
0.8
Moderate strong positive correlation
0.5
Weak positive correlation
0 No correlation
Weak negative correlation
-0.5
Moderate negative correlation
-0.8
Strong negative correlation
-1 Perfect negative correlation
Bivariate data: correlation coefficient - r
𝑟=1 𝑟 = −0,9999
∴ Perfect positive ∴ Strong negative
correlation correlation
𝑟 = −0,0009 𝑟 = −0,68
∴ almost no correlation ∴Moderate negative
correlation
Bivariate data: regression line
Regression line – the equation for the line of best fit.
𝑦ො = 𝑎 + 𝑏𝑥
constant value The gradient of
(𝑦 − cut) the regression line
To plot the line of best fit – use 𝑦-cut and point (𝑥;ҧ 𝑦)
ത
Interpolation – using the regression line to predict values within the data range
Extrapolation – using the regression line to predict values outside the data range
Bivariate data
The table represents the distance in meters required by a car to apply breaks and reach a
standstill when it is travelling at a given speed.
Speed in km/h 20 40 60 80 100 120 140
Breaking distance in m 6 16 3 48 70 80 110
Enter the data in your calculator in order to find the values of 𝑟, 𝑎, 𝑏, 𝑥ҧ and , 𝑦ത .
𝑎 = −24,9
𝑏 = 0,9
𝑟 = 0,95
𝑥ҧ = 80
𝑦ത = 47,6
Bivariate data
Mrs du Trevou owns an ice-cream shop. She analyses her ice-cream sales over a randomly selected 14 days. She
then compared her sales with the temperature on the day of each sale. The results of her survey are shown in the
table below:
Temp in OC 20 13 18 23 19 28 21 38 26 22 17 14 15 16
Ice creams sold 116 85 107 139 123 172 128 127 148 124 112 89 101 96
a) Draw a scatter plot representing this data
b) Discuss the relationship between the data
c) Describe the correlation:
1) including the outlier
2) excluding the outlier
d) Determine the line of best fit
1) including the outlier
2) excluding the outlier
e) Using the regression line that was calculated without the outlier, predict how many ice creams Mrs du Trevou
will sell if the temperature is:
1) 5oC
2) 24oC
Bivariate data
a) Draw a scatter plot representing this data
200
180
No of ice creams sold 160
140
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius
b) Discuss the relationship between the data
Positive association, strong relationship, Linear, 1 outlier, no groupings
Bivariate data
c) Determine the correlation coefficient and describe the correlation:
𝑟 = 0,7135 ∴ Moderate strong positive correlation
d) Determine the equation for the line of best fit.
𝑎 = 66,0467 𝑏 = 2,5598
∴ 𝑦ො = 2,6𝑥 + 66,0
e) Draw the line of best fit, by finding the y-cut and the point (𝑥;ҧ 𝑦)
ത
Bivariate data
e) Draw the line of best fit, by finding the y-cut and the point (𝑥;ҧ 𝑦)
ത
𝑦ො = 2,6𝑥 + 66,0
𝑦 − 𝑐𝑢𝑡 = 66
𝑥;ҧ 𝑦ത = (20,7 ; 119,1)
200
180
160
No of ice creams sold
140
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius
Bivariate data
f) Determine the correlation coefficient as well as the line of best fit if the outlier is excluded.
𝑟 = 0,9756 ∴ Strong positive correlation
𝑎 = 16,22235948 𝑏 = 5,27424336
∴ 𝑦ො = 5,3𝑥 + 16, 2
g) Draw the new line of best fit.
𝑦 − 𝑐𝑢𝑡 = 16,2 200
180
𝑥;ҧ 𝑦ത = (19,4 ; 118,5) No of ice creams sold 160
140
h) How does the outlier 120
effect the data? 100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius
Bivariate data
g) Using the regression line that was calculated without the outlier, predict how many ice creams
Mrs du Trevou will sell if the temperature is. Explain if this is a valid prediction.
1) 5oC
𝑦ො = 5,3𝑥 + 16,2
𝑦ො = 5,3(5) + 16,2
𝑦ො = 42,7
∴ 42 𝑖𝑐𝑒 𝑐𝑟𝑒𝑎𝑚𝑠
𝑁𝑜𝑡 𝑣𝑎𝑙𝑖𝑑 𝑎𝑠 𝑖𝑡 𝑖𝑠 𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛
2) 24oC
𝑦ො = 5,3𝑥 + 16,2
𝑦ො = 5,3 24 + 16,2
𝑦ො = 143,4
∴ 143 𝑖𝑐𝑒 𝑐𝑟𝑒𝑎𝑚𝑠
𝑉𝑎𝑙𝑖𝑑 𝑎𝑠 𝑖𝑡 𝑖𝑠 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛
Worksheet no 1 – 5
In class and for homework
LESSON 6