0% found this document useful (0 votes)
11 views107 pages

Understanding Statistics: Measures & Examples

Chapter 13 of the statistics document covers the collection, organization, and interpretation of numerical data, focusing on measures of central tendency such as mean, median, and mode. It also explains concepts related to grouped data, measures of dispersion, and visual representations like histograms and box plots. Additionally, it includes examples and exercises to illustrate these statistical concepts.

Uploaded by

ngozikannadi7
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views107 pages

Understanding Statistics: Measures & Examples

Chapter 13 of the statistics document covers the collection, organization, and interpretation of numerical data, focusing on measures of central tendency such as mean, median, and mode. It also explains concepts related to grouped data, measures of dispersion, and visual representations like histograms and box plots. Additionally, it includes examples and exercises to illustrate these statistical concepts.

Uploaded by

ngozikannadi7
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STATISTICS – Chapter 13

LESSON 1
STATISTICS
is a branch of mathematics that deals with the collection, organisation and interpretation of
numerical data.

Data can be continuous or discreet

Continuous – information collected by measurement

Discreet – information collected by counting


Measures of central tendency from a list
Measures of central tendency – are values that tell you where the middle (centre) of the
data lies.

Mean – the average of the data set

∑𝑥 (𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑠𝑐𝑜𝑟𝑒𝑠)


𝑥ҧ =
𝑛 (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠)

Median – the middle value of the data set after the data has been arranged in ascending order.

Mode – the value that occurs most often.


Mean, median & mode of ungrouped data
Example 1:
The following shoe sizes were collected from 13 learners:
7 8 8 12 7 9 8 10 9 10 11 5 8

Determine:
∑𝑥 112
a) Mean 𝑥ҧ = = = 8,62
𝑛 13

b) Median 5 7 7 8 8 8 8 9 9 10 10 11 12

c) Mode 8
Example 2:
The following test marks out of 20 were collected from 10 learners:
18 20 15 15 8 10 12 13 9 17

Determine:
∑𝑥 137
a) Mean 𝑥ҧ = = = 13,7
𝑛 10
13+15
a) Median 8 9 10 12 13 15 15 17 18 20 ∴ = 14
2

a) Mode 15
Mean, median, mode & range from grouped data

Example 3:
The frequency table below shows the marks of 125 Grade 12 Mathematics learners. Marks are
recorded as whole percentages.

Mark Frequency (𝒇) Midpoint (𝒙𝒊 ) 𝒇. 𝒙𝒊


0 ≤ 𝑥 < 20 6 0 + 20
= 10 6 × 10 = 60
2
20 ≤ 𝑥 < 40 14 30 420
40 ≤ 𝑥 < 60 38 50 1900
60 ≤ 𝑥 < 80 45 70 3150
80 ≤ 𝑥 < 100 22 90 1980
TOTAL: 125 7510
Determine:
a) The estimated mean
∑𝑓. 𝑥𝑖 7510
𝑥ҧ = = = 60,08
𝑛 125

b) The modal class


60 ≤ 𝑥 < 80

c) The class interval where the median is found


𝑛 125
𝑃𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 = = = 62,5 ∴ 63𝑟𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
∴ 60 ≤ 𝑥 < 80
5 number summary
21 elements in the list
Example:
34 35 35 38 41 41 44 45 46 47 47 51 51 54 55 56 57 62 74 75 82

The Median Q2 → the value that divides the data in two halves
 Q2 = 47
 The lower quartile Q1 → the median of the lower half of the data
 Q1 = 41
 The upper quartile Q3 → the median of the upper half of the data
 Q3 = 56,5
 The minimum value → the smallest value of the set
min = 34
 The maximum value → the biggest value of the set
 max = 82
Box and whiskers diagram
Example continue:

min = 34 Q1 = 41 Q2 = 47 Q3 = 56 max = 82

30 40 50 60 70 80
Skewed and symmetrical data ഥ − 𝑸𝟐
𝒙
𝒎𝒆𝒂𝒏 – 𝒎𝒆𝒅𝒊𝒂𝒏
Symmetrical
Data is balanced, same amount of scores on each side
of the median
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 = 0 ⇔ 𝑚𝑒𝑎𝑛 = 𝑚𝑒𝑑𝑖𝑎𝑛

Skewed to the right


Data is spread more to the right of the median,
positively skewed
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 > 0 ⟺ 𝑚𝑒𝑎𝑛 > 𝑚𝑒𝑑𝑖𝑎𝑛

Skewed to the left


Data is spread more to the left of the median,
negatively skewed
𝑚𝑒𝑎𝑛 – 𝑚𝑒𝑑𝑖𝑎𝑛 < 0 ⟺ 𝑚𝑒𝑎𝑛 < 𝑚𝑒𝑑𝑖𝑎𝑛
Histograms and frequency polygons
Marks out of 50 Frequency Midpoint
0 ≤ 𝑥 < 10 0 5
10 ≤ 𝑥 < 20 25 15
20 ≤ 𝑥 < 30 74 25
30 ≤ 𝑥 < 40 66 35
40 ≤ 𝑥 < 50 35 45
50 ≤ 𝑥 < 60 0 55

Histogram showing learners marks


80
60
Frequency

40
20
0
10 20 30 40 50 60
Marks
Measures of dispersion
Measures of dispersion – are values that tell you how spread out or grouped the data values are.
Range – the difference between the highest and the lowest score
𝑅 = 𝑚𝑎𝑥 − 𝑚𝑖𝑛
Interquartile range – the difference between the 3rd and 1st quartiles
IQ𝑅 = 𝑄3 − 𝑄1

Semi-interquartile range – half the difference between the 3rd and 1st quartiles
𝑄3 −𝑄1
SIQ𝑅 =
2

Quartiles – divide ordered data into 4 equal parts (into quarters)


Percentiles – divide ordered data into 100 equal parts
Example 5: (Do the questions for Grade 11A only!!)
The 32 learners from each of the grade 11 classes wrote a maths test out of 60. Their
mathematics teachers recorded their marks on stem and leaf diagrams

Grade 11 A Grade 11 B Grade 11 C


1 2345667 1 222335799999 1 4
2 0012 2 001223356789 2 07
3 0112579 3 1579 3 13
4 46899 4 0002 4 00111122669
5 01266788 5 5 0112344667899
6 0 6 6 000

Determine for each class:


a) The mean and the five number summary
b) The difference between the mean and the median
c) Draw a box and whiskers diagram
d) Whether the data is skewed or symmetrical
e) The interquartile range
Example continue:
a) Grade 11 A Grade 11 B Grade 11 C
1155 779 1484
Mean 𝑥
Mean 𝑥ҧҧ 𝑥ҧ = = 36,094 𝑥ҧ = = 24,344 𝑥ҧ = = 46,375
32 32 32
Min value 12
Min value 12 14
𝑄1 20 19 41
𝑄1
𝑄2 22
𝑄2 36 49,5
𝑄3 30
𝑄3 50,5 56
Max value 42 60
Max value 60

b) 𝑥ҧ - median 𝑥ҧ - median 𝑥ҧ - median


= 36,094 – 36 = 24,344 – 22 = 46,375 – 49,5
=0,094 = 2,344 =-3,125
Example continue:
c)
Skewed left
11 C

11 B Skewed right

11 A symmetrical

10 20 30 40 50 60

𝑥ҧ - median 𝑥ҧ - median 𝑥ҧ - median


= 36,094 – 36 = 24,344 – 22 = 46,375 – 49,5
=0,094 = 2,344 =-3,125

d)
𝐼𝑄𝑅 = 50,5 − 20 𝐼𝑄𝑅 = 30 − 19 𝐼𝑄𝑅 = 56 − 41
= 30,5 = 11 = 15
Exercise 11.1 pg. 331 no. b
LESSON 2
Exercise 11.1 pg. 331 no. b
Ogive (Cumulative frequency)
Example 6:
Mark Tally Frequency Cumulative Cumulative frequency →
(f) frequency tells us how many learners got that
1 I 1 1 mark or less

2 III 3 1+3 = 4
3 II 2 4+2 = 6 11 learners got 5 marks or less
4 III 3 9
5 II 2 11
6 IIII 4 15
7 I 16 learners got 7 marks or less
1 16
8 II 2 18

Frequency → tells us how many 9 learners got between


learners got that mark 4 marks and 8
(18 – 9)
Example continue:
a) Complete the table

Cumulative
Marks Frequency Points to plot
frequency
1 – 10 0 0 (10;0)
11 – 20 2 2 (20;2)
21 – 30 6 8 (30;8)
31 – 40 7 15 (40;15)
41 – 50 14 29 (50;29)
51 – 60 20 49 (60;49)
61 – 70 35 84 (70;84)
71 – 80 29 113 (80;113)
81 – 90 6 119 (90;119)
91 – 100 1 120 (100;120)
Example continue:
b) Draw an ogive to represent the data
Marks obtained in an exam

Cumulative frequency
graph = OGIVE
Cumulative frequency

Marks
Example continue:
c) Find an estimate of the median, lower quartile and upper quartile.

1 1
𝑄1 : (𝑛) = 120 = 30 ∴ 30𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Position of the quartiles 4 4

1 1
𝑄2 : (𝑛) = 120 = 60 ∴ 60𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
2 2
3 3
𝑄3 : (𝑛) = 120 = 90 ∴ 90𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
4
4
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

d) 80th percentile

Position = 𝑛 × 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒%
= 120 × 80%
= 96𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛
Example continue:
Marks obtained in an exam

Cumulative frequency

𝑸𝟏 ≈ 𝟓𝟐 𝑸𝟐 ≈ 𝟔𝟑 𝑸𝟑 ≈ 𝟕𝟐
Marks
c) 𝑄1 = 52 ; 𝑄2 = 63 ; 𝑄4 = 72 𝟖𝟎𝒕𝒉 𝒑𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 ≈ 𝟕𝟓
d) 80𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = 75
Example continue:
e) Draw the box and whisker plot representing the marks obtained in the exam.

Cumulative frequency

𝑸 ≈ 𝟔𝟑 𝑸𝟑 ≈ 𝟕𝟐
Marks 𝑸𝟏 ≈ 𝟓𝟐 𝟐
Exercise 11.2 pg. 340 no. b,c,d,e
LESSON 3
Exercise 11.2 pg. 340 no. b,c,d,e
Variance and standard deviation
Variance (𝑣𝑎𝑟) and Standard deviation (𝜎) measures the dispersion around the mean.

• How spread out a set of data values are.


• Does most of the data lie close to the mean? Or is it widely spread out?

mean → 62%

← 𝜎=8
Frequency

← 𝜎 = 16

Marks (%)
Variance (𝑣𝑎𝑟) – the average of the squared differences from the mean

Standard deviation (𝜎) – the square root of the variance

Steps to follow:
∑ 𝑥 − 𝑥ҧ 2
𝜎= 𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
 Find the mean 𝑥ҧ 𝑛
 Find the deviations from the mean 𝑥 − 𝑥ҧ
 Square the deviations 𝑥 − 𝑥ҧ 2

 Add up the squared deviation column and divide by the number of data values
– this is to find variance 𝑣𝑎𝑟
 Square root the variance – this is the standard deviation 𝜎
Example 7:
A girl measured the heights of her friends and family. She decided to compare her findings.

Heights of friends Heights of family


𝒙 𝒙
162 179
173 184
160 156
175 174
175 220
180 135
160 135

a) Determine the mean of her friend group.


𝐹𝑟𝑖𝑒𝑛𝑑𝑠
1185
𝑥ҧ = = 169,3
7
b) Work out the variance for her friend group.

∑ 𝑥 − 𝑥ҧ 2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛
Deviations
Heights of (Deviations)2
from the
friends
mean ഥ
𝒙−𝒙 𝟐
𝒙
𝒙−𝒙 ഥ
162 162 – 169,3 (-7,3)2 𝐹𝑟𝑖𝑒𝑛𝑑𝑠
= -7,3 =53,29 419,43
𝑉𝑎𝑟 =
173 3,7 13,69 7
160 -9,3 𝑉𝑎𝑟 = 59,9 𝑐𝑚2
86,49
175 5,7 32,49
175 5,7 32,49
180 10,7 114,49
160 -9,3 86,49 This column is the sum of
419,43 (deviation)2 → ∑ 𝑥 − 𝑥ҧ 2
Heights of family
c) Work out the standard deviation for her friends. 𝒙
2 179
∑ 𝑥 − 𝑥ҧ
𝜎= 𝑜𝑟 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 184
𝑛
156
𝐹𝑟𝑖𝑒𝑛𝑑𝑠 174
Calculator change to:
𝜎 = 59,9 220
frequency table off 𝜎 =7,7 cm 135
135
d) Work out the mean, variance and standard deviation of her family.
Use the calculator.
𝐹𝑎𝑚𝑖𝑙𝑦
𝐹𝑎𝑚𝑖𝑙𝑦 𝐹𝑎𝑚𝑖𝑙𝑦 𝑉𝑎𝑟 = 𝜎 2
𝑥ҧ = 169 𝜎 =27,86 cm 𝑉𝑎𝑟 = 776 𝑐𝑚2

e) What do the two standard deviation tell us about the spread of the data
around the mean?

𝐻𝑒𝑟 𝑓𝑎𝑚𝑖𝑙𝑦’𝑠 ℎ𝑒𝑖𝑔ℎ𝑡𝑠 𝑎𝑟𝑒 𝑚𝑜𝑟𝑒 𝑠𝑝𝑟𝑒𝑎𝑑 𝑜𝑢𝑡 𝑎𝑠 𝑡ℎ𝑒 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑖𝑠 ℎ𝑖𝑔ℎ𝑒𝑟 𝑡ℎ𝑎𝑛 𝑡ℎ𝑒 𝑓𝑟𝑖𝑒𝑛𝑑𝑠 𝑠𝑡𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛,
𝑚𝑒𝑎𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑓𝑟𝑖𝑒𝑛𝑑𝑠 ℎ𝑒𝑖𝑔ℎ𝑡𝑠 𝑎𝑟𝑒 𝑚𝑜𝑟𝑒 𝑐𝑙𝑜𝑠𝑒𝑟 𝑡𝑜𝑔𝑒𝑡ℎ𝑒𝑟 𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑑 𝑡𝑜 ℎ𝑒𝑟 𝑓𝑎𝑚𝑖𝑙𝑦.
Heights of family
𝒙
179

f) How many people in her family lie within one standard deviation of the mean? 184
169 − 27,86 ; 169 + 27,86 156
174
∴ 141,14 ; 196,86 220
135
∴ 4 𝑝𝑒𝑜𝑝𝑙𝑒
135
Example 8:
Five numbers, 4; 8; 10; 𝑥 and 𝑦, have a mean of 10 and a standard deviation of 4. Find the values of 𝑥 and 𝑦.
ഥ = 𝟏𝟎
𝒙 𝜎=𝟒 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
𝑥ҧ =
𝟐
𝑛
𝒙 ഥ
𝒙−𝒙 ഥ
𝒙−𝒙
4 36 4 + 8 + 10 + 𝑥 + 𝑦
4 − 10 = −6 10 =
5
8 8 − 10 = −2 4
50 = 22 + 𝑥 + 𝑦
10 10 – 10 = 0 0
28 = 𝑥 + 𝑦
𝑥 𝑥 − 10 𝑥 2 − 20𝑥 + 100
𝑦 𝑦 − 10 𝑦 2 − 20𝑦 + 100 

∑ 𝑥 − 𝑥ҧ 2
2
෍ 𝑥 − 𝑥ҧ = 36 + 4 + 0 + 𝑥2 − 20𝑥 + 100 + 𝑦2 − 20𝑦 + 100 𝜎=
𝑛
= 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
4=
5
𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
16 =
5
80 = 𝑥 − 20𝑥 + 𝑦 2 − 20𝑦 + 240
2

0 = 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 160



0 = 𝑥 2 − 20𝑥 + 𝑦 2 − 20𝑦 + 160 
28 = 𝑥 + 𝑦 

𝑓𝑟𝑜𝑚 1 𝑥 = 28 − 𝑦 

𝑠𝑢𝑏 𝑥 𝑖𝑛 2
0 = 28 − 𝑦 2 − 20 28 − 𝑦 + 𝑦 2 − 20𝑦 + 160
0 = 784 − 56𝑦 + 𝑦 2 − 560 + 20𝑦 + 𝑦 2 − 20𝑦 + 160
0 = 2𝑦 2 − 56𝑦 + 384
0 = 𝑦 2 − 28𝑦 + 192
0 = 𝑦 − 16 𝑦 − 12
𝑦 = 16 𝑜𝑟 𝑦 = 12

𝑠𝑢𝑏 𝑦 𝑖𝑛 3
𝑥 = 28 − 16 𝑜𝑟 𝑥 = 28 − 12
𝑥 = 12 𝑜𝑟 𝑥 = 16

∴ 𝑚𝑦 𝑡𝑤𝑜 𝑛𝑢𝑚𝑏𝑒𝑟𝑠 𝑎𝑟𝑒 12 𝑎𝑛𝑑 16


Exercise 11.3 pg. 347 no. b,d,f,h
LESSON 4
Exercise 11.3 pg. 347 no. b,d,f,h

Calculator change to:


frequency table off
Calculator change to:
frequency table on
Distribution of data
NORMALLY DISTRIBUTED OR SYMMETRICAL DATA

Mean= Median
ഥ − 𝑸𝟐 = 𝟎
𝒙

If the data to the left of the


median balances with the data on
the right, then the mean, median
and mode will be the same.
The data is said to be normally
distributed
or symmetrical.
DISTRIBUTIONS THAT ARE NOT NORMALLY DISTRIBUTED Mean > Median
Skewed to the right / positively skewed ഥ − 𝑸𝟐 > 𝟎
𝒙
Skewed right implies that there is a mass of
data values on the left side of the
distribution with fewer higher vales on the
right. Skewed right implies that the data is
spread more to the right of the median, the
data is said to be positively skewed or
skewed to the right.
Skewed to the left / negatively skewed Mean < Median
ഥ − 𝑸𝟐 < 𝟎
𝒙
Skewed left implies that there is a mass of
data values on the right side of the
distribution with fewer higher vales on
the left. Skewed left implies that the data
is spread more to the left of the median,
the mode will be closer to the right of the
distribution, the data is said to be
negatively skewed or skewed to the left.
Distribution of data (graphical)
Bell curve

98 or 99 or 100%
95%
68%

-𝜎 ഥ
𝒙 +𝜎
-2𝜎 +2𝜎
-3𝜎 +3𝜎
Exercise 11.4 pg. 352 no. a,c,e
LESSON 5
Exercise 11.4 pg. 352 no. a,c,e
Calculator change to:
frequency table off
Calculator change to:
frequency table on
Outlier
Outlier
Exercise 11.5 pg. 357 no. a,b,e
LESSON 5
Exercise 11.5 pg. 357 no. a,b,e

Calculator change to:


frequency table off
Bivariate data: scatter plots, least squares regression line
(line of best fit) and correlation coefficient.
Bivariate data – relationship between two types of data (two variables)

• We display bivariate data graphically in a scatter plot


• 𝑥 −axis → independent variable
• 𝑦 − axis → dependant variable

The questions normally consist of three parts:


• Draw a scatter plot.
• Determine the equation of the least squares regression line and draw the line of best fit.
• Determine the correlation coefficient and comment on it.
Bivariate data: Scatter plot
The shape of the graph tells us about the relationship between the two variables.
Ask yourself these questions when looking at the scatter plot:

 Is there a positive or negative association?


 Is the relationship strong or weak?
 Is the relationship linear or non linear?
 Are there outliers?
 Are there groupings?
Bivariate data: scatter plots
Positive association  Negative association
 Strong relship  Strong relship
 Linear  Linear
 No outliers  No outliers
 No groupings  No groupings

 Negative association
 Weak relship Positive association
 Linear  Weak relship
 No outliers  Linear
 No groupings  No outliers
 No groupings
Bivariate data: scatter plots
Positive association Positive association
 Strong relship  Strong relship
 Exponential  Parabolic
 No outliers  No outliers
 No groupings  No groupings

Positive association Positive association


 Strong relship  Strong relship
 Linear  Linear
 One outliers  No outliers
 No groupings  Two groupings
Bivariate data: correlation coefficient - r
Correlation coefficient – determines the strength of the relationship between the two variables.
The value can vary between −1 (negative) and 1 (positive).

1 Perfect positive correlation


Strong positive correlation
0.8
Moderate strong positive correlation
0.5
Weak positive correlation
0 No correlation
Weak negative correlation
-0.5
Moderate negative correlation
-0.8
Strong negative correlation
-1 Perfect negative correlation
Bivariate data: correlation coefficient - r

𝑟=1 𝑟 = −0,9999
∴ Perfect positive ∴ Strong negative
correlation correlation

𝑟 = −0,0009 𝑟 = −0,68
∴ almost no correlation ∴Moderate negative
correlation
Bivariate data: regression line
Regression line – the equation for the line of best fit.

𝑦ො = 𝑎 + 𝑏𝑥

constant value The gradient of


(𝑦 − cut) the regression line

To plot the line of best fit – use 𝑦-cut and point (𝑥;ҧ 𝑦)

Interpolation – using the regression line to predict values within the data range

Extrapolation – using the regression line to predict values outside the data range
Bivariate data
The table represents the distance in meters required by a car to apply breaks and reach a
standstill when it is travelling at a given speed.
Speed in km/h 20 40 60 80 100 120 140
Breaking distance in m 6 16 3 48 70 80 110

Enter the data in your calculator in order to find the values of 𝑟, 𝑎, 𝑏, 𝑥ҧ and , 𝑦ത .

𝑎 = −24,9
𝑏 = 0,9
𝑟 = 0,95
𝑥ҧ = 80
𝑦ത = 47,6
Bivariate data
Mrs du Trevou owns an ice-cream shop. She analyses her ice-cream sales over a randomly selected 14 days. She
then compared her sales with the temperature on the day of each sale. The results of her survey are shown in the
table below:

Temp in OC 20 13 18 23 19 28 21 38 26 22 17 14 15 16
Ice creams sold 116 85 107 139 123 172 128 127 148 124 112 89 101 96

a) Draw a scatter plot representing this data


b) Discuss the relationship between the data
c) Describe the correlation:
1) including the outlier
2) excluding the outlier
d) Determine the line of best fit
1) including the outlier
2) excluding the outlier
e) Using the regression line that was calculated without the outlier, predict how many ice creams Mrs du Trevou
will sell if the temperature is:
1) 5oC
2) 24oC
Bivariate data
a) Draw a scatter plot representing this data
200
180
No of ice creams sold 160
140
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius

b) Discuss the relationship between the data

Positive association, strong relationship, Linear, 1 outlier, no groupings


Bivariate data
c) Determine the correlation coefficient and describe the correlation:
𝑟 = 0,7135 ∴ Moderate strong positive correlation

d) Determine the equation for the line of best fit.


𝑎 = 66,0467 𝑏 = 2,5598
∴ 𝑦ො = 2,6𝑥 + 66,0

e) Draw the line of best fit, by finding the y-cut and the point (𝑥;ҧ 𝑦)

Bivariate data
e) Draw the line of best fit, by finding the y-cut and the point (𝑥;ҧ 𝑦)

𝑦ො = 2,6𝑥 + 66,0
𝑦 − 𝑐𝑢𝑡 = 66
𝑥;ҧ 𝑦ത = (20,7 ; 119,1)

200
180
160
No of ice creams sold

140
120
100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius
Bivariate data
f) Determine the correlation coefficient as well as the line of best fit if the outlier is excluded.
𝑟 = 0,9756 ∴ Strong positive correlation
𝑎 = 16,22235948 𝑏 = 5,27424336
∴ 𝑦ො = 5,3𝑥 + 16, 2
g) Draw the new line of best fit.
𝑦 − 𝑐𝑢𝑡 = 16,2 200
180
𝑥;ҧ 𝑦ത = (19,4 ; 118,5) No of ice creams sold 160
140
h) How does the outlier 120
effect the data? 100
80
60
40
20
0
0 5 10 15 20 25 30 35 40
Temp degrees Celsius
Bivariate data
g) Using the regression line that was calculated without the outlier, predict how many ice creams
Mrs du Trevou will sell if the temperature is. Explain if this is a valid prediction.
1) 5oC
𝑦ො = 5,3𝑥 + 16,2
𝑦ො = 5,3(5) + 16,2
𝑦ො = 42,7
∴ 42 𝑖𝑐𝑒 𝑐𝑟𝑒𝑎𝑚𝑠
𝑁𝑜𝑡 𝑣𝑎𝑙𝑖𝑑 𝑎𝑠 𝑖𝑡 𝑖𝑠 𝑒𝑥𝑡𝑟𝑎𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛

2) 24oC
𝑦ො = 5,3𝑥 + 16,2
𝑦ො = 5,3 24 + 16,2
𝑦ො = 143,4
∴ 143 𝑖𝑐𝑒 𝑐𝑟𝑒𝑎𝑚𝑠
𝑉𝑎𝑙𝑖𝑑 𝑎𝑠 𝑖𝑡 𝑖𝑠 𝑖𝑛𝑡𝑒𝑟𝑝𝑜𝑙𝑎𝑡𝑖𝑜𝑛
Worksheet no 1 – 5
In class and for homework
LESSON 6

You might also like