0% found this document useful (0 votes)
7 views44 pages

Single & Bivariate Statistics Solutions

Uploaded by

ibexampaper2017
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views44 pages

Single & Bivariate Statistics Solutions

Uploaded by

ibexampaper2017
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 5 – Single variable and bivariate statistics

Solutions to Pre-test
1 b
a –J i 12 + 7 = 19 scores were at least 60.
ii 2 + 3 + 6 + 12 = 23 scores were less
b –G than 80.

c –R c There were 30 scores in total.

d –K d The number of scores in the 20 – 39 range


3
e –L was  10%.
30
f –I
4
g –A
a 6  10
h –H 2
16
i –F 
2
j –B 8
k –C
b 89
l –M 2
17
m –O 
2
n –D  8.5

o –E
c 2 459
p –P 4
20
q –Q 
4
r –N 5

2
a Eight customers rented 3 DVDs.
d 3  5  8  10  14
5
b Forty customers were surveyed. 40

c 5 × 0 + 1 × 8 + 2 × 14 + 3 × 8 + 4 × 3 5
+ 5 × 2 = 82 8
82 DVDs were rented during the survey.

d The number of customers renting fewer


than two DVDs was 8  5  13 .
3
a Six scores were in the 40 – 59 range.

© Cambridge University Press & Assessment 2024 1


5
a 38, 41, 41, 47, 58
i
38  41  41  47  58
5
225

5
 45
ii 41

iii 38, 41, 41, 47, 58

iv 58 – 38 = 20

b 2, 2, 2, 4, 6, 6, 7, 9, 10, 12

i 2  2  2  4  6  6  7  9  10  12
10
60

10
6

ii 2
iii 2, 2, 2, 4, 6, 6, 7, 9, 10, 12
66
6
2
iv 12 – 2 = 10

6
a Fifteen calculators are represented in
the plot.

b The mode is 111 and 139

c The minimum calculator weight is 98 g


The maximum calculator weight is 145 g

d 145 – 98 = 47

© Cambridge University Press & Assessment 2024 2


Solutions to Exercise 5A
1 g G
a The larger the population sample, the more Statistics: The practise of collecting and
accurately the results describe the analysing data
population.
h C
Confidentiality: The state of being secret
b Approximately 380 000
3
c Approximately 295 000
a B
f Spelling test Numerical: Data that are numbers

g In order to compare ability, words should b E


be the same however should also include Continuous: Numerical data that can take
age appropriate words for each generation. any value in a given range (measurable
e.g. time, weight, height, distance, speed)
h The use of technology to record answers
c C
i Words common to both age groups Discrete: Numerical data that take on a
limited number of values (countable e.g.
j The survey should be conducted in a how many?)
common area where participants feel
comfortable e.g. A library, school room or d D
park. Categorical: Data that can be divided into
categories
2
e F
a E Ordinal: Categorical data that can be
Population: All the people or objects in ordered
question
f A
b F Nominal: Categorical data that has no
Census: Statistics collected from an entire order
population
4
c A
Sample: A group chosen from the A Categorical
population E.g. Blue, Green, Yellow…

d B B Categorical
Survey: A tool used to collect statistical E.g. Holden, Ford, Mercedes…
data
C Numerical
e H E.g. 12 mins, 2 hrs, 37 mins
Data: The factual information collected Students will provide a number answer
from a survey or other source
D Categorical
f D E.g. Jack Russell, Labrador, Poodle…
Variable: An element or feature that can
vary

© Cambridge University Press & Assessment 2024 3


5 C If the principal wanted to determine
A Numerical student opinion on Mathematics classes
E.g. 1 time, No times, 3 times… taught at this school, this would be the
best way to determine this information.
B Numerical
E.g. $3, $8, $19… D By giving 20% of students in every class
the survey, a large sample of the
C Numerical population has been surveyed. With no
1 item, 3 items, 4 items… bias towards those who like/dislike
Mathematics as a subject.
D Categorical
E.g. McDonalds, Subway, Wendys… 11 Answers may vary.

6 12 Answers may vary.


a Numerical - discrete
Counting numbers (How many?)

b Numerical – discrete
Counting numbers (How many?)

c Categorical – nominal

d Numerical – continuous
Measuring

e Categorical – ordinal
The higher the category, the better your
ability

7 D
An election is a process of census,
statistics collected from an entire sample.

8 C
A representative sample of the population
should be chosen. The sample needs to
accurately portray the population of
households with TV’s.

9
A Students who arrive to school earliest are
often those who enjoy school, therefore
perhaps are more likely to have a higher
opinion of Mathematics.

B Students who are studying the most


advanced maths subject in the school are
usually those who have a better
understanding of Mathematics, therefore
having a higher opinion of the topic.

© Cambridge University Press & Assessment 2024 4


Solutions to Exercise 5B
1 4
a This is a set of numbers, so it is numerical a
data.
i

b This is a set of terms, so it is categorical


data. There is no order to the set, so it is
nominal categorical data.

c This is a set of terms, so it is categorical


data. The terms can be ordered (low/ ii
medium/ high), so it is ordinal categorical
data.

d This is a set of numbers, so it is numerical


data.

2
a Car
Tally Frequency
colour
red III 3
white IIII 5
green II 2 b
silver II 2
Total 12 12 i

b Class Percentage
Frequency
interval Frequency
80–84 8 16%
85–89 23 46%
90–94 13 26%
95–100 6 12%
Total 50 100%
ii
3 Use the histogram to answer each of the
questions.

a 8 (the frequency of the 154.5 column,


which represents 150 to 160 minutes)

b Sum all the frequencies 1 + 3 + 4 +


8 + 7 + 2 = 25 cars in the race

c (i) 1 + 3 + 4 = 8
8
(ii)  100  8  4  32%
25

© Cambridge University Press & Assessment 2024 5


5 c The 5–9 interval has the greatest tally
and highest frequency.
a
Class Freq Percentage
interval frequency d Number of teams with more than
50–59 3 3 5 wins = 9 + 5 + 1 = 15
 100  12% Percentage of teams with more than 5
25 15
wins = 20 × 100% = 75%
60–69 5 5
 100  25%
25 7
70–79 9 9
 100  45% a Type of
Frequency
Percentage
25 transport Frequency
Car 16 40%
80–89 5 5
 100  25% Train 6 15%
25 Tram 8 20%
90–99 3 3 Walking 5 12.5%
 100  12% Bicycle 2 5%
25 Bus 3 7.5%
Total 25 100% Total 40 100%

b
b
i The frequency of people who travel by
train is 6.
ii The most popular form of transport is
the car.
iii 40% of people travel by car.
iv 12.5 + 5 = 17.5% of people walk or
cycle to work.
v The percentage of people who travel by
public transport is 15 + 20 + 7.5 =
c 70–79 has the highest frequency 42.5%.
d 5 + 3 = 8 students who scored 80 or 8
more.
8 a A skewed data set – more of the data is
 100  40% on the left-hand side at 10 then trails off
25 towards 70.
6 b A symmetrical data set – the highest bar
a Class
Tally Frequency
Percentage is in the centre and then the bars either
interval Frequency
0–4 IIII 5 25% side are approximately the same height.
IIII 9
5–9 9 45%
IIII a Percentage
Mass Frequency
10–14 IIII 5 25% Frequency
15–19 I 1 5% 10– 3 6%
Total 20 100% 15– 6 12%
20– 16 32%
25– 21 42%
b 30–
4 8%
35
Total 50 100%

b In total, 50 mice were weighed in the


experiment.

© Cambridge University Press & Assessment 2024 6


c 32% of the mice were in the 20-gram c From the graph it can be seen that
interval. females outlive males by
approximately 5–10 years in all
d The most common weight interval was
countries excluding South Africa
the 25-gram weight interval – that is,
where males have a higher life
mice with a mass equal to or greater
expectancy.
than 25 grams and less than 30 grams.
d Living conditions E.g. Poverty, access
e 42% of the mice were in the most
to clean water, medical facilities
common weight interval.
High prevalence of HIV/AIDS
f Only 6% of the mice had a mass less
13
than 15 grams, so 94% of the mice had
a Saturday and Sunday
a mass of 15 grams or more.
The vendor would expect higher sales
10 on the weekend.
a Percentage
Section Frequency b A warm day
Frequency
String 21 52.5% A public holiday
Wood
8 20%
wind c
Brass 7 17.5%
Percussion 4 10% i Wednesday
Total 40 100% Cost = $200
b In total there are 40 students in the Revenue = $450
school orchestra. Profit = 450 – 200 = $250

c 52.5% of the students play in the ii Thursday


string section. Cost = $150
Revenue = $50
d 47.5% of the students do not play in Loss = 150 – 50 = $100
the string section.
d The graph does not help us visualise
4 the ice-cream sales in terms of profit
e  0.0930
40  3 and loss. The graph would be clearer if
If three more students joined the the bars were standing side by side
orchestra in the string section, the rather than one on top of the other.
percentage of percussionists would be
9.3%.

11 Individual scores are lost in a


histogram. The bars show the
frequency of a numerical category
rather than a specific value.
12
a Russia (approximately 14 years)
Male ≈ 59
Female ≈ 73
b Pakistan (approximately 1 year)
Male ≈ 61
Female ≈ 62

© Cambridge University Press & Assessment 2024 7


Solutions to Exercise 5C
1 6
a A histogram – no gaps between bars a
b A dot plot – dots are used to represent i 15, 15, 20, 21, 22, 24, 25, 26, 26, 31,
frequencies at a particular value 37, 37, 38, 46, 52
c A bar graph – gaps between bars Stem Leaf
d A stem and leaf plot – frequencies are 1 55
represented by the units in the ‘leaf’ 2 0124566
3 1778
2 4 6
5 2

1|5 = 15

ii Skewed

3 b
a 3|2 = 32, 3|5 = 35, 4|1 = 41 i 12, 16, 21, 23, 25, 27, 31, 32, 35, 35,
∴32, 35, 41, 43, 47, 54, 54, 56, 60, 62, 36, 36, 38, 40, 40, 42, 44, 48, 51, 53,
71, 71 55

b 0|2 = 0.2, 0|3 = 0.3, 0|7 = 0.7 Stem Leaf


∴0.2, 0.3, 0.7, 1.4, 1.4, 1.8, 1.9, 2.3, 1 26
2.6, 2.6, 3.0, 3.5 2 1357
3 1255668
4 4 00248
5 135
a Eleven families were surveyed.
b Most common number of children is 1 1|2 = 12

c Total number of children = 0 × 1 + ii Approximately symmetrical


1 × 5 + 2 × 4 + 1 × 3 = 16
c
d The data is fairly symmetrical with
outliers for 0 and 3 children. i 116, 117, 118, 119, 121, 124, 125,
127, 133, 135, 137, 145, 147, 149,
5 153, 158
a 9 holes, there are 9 dots on the graph,
Stem Leaf
each representing a hole.
11 6 7 8 9
b 2×3 + 4×4 + 2×5 + 1×7 12 1 4 5 7
= 6 + 16 + 10 + 7 13 3 5 7
= 39 strokes 14 5 7 9
15 3 8
c The golfer had one hole with 7 16 0 2
strokes, apart from that he had 11|6 = 116
between 3 and 5 strokes.
ii Skewed

© Cambridge University Press & Assessment 2024 8


d ii Set 1 is approximately symmetrical.
i 2.0, 3.3, 3.7, 3.8, 4.3, 4.4, 4.5, 4.7, 4.8, Set 2 is positively skewed.
4.9, 5.2, 5.4, 5.5, 5.8, 6.1, 6.3, 6.5, 7.0
8
Stem Leaf a
2 0
3 378
4 345789
5 2458
6 135
7 0 b

2|0 = 2.0
ii Approximately symmetrical

7 c Nick’s performance is well spread


a
d Jack’s performance is irregular and
i Set 1: 38, 39, 40, 42, 42, 43, 46, 48, skewed toward the lower scores
53, 53, 57, 59, 61, 64
9
Set 2: 32, 35, 41, 47, 47, 52, 52, 55,
56, 60, 61, 63, 64, 67 a Fastest time = 20.5 minutes
Slowest time = 24.6 minutes
Set 1 Stem Set 2 24.6 – 20.5 = 4.1
98 3 25
863220 4 177 b The distribution is perfectly
9733 5 2256 symmetrical so the middle time is
41 6 01347
22.5minutes.
3|2 = 32
c A slower time will bring his average
ii Set 1 is approximately symmetrical, time down to a slower pace, average
Set 2 is slightly negatively skewed. might be leaning towards 23 minutes.

b 10

i Set 1: 164, 166, 168, 171, 172, 175, a Inner Stem Outer
176, 180, 181, 185, 187, 187, 188, city suburb
192, 195, 199, 201, 208 9643 0 349
11
Set 2: 160, 163, 163, 165, 167, 170, 9420 1 2889
171, 171, 174, 178, 182, 182, 186, 41 2 134
187, 190, 194 3 4
4 1
Set 1 Stem Set 2
864 16 03357
6521 17 01148 0|3 = 3 km
877510 18 2267
952 19 04
81 20
16|0 = 160

© Cambridge University Press & Assessment 2024 9


b Inner city travels are in the interval d City B (17–31°C) is considerably
1–24 km. warmer than City A (8–18°)
Outer suburb travellers have a greater
range from 3–41 km. e The two cities are in different places,
City B might be experiencing summer
c In the outer suburbs, students are less and City A might be in winter.
likely to live near their school, hence
needing to travel further distances.

11

a Stem Leaf
1 24
2 369b
a 14
4 7c8

a = 3 {1, 2, 3, 4}
b=9
c = 7 or c = 8

b Stem Leaf
20 a14
21 229
22 0b57
23 14

a = 0 or a = 1
b is between 0 and 5 inclusive.

12

a The stem 1 is allocated 0–4 (included)


and stem 1* is allocated 5–9
(included)

i 12° is in the 1 stem


(2 is between 0–4)

ii 5° is in the 0* stem
(5 is between 5–9)

c When there is too many leaves in one


category it allows for better analysis if
the stem is split in two.

© Cambridge University Press & Assessment 2024 10


Solutions to Exercise 5D
1 ii mode = 10
a The mode is the most frequently
occurring value in a data set. iii range = 50 – 5 = 45

b Dividing the sum of all the values by c


the total number of values gives the i 55  70  75  50  90  85  50  65  90
mean. 9
630
c The middle value of a data set ordered 
9
from smallest to largest is the median.
 70
d A data set with two most common ii bimodal = 50 and 90
values is bimodal.
iii range = 90 – 50 = 40
e A data set has a maximum value and a
minimum value of 2. The range is d
7 – 2 = 5.
27  30  28  29  24  12 150
2 i   25
6 6
a 2 4 6 7 8 10 11
ii no mode
b 6 9 [10 14] 17 20 iii range = 30 – 12 = 18
3
e
a 4 + 5 + 3 + 6 + 4 + 3 + 3 = 28 cups i 2.0  1.9  2.7  2.9  2.6  1.9  2.7  1.9
b 7 days 8
18.6

c 28 ÷ 7 = 4 8
On average, Sebastian drinks 4 cups of  2.325
coffee each day
4 ii mode = 1.9
a
iii range = 2.9 – 1.9 = 1
2  4  5  8  8 27
i   5.4
5 5
f
ii mode = 8 i 1.7  1.2  1.4  1.6  2.4  1.3
6
iii range = 8 – 2 = 6
9.6

b 6
 1.6
i 5  8  10  15  20  12  10  50
8 ii no mode
130 iii range = 2.4 – 1.2 = 1.2

8
 16.25

© Cambridge University Press & Assessment 2024 11


5 d The mean is skewed by the $50
outlier, which is much larger than the
a 1 4 7 8 12 other amounts

b 1 2 2 4 4 7 9 8 21  23  27  32  38  39  39  44  46
9
c 6 10 11 11 13 13 14 309

9
d 56 62 64 73 75 77 77 78 79  34.3
a
e 2 4 4 5 6 | 8 8 10 12 22
7 i range = 32 – 4 = 28

f 1 2 2 3 | 7 12 12 18 Stem Leaf
5 0 44
1 0259
g 27 30 31 | 36 38 40 2 178
33.5 3 2

h 2.0 2.4 2.8 3.1 | 3.2 3.5 3.7 3.9


3.15 ii mode = 4
6 4  4  6  6  6  8  9  9  11 Stem Leaf
9 0 44
63 1 0259
 2 178
9 3 2
7

iii mean = 17.2


a Mean = 7 hours

b 4 4 6 6 6 8 9 9 11 4  4  10  12  15  19  21  27  28  32
Median = 6 hours 10
172

c range = 11 – 4 = 7 hours 10
 17.2
d mode = 6
iv median = 17
7
Stem Leaf
a range = 50 – 8 = $42 0 44
1 025|9
b 8 12 12 15 | 20 24 25 50 2 178
17.5 3 2
Median value = $17.50
b
12  15  12  24  20  8  50  25 i range = 125 – 101 = 24
8 Stem Leaf
166 10 124
c 
8 11 26
 $20.75 12 5

© Cambridge University Press & Assessment 2024 12


ii No mode 9
a
iii mean = 110
i Hugh’s mean score is 76.4%
101  102  104  112  116  125
6 68  68  65  77  73  85  84  82  81  81
660 10

6 764

 110 10
iv median = 104 | 112
 76.4%
= 108 Mark’s mean score is 83.6%
Stem Leaf
10 124| 64  74  77  82  84  86  88  92  94  95
11 |26 10
12 5 836
c 
10
i range = 6.2 – 3.0 = 3.2  83.6%

Stem Leaf ii Hugh’s median score is 77 | 81 = 79%


3 005
4 27 Mark’s median score is 85%
5 133
6 02 Hugh Stem Mark
885 6 4
|7 3 7 47
ii bimodal = 3.0 and 5.3
5 4 2 1 1| 8 24|68
9 245
Stem Leaf
3 005
iii Hugh’s range is from 65 to 85 = 20%
4 27
5 133 Mark’s range is from 64 to 95 = 31%
6 02
iv Mark’s scores had a larger range,
Hugh therefore was the more
iii mean = 4.63
consistent performer.
3  3  3.5  4.2  4.7  5.1  5.3  5.3  6  6.2
Mark had a higher mean and median,
10 meaning overall Mark achieved higher
46.3
 results than Hugh on the exams.
10
 4.63 10 The median is a better measure of
centre when such a large value has
iv median = 4.7 | 5.1 been included as the mean will
= 4.9 increase.
The mean of $536 000 does not
Stem Leaf represent the ‘average’ house price as
3 005
4 out 5 houses sold for considerably
4 27|
5 |133 less than that.
6 02

© Cambridge University Press & Assessment 2024 13


11 12
a Inn Score Total Moving
a median = 4 wins Average
1 26 26 26
 26
1
2 38 26 + 38 = 64 64
 32
2
3 5 64 + 5 = 69 69
 23
3
4 10 69 + 10 = 79 79
0 1 2 3 4 5 6 7  20
4
5 52 79 + 52 = 131 131
 26
5
b Mean = 3.7 wins/season 6 103 131 + 103 = 234
234  39
0 2 23 4 4555 7 6
7 75 234 + 75 = 309 309
10  44
37 7
 8 21 309 + 21 = 330 330
10  41
 3.7 8
9 33 330 + 33 = 363 363
c  40
9
10 0 363 + 0 = 363 363
i Median = 4 (no effect)  36
10
b

0 1 2 3 4 5 6 7
c

ii Mean = 3.6 (decreased slightly) i The score graph fluctuated wildly with
0 2 233 4 45557 a range of 0–103
10 ii The average graph is fairly constant
40 with small increases and decreases, a
 range of 20 to 40.
11
 3.6 d The moving average graph follows the
trend of the score (increases with good
scores and decreases with poor scores)
but the fluctuations are much less
significant.
The moving average graph gives a
better overview of the batsman’s
ability.

© Cambridge University Press & Assessment 2024 14


Solutions to Exercise 5E
1 b 10 10 11 14 14 | 15 16 18 20 21
a The requirements for a five-figure Q1 14.5 Q3
summary are the minimum value (Min),
the lower quartile (Q1 ) , the median i Q1 = 11, Q3 = 18
(Q2 ) , the upper quartile (Q3 ) and the
ii IQR = Q3 – Q1
maximum value (Max).
= 18 – 11
b The range is the difference between the =7
highest and lowest data points,
representing the spread of the entire c 41 49 | 53 58 | 59 62 | 62 65
 51  58.5  62 
data set.
The interquartile range (IQR) represents
the spread of the middle 50% of the i Q1 = 51, Q3 = 62
data, and is calculated by finding the
ii IQR = Q3 – Q1
difference between the upper and lower
quartiles. = 62 – 51
= 11
c An outlier is a data point outside the
vicinity of the rest of the data. d 1.2 1.7 | 1.9 2.2 | 2.4 2.5 | 2.9 3.2
2 1.8  2.3  2.7 

a 0 1 1 1 1 | 2 2 2 3 3
(1.5) i Q1 = 1.8, Q3 = 2.7

ii IQR = Q3 – Q1
b
= 2.7 – 1.8
i 0 1 1 1 1 | 2 2 2 3 3 = 0.9
Q1 (1.5)
5
ii 0 1 1 1 1 | 2 2 2 3 3 a
Q1 (1.5) Q3
i 1 2 4 8 10 11 14
Q1 M Q3
3
ii IQR = Q3 – Q1
a Q3 – Q1 = 8 – 3 = 5 = 11 – 2
=9
b Q1 – 1.5 × IQR = 3 – 1.5 × 5 = –4.5
Q3 + 1.5 × IQR = 8 + 1.5 × 5 = 15.5 b
c 18 is considered to be an outlier as it
i 1 2 2 3 5 7 8 9 10 12 14
above the upper fence value of 15.5. Q1 M Q3

4 ii IQR = Q3 – Q1
a 3 4 6 | 8 8 10 = 10 – 2
Q1  7 Q3
=8
c
i Q1 = 4, Q3 = 8

ii IQR = Q3 – Q1 i 0.8 0.9 | 1.1 1.1 1.2 1.3 1.5 | 1.7 1.9
1 M 1.6 
=8–4
=4

© Cambridge University Press & Assessment 2024 15


ii IQR = Q3 – Q1 v Q1  1.5 IQR  5  1.5 4  -1
= 1.6 – 1 Q3  1.5 IQR  9  1.5 4  15
= 0.6
There are no outliers.
d
b 16, 18, 21, 23, 24, 25, 25, 26, 27, 29, 31
i 4 7 9 | 12 13 15 16 18 18 21
10.5 M i Max  31, min  16
| 24 24 33 ii Q2  25
 22.5
iii Q1  21 and Q3  27
ii IQR = Q3 – Q1
= 22.5 – 10.5 iv IQR  27  21  6
= 12 v Q1  1.5 IQR  21  1.5 6  12
6 Q3  1.5 IQR  27  1.5 6  36
a There are no outliers.
i Max  17, min  0

ii Rearranged in ascending order: c 2, 8, 10, 10, 11, 12, 12, 13, 13, 15, 24
0, 8, 9, 10, 10, 11, 12, 13, 14, 14, i Max  24, min  2
15, 15, 15, 15, 17 ii Q2  12
Median  13
iii Q1  10 and Q3  13
iii Q1  10 and Q3  15
iv IQR  13  10  3

iv IQR  15  10  5 v Q1  1.5 IQR  10  1.5 3  5.5


Q3  1.5 IQR  13  1.5 3  18.5
b Q1 − 1.5 × IQR = 10 − 1.5 × 5 =
2.5 2 and 24 are outliers
0 is an outlier.
c Answers vary; Roads may have been
d 1 3 4 | 4 4 6 | 8 8 10 | 10 11 17
closed that day  4  7 10 
d Max = 17, min = 8, so range = 17 −
8 = 9. The IQR remains unchanged, i Min = 1
and is still 5. There are no longer any Max = 17
outliers, but the median remains the
same. The range has reduced from the ii Median = 7
original dataset.
iii Q1= 4
7 Q3 = 10
a 4, 5, 5, 5, 7, 8, 9, 9, 10, 14
iv IQR = Q3 – Q1
i Max  14, min  4
= 10 – 4
78 =6
ii Q2   7.5
2
v Q1 – 1.5 × IQR
iii Q1  5 and Q3  9 = 4 – 1.5 × 6
iv IQR  9  5  4 = –5

© Cambridge University Press & Assessment 2024 16


Q3 + 1.5 × IQR Q3 + 1.5 × IQR
= 10 + 1.5 × 6 = 15.5 + 1.5 × 9
= 19 = 29

No outliers The last two pieces of luggage


weighing 30 and 31 kg are outliers and
8 will need to undergo a further check as
a 25, 32, 36, 40, 43, 46, 48, 52, 52, 53, 60, 128 the upper fence is 29 kg.
i Max  128, min  25
10 350 850 | 900 1000 1000
46  48 (875) M
ii Q2   47
2 1100 1100 | 1200 1700
1150
36  40 52  53
iii Q1   38 and Q3   52.5
2 2 IQR = Q3 – Q1
iv IQR  52.5  38  14.5 = 1150– 875
= 275
v Q1  1.5  IQR  38  1.5  14.5  16.25 Q1 – 1.5 × IQR
Q3  1.5 IQR  52.5  1.5  14.5  74.25 = 875 – 1.5 × 275
The only outlier is 128. = 462.50
Q3 + 1.5 × IQR
615
vi Mean   51.25 = 1150 + 1.5 × 275
12
= 1562.5
b The mean is more heavily distorted by The fridges priced at $350 and $1700
the outlier than the median, so the are considered outliers as they are not
median is the better measure of the in the interval
centre of this data. $(462.4 50 – 1562.50)

c The larger number of buttons on one 11


calculator could be explained by that a 1 16 | 18 20 | 24 26 | 28 30
calculator being more advanced than 17   22   27 
the others, perhaps having a full
keyboard in addition to its normal IQR = Q3 – Q1
function keys. = 27– 17
= 10

9 1 4 5 5 6 | 7 7 7 8 8 | 10 10 b Q1 – 1.5 × IQR
 6.5  9 = 17 – 1.5 × 10
10 13 15 | 16 17 19 30 31 =2
15.5
Q3 + 1.5 × IQR
IQR = Q3 – Q1 = 27 + 1.5 × 10
= 15.5 – 6.5 = 42
=9
The value of 1 is an outlier as it is not
Q1 – 1.5 × IQR within the interval (2–42).
= 6.5 – 1.5 × 9
= -7

© Cambridge University Press & Assessment 2024 17


c 1 16 | 18 20 24 26 28 | 30 32 12
17  M  29  a There are eleven data points, so the sixth
IQR = Q3 – Q1 will be the median. Hence, the existing
= 29 – 17 median is 35. The data point 42 could
increase indefinitely; it would not change
= 12
the median. However, if 42 became less
Q1 – 1.5 × IQR than 35, the median would become that
= 17 – 1.5 × 12 data point since that data point would be
= –1 the sixth largest point in the dataset.
Q3 + 1.5 × IQR Hence, the possible values are 35 and
= 29 + 1.5 × 12 above.
= 47 b The existing Q1 is 21; Q3 is 49. If 42
No outliers as all values are within the became larger than 49, Q3 would change,
interval (–1–47). changing the IQR. If 42 became less than
21, Q1 would change, changing the IQR.
Hence, 42 can change to between 21 and
49 inclusive.
13 This is a research activity and students'
responses will vary.

© Cambridge University Press & Assessment 2024 18


Solutions to Exercise 5F
1 5
box a
whisker
i 1 2 8 11 12 | 15 15 17 18 19
Q1 13.5 Q3

Min = 1
Q1 = 8
min Q1 Median Q3 max
Median = 13.5
Q3 = 17
2
a Median = 15 Max = 19

b Minimum = 5 ii

c Maximum = 25

d Range = 25 – 5 = 20 b
i 0 1 1 | 2 3 3 | 3 4 4 | 4 4 5
e Lower quartile = 10 1.5  3  4
Min = 0
f Upper quartile = 20 Q1 = 1.5
g IQR = 20 – 10 = 10 Median = 3
Q3 = 4
Max = 5
3
a ii

b c
i 117 118 | 118 119 120 120
118 M

121 | 122 124


121.5
4
Min = 117
a The top 25% of data are above Q3. Q1 = 118
Median = 120
b The middle 50% of data are between Q3 = 121.5
Q1 and Q3. Max = 124

c The lowest of first 25% of data are


ii
between the minimum and Q1.

d The highest or last 25% of data are


between Q3 and the maximum.

© Cambridge University Press & Assessment 2024 19


d c 11, 16, 18, 19, 20, 21, 21, 22, 22, 23,
23, 24, 26, 31
i 20 22 25 28 30 32 34 40 41
Q1 M
21  21
i,ii Q2   21
47 49 62 66 82 85 2
Q3
Q1  19 and Q3  23
Min = 20
IQR  23  19  4
Q1 = 28
Q1  1.5  IQR  19  1.5  4  13
Median = 40
Q3 = 62 Q3  1.5  IQR  23  1.5  4  29
Max = 85 11 and 31 are outliers.

ii iii

d 0.02, 0.03, 0.03, 0.03, 0.04, 0.04, 0.05,


6 0.05, 0.06, 0.07
a 2, 3, 4, 4, 4, 5, 6, 6, 7, 8, 13
0.04  0.04
i, ii Q2  5 i,ii Q2   0.04
2
Q1  4 and Q3  7 Q1  0.03 and Q3  0.05
IQR  7  4  3 IQR  0.05  0.03  0.02
Q1  1.5  IQR  4  1.5  3  -0.5 Q1  1.5  IQR  0.03  1.5  0.02  0
Q3  1.5  IQR  7  1.5  3  11.5
Q3  1.5  IQR  0.05  1.5  0.02  0.08
13 is an outlier.
There are no outliers.
iii iii

7
b 1.1, 1.4, 1.6, 1.7, 1.8, 1.8, 1.8, 1.9, 1.9, a
2.0, 2.2 i 1.6 1.9 2.0 | 2.0 2.1 2.2 | 2.2
i, ii Q 2  1.8  2.0  2.2

Q1  1.6 and Q3  1.9 2.4 2.5 | 2.7 3.8 3.9


 2.6
IQR  1.9  1.6  0.3
Q1  1.5  IQR  1.6  1.5  0.3  1.15 i Min = 1.6
Q3  1.5  IQR  1.9  1.5  0.3  2.35
ii Q1 = 2.0
1.1 is an outlier.
iii Median = 2.2

iv Q3 = 2.6
iii
v Max = 3.9

© Cambridge University Press & Assessment 2024 20


vi IQR c
= Q3 – Q1
= 2.6 – 2.0
= 0.6 9
a Both boxplots have the same minimum of 1.
b Q1 – 1.5 × IQR
b The range of boxplot A is 18, and the
= 2.0 – 1.5 × 0.6
range of boxplot B is 20.
= 1.1 Boxplot B has the greater range.
Q3 + 1.5 × IQR
c i IQR = 11 – 6 = 5
= 2.6 + 1.5 × 0.6
= 3.5 ii IQR = 17 – 7 = 10

Outliers are 3.8 and 3.9 kg as they do d The data points for A have a very
narrow spread and are concentrated
not fit in the interval (0.6–3.5) kg.
around a value of 6.5, while the data
c points for B are more widely and evenly
spread.
10
a Both boxplots share a median of 14 and
8 an upper quartile of 17.
b The range of data set A is 6 and the
a 8 14 15 15 16 16 16 17 19 range of data set B is 12.
Q1 M Q3
Data set B has a wider range of values.
19 24
c i IQR  17  13  4
i Min = 8
ii IQR  17  12  5
ii Q1 = 15
d Data set B is more spread out overall,
iii Median = 16 although the portions of each data set
between the median and the upper
iv Q3 = 19 quartile are spread similarly.
11
v Max = 24 a Set 1: 1, 2, 3, 3, 4, 4, 6, 6, 6, 7, 7, 7, 8, 8,
12
vi IQR
Q2  6
= Q3 – Q1
Q1  3 and Q3  7
= 19 – 15
=4 IQR  7  3  4
Q1  1.5  IQR  3  1.5  4  -3
b Q1 – 1.5 × IQR Q3  1.5  IQR  7  1.5  4  13
= 15 – 1.5 × 4
There are no outliers.
=9
Set 2: 5, 7, 8, 9, 9, 11, 11, 12, 13, 13,
Q3 + 1.5 × IQR 14, 14, 15, 15, 16
= 19 + 1.5 × 4 Q 2  12
= 25
Q1  8 and Q3  14
8 is an outlier because it is less than IQR  14  8  6
the lower fence of 4 in the interval Q1  1.5  IQR  8  1.5  6  1
(4–25) days. Q3  1.5  IQR  14  1.5  6  23
There are no outliers.

© Cambridge University Press & Assessment 2024 21


b The second examiner consistently found more spelling errors than the first examiner.

© Cambridge University Press & Assessment 2024 22


Solutions to progress quiz
1 d Mean =
a Categorical and nominal 0×1+1×1+2×2+3×3+3×4+5×5+3×6+2×7
=
20
b Numerical and discrete
0+1+4+9+12+25+18+14 83
= 20 = 4.15
2 20

a days

e Range = 7 – 0 = 7 days

b The data is skewed.

c A total of 1 + 4 + 12 + 8 = 25 people c (i) The mode is 30.


were surveyed. 1 + 4 = 5 had fewer
5
than six hours of sleep. This is 25 × (ii) There are 20 data points. The 10th
and 11th datapoints are 30 and 30,
100 = 5 × 4 = 20% of those
respectively. Hence, the median is
surveyed has fewer than six hours of
30+30 60
= = 30.
sleep. 2 2

d The 6-8 interval is the most frequent 5


(with a frequency of 12). 8+15+23+12+3+19+42+33
a Mean = =
8
3 155
= 19.375
8
a Count the number of dots, which
b Order the dataset: 3, 8, 12, 15,
represents 20 students surveyed.
19,23,33, 42. Hence, the median is
b 5 days (the greatest frequency) 15+19
=
34
= 17.
2 2

4+5 9
c = 2 = 4.5 days is the median c Range = 42 – 3 = 39.
2

© Cambridge University Press & Assessment 2024 23


6 Hence, by ordering the dataset, we

a Order the dataset: 8, 10, | 11, 13, | 15, see that Q1 = 49, Q2 = 82, Q3 =

18, | 20, 24. 102

Q1 Q2 Q3 b There are no outliers. All data points are


10+11 21 less than Q3 + 1.5 × IQR = 102 + 1.5 ×
Hence, Q1 = = =
2 2
(102 − 49) = 181.5 and are larger than
18+20 38
10.5, Q3 = = = 19
2 2 Q1 − 1.5 × IQR = 49 − 1.5 × (102 −
b IQR = Q3 − Q1 = 19 − 10.5 = 8.5 49) = −30.5

7 c
a Order the dataset: 0, 49, 75, 82, 97,
102, 110.
Q1 M Q3

© Cambridge University Press & Assessment 2024 24


Solutions to Exercise 5G
1 b The data points for Gum Heights are
clustered more closely together than
a If data are more spread out from the the data points for Oak Valley, so
mean then the standard deviation is Gum Heights will have the smaller
larger. standard deviation.
b If data are more concentrated about 5
the mean then the standard deviation is 35 6 7 9
smaller. a Mean 
5
2 30

5
a Dot plot B has a greater concentration
6
of values at the higher end of the
range, so its mean will be higher.
(3) 2  (1) 2  02  12  32

b Dot plot A has data more spread out 5
from the mean than boxplot B, so the
 4
standard deviation of boxplot A will
be higher. 2

a Dot plot B has a greater concentration 11 4  5  7


of values at the higher end of the b Mean 
5
range, so its mean will be higher.
18

b Dot plot A has data more spread out 5
from the mean than boxplot B, so the  3.6
standard deviation of boxplot A will
(2.6)2  (2.6)2  (0.4)2  (1.4)2  (3.4)2
be higher. 
5
3  5.44
a Dot plot A will have a higher standard  2.3
deviation than dot plot B. 2  5  6  9  10  11  13
c Mean 
7
b Dot plot A has data spread evenly
56
across the range, while dot plot B has 
data more closely concentrated on the 7
mean. 8

4 (6) 2  (3) 2  (2) 2  12  2 2  32  52



a Gum Heights has more homes with 7
few plants and less homes with many 88

plants, so Gum Heights will have the 7
smaller mean number of trees or  3.5
shrubs.

© Cambridge University Press & Assessment 2024 25


d 28  29  32  33  36  37 c Both dwellings and schools are less
Mean  concentrated in the outer suburbs of a
6
city compared to the inner city region,
195
 so it is likely that students in the outer
6 suburbs will have further to travel to
 32.5 get to school.
(4.5) 2  (3.5) 2  (0.5) 2  (0.5) 2  (3.5) 2  (4.5) 2

6 8
65.5
 a False: Set B has more data values in
6
 3.3
the higher range than set A, so its
mean will be higher.
6 b True: Set A has a less concentrated
1 2  3  3 4  4  2 spread of data values than set B, so its
a Mean 
10 range will be greater.
27
 c True: Set A has a less concentrated
10 spread of data values than set B, so its
 2.7 standard deviation will be greater.
(1.7) 2  3  (0.7) 2  4  (0.3) 2  2  (1.3) 2
 9
10 3  1  1 2  3  3
8.1 a Mean 
 7
10 14
 0.9 
7
2
4  11  13  17  20  22
b Mean  3  (1) 2  02  3 12
6 
7
87
 6
6 
 14.5 7
(10.5)2  (3.5)2  (1.5)2  (2.5) 2  (5.5) 2  (7.5) 2  0.9

6 1 4  4  5  3  6
b Mean 
217.5 8

6 42

 6.0 8
 5.25
7
1 (1.25) 2  4  (0.25) 2  3  (0.75) 2

a The outer suburb school has more data 8
values in the higher range that the 3.5

inner city school, so its mean will be 8
higher.  0.7
10
b The inner city school has a more
a The range for set A is 6 and the range
concentrated spread of data values, so
for set B is 16, so the ranges are not
its standard deviation will be smaller.
equal.

© Cambridge University Press & Assessment 2024 26


b The mean for set B will be changed by c
the outlier, so no. i Research required. In essence
normally distributed data is
c The median is unaffected by the symmetrically distributed about the
outlier, so yes. mean, and the further from the mean a
d The standard deviation will be data point is the less likely it is to
affected by the outlier because one of occur.
the deviations would be calculated ii The percentages of normally
using the outlier. distributed data within 1, 2 or 3
11 standard deviations from the mean are:
a 1 SD: 68%
2 SDs: 95%
i x    69.16  16.0  85.16 3 SDs: 99.7%
This is close to the results for part b.
ii x    69.16  16.0  53.16

iii x  2  69.16  32.0  101.16

iv x  2  69.16  32.0  37.16

v x  2  69.16  48.0  117.16

vi x  3  69.16  48.0  21.16

i 33 students have scores between


53.16 and 85.16, or 66%.

ii 48 students have scores between


37.16 and 101.16, or 96%.

iii All 50 students have scores


between 21.16 and 117.16, or
100%.

© Cambridge University Press & Assessment 2024 27


Solutions to Exercise 5H
1 4
a The points form a nearly straight line, a
so this plot has a linear trend.

b The points form no obvious pattern,


so the plot has no trend.

c The points form a curved line, so the


plot has a non-linear trend.

d The points form a nearly straight line,


so the plot has a linear trend.
b The pass rate for the examination has
2 increased marginally over the 10
a i 20°C years, with a peak in 2020 and a
ii 30°C trough in 2015.
iii 30°C
iv 34°C c The highest pass rate was 87% in
2020.
b The maximum temperature graphed is 36°C.

c i The temperature stayed the same d In 2014 the pass rate was 74% by
from 12 noon to 1 p.m. 2018 this had jumped to 85%
ii The temperature dropped from 85 – 74 = an increase of 11%
3 p.m. to 4 p.m. 5
a Linear trends have points on or near a
d The points from 8 a.m. to 3 p.m. form a straight line, the house price in an
nearly straight line, then the last point
shows a noticeable decrease. Adelaide suburb are linear with a
This means that the temperature deviation in 2017.
increased in a linear way from 8 a.m. bi Every year house prices increase by
until 3 p.m., and at 3 p.m. the approximately $50 000, in 2021 would
temperature began to drop. be worth $650 000.
ii House prices would have increase by
3
$150 000. Therefore should be worth
a
approximately $750 000.
6
a

b The share price generally increased to


a peak in June, then decreased to its
lowest point in November before
trending upwards again in its final
month.

c The maximum share price over the year


was $1.43 and the minimum was $1.22,
so the difference was $0.21. b Sales are high in the summer months
and decrease in the cooler months.

© Cambridge University Press & Assessment 2024 28


c Strawberries are in season in the 9
warmer months. a Increases continually, rising more
rapidly as the years progress.
7 b Compound interest – exponential
ai 19 000 – 13 000 = $6000 growth.
ii 13 000 – 9000 = $4000 10
a Graphs may vary but it should
b 1 – December decrease from room temperature to the
temperature of the fridge. This should
c represent exponential decay. In other
words, approaches the fridge
temperature but never quite reaching
the fridge temperature.
b No. Drink cannot cool to a
temperature lower than that of the
internal environment of the fridge.
c Check with your teacher.

di The sales trend for City Central for the


6 months is fairly constant with a
small trough in October.

ii Sales for Southbank peaked in August


before taking a downturn.

e Approximately $5000 if the graph


continues.
8
ai 5.8 km

ii 1.7 km

bi The Blue Crest starts far away and


slowly gets closer to the machine

ii The Green Tail starts near the machine


and gets gradually farther from it

c If the trends continue, the Blue Crest


and Green Tail will be the same
distance from the machine at
approximately 8:30 p.m.

© Cambridge University Press & Assessment 2024 29


Solutions to Exercise 5I
1 b strong positive (↗)
a The height and width of doors varies
little and is based on aesthetics and their
planned use, so there is likely to be a
strong correlation between height and
width of doors.

b As more mass means more force is


required to accelerate there is likely to
be a strong correlation between weight
of car and fuel consumption.

c There is no direct relationship between


temperature and phone call length,
c strong negative (↘)
so there is unlikely to be a strong
correlation.

d Since the colour of a flower does not


give any indication to the scent of the
flower, there is unlikely to be a strong
correlation.

e As plants need water to grow, there is


likely to be a strong correlation between
amount of rain and vegetable size.

2
a increases
4
b decreases
a
3

a weak negative (↘)

b The trend line is sloped upwards, so the


correlation between x and y is positive.

c The upwards trend is clearly defined,


so the correlation is strong.

d (8, 1.0) is an outlier.

© Cambridge University Press & Assessment 2024 30


5 7
a
a

b The trend line is sloped downwards,


b Garden bed D seems to go against the
so the correlation between x and y is
general trend.
negative.
c With a sample size this small it is
c The downwards trend is clearly defined,
difficult to be certain, but it does look
so the correlation is strong.
like the amount of fertiliser does affect
the size of tomatoes produced.
d (14, 4) is an outlier.
8
6 a The larger the engine the more fuel it
a would be expected to consume, so there
is probably a correlation between these
two variables.

b As the engine size increases more fuel


would be burned per second, so fuel
economy would probably decrease.
The trend line is sloped downwards, so the
correlation is negative. c i As engine size increases fuel
economy clearly decreases, so yes.

b ii Car H is the only example where


engine size increases and fuel
economy increases rather than
decreases compared to the previous
car.
Car H does not support the general
trend.
The trend line is sloped upwards, so the
correlation is positive. 9

c a

There is no clear trend, so there is no b The correlation between V and d


correlation between x and y. is negative.

c Generally as d increases V decreases.

© Cambridge University Press & Assessment 2024 31


10 The graph needs better scales for each
axis, as the data involve only small
variations in amount of sleep and in
exam results.

With such a small sample size of


closely clustered results it is difficult to
justify the conclusion of a strong
correlation.

11
a i There is a weak negative correlation.

ii There is no correlation.

b The first plot supports the position the


government department wishes to put
forward, so the government would be more
likely to use the results of survey 1.

© Cambridge University Press & Assessment 2024 32


Solutions to Exercise 5J
1 4
a
a It is appropriate to fit a trend line
when the data appear to fit on or near
a straight line, the show a definite
linear trend.

b When fitting a trend line, balance the


b As x increases y increases, so there is
number of points evenly either side of a positive correlation between x and y.
the line, ignoring outliers when taking
distances into account. c
2
a

d Answers are approximate and will


differ slightly from student to student.
i 3.2
ii 1.2
b iii 1.5
iv 7.5

5
a

b negative correlation (↘)


d
di y ≈ 13.5 when x = 7.5

3
a Estimating the y value when x = 6 is
an example of interpolation.

b Estimating the y value when x = 10 is


an example of extrapolation.

© Cambridge University Press & Assessment 2024 33


ii y ≈ 23 when x = 0 b

iii x ≈ 9 when y = 12

ci 65 People in the park when it is 20°

iv x ≈ 7 when y =15

ii It will be approximately 13° when 25


people visit the park

6 Lines of best fit will vary, and so these


answers are approximate only. The lines of
best fit should have approximately equal
numbers of data points above and below
them and generally pass through the
middle of the data points.
a When x = 5, y ≈ 4.5
8
b When x = 5, y ≈ 6 a

c When x = 5, y ≈ 0.5

d When x = 5, y ≈ 50

7
a The number of people entering the
park increases with temperature
(positive correlation ↗)

© Cambridge University Press & Assessment 2024 34


c $600 profit will be made if 17 d Answers will vary slightly.
customers visit the book shop. i Rainfall ≈ 540 mm
ii Rainfall ≈ 720 mm

10
a

d In order to make $100 profit, 2 c


customers must visit the bookshop.

i In 2000, the record would be


approximately 130 m.
9
ii In 2015, the record would be
a
approximately 170 m.
d Eventually the records will plateau,
increases in distances will become
smaller.
11
b
a Experiment 1:

c Answers will vary slightly.


i Growth ≈ 25 cm
ii Growth ≈ 90 cm

© Cambridge University Press & Assessment 2024 35


Experiment 2:

b Answers will vary.


Experiment 1:

Experiment 2:

i Max. heart rate ≈ 140


ii Max. heart rate ≈ 133

c Answers will vary.


i Age ≈ 25
ii Age ≈ 20

d Experiment 2 estimates a lower


maximum heart rate for a person aged
22.
e Answer requires research; students'
responses will vary.

© Cambridge University Press & Assessment 2024 36


Solutions to Puzzles and Challenges
6  71  5  60 726 Median of 6
1   66 kg The average of the two centre values
11 11
3
2 5  80  400% in total 6
2
400  (4  78)  88%
3
3, 3,  6, 3  8
3 Mean = 5 2
(therefore must sum to 25)
Middle number = 5 3, 3, 9, 11
2, 2, 5, 6, 10
Range = 10 – 2 = 8 7 2 4 5 6 8 10 x
Q1 M Q3

4 IQR
a The mean will increase by 3. = Q3 – Q1
E.g. Consider the data set: = 10 – 4
5, 6, 7 =6
5  6  7 18
Mean =  6 Q1 – 1.5 × IQR
3 3 = 4 – 1.5 × 6
Add 3 to each: 8, 9, 10 = -5
8  9  10 27
 9 Q3 + 1.5 × IQR
3 3
= 10 + 1.5 × 6
b The median will increase by 3. = 19
E.g. 5, 6 , 7 x must be less than or equal to 19 so as
M
not to be considered an outlier.
Add 3 to each: 8, 9 , 10
M
8
c The range remains unchanged. a This is simply a horizontal shift in the
E.g. 5, 6, 7 dataset, which means the mean,
Range = 7 – 5 = 2 median and mode will all increase by
Add 3 to each: 8, 9, 10 10. You can imagine the dot plot,
Range = 10 – 8 = 2 histogram or stem and leaf plot
starting from 10 units further to the
5 Q1 = 2.6 and Q3 = 3.7 right than before. This has the effect of
IQR increases the entire dataset, including
= Q3 – Q1 the mean, median and mode by 10.
= 3.7 – 2.6
= 1.1 b The mean will increase by a factor of
10.
6 Range of 8 E.g. Consider the data set:
Difference between the max and min 5, 6, 7
is 8 5  6  7 18
Mean =  6
Mode of 3 3 3
Most common value is 3

© Cambridge University Press & Assessment 2024 37


Multiply each by 10: 50, 60, 70
50  60  70 180
  60
3 3

The median will increase by a factor of 10.


E.g. 5, 6 , 7
M

Multiply each by 10: 50, 60 , 70


M

The mode will increase by a factor of 10.

9 Hence, at the end of 2019, the total rainfall for the last ten years was 546 × 10 = 5460 mm
(2010 to 2019 inclusive). At the end of 2020, the total rainfall for the last ten years was 562 ×
10 = 5620 mm (2011 to 2020 inclusive). This represents a 5620 - 5460 = 160 mm increase.
The rainfall for years 2011 to 2019 inclusive are included in both averages. The only data
point that has changed between the ten year averages is that the rainfall in 2010 has
essentially been 'replaced' by the rainfall in 2020, i.e., 5460 mm includes the 2010 rainfall but
excludes the 2020 rainfall; 5620 mm includes the 2020 rainfall but excludes the 2010 rainfall.
In other words, the rainfall in 2020 is 160 mm larger than that of 2010. Hence, the rainfall in
2010 was 654 - 160 = 494 mm.

© Cambridge University Press & Assessment 2024 38


Solutions to short-answer questions
1 ii 2, 3, 4, 5, 6, 7, 8
Range = 8 – 2 = 6
a
iii Median = 5
2, 3, 4, 5 , 6, 7, 8
M

i 10  55  67  24  11  16
7
b 183

6
 30.5

ii 10, 11, 16, 24, 55, 67


Range = 67 – 10 = 57

iii Median = 20
c It is positively skewed. 10, 11, 16, | 24, 55, 67
(20)

2 c
a 15, 19, 20, 24, 28, 29, 32, 34, 37, 38, 38, i 1.7  1.2  1.4  1.6  2.4  1.3
42, 49, 50
6
Stem
Leaf 9.6

5 91 6
0 428 9  1.6
2 437 8 8
2 94
0 5 ii 1.2, 1.3, 1.4, 1.6, 1.7, 2.4
Range = 2.4 – 1.2 = 1.2
1|5 = 15
iii Median =
b The data is symmetrical with a centre at 1.2, 1.3, 1.4, | 1.6, 1.7, 2.4
30–39. (1.5)

3 4

a 40  41  37  32  48  43  32  76  29  33  26  38  87
a
13

2 7  48365
562
i 
13
7  43.2

35
 b 26, 29, 32, 32, 33, 37, 38 , 40, 41, 43,
7 M

5 48, 76, 87

© Cambridge University Press & Assessment 2024 39


c The mean is affected by the two older 7
people at the party (76 and 87)
a
5
i 2, 2, | 3, 3, 3 , 4, 5, | 6, 12
(2.5) M (5.5)
a 4, 5, 8 ,10, 10, 11 ,12, 14, 15 ,17, 21
Q1 Q2 Q3 Q1 = 2.5
Q1 = 8 Q3 = 5.5
Q2 = 11
ii IQR
Q3 = 15
= Q3 – Q1
= 5.5 – 2.5
b 2, 6, 6, | 10, 11, 12, | 14, 14, 15,
(8) (13) =3
|
(15.5)
16, 18, 23, iii Q1 – 1.5 × IQR
Q1 = 8 = 2.5 – 1.5 × 3
Q2 = 13 = –2
Q3 = 15.5
Q3 + 1.5 × IQR
6 = 5.5 + 1.5 × 3
= 10
a Order the dataset: 2, 2, 3, 5, 6, 6, | 7, 8, 9,
10, 10, 21. Hence, the range is 21 – 2 = 12 is an outlier
19.
Q1 iv
Q3
3+5 8 9+10 19
Q1 = 2
= 2 = 4, Q 3 = 2
= 2
=
9.5. Hence, IQR = Q 3 − Q1 = 9.5 − b
4 = 5.5
i 11, 12, 15 , 15, 17, 18 , 20, 21,
Q1 M
b Q_3+1.5×IQR=9.5+1.5×5.5=17.75; 21 is
above this, so 21 is an outlier. 24 , 27, 28
Q3
Q_1-1.5×IQR=4-1.5×5.5=-4.25; there
are no outliers below this. Q1 = 15
Q3 = 24
c Removing 21 makes the range 10 – 2 =
8, so the range is significantly reduced. ii IQR
Now, = Q3 – Q1
2, 2, 3, 5, 6, 6, 7, 8, 9, 10, 10 = 24 – 15
Q1 M Q3 =9
IQR = Q 3 − Q1 = 9 − 3 = 6, so the
IQR increases slightly, from 5.5 to 6 iii Q1 – 1.5 × IQR
(this is a much smaller effect than the = 15 – 1.5 × 9
effect the removal of the outlier had on = 1.5
the range). Q3 + 1.5 × IQR
= 24 + 1.5 × 9
= 37.5
There are no outliers for this data set.

© Cambridge University Press & Assessment 2024 40


iv IQR (B)
= Q3 – Q1
= 60 – 40
= 20

d TRUE
c
Q3 = 80
i 0.7, 1.9, 2.1 , 2.2, 2.3, | 2.4, 2.6, 9
(2.35)
4  5  7  9  10
Q1
a Mean 
2.6 , 2.8, 3.1 5
Q3
35

Q1 = 2.1 5
7
Q3 = 2.6

ii IQR (4  7) 2  (5  7) 2  (7  7) 2  (9  7) 2  (10  7) 2
= Q3 – Q1 
5
= 2.6 – 2.1 26

= 0.5 5
 2.3 (to1 decimal place)
iii Q1 – 1.5 × IQR
= 2.1 – 1.5 × 0.5 b
= 1.35
11 3  5  5  9
Mean 
Q3 + 1.5 × IQR 6
= 2.6 + 1.5 × 0.5 24

= 3.35 6
4
0.7 is an outlier as it is below the interval
(1.35 – 3.35)
(1  4) 2  (1  4) 2  (3  4) 2  (5  4) 2  (5  4) 2  (9  4) 2

iv 5
46

6
 2.8 (to1decimal place)
8 10
a FALSE a linear
range A = 100 – 20 = 80
range B = 90 – 10 = 80 b From 10 a.m. to 11 a.m. The increase is
4∘ C (from 22∘ C to 26∘ C)
b TRUE
median A = 50 11
median B = 50 a Negative (↘)

c TRUE b None
IQR (A)
= Q3 – Q1 c Positive (↗)
= 80 – 40
= 40

© Cambridge University Press & Assessment 2024 41


12

b Negative correlation (↘)

c Strong

d (3, 5)

13

b 50 m in length is approximately 1.8 m deep.

© Cambridge University Press & Assessment 2024 42


Solutions to multiple-choice questions
1 D 7 E
Answers will be a worded answer
where order is not important e.g.
basketball, cricket, football…

2 C
8 C
Stem Leaf
2 49
3 1178
4 246
5 04

9 D
3 B Weak negative (↘)

10 B
Stem Leaf
2 49
3 1178
4 246
5 04

4 A
2, 3, 4, 5, 6, 10
Range = 10 – 2 = 8
Mean =
2  4  3  5  10  6 30
 5
6 6

5 C
12 15 18 | 22 26 26
(20)

6 A

Q1 = 5 and Q3 = 13
IQR
= Q3 – Q1
= 13 – 5
=8

© Cambridge University Press & Assessment 2024 43


Solutions to extended-response questions
1 c

a 15, 21, 24, 32 , 36, 39, 50, | 51, i 7 customers yields $50 profit
Q1 (50.5)

57, 65, 73 , 73, 82, 86 ii 16 customers yields $105 profit


Q3
iii 27 customers yields $220 profit
Q1 = 32
Q3 = 73

IQR
= Q3 – Q1
= 73 – 32
= 41

b Q1 – 1.5 × IQR
= 32 – 1.5 × 41
= -29.5

Q3 + 1.5 × IQR
= 73 + 1.5 × 41
= 134.5

No outliers

b
i 10 customers yields $80 profit
ii 20 customers yields $150 profit
iii 30 customers yields $240 profit

© Cambridge University Press & Assessment 2024 44

You might also like