MANCOSA
Bachelor of Commerce in Information and Technology Management
ANALYTICAL TECHNIQUES
UNIT 4: MANAGEMENT STATISTICS
Study Notes | Compiled from Dr H. Matsongoni's Materials & Module Guide
Unit Learning Outcomes
By the end of this unit you should be able to:
• Understand, calculate and interpret the measures of central location (mean, median,
mode) for both ungrouped and grouped data.
• Understand the concept of skewness and interpret the reliability of a measure of
central tendency.
• Understand and interpret the measures of dispersion (range, inter-quartile range,
quartile deviation, variance, standard deviation, coefficient of variation) for sets of
data.
Overview
The behaviour pattern of any random variable can be fully described by two categories of
statistics:
Measures of Central Location Measures of Dispersion
Describe where the 'centre' or 'typical value' of Describe how spread out or scattered the data is
the data lies. around that centre.
Mean, Median, Mode Range, IQR, Quartile Deviation, Variance,
Standard Deviation, CV
1. Measures of Central Location / Tendency (Section 4.1)
Observations of a random variable tend to group about some central value. The statistical
measures that quantify where the majority of observations are concentrated are called
measures of central location (or central tendency).
A central location statistic represents a typical or representative value in a data set and is
useful for comparing data sets.
1.1 Arithmetic Mean
The arithmetic mean (average) is the sum of all observations divided by the number of
observations. It is the most commonly used measure of central tendency.
Formula: Ungrouped (Raw) Data
Sample Mean x̄ = Σxᵢ / n
where n = number of observations; xᵢ = value of the ith observation
📌 Worked Example — Ungrouped Data (13 values)
Data: 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
Sum (Σxᵢ) = 500
n = 13
x̄ = 500 / 13 = 38.46
Formula: Grouped Data
For grouped data, the midpoint (x) of each interval is used as a representative value for that
class:
Grouped Mean x̄ = Σ(f · x) / n
where f = frequency of each interval; x = midpoint of each interval; n = total
observations
The midpoint of each interval is found as: midpoint = (lower limit + upper limit) / 2
📌 Grouped Mean — Student Marks (n = 100)
Each interval's midpoint is multiplied by its frequency, then all products are summed.
Σ(f · x) = 1×5 + 2×15 + 5×25 + 6×35 + 29×45 + 34×55 + 12×65 + 7×75 + 3×85 + 1×95 = 5 200
x̄ = 5 200 / 100 = 52
Compare to raw data mean of 51.34 — the grouped method gives a close approximation.
From To Frequency (f) Midpoint (x) f×x
0 < 10 1 5 5
10 < 20 2 15 30
20 < 30 5 25 125
30 < 40 6 35 210
40 < 50 29 45 1 305
50 < 60 34 55 1 870
60 < 70 12 65 780
70 < 80 7 75 525
80 < 90 3 85 255
90 < 100 1 95 95
TOTAL 100 5 200
1.2 Mode
The mode is the most frequently occurring value in a data set. For ungrouped data, it is simply
the value that appears most often.
📌 Worked Example — Ungrouped Data
Data: 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
The value 39 appears three times — more than any other value.
Mode = 39
Formula: Grouped Data
Grouped Mode Mo = Omo + c(fm − fm-1) / (2fm − fm-1 − fm+1)
Omo = lower limit of modal class; c = class width; fm = frequency of modal
class; fm-1 = frequency of class BEFORE modal class; fm+1 = frequency of
class AFTER modal class
📌 Worked Example — Student Marks
Modal class = 50 < 60 (highest frequency = 34)
Omo = 50, c = 10, fm = 34, fm-1 = 29, fm+1 = 12
Mo = 50 + 10(34 − 29) / (2×34 − 29 − 12)
= 50 + 10(5) / (68 − 41)
= 50 + 50 / 27
= 50 + 1.85
= 51.85
1.3 Median
The median is the value that divides an ordered data set into two equal halves. Exactly 50%
of observations fall below the median and 50% above it. It is also called the second quartile
(Q2) or the 50th percentile.
Ungrouped Data
Condition Rule
n is ODD Sort ascending. Median = the [(n+1)/2]th value.
n is EVEN Sort ascending. Median = average of the (n/2)th and ((n/2)+1)th values.
📌 Worked Example — Ungrouped (n = 13, odd)
Data (sorted): 32, 35, 36, 37, 38, 38, 39, 39, 39, 40, 40, 42, 45
Position of median = (13+1)/2 = 7th value
Median = 39
Formula: Grouped Data
Grouped Median Me = Ome + c(n/2 − f(<)) / fme
Ome = lower limit of median class; c = class width; n = total observations; f(<) =
cumulative frequency of all classes BEFORE the median class; fme = frequency
of the median class
The median class is found by locating which interval contains the (n/2)th observation using
the cumulative frequency column.
📌 Worked Example — Student Marks (n = 100)
n/2 = 50th observation
Cumulative frequency reaches 43 at the end of interval 40 < 50,
and reaches 77 at the end of 50 < 60. So the 50th observation lies in 50 < 60.
Ome = 50, c = 10, n/2 = 50, f(<) = 43, fme = 34
Me = 50 + 10(50 − 43) / 34
= 50 + 10(7) / 34
= 50 + 70 / 34
= 50 + 2.06
= 56.18 ≈ 56.2
1.4 Skewness and the Relationship Between Mean, Median and Mode
Skewness refers to the asymmetry of a frequency distribution. The relationship between the
mean, median, and mode tells you both the shape of the distribution and which measure of
central tendency is most reliable.
Distribution Shape Relationship Best Measure
Type
Symmetrical Bell-shaped, equal on both sides. Mean = Median = Any of the three.
Mode
Positively Tail extends to the right. Caused Mode < Median < Median — not
Skewed (Right) by a few very high values (e.g. Mean pulled by
very hard test). extremes.
Negatively Tail extends to the left. Caused Mean < Median < Median — not
Skewed (Left) by a few very low values (e.g. Mode pulled by
very easy test). extremes.
Bimodal Two peaks — two most common Mean = Median, but Context-
values. two separate Modes. dependent.
📌 Memory Aid — Skewness
The MEAN is pulled in the direction of the TAIL (the extreme scores).
Positive skew → long right tail → mean is HIGHEST
Negative skew → long left tail → mean is LOWEST
When data is skewed, the MEDIAN is the most reliable measure of central location because:
• It is not distorted by extreme values (unlike the mean), and
• It is not overly influenced by the frequency of a single value (unlike the mode).
Other Measures of Central Location (Less Common)
• Geometric mean — used for percentage changes or growth rates (e.g. investment
returns over time).
• Harmonic mean — used when a data set represents rates of change.
• Weighted arithmetic mean — used when observations have different levels of
importance (different weights).
2. Measures of Dispersion / Variability (Section 4.2)
Measures of dispersion describe how spread out or scattered observations are around the
central value. They tell us how reliable the central tendency measure is:
• Widely dispersed observations → low reliability → central value is not very
representative.
• Tightly clustered observations → high reliability → central value is highly
representative.
Measure What it Measures Key Strength / Weakness
Range Spread from minimum to Quick & easy, but distorted by outliers.
maximum.
Inter-Quartile Range Spread of the middle 50% of Robust to outliers, but ignores 50% of
(IQR) data. data.
Quartile Deviation Half the IQR — spread around Useful with outliers; ignores extremes.
median.
Variance Average squared deviation from Uses ALL data, but expressed in
the mean. squared units.
Standard Deviation Square root of variance — in Most widely used; uses ALL data; in
original units. original units.
Coefficient of Variation Standard deviation as % of Allows comparison across different
(CV) mean. variables/units.
2.1 Range
Range (Ungrouped) Range = Maximum value − Minimum value
Range (Grouped) Range = Upper limit (highest class) − Lower
limit (lowest class)
📌 Student Marks Example
Ungrouped: Range = 90 − 8 = 82
Grouped: Range = 100 − 0 = 100 (less accurate — classes cover 0 to 100)
NOTE: The grouped range overestimates here. For larger datasets, grouped range is usually a
closer estimate.
📌 Weakness of the Range
The range uses only TWO values (min and max) — it tells you nothing about how data is clustered in
between.
It is highly sensitive to OUTLIERS. Consider these two distributions, both with range = 13:
Distribution 1: 32, 35, 36, 36, 37, 38, 40, 42, 42, 43, 43, 45 (spread evenly)
Distribution 2: 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 45 (tightly clustered with one outlier)
The range alone would suggest they are equally variable — but they are clearly not.
2.2 Inter-Quartile Range (IQR)
The IQR excludes outliers by considering only the middle 50% of observations — the
difference between the upper quartile (Q3) and lower quartile (Q1).
IQR IQR = Q3 − Q1 = 75th percentile − 25th
percentile
📌 Student Marks Example (from Ogive)
75th percentile (Q3) = 58
25th percentile (Q1) = 45
IQR = 58 − 45 = 13
This means the middle 50% of marks lie within a range of 13 marks.
📌 Limitation of IQR
Like the range, the IQR uses only TWO values (Q1 and Q3).
It excludes the top 25% and bottom 25% of observations from the analysis.
It provides no information about the clustering of values between Q1 and Q3.
2.3 Quartile Deviation (Q.D.)
Quartile Deviation Q.D. = (Q3 − Q1) / 2 = IQR / 2
📌 Student Marks Example
IQR = 13, so Q.D. = 13 / 2 = 6.5
Interpretation (using median ≈ 56.18):
50% of all observations are expected to lie within 6.5 marks of the median:
→ Between 56.18 − 6.5 = 49.68 and 56.18 + 6.5 = 62.68
More precisely:
→ 25% of marks lie within 6.5 marks BELOW the median (49.68 to 56.18)
→ 25% of marks lie within 6.5 marks ABOVE the median (56.18 to 62.68)
2.4 Variance
The variance is the most widely used measure of dispersion because it takes EVERY
observation into account. It measures the average squared deviation of each observation from
the mean.
Formula: Ungrouped Data
Sample Variance Sx² = Σ(xᵢ − x̄)² / (n − 1)
OR equivalently: Sx² = (Σxᵢ² − n·x̄²) / (n − 1)
📌 Why divide by (n − 1) instead of n?
This is called the 'degrees of freedom' correction.
Once the mean and (n−1) values are known, the last value is determined — it has no freedom to
vary.
Dividing by (n−1) produces an unbiased estimate of the population variance from a sample.
Step-by-Step Calculation (Ungrouped)
Step Action Example: Car Ages {13, 7, 10, 15, 12, 18, 9}
1 Find the mean. x̄ = (13+7+10+15+12+18+9)/7 = 84/7 = 12
2 Subtract mean from each value → deviation. 1, −5, −2, 3, 0, 6, −3
3 Square each deviation. 1, 25, 4, 9, 0, 36, 9
4 Sum the squared deviations. Σ(xᵢ − x̄)² = 84
5 Divide by (n−1). Variance = 84 / (7−1) = 84 / 6 = 14 years²
6 Take the square root → standard deviation. Sx = √14 = 3.74 years
📌 Check: Sum of Deviations Must Always Equal Zero
1 + (−5) + (−2) + 3 + 0 + 6 + (−3) = 0 ✓
If the deviations do not sum to zero, there is an arithmetic error in your mean calculation.
Car Age (xᵢ) Mean (x̄) Deviation (xᵢ − x̄) Squared Dev. (xᵢ − x̄)² xᵢ² (alt. formula)
13 12 1 1 169
7 12 -5 25 49
10 12 -2 4 100
15 12 3 9 225
12 12 0 0 144
18 12 6 36 324
9 12 -3 9 81
Total = 84 Sum = 0 ✓ Σ = 84 Σ = 1 092
Alternative formula verification: Variance = (Σxᵢ² − n·x̄²) / (n−1) = (1092 − 7×144) / 6 = (1092
− 1008) / 6 = 84 / 6 = 14 ✓
Formula: Grouped Data
Grouped Variance Sx² = (Σ fᵢxᵢ² − n·x̄²) / (n − 1)
fᵢ = frequency of interval; xᵢ = midpoint of interval; n = total observations
📌 Student Marks Example (Grouped)
Σfᵢxᵢ² = 287 025.2 (sum of each frequency × midpoint squared)
n = 100, x̄ = 52
Sx² = (287 025.2 − 100 × 52²) / 99
= (287 025.2 − 270 400) / 99
= 16 625.2 / 99
= 167.93 (note: raw data variance = 210.10 — grouped approximation varies slightly)
2.5 Standard Deviation
The standard deviation is the square root of the variance. It expresses the average spread of
observations around the mean in the ORIGINAL units of the random variable — making it far
more interpretable than the variance.
Standard Deviation Sx = √Sx² = √Variance
📌 Student Marks Example
Ungrouped: Sx = √210.1 = 14.49 marks
Grouped: Sx = √220.2 = 14.84 marks
Interpretation: On average, student marks deviate from the mean by approximately 14.5 marks.
📌 Variance vs. Standard Deviation — Which to Use?
VARIANCE is expressed in SQUARED units (e.g. marks², years², rands²) — hard to interpret
intuitively.
STANDARD DEVIATION is in ORIGINAL units (e.g. marks, years, rands) — the preferred reporting
measure.
Example: Heights measured in cm → variance in cm² → standard deviation in cm (meaningful).
The standard deviation is also more stable across samples from the same population.
2.6 Coefficient of Variation (CV)
The coefficient of variation expresses the standard deviation as a percentage of the mean. It
allows comparison of variability between data sets that are measured in different units or have
very different means.
Coefficient of Variation CV = (Sx / x̄) × 100%
CV is SMALL (close to 0%) CV is LARGE
Low variability — observations are tightly clustered High variability — observations are widely spread
around the mean. Central value is highly reliable. around the mean. Central value is less reliable.
📌 Student Marks Example
Ungrouped: Sx = 14.49, x̄ = 51.34
CV = (14.49 / 51.34) × 100 = 28.2%
Interpretation: Although the range of marks is large (8 to 90),
a CV of 28.2% indicates the marks are relatively tightly clustered around the mean.
This is consistent with the large concentration of marks in the 40–60 range seen on the histogram.
3. Summary: Measures of Central Location
Measure Best Used When Affected by Student Marks
Outliers? Result
Mean Data is symmetric, no extreme outliers. YES — pulled x̄ = 51.34 (raw) / 52
toward extremes. (grouped)
Mode Identifying the most common value; useful NO — based on Mo = 51.85
for categorical data. frequency only. (grouped)
Median Data is skewed or has outliers. Best single NO — based on Me = 56.18
'representative' for skewed data. position only. (grouped)
4. Summary: Measures of Dispersion
Measure Formula Uses All Data? Student Marks Result
Range Max − Min NO (2 values) 82 (ungrouped) / 100 (grouped)
IQR Q3 − Q1 NO (2 values) 58 − 45 = 13
Quartile Deviation IQR / 2 NO 13 / 2 = 6.5
Variance Σ(xᵢ−x̄)² / (n−1) YES 210.1 (ungrouped)
Std Deviation √Variance YES 14.49 marks
CV (Sx / x̄) × 100% YES 28.2%
5. Comprehensive Worked Example — Faulty ATMs at
Standard Bank SA
For 20 consecutive days, the number of faulty ATMs reported across South Africa were
recorded. The values are:
38 29 33 41 56 35 19 27
32 52 24 45 34 51 28 17
46 31 44 22
Step 1 — Range
Maximum = 56, Minimum = 17
Range = 56 − 17 = 39 faulty ATMs
Step 2 — Grouped Frequency Distribution (Class width = 10, lowest limit = 10)
Interval Freq. (f) Midpoint (x) f×x f × x² Cum. f
10 – 19 2 14.5 29 420.5 2
20 – 29 5 24.5 122.5 3 001.25 7
30 – 39 6 34.5 207 7 141.5 13
40 – 49 4 44.5 178 7 921 17
50 – 59 3 54.5 163.5 8 910.75 20
TOTAL 20 700 27 395
Step 3 — Mean, Median, Mode
(i) Mean:
x̄ = Σ(f·x) / n = 700 / 20 = 35 faulty ATMs
(ii) Median:
n/2 = 10th observation. Cumulative frequency reaches 7 at end of 20–29, and 13 at end of
30–39.
So the 10th observation lies in the interval 30–39.
Me = 30 + 10(10 − 7) / 6 = 30 + 30/6 = 30 + 5 = 35 (or 34.5 with c = 9)
(iii) Mode:
Modal class = 30–39 (highest frequency = 6). fm-1 = 5, fm+1 = 4
Mo = 30 + 10(6 − 5) / (12 − 5 − 4) = 30 + 10/3 = 33.3 ≈ 33 faulty ATMs
Step 4 — Variance and Standard Deviation
Variance = (Σfx² − n·x̄²) / (n−1)
= (27 395 − 20 × 35²) / 19
= (27 395 − 24 500) / 19
= 2 895 / 19
= 152.37
Standard Deviation = √152.37 = 12.34 faulty ATMs
📌 Interpretation
The standard deviation of 12.34 is relatively small compared to the mean of 35.
CV = (12.34 / 35) × 100 = 35.3%
This indicates moderate variability — the number of faulty ATMs fluctuates but shows some
consistency.
Data type: DISCRETE — the number of faulty ATMs must be a whole number (you cannot have 2.5
faulty ATMs).
6. Self-Test Questions
Use these questions to check your understanding of Unit 4:
1. Define the three main measures of central tendency. When is each most appropriate
to use?
2. Calculate the mean, median, and mode for the following ungrouped data: 15, 20, 20,
25, 30, 30, 30, 35, 40, 50
3. Using the student marks grouped frequency table, verify that the grouped mean is
52. Show all working.
4. What is skewness? Describe the relationship between mean, median, and mode in:
(a) a positively skewed distribution, (b) a negatively skewed distribution.
5. Why is the median often considered the best measure of central location when data
is skewed?
6. Explain the difference between the range and the inter-quartile range. What is the
main weakness of both?
7. For a data set with Q1 = 40 and Q3 = 68, calculate: (a) IQR, (b) Quartile Deviation.
Interpret the quartile deviation in context.
8. Why is the variance divided by (n−1) rather than n? What are 'degrees of freedom'?
9. A company tracks daily production output (units) over 8 days: 120, 135, 128, 142,
119, 155, 130, 127. Calculate: (a) mean, (b) variance, (c) standard deviation. Show
all steps including the deviation table.
10. Two factories produce the same component. Factory A has mean output = 500 units
with Sx = 30. Factory B has mean output = 200 units with Sx = 20. Which factory
shows greater relative variability? Use the CV to justify your answer.
11. Returning to the ATM example: if the standard deviation is 12.34 and the mean is 35,
calculate the CV and interpret it. Is the number of faulty ATMs consistent across the
20 days?
12. List all six measures of dispersion discussed in this unit and state one advantage and
one disadvantage of each.
Prescribed Reading
• Anderson, D., et al., (2020), Statistics For Business And Economics, 6th Edition,
Cengage Learning, South Africa.
Recommended Reading
• Keller, G., and Gaciu, N., (2020), Statistics For Management And Economics, 2nd
Edition, Cengage Learning, South Africa.
End of Unit 4 Study Notes