CHAPTER THREE
Measures of central tendency
3.1 Objectives of Measuring Central Tendency
A single value that describes the characteristics of the entire mass of data is called measures of
central tendency or average.
Objectives of measuring central tendency are:
To get a single value that represent(describe) characteristics of the entire data
To summarizing/reducing the volume of the data
To facilitating comparison within one group or between groups of data
To enable further statistical analysis
Desirable properties of measure of central tendency
We say a measure of central tendency is best if it possess most of the following. It should:
- be simple to understand and easy to calculate/interpret,
- exist and be unique,
- be rigidly defined by mathematical formula,
- based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.
3.3 Types of Measures of Central Tendency
Several types of averages or measures of central tendency can be defined, the most commons are
Mean
Median
Mode
Quantiles
3.3.1. The Mean
There are four types of means: Arithmetic mean, weighted arithmetic mean, Harmonic mean and
Geometric mean.
Arithmetic Mean( ̅ ) is defined as the sum of the measurements of the items divided by the total
number of items.
Let the variables X takes the value the arithmetic mean is given as
∑
Example 1: Find the mean score of 10 students in a midterm exam in a class if their scores are:
25, 27, 30, 23, 16, 27, 29, 14, 20, 28
∑
Solution: We have n =10 , ̅
Arithmetic Mean for Ungrouped Frequency Distribution
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
∑ k
̅
∑ f
i 1
i n
Example 2: Monthly incomes of fourth year regular students are given in the following
frequency distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.(Exercise)
Arithmetic Mean for Grouped Frequency Distribution
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k
f m i i f m f m ... f m
x i 1
1 1 2 2 k k
f ... f
k
f f 1 2 k
i
i 1
th
Where m i is he class mark of the i class; i = 1, 2, …, k
f i = the frequency of the i th class and k = the number of classes
k
Note that f i n = the total number of observations.
i 1
Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.
Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25
Number of laborers 3 4 5 6 6 4 3
Solutions:
Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25
Number of laborers(f) 3 4 5 6 6 4 3
Class mark(m) 12 14 16 18 20 22 24
36 56 80 108 120 88 72
k
k k f m i
f m f
i
i
= 560 i n =31 x i 1
k
f
i
i 1 i 1
i
i 1
Properties of the Arithmetic Mean
The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is ( xi x ) 0
i 1
The sum of the squares of the deviations of a set of observations from any number, say A, is
minimum when A= ̅ . That is, ∑( ̅) ∑( )
When a set of observations is divided into k groups and x1 is the mean of n1 observations of
group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations
of group k , then the combined mean ,denoted by xc , of all observations taken together is
given by
̅ ̅ ̅ ∑ ̅
̅
∑
If the mean of x1 , x 2 , ..., x n is x , then
a) the mean of x1 k , x 2 k , ..., x n k will be x k
b) The mean of kx1 , kx2 , ..., kxn will be kx .
Example 1: Last year there were three sections taking Stat 273 course in University. At the
end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.
Solution:
n1 x1 n2 x2 n3 x3 28(80) 32(83) 35(76) 7556
xc 79.54
n1 n2 n3 28 32 35 95
Merits of Arithmetic Mean
- Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand.
- It doesn‟t need arrangement of data in increasing or decreasing order.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.
Drawbacks of Arithmetic Mean
- It is highly affected by extreme (abnormal) values in the series.
- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of „3.6 children‟ per family.
- It can‟t be calculated for open-ended classes.
-
3.3.2 The Median
The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~
x . For ungrouped data the median is obtained by
x n 1 if the number of items, n, is odd
~ 2
x 1
( x n x n 2 ) if the number of items, n, is even
2 2 2
For grouped data the median, obtained by interpolation method, is given by
̃ ( )
Where Lmed lower class boundary of the median class
F Sum of frequencies of all class lower than the median class (in other words it is the
cumulative frequency immediately preceding the median class)
f med Frequency of the median class and W is class width
The median class is the class with the smallest cumulative frequency greater than or equal to n .
2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2,
6.4, 10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: the median is 10.5.
Examples 2: The following table gives the distribution of the weekly wages of employees of a small
firm.
Wages in birr No. of employees
126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
172 and above 2
a. Find the median weekly wage.
b. Why is the median a more suitable measure of central tendency than the mean in
this case?
Merits of median
- It is not influenced by extreme values.
- Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
- Median can be calculated even in case of open-ended intervals.
- It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
- It is not capable of further algebraic treatment.
- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.
3.3.3 The Mode
The mode or the modal value is the most frequently occurring score/observation in a series and
denoted by x̂ . Note that the mode may not exist in the series or, even if it does exist, it may not be
unique.
For grouped data, the mode is found by the following formula:
1
xˆ Lmod W
1 2
Where Lmod lower class boundary of the modal class
1 The difference between the frequency of the modal class and frequency of the class
immediately preceding the modal class
2 The difference between the frequency of the modal class and frequency of the class
Immediately follows the modal class
W is the class width
The modal class is the class with the highest frequency in the distribution.
Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students‟ marks.
Solution: mode is 70
Example 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30
children given below.
Weight 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3
No. of children 5 5 9 4 4 3
Solution: 2.7-2.3 is the modal class since it has the highest frequency
1 9 5 4 and 2 9 4 5 Lmod 2.7
4
xˆ 2.7 * 0.4 2.878
45
Merits of mode
- Mode is not affected by extreme values.
- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.
- It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.
Demerits of mode
- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency
3.3.4 Quantiles
Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.
I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 . The
first quartile is also called the lower quartile and the third quartile is the upper quartile. The second
quartile is the median.
For Ungrouped data:
Let Q j be the j th quartile value for j 1, 2, 3 . Then
th
j
Q j n 1 item; j 1, 2, 3.
4
Examples: find the quartiles, Q1 Q2 and Q3 of the following data 20, 30, 25,23, 22,32,36,18
Solution:
Arrange the data in ascending form, and n= 8
th
1
18, 20, 22, 23, 25, 30, 32, 36 Q1 8 1 item; .
4
Q1=2.25 item =2 item +0.25(3 item – 2nd item)
th nd rd
= 20+0.25(22-20)
=20+0.5=20.5
th
2
Q 2 8 1 item; .
4
Q2=4.5 item = 4th item +0.5(5th item -4th item)
th
= 23+ 0.5(25-23) = 24
Q3 = ?
For grouped data
We can apply the following formula:
j n 4 FQ j
Q j LQ j W ; j 1, 2, 3.
fQj
Where Q j the j th quartile we are going to calculate
LQ j Lower class boundary of the j th quartile class
FQ j Sum of frequencies of all classes lower than the j th quartile class
f Q j Frequency of the j th quartile class and W Class width
The j th quartile class is the class with the smallest cumulative frequency greater than or equal to
j n 4 .Example: Find the Q1 Q2 and Q3 of the following data
Class Class boundaries Frequency Cumulative frequency less than type
50-69 49.5-65.5 3 3
70-89 69.5-89.5 7 10
90-109 89.5-109.5 4 14
110-129 109.5-129.5 4 18
130-149 129.5-149.5 9 27
( )
Quartile class= N =27 so for Q1= = 6. 75 which is in the second class
L1= 69.5 fQ1=7 FQ1=3 W=20
Q1= L1+( ) Q1= 69.5 +( ) =80.2
Find q2 and q3 by this form
II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth decile
is the median.
For Ungrouped data
Let D j be the j th percentile value for j 1, 2, ... , 9 . Then
th
j
D j n 1 item; j 1, 2, ... , 9
10
Find the quartiles, D1 D6 and D8 of the following data 20, 30, 25,23, 22,32,36,18
Solution
Arrange the data in ascending form, and n= 8
18, 20, 22, 23, 25, 30, 32, 36
th
2
D 2 8 1 item; =1.8 th item st
item +0.8( nd
item- st
item) =18+0.8(20-18)=19.6
10
th
8
D8 8 1 item; = 7.2th item = 7th +0.2(8th -7th) = 32 +0.2(4) =32.8
10
D6 =?
For grouped data
We can apply the following formula:
j n10 FD j
D j LD j W ; j 1, 2, ... , 9
f Dj
Define the symbols similar way as we did in the case of quartiles.
The j th decile class is the class with the smallest cumulative frequency greater than or equal to j n 10
Example:
Find D5 and D7
Values Frequency Cf
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
Solutions:
• First find the less than cumulative frequency.
• Use the formula to calculate the required deciles
D5= determine the class containing the 5th decile
D5 = N= 493
190-200 is the class containing the fifth decile
LD5 =189.5 , w =10 N=493, FD =244, f=107
D5 = 189.5 + ( ) = 189.7
D7= determine the class containing the 7th decile
D7 =
190-200 is the class containing the fifth decile
LD7 =189.5, w =10 N=493, FD =244, f=107
D7= 189.5 + ( ) = 198.94 = 199
Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.
For ungrouped data
Let Pj be the percentile value for j 1, 2, 3, ... , 99 . Then
th
j
Pj n 1 item; j 1, 2, 3, ..., 99
100
Find the quartiles, P8 P50 and P8 of the following data 20, 30, 25,23,22,32,32,36,18
Solution
Arrange the data in ascending form, and n= 8
18, 20, 22, 23, 25, 30, 32, 36
th
8
P8 8 1 item;
100
=0.72th item = 1st = 18
P50 = 24 P85= 35
For grouped data
We can use the following formula:
j n100 FPj
Pj LPj W ; j 1, 2, 3, ... , 99
f Pj
Define the symbols similar way as we did in the case of quartiles.
The j th percentile class is the class with the smallest cumulative frequency greater than or equal to
j n 100 .
Example:
Find P90 and P99
Values Frequency Cf
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
P90= determine the class containing the 90th Percentile
D90 =
220- 230 is the class containing the fifth decile
LP90 =219.5, w =10 N=493, FD =434, f=31
P90= 219.5 + ( ) = 222.629 = 223
P99= ?
Interpretations
1. Q j is the value below which ( j 25) percent of the observations in the series are found (where
j 1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the given
series are found.
2. D j Is the value below which ( j 10) percent of the observations in the series are found (where
j 1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found in the
series.
3. Pj is the value below which j percent of the total observations are found (where j 1, 2, 3, ... , 99 ).
For example 73 percent of the observations in a given series are below P73 .
Exercise: The following table presents the male population of a certain region in Ethiopia.
Find a) all quartiles
b) The 9 th and 5 th decile and
c) 65 th and 75 th percentiles
Age groups (in years) 0 – 5 5 – 10 10 – 15 15 – 20 20 – 25 25 – 30 30 – 35 35 - 40
Male population 2580 3737 4620 5200 7250 620 297 355
Measures of Dispersion (Variation) and Shape
Objectives of Measuring Variation
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution
The average or central value is of little use unless the degree of variation, which occurs about it,
is given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average. Measures of variation are statistical measures, which
provide ways of measuring the extent to which the data are dispersed or spread out.
Measures of variation are needed for the following basic objectives.
To judge the reliability of a measure of central tendency
To compare two or more sets of data with regard to their variability
To control variability itself like in quality control, body temperature, etc
To make further statistical analysis or to facilitate the use of other statistical measures.
Properties of a good measure of dispersion
A good measure of dispersion should:
- be rigidly defined by a mathematical formula,
- be simple to understand and easy to calculate,
- be unique,
- calculated based on all observations in the series,
- not be affected by some extreme values existing in the series,
- have sampling stability property, and
- be capable of further algebraic treatment as well as further statistical analysis.
4.2 Absolute and Relative Measures of Dispersion
Measures of dispersion/variation may be either absolute or relative. Absolute measures of
dispersion are expressed in the same unit of measurement in which the original data are given.
These values may be used to compare the variation in two distributions provided that the
variables are in the same units and of the same average size.
In case the two sets of data are expressed in different units, however, such as quintals of sugar
versus tones of sugarcane or if the average sizes are very different such as manager‟s salary
versus worker‟s salary, the absolute measures of dispersion are not comparable. In such cases
measures of relative dispersion should be used.
A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate
measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). It
should be noted that while computing the relative dispersion, the average (the measure of central
tendency) used as a base should be the same one from which the absolute deviations were
measured. Note also that the value of a relative dispersion is unit less quantity.
Types of Measures of Dispersion
The Range and Relative Range
Range (R) is defined as the difference between the largest and the smallest observation in a given
set of data. That is, R x max x min where xmax and xmin are the largest and the smallest
observations in the series respectively.
In case grouped data, range is found by taking the difference between the class mark of the last
class and that of the first class. That is, R M last M first where M last and M first are the class
marks of the last class and that of the first class respectively.
A relative range (RR), also known as coefficient of range, is given by
x max x min R
RR ........ for ungrouped data
x max x min x max x min
M last M first R
RR ......... for grouped data
M last M first M last M first
Properties of Range and Relative Range
- Range and relative range are easy to calculate and simple to understand.
- Both cannot be computed for grouped data with open ended classes.
- They do not tell us anything about the distribution of values in the series.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
x max 624 birr x min 462 birr
R x max x min 624 birr 462 birr 162 birr
x max x min 624 birr 462 birr 162 birr
RR 0.149
x max x min 624 birr 462 birr 1086 birr
Example 2: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
Maximum load Number
of cables
(in kilo-Newton)
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
M first 95 kN M last 130 kN
R M last M first 130 kN 95 kN 35 kN
M last M first 130 kN 95 kN 35 kN
RR 0.156
M last M first 130 kN 95 kN 225 kN
3.2 The Mean Deviation and Coefficient of Mean Deviation
The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.
The mean deviation of a sample of n observations x1 , x 2 , ... , x n is given as
MD
x i A
Where A is a central measure (the mean or the median)
n
In case of grouped data, the formula for MD becomes
MD
f i mi A
Where mi is the class mark of the i th class, f i is the frequency of the
n
i th class and n f i .
The mean deviation about the arithmetic mean is, therefore, given by
MD
xi x .... for ungrouped data
n
MD
f i mi x
.... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n f i
The mean deviation about the median is also given by
MD
xi ~
x
.... for ungrouped data
n
MD
f i mi x
.... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n f i .
The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median. In general,
MD
CMD where A is a measure of central tendency: the arithmetic mean or the median. That
A
MD
is, CMD about the arithmetic mean is given by CMD where MD is the mean deviation
x
calculated about the arithmetic mean. On the other hand CMD about the median is given by
MD
CMD ~ in which case MD is calculated about the median of the observations.
x
Properties of Mean Deviation and coefficient of mean deviation
- It is easy to understand and compute
- It is based on all observations
- It is not affected very much by the values of extreme value(s).
- It is not capable of further mathematical treatments and it is not a very accurate measure
of dispersion.
.3 The Variance, the Standard Deviation and Coefficient of Variation
The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean.
Population Variance ( 2 )
For ungrouped data
x 1
2
xi 2 Where is the population arithmetic mean
... xi N
2 i 2
N N
and N is the total number of observations in the population.
For grouped data
f m fi mi
2 2
1
2
i
N
i
N
fi mi N Where is the population arithmetic
2
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and N f i .
Sample Variance ( S 2 )
For ungrouped data
x x
2
1 xi 2 Where x is the sample arithmetic mean
... xi n
2 i 2
S
n 1 n 1
and n is the total number of observations in the sample.
For grouped data
f m x fi mi
2 2
1
S 2
i
n 1
i
...
n 1
fi mi n Where x is the sample arithmetic
2
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and n f i .
The Standard Deviation
Standard deviation is the positive square root of the variance.
Population Standard Deviation ( )
2 where 2 is the population variance.
Sample Standard Deviation ( S )
S S 2 where S 2 is the sample standard deviation.
Example 1: compute the variance for the following data
value 3 6 9 12 15 total
frequency 1 4 10 3 2 20
f ix i
3 24 90 36 30 183
x i
-6.15 -3.15 -0.15 2.85 5.85
( xi )
2
37.8225 9.9225 0.0225 8.1225 34.2225
f ( xi )
2
i
37.8225 39.69 0.225 24.3675 68.445 170.55
x
f xi i
183 5
9.15, where n f i 20
n 20 i 1
f x x
2
170.55
And S 2
i i
8.976
n 1 19
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
S
CV 100 . Where S is the standard deviation of the observations.
x
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.
Department Biology Chemistry
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments‟ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
S S
CV 100 CV 100
x x
23 11
100 29.11% 100 17.19%
79 64
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students‟ scores compared with that of Chemistry students.
Example: The following table illustrates the frequency distribution of masses of 100 male
students in Gander University.
Mass (kg) 60-62 63-65 66-68 69-71 72-74
No. of students 5 18 42 27 8
Find: a) the variance b) the standard deviation c) the coefficient of variation
d) Calculate mean deviation?
Solution:
Mass (kg) 60-62 63-65 66-68 69-71 72-74 total
No. of students(fi) 5 18 42 27 8 100
class mark(mi) 61 64 67 70 73
fi mi 305 1152 2814 1890 584 6745
fi mi2 18605 73728 188538 132300 42632 455803
mi x 6.45 3.45 0.45 2.55 5.55
fi mi x 32.25 62.1 18.9 68.85 44.4 226.5
5 5
m f
5
455803 ,
2
fi mi 6745 , i 1 i
n i 100
i 1 i 1
fm i i
6745
and x i 1
67.45
n 100
2 2
1 ( f i m i ) 1 (6745) ) 8.61
(i 1
5
) (455803
2 2
a) S n 1
f m
i i
n 99 100
b) S 8.61 2.93
2
S
S 2.93
c) CV *100 *100 4.344
x 67.45
d) MD
fi mi x 226.5 2.265
n 100
Properties of the Variance and the Standard Deviation
Variance
– It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
– Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
– It is calculated based on all the observations/data in the series.
– It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation
– It is considered to be the best measure of dispersion.
– [Demerits] If the values of two series have different unit of measurement, then we can not
compare their variability just by comparing the values of their respective standard deviations.
– It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
– Standard deviation is as such neither easy to calculate nor to understand.
– Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.