Properties and Uses of Central Tendency
The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken
from the same population and all three measures are computed for these
samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data
values.
5. The mean cannot be computed for the data in a frequency distribution
that has an open-ended class.
6. The mean is affected by extremely high or low values, called outliers, and
may not be the appropriate average to use in these situations.
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data
values fall into the upper half or lower half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or
extremely low values.
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal or categorical, such as
religious preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode,
or the mode may not exist for a data set.
1
CHAPTER 4
MEASURES OF VARIATION (DISPERSION)
The scatter or spread of the items of a distribution is known as a dispersion or
variation. It is the degree to which numerical data tend to spread about an
average value. Measure of central tendency is not good enough to describe the
nature of data it is better to use both central and dispersion.
Purposes of Measures of Dispersion
To have an idea about the reliability of the measure of central tendency.
If the degree of scatterdness is large, an average is less reliable. If the value
of the dispersion is small, it indicates that a central value is a good
representative of all the values in the data set.
To compare two or more sets of data with regard to their variability. Two
or more data sets can be compared by calculating the same measure of
dispersion having the same unit of measurement. A set with smaller value
possess less variability or is more uniform (or more consistent).
To provide information about the structure the data. A value of a measure
of dispersion gives an idea about the spread of the observations. Further, one
can surmise about the limits of the expansion of the values in the data set.
To pave way to the use of other statistical measures. Measures of
dispersion, especially variance and standard deviation, lead to many statistical
techniques like correlation, regression, analysis of variance.
Example 4.1: consider the following summary of marks of five students from
three section out of 10%.
section Mark mean variation
1 7,7,7,7,7 7 0
2 8,8,7,6,10 7 2.2
3 9,7,6,8,5 7 2.5
2
As shown above the mean is the same for all section but the degree of
uniformity or consistency of mark values is different as explained by measure
of dispersion.
Types of measures of variation
Absolute measures of variation
Relative measures of variation
Absolute measures of variation: A measure of variation is said to be an
absolute form when it shows the actual amount of variation of an item from a
measure of central tendency and are expressed in concrete units in which the
data have been expressed.
Relative measure of variation: It is the quotient obtained by dividing the
absolute measure by a quantity in respect to which absolute deviation has been
computed. Relative measure of variation is a pure number and used for making
comparisons between different distributions.
Absolute Measures Relative Measures
Range Relative Range
Quartile Deviation Coefficient of Quartile Deviation
Mean Deviation Coefficient of Mean Deviation
Variance and Standard Deviation Coefficient of Variation
Standard Scores
Range: the range is the largest score minus the smallest score. It is a quick and
the simplest measure of variability.
Ungrouped Data: R=Maximum value – Minimum value
Grouped Data: R=UCLlast class – LCLfirst class
Relative Range (RR): It is also sometimes called coefficient of range.
LS
Ungrouped data: RR=
LS
UCLlast LCL first
Grouped data: RR=
UCLlast LCL first
Example 4.2: consider the following distribution
23,14,,13,45,14,1,54,45,46,23,24,48,15,17,18,20,41
a) Find the range
b) Find the coefficient of range
3
Exercise 4.1: the range and the relative range of data values is 6 and 0.2
respectively find the maximum and the minimum data values
Quartile Deviation (Q.D): Sometimes known as Semi-interquartile Range (SIR)
The inter quartile range is the difference between the third and the first
quartiles of a set of items and semi-inter quartile range is half of the inter
quartile range.
Q3 Q1
Q.D=
2
It gives the average amount by which the two quartiles differ from the median.
QD involves only the middle 50% of the observations by excluding the
observations below the lower quartile and the observations above the upper
quartile.
Coefficient of Quartile Deviation (C.Q.D): It gives the proportion of the
average amount by which the two quartiles differ from the median.
Q3 Q1
C.Q.D
Q3 Q1
Example 4.3: given, Q = 174.90, Q = 190.23, Q =203.83 find Q.D and C.Q.D
1 2 3
203.83 174.9
Q.D=
2
Q.D=14.47
Interpretation: The average distance that the lower and the upper quartiles
differ from the median is equal to 14.47 or the lower quartile is less by 14.47
and the upper quartile exceed by 14.47 on average from the median.
2Q.D
C.Q.D
Q3 Q1
2 * 14.47
C.Q.D
203.83 174.9
C.Q.D =0.076
4
Interpretation: The proportion of the average distance by which the two
quartiles differ from the median is 0.076 or average distance that the lower
and the upper quartiles differ from the median is 7.6% of the median itself.
Mean Deviation (M.D): it is the arithmetic mean of the absolute deviation of
each observation from the mean, median and mode. Depending up on the type of
averages used we have different mean deviations.
a. Mean deviation about the mean (M.D( X ))
~
b. Mean deviation about the median (M.D( X ))
c. Mean deviation about the mode (M.D( X̂ ))
a. Mean deviation about the mean (M.D( X )):
Ungrouped Data: M.D( X )=
| X X |
n
Grouped Data: M,D( X ) =
f |XX|
f
~
b. Mean deviation about the median (M.D( X ))
~
~ | X X |
Ungrouped Data: M.D( X )=
n
~
~
Grouped Data: M.D( X ) =
f |X X |
f
c. Mean deviation about the mode (M.D( X̂ ))
Ungrouped Data: M.D( X̂ ) =
| X Xˆ |
n
Grouped Data: M.D( X̂ ) =
f | X Xˆ |
f
Examples 4.4: The following are the number of visit made by ten mothers to
the Local doctor’s surgery 8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.
5
Solutions:
First calculate the three averages
X =6
~
X =5.5
X̂ =5
Xi 4 4 5 5 5 6 7 7 8 9 Total
|Xi – 6| 2 2 1 1 1 0 1 1 2 3 14
|Xi - 5.5| 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
|Xi - 5| 1 1 0 0 0 1 2 2 3 4 14
M.D( X )=
| X 6 | 14 1.4
n 10
~ | X 5.5 | 14
M.D( X )= 1.4
n 10
M.D( X̂ ) =
| X 5 | 14 1.4
n 10
Coefficient of Mean Deviation (C.M.D): the coefficient of mean deviation of a
given average is the ratio of the mean deviation of an average to average about
which deviations are taken.
MD( X )
C.M.D( X )=
X
~
~ MD( X )
C.M.D( X )= ~
X
MD ( Xˆ )
C.M.D( X̂ )=
Xˆ
Exercise 4.2: calculate the C.M.D about the mean, median and mode for the
data in example 4.4 above.
Variance: variance is the "average squared deviation from the mean" both Variance
and Standard Deviation are the most superior and widely used measures of
dispersions and both measure the average dispersion of the observations around
the mean.
6
Population variance ( 2 )
For ungrouped data 2 =
( X ) 2
Where N is the population size
N
f ( Xcm ) 2
For grouped data =
2
Where Xcm is the class mark
f
Sample variance (S2)
For ungrouped data S2 =
(X X ) 2
Where n is the sample size
n 1
For grouped data S2 =
f ( X cm X ) 2
Wher e Xcm is the class mark
f 1
Standard Deviation: Standard deviation is an actual average distance of each
observation from the mean; it is the positive square root of variance.
Population Standard Deviation (δ) = 2
Sample Standard Deviation (S) = S 2
If the standard deviation of the data is small the values are concentrated near
the men and if it large the values are scattered away from the mean.
Examples 4.5: Find the variance and standard deviation of the following sample
data 5, 17, 12, 10.
Solution: first find the mean
X=
X = 5 17 12 10 11
n 5
Xi 5 17 12 10 total
(Xi- X )2 36 36 1 1 74
2
S =
(X X ) 2
74
24.67
n 1 3
S = S 2 24.6 4..97
7
Example 4.6: consider the following grouped frequency distribution of
weight of 75 individuals, find the standard deviation
Weight(kg) Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solution: first find mean, the class mark and the sum squared deviation from
the mean.
X =55
weight Frequency Xcm (Xcm- X )2 f*(Xcm- X )2
40-44 7 42 169 1183
45-49 10 47 64 40
50-54 22 52 9 198
55-59 15 57 4 60
60-64 12 62 49 588
65-69 6 67 144 864
70-74 3 72 289 867
f =75 f ( X cm X ) 2
=4400
2
S =
f ( X cm X ) 2
4400 4400
59.46
f 1 75 - 1 74
S = S 2 59.46 7.71
Interpretation
The standard deviation value S=7.71, indicates that on average each weight of
individuals is far from the mean age ( X =55kg) by 7.71kg.
8
Special properties of Standard deviations
1. If the standard deviation of X1, X2, X3 ,…...Xn is S, then the standard
deviation of
a. X1 K, X2 K, X3 K ,……..Xn K will also be S
b. KX1, KX2, KX3 ,……..KXn is |K|S
c. b + KX1,b + KX2,b + KX3 ,……..,b + KXn is |K|S
Where: b and k are any constants
Exercise 4.3:
A. The mean and variance of n Tetracycline Capsules X1, X2, …,Xn are known
to be 12gm and 16gm respectively. New set of capsules of another drug
are obtained by the linear transformation Yi = 2Xi – 0.5 (i = 1, 2… n) then
what will be the variance and standard deviation of the new set of
capsules?
B. The mean and the standard deviation of a set of numbers are respectively
500 and 10.
i. If 10 is added to each of the numbers in the set, then what will be
the variance and standard deviation of the new set?
ii. If each of the numbers in the set are multiplied by -5, then what
will be the variance and standard deviation of the new set?
2. If the data are a sample and the distribution is normal or bell-shaped (or
close to it!) or approximately normally distributed, then
Approximately 68% of the scores in the sample fall within one standard
deviation of the mean i.e. X S will include approximately 68% of the
data.
Approximately 95% of the scores in the sample fall within two standard
deviations of the mean i.e. X 2 S will include approximately 95% of the
data.
Approximately 99.73% of the scores in the sample fall within three
standard deviations of the mean i.e. X 3S will include approximately
99.73% of the data.
9
Even if standard deviation is better than variance, there is however on
difficulty with it. If there are two or more distributions of different variables
(having different units of measurement), there variability cannot be compared
by comparing the values of the standard deviation.
Coefficient of Variation (CV): The coefficient of variation is the ratio of the
standard deviation to the mean and it is expressed as percent.
If two or more distributions differ in their units of measurement, there
variability cannot be compared by any of the absolute measure given before, but
using the coefficient of variation it is possible.
For Population CV= ×100%
S
For sample CV= ×100%
X
The distribution having less CV is said to be less variable or more
consistent or more uniform.
The distribution having high CV is said to be more variable or less
consistent or less uniform.
Example 4.7: An analysis of the monthly wages paid (in Birr) to workers in two
firms A and B belonging to the same industry gives the following results
Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?
10
Solution: find the CV for both firms
S S
CVA= ×100% CVB= ×100%
X X
10 11
CVA= ×100%=19.05% CVB= ×100%=23.16%
52.5 47.5
Interpretation: Since [Link] < [Link], in firm B there is greater variability in
individual wages or wages paid in firm B are not consistent there is high
variability.
Exercise 4.4:
1. A meteorologist interested in the consistency of temperatures in three
cities during a given week collected the following data. The temperatures
for the five days of the week in the three cities were
City 1 25 24 23 26 17
City2 22 21 24 22 20
City3 32 27 35 24 28
Which city have the most consistent temperature, based on these data?
2. Given two data sets
Data Set A: 2 Meters, 4 Meters, 6 Meters
Data Set B: 1000 Litters, 800 Litters, 900 Litters
Which data set has less variability?
Standard Scores (Z-scores): foe a given value Z-score gives the deviations
from the mean in units of standard deviation and it tell us how many standard
deviation a given value lie above or below the mean.
If X is a measurement from a distribution with mean X and
standard deviation S, then its value in standard units is
Xi
Z for population
Xi X
Z for the sample
S
11
Z gives the deviations from the mean in units of standard deviation and it
tell us how many S.D a given value lie above or below the mean.
It also helps in hypothesis testing
It is used to compare two observations coming from different groups.
Example 4.8: Two sections were given introduction to statistics examinations.
The following information was given.
Value Section 1 Section 2
Mean 78 90
S. D 6 5
Student A from section 1 scored 90 and student B from section 2 scored
[Link] speaking who performed better?
Solution: first find Z-score of both students
Xi 90 78
ZA =2
6
Xi 95 90
ZB =1
5
Interpretation: Student A performed better relative to his section because the
score of student A is two standard deviation above the mean score of his
section while, the score of student B is only one standard deviation above the
mean score of his section.
12