Understanding Measures of Central Tendency
Understanding Measures of Central Tendency
A single value that describes the characteristics of the entire mass of data
Abriham S. 1
Properties of Summation
𝑛
𝑖=1 𝑥𝑖 = 𝑥1 +𝑥2 + ⋯ + 𝑥𝑛
𝑛
𝑖=1 𝑥𝑖
2 = 𝑥1 2 +𝑥2 2 + ⋯ + 𝑥𝑛 2
𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 =𝑥1 𝑦1 +𝑥2 𝑦2 +….+𝑥𝑛 𝑦𝑛
𝑛 2 𝑓 = 𝑥 2𝑓
𝑥
𝑖=1 𝑖 𝑖 1 1 + 𝑥2 2 𝑓2 + ….+𝑥𝑛 2 𝑓𝑛
𝑛
𝑖=1 𝑥𝑖 𝑓𝑖 =𝑥1 𝑓1 +𝑥2 𝑓2 +….+𝑥𝑛 𝑓𝑛
𝑛
𝑖=1 𝑘 = 𝑛𝑘 , where k is any constant
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 ±𝑦𝑖 = 𝑖=1 𝑥𝑖 ± 𝑖=1 𝑦𝑖
𝑛 𝑛
𝑖=1 𝑘𝑥𝑖 = k 𝑖=1 𝑥𝑖 , if k is a constant.
Abriham S. 2
Example 3.1: considering the following data determine
X 5 7 7 10 8
Y 7 6 7 8 8
5
𝑖=1 𝑥𝑖 = 5+7+7+10+8 =37
5
𝑖=1 𝑦𝑖 =7+6+7+8+8 =36
5
𝑖=1 5 = 5*5=25
5 2
𝑥
𝑖=1 𝑖 = 52 +72 +72 +102 +82 =287
5
𝑖=1 𝑥𝑖 𝑦𝑖 =5*7+7*6+7*7+10*8+8*8= 270
5 𝑛
( 𝑖=1 𝑥𝑖 ) ( 𝑖=1 𝑦𝑖 ) =37*36 =1332
5
𝑖=1(4𝑥𝑖 + 3𝑦𝑖 )= Abriham S. 3
Types of Measures of Central Tendency
• The Mode
• The Median
If data are given in the form of Grouped frequency distribution, the sample
mean can be computed as
𝑓𝑖 2 1 3 1
Solutions:
𝑥𝑖 𝑓𝑖 𝑥𝑖 𝑓𝑖
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
4
𝑖=1 𝑥𝑖 𝑓𝑖 36
𝑥 = 4 𝑓 = = 5.14
𝑖=1 𝑖 7
Abriham S. 6
Example: calculate the mean for the following age distribution.
Class frequency
6 - 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6
Solutions:
6- 10 35 8 280
11- 15 23 13 299
16- 20 15 18 270
21- 25 12 23 276
26- 30 9 28 252
31- 35 6 33 198
6
𝑖=1 𝑚𝑖 𝑓𝑖 1575
𝑥 = 6 𝑓 = = 15.755
𝑖=1 𝑖 100
Abriham S. 8
Properties of Arithmetic Mean
The sum of the deviations of a set of items from their mean is always
𝑛
zero i.e. 𝑖=1(𝑋𝑖 − 𝑋)=0
When a set of observations is divided into k groups and 𝑥1 is the
mean of 𝑛1 observations of group 1, 𝑥2 is the mean of 𝑛2
observations of group2, …, 𝑥𝑘 is the mean of 𝑛𝑘 observations of
group k , then the combined mean, denoted by 𝑥 , of all
observations taken together is given by
𝑘
𝑥1 𝑛1 + 𝑥2 𝑛2+⋯+ 𝑥𝑘 𝑛𝑘 𝑖=1 𝑥𝑖 𝑛𝑖
𝑥𝑐 = = 𝑘 𝑛𝑖
𝑛1 +𝑛2 + …+𝑛𝑘 𝑖=1
Abriham S. 9
Example: In a class there are 30 females and 70 males. If females averaged
60 in an examination and boys averaged 72, find the mean for the entire
class.
Solutions:
𝑥𝑓 𝑛𝑓 + 𝑥𝑚 𝑛𝑚 60∗30 + 72∗70
𝑥𝑐 = = = 68.40
𝑛𝑓 +𝑛𝑚 30+70
𝑛
𝑖=1 𝑥𝑖 𝑤𝑖 60∗1+75∗2+63∗1+59∗3+55∗3 615
𝑋= 𝑛 𝑤 = = =61.5 Abriham S. 12
𝑖=1 𝑖 1+2+1+3+3 10
Geometric Mean
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖
LogGM =
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖
GM= antilog
𝑛
Example: Compute the geometric mean
Values 2 4 6 8 10
frequency 1 2 2 2 1
Solution:
𝑛 𝑓1 𝑓2 ∗ 𝑓𝑘 8
GM= 𝑥1 ∗ 𝑥2 𝑥𝑘 = 21 ∗ 42 ∗ ⋯ ∗ 101 =5.41 14
The frequency distribution are grouped(continuous), class marks of
the class intervals are considered as 𝑚𝑖
𝑛 𝑛 𝑛
GM = 𝑚1 𝑓1 ∗ 𝑚2 𝑓2 ∗ 𝑚𝑛 𝑓𝑛 = 𝑖=1 𝑚 𝑖
𝑓𝑖
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑚𝑖
LogGM =
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑚𝑖
GM =antilog
𝑛
Abriham S. 16
Harmonic Mean
• Is the reciprocal of the arithmetic mean of the reciprocal of the
individual observations. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observations, then
harmonic mean can be represented by the following formula:
𝑛 𝑛
HM = 1 1 1 = 𝑛 1
+ +⋯+ 𝑖=1𝑥
𝑥1 𝑥2 𝑥𝑛 𝑖
Example: find the harmonic mean of 0.3, 0.4, 0.5 and 0.2
𝑛 4
HM= 1 1 1 1 = 1 1 1 1 = 0.31
+ + + + + +
𝑥1 𝑥2 𝑥3 𝑥4 0.3 0.4 0.5 0.2
If the data arranged in the form of frequency distribution
𝑘
𝑖=1 𝑓𝑖 𝑓1 + ⋯+𝑓𝑘 𝑛
HM = 1 = 1 = 𝑛 1
𝑘 𝑓 𝑥 𝑓𝑘 𝑥1 +⋯+𝑓𝑘 𝑥𝑘 𝑓𝑖 𝑋𝑖
𝑖=1
𝑖=1 𝑖 𝑖 Abriham S. 17
If the frequency distribution are grouped (continuous), class marks
of the class intervals are considered as 𝑚𝑖 :
𝑛 𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 +𝑳𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕
HM = 𝑛 1 , 𝒎𝒊 =
𝑖=1𝑓 𝑚
𝟐
𝑖 𝑖
𝑋 𝑛+1 , 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
𝑋= 1
𝑋 𝑛 +𝑋 𝑛
+1
, 𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2 2
Abriham S. 20
Median for Grouped Data.
If data are given in the of continuous frequency distribution, the median is
defined as:
𝑛 𝑤
𝑋 = 𝐿𝑚𝑒𝑑 + ( − 𝑐)
𝑓𝑚𝑒𝑑 2
Where:
𝐿𝑚𝑒𝑑 =Lower boundary of the median class
W= class width of the median class
𝑓𝑚𝑒𝑑 =frequency of the median class
n=the sum of frequency
C= cumulative frequency of the class that precedes the median class.
Abriham S. 21
Example: Calculate the median for the following frequency distribution.
Frequency 5 8 10 8 5
Less than cf. 5 13 23 31 36
To obtain the median class 𝑛 2=36 2=18, Thus the smallest cumulative
frequency that contains greater than or equal to 18 is 23.
Hence the median class is the 3rd class which is 10 – 15.
Given: 𝐿𝑚𝑒𝑑 =10, W= 5, 𝑓𝑚𝑒𝑑=10, C=13, n=36
𝑤 𝑛 5
𝑋 = 𝐿𝑚𝑒𝑑 + ( − 𝑐)=10+ 18 − 13 = 12.5
𝑓𝑚𝑒𝑑 2 10
Abriham S. 22
Properties of Median
Abriham S. 23
The Mode
is a value which occurs most frequently in a set of values. The mode may
not exist and even if it does exist, it may not be unique. In case of discrete
distribution the value having the maximum frequency is the model value.
Note: If a distribution has more than two modal values then we call the
distribution multimodal. If in a set of observed values, all values occur once
or equal number of times, there is no mode. The mode is also useful in
finding the most typical case when the data are nominal or categorical.
Examples:
1. Find the mode of 5, 3, 5, 8, 9: Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
The mode of a set of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 is usually denoted by 𝑋
Abriham S. 24
Mode for Grouped data
If data are given in the continuous frequency distribution, the mode is
defined as:
∆1
𝑋 = 𝐿𝑚𝑜 +w
∆1 +∆2
f 5 5 9 4 4 3
Solution:
2.7-3.1 is the modal class, since it is a class with the highest frequency
∆1 4
𝑋 = 𝐿𝑚𝑜 +w =2.7+0.4 =2.878
∆1 +∆2 4+5
Abriham S. 26
Advantages of Mode
• It is not affected by extreme observations.
Disadvantages of Mode
• It is not rigidly defined.
Abriham S. 28
Step to calculate Quartile for Ungrouped Data
𝑘(𝑛+1) 𝑡ℎ
𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
4
𝑄𝑘 = 𝑘𝑛 𝑘𝑛 𝑡ℎ
4
+ 4
+1
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
1246224563426789
1, 2, 2, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9
Abriham S. 29
1∗16 1∗16 th
+ +1
Q1 = 4 4
= 4.5 th = 4 th +0.5 (5th −4th )=2+0.5(2-2)=2
2
2∗16 2∗16 th
+ +1
Q2 = 4 4
= 8.5 th = 8 th +0.5(9th −8th )=4+0.5(4-4)=4
2
Q3 =?
Abriham S. 30
For grouped data: we have the following formula
𝒘 𝒊∗𝒏
𝑸𝒊 = 𝑳𝑸𝒊 + − 𝒄 , i=1,2,3
𝒇𝑸𝒊 𝟒
Where:
𝐿𝑄𝑖 = lower class boundary of the quartile class
W = class width of the quartile class
n = total number of observation
C = cumulative frequency (less than type) precedes the quartile class
𝑘(𝑛+1) 𝑡ℎ
𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
10
𝐷𝑘 = 𝑘𝑛 𝑘𝑛 𝑡ℎ
10
+ 10
+1
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Abriham S. 32
For grouped data: we have the following formula
𝒘 𝒊∗𝒏
𝑫𝒊 = 𝑳𝑫𝒊 + − 𝒄 , i=1,2,… , 9
𝒇𝑫𝒊 𝟏𝟎
Where:
𝐿𝐷𝑖 = lower class boundary of the decile class
Remark: The decile class (class containing Di) is the class with the smallest
𝑖∗𝑛
cumulative frequency greater than or equal to
10
33
Percentiles
• Percentiles are measures that divide the arranged distribution in
to hundred equal parts.
Abriham S. 34
For grouped data: we have the following formula
𝒘 𝒊∗𝒏
𝑷𝒊 = 𝑳𝑷𝒊 + − 𝒄 , i=1,2,… , 99
𝒇𝑷𝒊 𝟏𝟎𝟎
Where:
𝐿𝑃𝑖 = lower class boundary of the percentile class
W = class width of the percentile class
n = total number of observation
C = cumulative frequency (< type) precedes the percentile class
𝑓𝑃𝑖 = frequency of the percentile class
Remark: The Percentile class (class containing Pi) is the class with the
𝑖∗𝑛
smallest cumulative frequency greater than or equal to
100
35
Example: Considering the following distribution
Class Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Calculate:
a) 3𝑟𝑑 quartiles
b) The 7th decile
c) The 90th percentile.
Abriham S. 36
Solutions:
First find the less than cumulative frequency.
𝑖∗𝑛 𝑡ℎ 90∗493 𝑡ℎ 𝑡ℎ
= = 443.7
100 100
Abriham S. 42
Interquartile range (IQR): Indicates the spread of the middle 50% of the
observations, and used with median
IQR = Q3 - Q1
Properties of IQR:
It is a simple and versatile measure
It encloses the central 50% of the observations
It is not based on all observations but only on two specific values
It is important in selecting cut-off points in the formulation of clinical
standards
Since it excludes the lowest and highest 25% values, it is not affected by
extreme values
Less sensitive to the size of the sample Abriham S. 43
The Mean Deviation (M.D)
The mean deviation of a set of items is defined as the arithmetic
mean of the values of the absolute deviations from a given average.
Depending up on the type of averages used we have different mean
deviations.
a) Mean Deviation about the mean
𝑛
𝑖=1 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛
Abriham S. 44
b) Mean Deviation about the median.
𝑛
𝑖=1 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛
Abriham S. 45
Steps to calculate M.D
𝑥 =6 , 𝑥= 5.5, 𝑥=5
Abriham S. 46
The deviations of each observation from mean, median and mode
𝑋𝑖 4 4 5 5 5 6 7 7 8 9 Total
𝑋𝑖 − 6 2 2 1 1 1 0 1 1 2 3 14
𝑋𝑖 − 5.5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14
𝑋𝑖 − 5 1 1 0 0 0 1 2 2 3 4 14
𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −6 14
M.D(𝑥)= = = = 1.4
𝑛 10 10
𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −5.5 14
M.D(𝑥)= = = =1.4
𝑛 10 10
𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −5 14
M.D(𝑥)= = = =1.4
𝑛 10 10
Abriham S. 47
𝑛
𝑖=1 𝑓𝑖 𝑚𝑖 −𝐴
MD A = for grouped frequency distribution;
𝑛
where 𝑚𝑖 is the class mark of the ith class, A is a measure of central
tendency: the arithmetic mean, the median and the mode.
Exercise: Find mean deviation about mean, median and mode for the ff
distributions.
F 20 40 30 10
Abriham S. 48
Population Variance and standard deviation
The variance is the average of the squares of the distance each
value is from the mean. The symbol for the population variance is
𝜎 2 . If 𝑋, 𝑋2 , … , 𝑋𝑁 be the measurements on N population units
then, the population variance is given by the formula:
𝑁 𝑋 2
𝑁 𝑁 2 𝑖=1 𝑖
𝑋𝑖 −𝜇 2 𝑖=1 𝑋𝑖 −
𝜎 2= 𝑖=1
= 𝑁
i=1,2, …, N
𝑁 𝑁
𝑁
2 𝑖=1 𝑓𝑖 𝑋𝑖 −𝜇 2
𝜎 = i= 1,2, …, N
𝑁
𝑛 2
𝑖=1 𝑓𝑖 𝑚𝑖 −𝑥 1 𝑓𝑖 𝑚𝑖 2 1
𝑆 2= = 2
𝑓𝑖 𝑚𝑖 − = 𝑓𝑖 𝑚𝑖 2 − 𝑛𝑥 2
𝑛−1 𝑛−1 𝑛 𝑛−1
1 1
𝑆2 = 2
𝑓𝑖 𝑚𝑖 − 𝑛𝑥 2 = 13310 − 20 ∗ 24.52 = 68.68
𝑛−1 20 − 1
S= 68.68 =8.287
Abriham S. 51
Properties of Variance and Standard Deviation
Variance and Standard Deviation are affected by extremely high or
low
If the scores in the data set are all equal in value, there is no
variability.
Addition or subtraction of a constant k to or from every value of
the original distribution will not affect the original variance & sd.
If all values in the distribution are multiplied by constant k, then
the new variance = 𝑘 2 times the original variance.
The new SD = ǀ𝑘 2 ǀSD
Abriham S. 52
Coefficient of Variation(CV)
• Coefficient of variation is used to compare the variability of two
or more different series.
• Coefficient of variation is the ratio of the standard deviation to
the arithmetic mean, usually expressed in percent:
𝑆
CV= *100%
𝑋
Abriham S. 53
Example:
Value SBP Cholesterol
Mean 130mm 200mg/dl
SD 15mm 40mg/dl
SBP or Cholesterol is there greater variability?
𝑆𝑆𝐵𝑃 15𝑚𝑚
𝐶𝑉𝑆𝐵𝑃 = *100%= *100%= 11.5%
𝑋𝑆𝐵𝑃 130𝑚𝑚
𝑆𝐶 40𝑚𝑔/𝑑𝑙
𝐶𝑉𝐶 = *100%= *100%= 20.0%
𝑋𝐶 200𝑚𝑔/𝑑𝑙
Abriham S. 54
Exercise 1: The following data are the ages in years of 15 women
who attend health education last year
30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 37, 32, 30, and 41.
Calculate:
55
Exercise 2: Marks of 20 students are summarized in the
following frequency distribution
56
Measure of Shape
57
Skewness
Skewness is the degree of asymmetry or departure from symmetry
of a distribution. Skewness is concerned with the shape of the curve
not size.
Symmetrical skewed distribution: when the value is uniformly
distributed around the mean (distribution of the data below the
mean and above the mean are equal) i.e. Mean=Median = Mode
Positively skewed distribution: if one or more observations are
extremely large i.e. mean is greater than median and mode
Negatively Skewed distribution: if one or more extremely small
observations are present i.e. mean is smaller than median and
mode.
Abriham S. 58
59
Measure of Skewness
The Skewness of the distribution can be measured by karl
Pearson’s Coefficient of Skewness (𝑆𝑘 ), for which the formula
is given below:
Abriham S. 61
Abriham S. 62
Measures of kurtosis
The moment coefficient of kurtosis:
𝑀4 𝑀4
𝑀𝑐𝑘 = 2 = 𝜎4
𝑀2
Where: M4 is the fourth moment about the mean.
M2 is the second moment about the mean.
𝜎 is the population Standard deviation.
The peakdness depends on the value 𝑀𝑐𝑘