0% found this document useful (0 votes)
12 views64 pages

Understanding Measures of Central Tendency

note
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views64 pages

Understanding Measures of Central Tendency

note
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Measures of Central Tendency

Measures of Central Tendency give us information about the location of


the center of the distribution of data values.

A single value that describes the characteristics of the entire mass of data

Objectives of measures of central tendency

To get a single value that represent(describe) characteristics of the entire


data

To summarizing/reducing the volume of the data

To facilitating comparison within one group or between groups of data

To make further statistical analysis.

Abriham S. 1
Properties of Summation
𝑛
𝑖=1 𝑥𝑖 = 𝑥1 +𝑥2 + ⋯ + 𝑥𝑛
𝑛
𝑖=1 𝑥𝑖
2 = 𝑥1 2 +𝑥2 2 + ⋯ + 𝑥𝑛 2
𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 =𝑥1 𝑦1 +𝑥2 𝑦2 +….+𝑥𝑛 𝑦𝑛
𝑛 2 𝑓 = 𝑥 2𝑓
𝑥
𝑖=1 𝑖 𝑖 1 1 + 𝑥2 2 𝑓2 + ….+𝑥𝑛 2 𝑓𝑛
𝑛
𝑖=1 𝑥𝑖 𝑓𝑖 =𝑥1 𝑓1 +𝑥2 𝑓2 +….+𝑥𝑛 𝑓𝑛

𝑛
𝑖=1 𝑘 = 𝑛𝑘 , where k is any constant
𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 ±𝑦𝑖 = 𝑖=1 𝑥𝑖 ± 𝑖=1 𝑦𝑖

𝑛 𝑛
𝑖=1 𝑘𝑥𝑖 = k 𝑖=1 𝑥𝑖 , if k is a constant.
Abriham S. 2
Example 3.1: considering the following data determine
X 5 7 7 10 8
Y 7 6 7 8 8
5
𝑖=1 𝑥𝑖 = 5+7+7+10+8 =37

5
𝑖=1 𝑦𝑖 =7+6+7+8+8 =36

5
𝑖=1 5 = 5*5=25

5 2
𝑥
𝑖=1 𝑖 = 52 +72 +72 +102 +82 =287

5
𝑖=1 𝑥𝑖 𝑦𝑖 =5*7+7*6+7*7+10*8+8*8= 270

5 𝑛
( 𝑖=1 𝑥𝑖 ) ( 𝑖=1 𝑦𝑖 ) =37*36 =1332

5
𝑖=1(4𝑥𝑖 + 3𝑦𝑖 )= Abriham S. 3
Types of Measures of Central Tendency

There are several different measures of central tendency:

• The Mean (Arithmetic, Weighted, Geometric and Harmonic)

• The Mode

• The Median

• Quantiles (Quartiles, Deciles and Percentiles)

The Arithmetic Mean: is the sum of all observations divided by the


number of observations.

The mean of 𝑥1 , 𝑥2 , … 𝑥𝑛 is denoted by A.M , or 𝑥 and is given by:


𝑛
𝑖=1 𝑥𝑖
𝑥 = , i = 1, 2, …, n
𝑛
Abriham S. 4
If the numbers 𝑥1 , 𝑥2 , … , 𝑥𝑘 have the corresponding frequencies
𝑓1 , 𝑓2 , … , 𝑓𝑘 respectively, the AM is obtained by

𝑥1𝑓1 +𝑥2 𝑓2 +⋯+𝑓𝑘 𝑥𝑘 𝑘


𝑖=1 𝑥𝑖 𝑓𝑖
𝑥 = = 𝑘
𝑓1 +𝑓2 + ⋯+𝑓𝑘 𝑖=1 𝑓𝑖

If data are given in the form of Grouped frequency distribution, the sample
mean can be computed as

𝑚1𝑓1 +𝑚2 𝑓2 +⋯+𝑓𝑘 𝑚𝑘 𝑘


𝑖=1 𝑚𝑖 𝑓𝑖
𝑥 = = 𝑘 𝑓
𝑓1 +𝑓2 + ⋯+𝑓𝑘 𝑖=1 𝑖

Where 𝑚𝑖 is he class mark of the 𝑖 𝑡ℎ class; i = 1, 2, …, k, fi the frequency of


the 𝑖 𝑡ℎ class and k = the number of classes

𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 + 𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡


𝑚𝑖 =
2 Abriham S. 5
Example: calculate the mean of the following frequency distribution
𝑥𝑖 2 3 7 8

𝑓𝑖 2 1 3 1

Solutions:

𝑥𝑖 𝑓𝑖 𝑥𝑖 𝑓𝑖

2 2 4

3 1 3

7 3 21

8 1 8

Total 7 36

4
𝑖=1 𝑥𝑖 𝑓𝑖 36
𝑥 = 4 𝑓 = = 5.14
𝑖=1 𝑖 7

Abriham S. 6
Example: calculate the mean for the following age distribution.

Class frequency

6 - 10 35

11- 15 23

16- 20 15

21- 25 12

26- 30 9

31- 35 6

Solutions:

• First find the class marks

• Find the product of frequency and class marks

• Find mean using the formula.


Abriham S. 7
Class 𝑓𝑖 𝑚𝑖 𝑚𝑖 𝑓𝑖

6- 10 35 8 280

11- 15 23 13 299

16- 20 15 18 270

21- 25 12 23 276

26- 30 9 28 252

31- 35 6 33 198

Total 100 1575

6
𝑖=1 𝑚𝑖 𝑓𝑖 1575
𝑥 = 6 𝑓 = = 15.755
𝑖=1 𝑖 100

Abriham S. 8
Properties of Arithmetic Mean
The sum of the deviations of a set of items from their mean is always
𝑛
zero i.e. 𝑖=1(𝑋𝑖 − 𝑋)=0
When a set of observations is divided into k groups and 𝑥1 is the
mean of 𝑛1 observations of group 1, 𝑥2 is the mean of 𝑛2
observations of group2, …, 𝑥𝑘 is the mean of 𝑛𝑘 observations of
group k , then the combined mean, denoted by 𝑥 , of all
observations taken together is given by

𝑘
𝑥1 𝑛1 + 𝑥2 𝑛2+⋯+ 𝑥𝑘 𝑛𝑘 𝑖=1 𝑥𝑖 𝑛𝑖
𝑥𝑐 = = 𝑘 𝑛𝑖
𝑛1 +𝑛2 + …+𝑛𝑘 𝑖=1

Abriham S. 9
Example: In a class there are 30 females and 70 males. If females averaged
60 in an examination and boys averaged 72, find the mean for the entire
class.
Solutions:

Given: 𝑥𝑓 =60, 𝑛𝑓 =30 𝑥𝑚 =72, 𝑛𝑚 =70

𝑥𝑓 𝑛𝑓 + 𝑥𝑚 𝑛𝑚 60∗30 + 72∗70
𝑥𝑐 = = = 68.40
𝑛𝑓 +𝑛𝑚 30+70

The effect of transforming original series on the mean.


a) If a constant k is added/ subtracted to/from every observation then
the new mean will be the old mean± k respectively.
b) If every observations are multiplied by a constant k then the new mean
will be k*old mean
Abriham S. 10
Advantages of Arithmetic Mean:
• is rigidly defined.
• is based on all observation.
• is suitable for further mathematical treatment.
• is stable average, i.e. not affected by fluctuations of sampling to some extent.
• It is easy to calculate and simple to understand.
Disadvantages of Arithmetic Mean:
• It is affected by extreme observations.
• It can not be used in the case of open end classes.
• It can not be determined by the method of inspection.
• It can not be used when dealing with qualitative characteristics, such as
intelligence, honesty, beauty.
• It can be a number which does not exist in a serious.
• Some times it leads to wrong conclusion if the details of the data from
which it is obtained are not available.
• It gives high weight to high extreme values and less weight to low extreme
values.
Abriham S. 11
Weighted Mean
When a proper importance is desired to be given to different data a
weighted mean is appropriate.
Weights are assigned to each item in proportion to its relative importance.
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be the value of items of a series and 𝑊1 , 𝑊2 , … , 𝑊𝑛 their
corresponding weights , then the weighted mean denoted 𝑥𝑤 is defined
as:
𝑛
𝑖=1 𝑥𝑖 𝑤𝑖
𝑋= 𝑛
𝑖=1 𝑤𝑖

Example: A student obtained the following percentage in an examination:


English 60, Biology 75, Mathematics 63,Physics 59, and chemistry [Link]
the students weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively
are allotted to the subjects.

𝑛
𝑖=1 𝑥𝑖 𝑤𝑖 60∗1+75∗2+63∗1+59∗3+55∗3 615
𝑋= 𝑛 𝑤 = = =61.5 Abriham S. 12
𝑖=1 𝑖 1+2+1+3+3 10
Geometric Mean

The geometric mean like arithmetic mean is calculated average. It used


when observed values are measured as ratios, percentages, proportions,
indices or growth rates.

The geometric mean, G.M of a series of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 is defined


as
𝑛
GM= 𝑛 𝑥1 ∗ 𝑥2 ∗ ⋯ ∗ 𝑥𝑛 = 𝑛 𝑖=1 𝑥𝑖
𝑛
𝑖=1 𝑙𝑜𝑔𝑥𝑖
LogGM =
𝑛
𝑛
𝑖=1 𝑙𝑜𝑔𝑥𝑖
GM= antilog
𝑛
Example: Find the geometric mean of the following numbers 2, 4, 8.
3
GM= 2 ∗ 4 ∗ 8 =4
Abriham S. 13
• When the observed values 𝑥1 , 𝑥2 , … , 𝑥𝑛 have the corresponding
frequencies 𝑓1 , 𝑓2 , … , 𝑓𝑛 respectively, the geometric mean is
obtained by
𝑛 𝑛 𝑛
GM= 𝑥1 𝑓1 ∗ 𝑥2 𝑓2 ∗ 𝑥𝑘 𝑓𝑘 = 𝑖=1 𝑥𝑖
𝑓𝑖

𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖
LogGM =
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑥𝑖
GM= antilog
𝑛
Example: Compute the geometric mean

Values 2 4 6 8 10
frequency 1 2 2 2 1

Solution:
𝑛 𝑓1 𝑓2 ∗ 𝑓𝑘 8
GM= 𝑥1 ∗ 𝑥2 𝑥𝑘 = 21 ∗ 42 ∗ ⋯ ∗ 101 =5.41 14
The frequency distribution are grouped(continuous), class marks of
the class intervals are considered as 𝑚𝑖
𝑛 𝑛 𝑛
GM = 𝑚1 𝑓1 ∗ 𝑚2 𝑓2 ∗ 𝑚𝑛 𝑓𝑛 = 𝑖=1 𝑚 𝑖
𝑓𝑖

𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑚𝑖
LogGM =
𝑛
𝑛
𝑖=1 𝑓𝑖 𝑙𝑜𝑔𝑚𝑖
GM =antilog
𝑛

Exercise 3.1: Compute geometric mean


Class f
2-4 20
4-6 40
6-8 30
8-10 10
Abriham S. 15
Properties of geometric mean
a. Its calculations are not as such easy.
b. It involves all observations during computation
c. It may not be defined even it a single observation is negative
d. If the value of one observation is zero its values becomes zero

Abriham S. 16
Harmonic Mean
• Is the reciprocal of the arithmetic mean of the reciprocal of the
individual observations. If 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observations, then
harmonic mean can be represented by the following formula:
𝑛 𝑛
HM = 1 1 1 = 𝑛 1
+ +⋯+ 𝑖=1𝑥
𝑥1 𝑥2 𝑥𝑛 𝑖
Example: find the harmonic mean of 0.3, 0.4, 0.5 and 0.2
𝑛 4
HM= 1 1 1 1 = 1 1 1 1 = 0.31
+ + + + + +
𝑥1 𝑥2 𝑥3 𝑥4 0.3 0.4 0.5 0.2
If the data arranged in the form of frequency distribution
𝑘
𝑖=1 𝑓𝑖 𝑓1 + ⋯+𝑓𝑘 𝑛
HM = 1 = 1 = 𝑛 1

𝑘 𝑓 𝑥 𝑓𝑘 𝑥1 +⋯+𝑓𝑘 𝑥𝑘 𝑓𝑖 𝑋𝑖
𝑖=1

𝑖=1 𝑖 𝑖 Abriham S. 17
If the frequency distribution are grouped (continuous), class marks
of the class intervals are considered as 𝑚𝑖 :
𝑛 𝑼𝒑𝒑𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕 +𝑳𝒐𝒘𝒆𝒓 𝒄𝒍𝒂𝒔𝒔 𝒍𝒊𝒎𝒊𝒕
HM = 𝑛 1 , 𝒎𝒊 =
𝑖=1𝑓 𝑚
𝟐
𝑖 𝑖

Find harmonic mean from Exercise 3.1


Properties of harmonic mean
i. It is based on all observation in a distribution.
ii. Used when a situations where small weight is give for larger
observation and larger weight for smaller observation
iii. Difficult to calculate and understand
Iv. Appropriate measure of central tendency in situations where
data is in ratio, speed or rate
Abriham S. 18
The Median

The median of a set of items (numbers) arranged in order of


magnitude (i.e. in an array form) is the middle value or the
arithmetic mean of the two middle values. Median is denoted by 𝑋

𝑋 𝑛+1 , 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
𝑋= 1
𝑋 𝑛 +𝑋 𝑛
+1
, 𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2 2

Example: Find the median of the following numbers.


a) 6, 5, 2, 8, 9, 4. First order the data: 2, 4, 5, 6, 8, 9: Here n=6
1 1 1
𝑋= 𝑋𝑛 2 + 𝑋𝑛 2+1 = 𝑋(3) + 𝑋(4) = (5+6) =5.5
2 2 2
Abriham S. 19
The Median…
Exercise: Find the median for the following data
x 6.7 6.2 8.4 6.9 7.1
f 5 2 8 7 7

Abriham S. 20
Median for Grouped Data.
If data are given in the of continuous frequency distribution, the median is
defined as:

𝑛 𝑤
𝑋 = 𝐿𝑚𝑒𝑑 + ( − 𝑐)
𝑓𝑚𝑒𝑑 2
Where:
𝐿𝑚𝑒𝑑 =Lower boundary of the median class
W= class width of the median class
𝑓𝑚𝑒𝑑 =frequency of the median class
n=the sum of frequency
C= cumulative frequency of the class that precedes the median class.
Abriham S. 21
Example: Calculate the median for the following frequency distribution.

Class 0-5 5-10 10-15 15-20 20-25

Frequency 5 8 10 8 5
Less than cf. 5 13 23 31 36

To obtain the median class 𝑛 2=36 2=18, Thus the smallest cumulative
frequency that contains greater than or equal to 18 is 23.
Hence the median class is the 3rd class which is 10 – 15.
Given: 𝐿𝑚𝑒𝑑 =10, W= 5, 𝑓𝑚𝑒𝑑=10, C=13, n=36

𝑤 𝑛 5
𝑋 = 𝐿𝑚𝑒𝑑 + ( − 𝑐)=10+ 18 − 13 = 12.5
𝑓𝑚𝑒𝑑 2 10

Abriham S. 22
Properties of Median

The median is used to find the center or middle value of a data


set.
The median is used when it is necessary to find out whether the
data values fall into the upper half or lower half of the
distribution.
The median is used for an open-ended distribution.
The median is affected less than the mean by extremely high or
extremely low values.

Abriham S. 23
The Mode
is a value which occurs most frequently in a set of values. The mode may
not exist and even if it does exist, it may not be unique. In case of discrete
distribution the value having the maximum frequency is the model value.

Note: If a distribution has more than two modal values then we call the
distribution multimodal. If in a set of observed values, all values occur once
or equal number of times, there is no mode. The mode is also useful in
finding the most typical case when the data are nominal or categorical.
Examples:
1. Find the mode of 5, 3, 5, 8, 9: Mode =5
2. Find the mode of 8, 9, 9, 7, 8, 2, and 5. It is a bimodal Data: 8 and 9
3. Find the mode of 4, 12, 3, 6, and 7. No mode for this data.
The mode of a set of numbers 𝑥1 , 𝑥2 , … , 𝑥𝑛 is usually denoted by 𝑋
Abriham S. 24
Mode for Grouped data
If data are given in the continuous frequency distribution, the mode is
defined as:

∆1
𝑋 = 𝐿𝑚𝑜 +w
∆1 +∆2

Where: 𝐿 𝑚𝑜 = lower class boundary of the modal class


W = class width of the modal class
∆1 =𝑓𝑚𝑜 −𝑓1
∆2 =𝑓𝑚𝑜 −𝑓2
𝑓𝑚𝑜 = frequency of the modal class
𝑓1 = frequency of the class preceding the modal class
𝑓2 = frequency of the class next to the modal class
Abriham S. 25
Example Find the mode for the frequency distribution of the birth weight
(in kilogram) of 30 children given below.

Class 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3

f 5 5 9 4 4 3
Solution:
2.7-3.1 is the modal class, since it is a class with the highest frequency

𝐿𝑚𝑜 =2.7, W = 0.4, ∆1 = 𝑓𝑚𝑜 −𝑓1 = 9-5=4, ∆2 = 𝑓𝑚𝑜 −𝑓2 = 9-4 = 5

∆1 4
𝑋 = 𝐿𝑚𝑜 +w =2.7+0.4 =2.878
∆1 +∆2 4+5

Abriham S. 26
Advantages of Mode
• It is not affected by extreme observations.

• Easy to calculate and simple to understand.

• It can be calculated for distribution with open end class

Disadvantages of Mode
• It is not rigidly defined.

• It is not based on all observations

• It is not suitable for further mathematical treatment.

• It is not stable average, i.e. it is affected by fluctuations of

sampling to some extent.

• Often its value is not unique.


Abriham S. 27
Quartiles
Quartiles are three points which divide the arranged set of data into four
equal parts.
The Lower Quartile (𝑸𝟏 ) is the measure where one-fourth (25%) of the
order data set is below it.
The Middle Quartile (𝑸𝟐 ) is the measure where two-fourth (50%) of
the order data set is below it
it is equal to median.
The Upper Quartile (𝑸𝟑 ) is the measure where three-fourth (75%) of the
order set of data is below it

Abriham S. 28
Step to calculate Quartile for Ungrouped Data

Arrange the data in increasing order of magnitude.

𝑘(𝑛+1) 𝑡ℎ
𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
4
𝑄𝑘 = 𝑘𝑛 𝑘𝑛 𝑡ℎ
4
+ 4
+1
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2

Example: find the quartiles for the following data.

1246224563426789

Solution: First arrange the data according to increasing order

1, 2, 2, 2, 2, 3, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9

Abriham S. 29
1∗16 1∗16 th
+ +1
Q1 = 4 4
= 4.5 th = 4 th +0.5 (5th −4th )=2+0.5(2-2)=2
2

2∗16 2∗16 th
+ +1
Q2 = 4 4
= 8.5 th = 8 th +0.5(9th −8th )=4+0.5(4-4)=4
2

Q3 =?

Abriham S. 30
For grouped data: we have the following formula

𝒘 𝒊∗𝒏
𝑸𝒊 = 𝑳𝑸𝒊 + − 𝒄 , i=1,2,3
𝒇𝑸𝒊 𝟒
Where:
𝐿𝑄𝑖 = lower class boundary of the quartile class
W = class width of the quartile class
n = total number of observation
C = cumulative frequency (less than type) precedes the quartile class

𝑓𝑄𝑖 = frequency of the quartile class


Remark: The Quartile class (class containing Qi) is the class with the
𝑖∗𝑛
smallest cumulative frequency greater than or equal to
4
Abriham S. 31
Deciles
 Deciles are measures which divide the arranged distribution into
ten equal parts.
 The values of the variables corresponding to these divisions are
denoted D1, D2,…, D9 often called the first, the second,…, the
ninth decile respectively.
For Ungrouped data

𝑘(𝑛+1) 𝑡ℎ
𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
10
𝐷𝑘 = 𝑘𝑛 𝑘𝑛 𝑡ℎ
10
+ 10
+1
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2

Abriham S. 32
For grouped data: we have the following formula
𝒘 𝒊∗𝒏
𝑫𝒊 = 𝑳𝑫𝒊 + − 𝒄 , i=1,2,… , 9
𝒇𝑫𝒊 𝟏𝟎
Where:
𝐿𝐷𝑖 = lower class boundary of the decile class

W = class width of the decile class

n= total number of observation

C = cumulative frequency (less than type) precedes the decile class

𝑓𝐷𝑖 = frequency of the decile class

Remark: The decile class (class containing Di) is the class with the smallest
𝑖∗𝑛
cumulative frequency greater than or equal to
10

33
Percentiles
• Percentiles are measures that divide the arranged distribution in
to hundred equal parts.

• The values of the variables corresponding to these divisions are


denoted P1, P2,.. P99 often called the first, the second,…, the
ninety-ninth percentile respectively.
For Ungrouped data
𝑘(𝑛+1) 𝑡ℎ
𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
100
𝑃𝑘 = 𝑘𝑛 𝑘𝑛 𝑡ℎ
100
+ 100
+1
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2

Abriham S. 34
For grouped data: we have the following formula

𝒘 𝒊∗𝒏
𝑷𝒊 = 𝑳𝑷𝒊 + − 𝒄 , i=1,2,… , 99
𝒇𝑷𝒊 𝟏𝟎𝟎
Where:
𝐿𝑃𝑖 = lower class boundary of the percentile class
W = class width of the percentile class
n = total number of observation
C = cumulative frequency (< type) precedes the percentile class
𝑓𝑃𝑖 = frequency of the percentile class
Remark: The Percentile class (class containing Pi) is the class with the
𝑖∗𝑛
smallest cumulative frequency greater than or equal to
100
35
Example: Considering the following distribution
Class Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Calculate:
a) 3𝑟𝑑 quartiles
b) The 7th decile
c) The 90th percentile.

Abriham S. 36
Solutions:
First find the less than cumulative frequency.

Use the formula to calculate the required Quartiles

Classes Frequency < Cf


140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493
Abriham S. 37
To find the 3rd Quartile class
𝑖∗𝑛 𝑡ℎ 3∗493 𝑡ℎ 𝑡ℎ
Quartile class = = = 369.75 the class containing
4 4
the 3𝑟𝑑 quartile is 200 -210
𝐿𝑄3 = 200, W = 10, n = 493, C = 351, 𝑓𝑄3 =49
𝑤 3∗𝑛
𝑄3 = 𝐿𝑄3 + −𝑐
𝑓𝑄3 4
10 3∗493
𝑄3 = 200 + − 351 =203.83
49 4
b. First find the class containing the 7th deciles class
𝑖∗𝑛 𝑡ℎ 7∗493 𝑡ℎ
= = 345.1th
10 10
190-200 is the class containing the 7th deciles
Given: 𝐿𝐷7 = 190, 𝑓𝐷7 =107, W=10, C =244
𝑤 7∗𝑛 10 7∗493
𝐷7 = 𝐿𝐷7 + −𝑐 =190+ − 244 =199.45
𝑓𝐷7 10 107 10
Abriham S. 38
determine the class containing the 90th percentile.

𝑖∗𝑛 𝑡ℎ 90∗493 𝑡ℎ 𝑡ℎ
= = 443.7
100 100

Then, 220-230 is the class containing the 90th percentile

𝐿90 = 220, 𝑓90 =31, n =493, w = 10 c = 434


𝑤 90 ∗ 𝑛
𝑃90 = 𝐿90 + −𝑐
𝑓90 100
10 90∗493
𝑃90 = 220 + − 434 =223.13
31 100
Exercise
based on the above data find the following measures
a) The 1st and 2nd quartiles
b) The 5th decile
c) The 20th percentile Abriham S. 39
Exercise
Marks of 75 students are summarized in the following frequency
distribution:
Marks No. of students
40-44 7
45-49 10
50-54 22
55-59 𝑓4
60-64 𝑓5
65-69 6
70-74 3
If 20% of the students have marks between 60 and 64
i. Find the missing frequencies 𝑓4 and 𝑓5 .
ii. Find the mean, median and Mode
iii. All quartiles, the 7th decile , the 70th percentile
Abriham S. 40
MEASURE OF VARIATION (DISPERSION)
It indicates that the degree to which numerical data tend to spread
about an average value.
The objective of measure of variation are:
 To control variability

 To compare two or more groups in terms of variability

 To make further statistical analysis


The most commonly used measures of dispersions are:
1) Range
2) Interquartile range (IQR)
2) Mean deviation
3) Variance and Standard deviation
4) Coefficient of variation Abriham S. 41
The Range
The difference between the largest and smallest observations in a
sample.
Range = Maximum value – Minimum value
Properties of range

 It is the simplest crude measure and can be easily understood

 It takes into account only two values which causes it to be a poor


measure of dispersion

 Very sensitive to extreme observations

 The larger the sample size, the larger the range

Abriham S. 42
Interquartile range (IQR): Indicates the spread of the middle 50% of the
observations, and used with median

IQR = Q3 - Q1

Properties of IQR:
 It is a simple and versatile measure
 It encloses the central 50% of the observations
 It is not based on all observations but only on two specific values
 It is important in selecting cut-off points in the formulation of clinical
standards
 Since it excludes the lowest and highest 25% values, it is not affected by
extreme values
 Less sensitive to the size of the sample Abriham S. 43
The Mean Deviation (M.D)
The mean deviation of a set of items is defined as the arithmetic
mean of the values of the absolute deviations from a given average.
Depending up on the type of averages used we have different mean
deviations.
a) Mean Deviation about the mean
𝑛
𝑖=1 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

For the case of frequency distribution it is given as:


𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

Abriham S. 44
b) Mean Deviation about the median.
𝑛
𝑖=1 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

For the case of frequency distribution it is given as:


𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

c) Mean Deviation about the mode.


𝑛
𝑖=1 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

For the case of frequency distribution it is given as:


𝑛
𝑖=1 𝑓𝑖 𝑥𝑖 −𝑥
M.D(𝑥)=
𝑛

Abriham S. 45
Steps to calculate M.D

1. Find the arithmetic mean, 𝑥 , 𝑥 𝑎𝑛𝑑 𝑥

2. Find the deviations of each observation from 𝑥 , 𝑥 and 𝑥

3. Find the arithmetic mean of the deviations, ignoring sign.

Example: The following are the number of visit made by ten

mothers to the local doctor’s surgery. 8, 6, 5,5, 7, 4, 5, 9, 7, 4

Find mean deviation about mean, median and mode.

Solutions: First calculate the three averages; 4,4,5,5,5,6,7,7,8,9

𝑥 =6 , 𝑥= 5.5, 𝑥=5
Abriham S. 46
The deviations of each observation from mean, median and mode

𝑋𝑖 4 4 5 5 5 6 7 7 8 9 Total
𝑋𝑖 − 6 2 2 1 1 1 0 1 1 2 3 14
𝑋𝑖 − 5.5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

𝑋𝑖 − 5 1 1 0 0 0 1 2 2 3 4 14

𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −6 14
M.D(𝑥)= = = = 1.4
𝑛 10 10
𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −5.5 14
M.D(𝑥)= = = =1.4
𝑛 10 10
𝑛 𝑛
𝑖=1 𝑥𝑖 −𝑥 𝑖=1 𝑥𝑖 −5 14
M.D(𝑥)= = = =1.4
𝑛 10 10

Abriham S. 47
𝑛
𝑖=1 𝑓𝑖 𝑚𝑖 −𝐴
MD A = for grouped frequency distribution;
𝑛
where 𝑚𝑖 is the class mark of the ith class, A is a measure of central
tendency: the arithmetic mean, the median and the mode.
Exercise: Find mean deviation about mean, median and mode for the ff
distributions.

Class 2-4 4-6 6-8 8-10

F 20 40 30 10

Abriham S. 48
Population Variance and standard deviation
The variance is the average of the squares of the distance each
value is from the mean. The symbol for the population variance is
𝜎 2 . If 𝑋, 𝑋2 , … , 𝑋𝑁 be the measurements on N population units
then, the population variance is given by the formula:
𝑁 𝑋 2
𝑁 𝑁 2 𝑖=1 𝑖
𝑋𝑖 −𝜇 2 𝑖=1 𝑋𝑖 −
𝜎 2= 𝑖=1
= 𝑁
i=1,2, …, N
𝑁 𝑁
𝑁
2 𝑖=1 𝑓𝑖 𝑋𝑖 −𝜇 2
𝜎 = i= 1,2, …, N
𝑁

• The standard deviation is the square root of the variance. The


symbol for the population standard deviation is 𝜎. The
corresponding formula for the standard deviation is
𝜎= 𝜎2
Abriham S. 49
Sample Variance
The sum of the squares of the deviations is divided by one less than the sample
size.
𝑛 𝑋 2
𝑛 𝑛 2 𝑖=1 𝑖
𝑋𝑖 −𝑥 2 𝑖=1 𝑋𝑖 −
𝑆 2= 𝑖=1
= 𝑛
𝑛−1 𝑛−1
For the case of frequency distribution it is expressed as:
𝑛 2 −𝑛𝑥 2
2 1 2 𝑖 𝑓 𝑖 𝑥𝑖
𝑆 = 𝑓𝑖 𝑥𝑖 − 𝑥 =
𝑛−1 𝑛−1
For grouped data

𝑛 2
𝑖=1 𝑓𝑖 𝑚𝑖 −𝑥 1 𝑓𝑖 𝑚𝑖 2 1
𝑆 2= = 2
𝑓𝑖 𝑚𝑖 − = 𝑓𝑖 𝑚𝑖 2 − 𝑛𝑥 2
𝑛−1 𝑛−1 𝑛 𝑛−1

𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 +𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡


Where: 𝑚𝑖 =
2
Abriham S. 50
Find the variance and the standard deviation of the following data
Class f 𝑚𝑖 𝑓𝑖 ∗ 𝑚𝑖 𝑓𝑖 ∗ 𝑚𝑖 2
5.5-10.5 1 8 8 64
10.5-15.5 2 13 26 338
15.5-20.5 3 18 54 972
20.5-25.5 5 23 115 2645
25.5-30.5 4 28 112 3136
30.5-35.5 3 33 99 3267
35.5-40.5 2 38 76 2888
Total 20 490 13310

1 1
𝑆2 = 2
𝑓𝑖 𝑚𝑖 − 𝑛𝑥 2 = 13310 − 20 ∗ 24.52 = 68.68
𝑛−1 20 − 1

S= 68.68 =8.287
Abriham S. 51
Properties of Variance and Standard Deviation
 Variance and Standard Deviation are affected by extremely high or
low
 If the scores in the data set are all equal in value, there is no
variability.
 Addition or subtraction of a constant k to or from every value of
the original distribution will not affect the original variance & sd.
 If all values in the distribution are multiplied by constant k, then
the new variance = 𝑘 2 times the original variance.
 The new SD = ǀ𝑘 2 ǀSD

Abriham S. 52
Coefficient of Variation(CV)
• Coefficient of variation is used to compare the variability of two
or more different series.
• Coefficient of variation is the ratio of the standard deviation to
the arithmetic mean, usually expressed in percent:
𝑆
CV= *100%
𝑋

where S is the standard deviation of the observations.


• A distribution having less coefficient of variation is said to be less
variable or more consistent or more uniform or more
homogeneous.

Abriham S. 53
Example:
Value SBP Cholesterol
Mean 130mm 200mg/dl
SD 15mm 40mg/dl
SBP or Cholesterol is there greater variability?
𝑆𝑆𝐵𝑃 15𝑚𝑚
𝐶𝑉𝑆𝐵𝑃 = *100%= *100%= 11.5%
𝑋𝑆𝐵𝑃 130𝑚𝑚

𝑆𝐶 40𝑚𝑔/𝑑𝑙
𝐶𝑉𝐶 = *100%= *100%= 20.0%
𝑋𝐶 200𝑚𝑔/𝑑𝑙

𝐶𝑉𝑆𝐵𝑃 < 𝐶𝑉𝐶, Cholesterol is more variable than systolic blood


pressure

Abriham S. 54
Exercise 1: The following data are the ages in years of 15 women
who attend health education last year

30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 37, 32, 30, and 41.

Calculate:

A. Mean, Median and Mode

B. IQR, MD, SD and CV

C. All Quartile, 5th decile and the 20th percentile

55
Exercise 2: Marks of 20 students are summarized in the
following frequency distribution

A. Mean, Median and Mode


B. IQR, MD, SD and CV
C. All Quartile, 5th decile and the 20th percentile

56
Measure of Shape

57
Skewness
Skewness is the degree of asymmetry or departure from symmetry
of a distribution. Skewness is concerned with the shape of the curve
not size.
 Symmetrical skewed distribution: when the value is uniformly
distributed around the mean (distribution of the data below the
mean and above the mean are equal) i.e. Mean=Median = Mode
 Positively skewed distribution: if one or more observations are
extremely large i.e. mean is greater than median and mode
 Negatively Skewed distribution: if one or more extremely small
observations are present i.e. mean is smaller than median and
mode.
Abriham S. 58
59
Measure of Skewness
The Skewness of the distribution can be measured by karl
Pearson’s Coefficient of Skewness (𝑆𝑘 ), for which the formula
is given below:

3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛) 𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒


𝑆𝑘 = =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑆𝑑
𝑆𝑘 lies between -3 and 3

𝑆𝑘 =0 then the distribution is symmetrical

𝑆𝑘 >0, then the distribution is positively skewed

𝑆𝑘 <0, then the distribution is negatively skewed


Abriham S. 60
Kurtosis
 Kurtosis is a measure of the shape of a distribution relates to
its flatness or peakednesss.
 Usually taken relative to a normal distribution.
 A distribution having relatively high peak is called
leptokurtic.
 If a curve representing a distribution is flat topped, it is
called Platykurtic.
 The normal distribution which is not very high peaked or flat
topped is called mesokurtic.

Abriham S. 61
Abriham S. 62
Measures of kurtosis
The moment coefficient of kurtosis:

𝑀4 𝑀4
𝑀𝑐𝑘 = 2 = 𝜎4
𝑀2
Where: M4 is the fourth moment about the mean.
M2 is the second moment about the mean.
𝜎 is the population Standard deviation.
The peakdness depends on the value 𝑀𝑐𝑘

If 𝑀𝑐𝑘 > 3 the curve is leptokurtic (more peaked)

If 𝑀𝑐𝑘 < 3 the curve is platykurtic (less peaked)

If 𝑀𝑐𝑘 = 3 the curve is mesokurtic (normal curve)


Abriham S. 63
Thank you!

You might also like