0% found this document useful (0 votes)
3 views34 pages

Basic Stat Chapter - 3

The document discusses measures of central tendency, which are statistical tools used to summarize a data set with a single representative value. It outlines the objectives, properties, and types of central tendency measures, including the arithmetic mean, median, and mode. Additionally, it explains the summation notation and provides examples for calculating the arithmetic mean for both ungrouped and grouped data.

Uploaded by

Tade Mf
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views34 pages

Basic Stat Chapter - 3

The document discusses measures of central tendency, which are statistical tools used to summarize a data set with a single representative value. It outlines the objectives, properties, and types of central tendency measures, including the arithmetic mean, median, and mode. Additionally, it explains the summation notation and provides examples for calculating the arithmetic mean for both ungrouped and grouped data.

Uploaded by

Tade Mf
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Basic Statistics December 30, 1899

3. Measures of Central Tendency


3.1. Introduction
The most important aspect of studying the distribution of a sample
measurement is the position of the central value, that is, a representative
value about which the measurements are distributed and when it is
convenient to have one figure that is representative of each group. This
figure is known as the average of the group. If the numbers of the group
are arranged in order of magnitude, the averages tend to fall around the
central position in the group, so averages are called measures of central
tendency. In short, any measure intended to represent the center of data
set is called a measure of location or central tendency.
Objectives:
The most important objectives of measuring central tendency are:
 To determining a single value around which the other data will
concentrate
 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between
groups of data
 To comprehend the data easily.
 To make further statistical analysis
3.2. Properties of measure of central tendency

It should be:

 Simple to understand and easy to calculate/interpret

 Rigidly defined by mathematical formula (unique)

 Based on all observations under investigation

 Not seriously/ highly affected by extreme observations,

 Capable of further statistical analysis and/or algebraic


manipulation

 Little affected by fluctuations of sampling.

1
Basic Statistics December 30, 1899
 Easy to compute/calculate and simple to understand.

 Suitable for further mathematical treatment.

3.3. The Summation Notation (¿

Let a data set consists of a number of observations, represents by x 1 , x 2 , … , x n where n (the


last subscript) denotes the number of observations in the data and x i is the ith observation.
Then the sum of all numbers (x ¿¿ i' s)¿where i goes from 1 up to n is symbolically given by:
n n

∑ x i∨∑ xi ¿ ∑ x that is, ∑ x i = x 1 + x 2 +…+ x n


i=1 i=1

x: whole set of numbers


x i : Specific score in a set of numbers
n: total number of observations
For instance a data set consisting of six measurements 2, 3, 9, 10, 8 and -2 is represented by
6
x 1 , x 2 , … , x 6 where x 1=2, x 2 =3, x 3=¿9, x 4 = 10, x 5= 8 and x 6=-2 Their sum becomes ∑ xi
i=1

= x 1 + x 2 +…+ x6 = 2+3+9+10+8+ (-2) = 30

The "i=1" in the bottom of the summation notation tells where to begin
the sequence of summation. If the expression were written with " i = 3",
the summation would start with the third number in the set.
For Example:
n

∑ x i=x 3 + x 4 + x 5 +…+ x n
i=3
for the above example, this would give the following result:
n

∑ x i=x 3 + x 4 + x 5 + x 6=9+ 10+8+(−2 ) =25


i=3

The "n" in the upper part of the summation notation tells where to end
the sequence of summation.
Some Properties of the Summation Notation:

2
Basic Statistics December 30, 1899
n
1. ∑ c = n.c, where c is a constant number.
i=1

n n
2. ∑ bxi = b∑ x i where b is a constant number
i=1 i=1

n n
3. ∑ (a+ bx¿ ¿i)¿ = n.a + b∑ x i where a and b are any constant number
i=1 i=1

n n n
4. ∑ ¿¿ ¿ = ∑ x i ± ∑ y i
i=1 i=1 i =1

n n n
5. ∑ xi yi ≠ ∑ xi ∑ yi
i=1 i=1 i =1

The sum of the product of the two variables could be written as follows:
n

∑ x i y i=( x 1∗y 1 ) +( x 2∗ y 2 )+ ( x 3∗y 3 ) +…+ ( x n∗y n )


i=1

7 7 7 7
Example 1: ∑ x i = 20, ∑ yi = 30, ∑ x 2i = 420, ∑ yi2=280
i=1 i=1 i=1 i=1

7 7 7
Find A. ∑ ¿¿ + 4 y i ¿ = 6 ∑ x i + 4∑ yi = 6*20 + 4*30 = 240
i=1 i=1 i=1
7 7
B. 3∑ x i −2 ∑ y i = 3*420 – 2*280 = 700
2 2

i=1 i=1

Example 2: considering the following data


X 5 7 7 6 8
Y 6 7 8 7 8

5 5 5
∑ Xi ∑Yi ∑ 10
a) i=1 b) i=1 c) i=1 d)
5
∑ ( X i +Y i )
i=1

3
Basic Statistics December 30, 1899
5 5 5
∑ ( X i −Y i ) ∑ Xi Y i ∑ X i2
e) i=1 f) i=1 g) i=1 h)
5 5
( ∑ X i )( ∑ Y i )
i=1 i=1
Solutions:
5
∑ X i=5+ 7+7+6 +8=33
a) i=1
5
∑ Y i=6+ 7+8+7+ 8=36
b) i=1
5
∑ 10=5∗10=50
c) i=1
5
∑ ( X i +Y i )=( 5+6 )+( 7+7 )+(7 +8 )+( 6+7 )+( 8+8)=69=33+ 36
d) i=1
5
∑ ( X i −Y i )=( 5−6 )+( 7−7)+( 7−8)+( 6−7 )+( 8−8 )=−3
e) i=1
5
∑ X i Y i=( 5∗6 )+(7∗7 )+(7∗8 )+( 6∗7 )+( 8∗8 )=241
f) i=1
5
∑ X i2 =52 +7 2 +7 2 +62 +82 =223
g) i=1
5 5
( ∑ X i )( ∑ Y i )=33∗36=1188
h) i=1 i=1

3.4. Types of measures of central tendency

Measures of Central Tendency:- give us information about the location


of the center of the distribution of data values. A single value that
approximately describes the characteristics of the entire mass of data is
called measures of central tendency. We will discuss briefly the three
measures of central tendency: Mean, Median and Mode in this unit.

The following are types of measures of central tendency, which are suitable for a
particular type of data. These are:

4
Basic Statistics December 30, 1899
Mean (Arithmetic mean , Geometric mean and Harmonic
mean )
Mode or modal value
Median
The choice of these averages depends up on which best fit the property
under discussion.
3.4.1. Arithmetic Mean (AM)
Arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items. It is usually denoted by x .
Arithmetic Mean for individual series( for ungrouped Data)

Suppose x 1 , x 2 , … , x n are observed values in a sample of size n from a population of size N,


n < N then the arithmetic mean of the sample, denoted by x is given by
n

x = x 1+
x 2+¿ …+ x n
¿=
∑ xi
i=1
n
n
If we take an entire population, the mean is denoted by μ and it is given by:
N

μ= X 1 +
X 2 +¿…+ X N
¿=
∑ Xi
i=1
N
N
Where N stands for the total number of observations in the population.
Example: Consider the samples given below:
I. 46 54 21 35
II. 10.5 2.4 3.6 5.9 8.7
Find the arithmetic mean
Solution:
i. The sample values are: 46 54 21 35
n

x=∑ i =
x 46+54 +21+35 156
i=1 = = 39
4 4
n

The arithmetic mean for sample value is 39.


ii. The sample values are: 10.5 2.4 3.6 5.9 8.7

5
Basic Statistics December 30, 1899
n

x= ∑ xi = 10.5+2.4+3.6 +5.9+8.7 = 31.1 = 6.22


i=1
5 5
n
The arithmetic mean for sample value is 6.22.
Arithmetic mean for discrete data arranged in frequency distribution (ungrouped)
When the numbers x 1 , x 2 , … , x k occur with frequencies f 1 , f 2 , … , f k, respectively, then the
mean can be expressed in a more compact form as:
k

x1 f 1 + x 2 f 2 +…+ x k f k ∑
xi f i
i=1
x= = k
f 1 + f 2+ …+ f k
∑ fi
i=1

Example 1: Obtain the mean of the following number


2, 7, 8, 2, 7, 3, 7
Solution:
Xi fi Xifi
2 2 4
3 1 3
7 3 21
8 1 8
Total 7 36
4
∑ f i Xi
36
X̄ = i=14 = =5 . 15
7
∑ fi
i =1

Example 2: Calculate the arithmetic mean of the sample of numbers of students in 10


classes:
50 42 48 60 58 54 50 42 50 42
n

x = ∑ xi = 50+42+ 48+60+58+54 +50+ 42+50 +42 = 496 = 49.6 ≈ 50


i=1
10 10
n
In this case there are three 42’s, one 48, three 50’s, one 54, one 58 and one 60. The number
of times each number occurs is called its frequency and the frequency is usually denoted by f.
The information in the sentence above can be written in a table, as follows.
Value, xi 42 48 50 54 58 60
Frequency, fi 3 1 3 1 1 1
xi * fi 126 48 150 54 58 60

6
Basic Statistics December 30, 1899

The formula for the arithmetic mean for data of this type is
k

x1 f 1 + x 2 f 2 +…+ x k f k
∑ xi f i
i=1
x= =
f 1 + f 2+ …+ f k k

∑ fi
i=1

In this case, we have:


42 x 3+ 48 x 1+50 x 3+54 x 1+58 x 1+60 x 1 126+48+ 150+54+58+ 60 496
x = = = = 49.6
3+1+3+1+1+1 10 10
≈ 50
The mean numbers of students in ten classes is 50.

Arithmetic Mean for Grouped Continuous Frequency Distribution


If data are given in the form of continuous frequency distribution, the sample mean can be
computed as:
k

∑ CM i f i CM 1 f 1 +CM 2 f 2 +…+CM k f k
i=1
x = = where CM i is the class mark of the i th class;
k
f 1+ f 2 +…+ f k
∑ fi
i=1

i=1, 2, . . . , k , f i is the frequency of the ith class and k is the number of classes
k
Note that ∑ f i = n = the total number of observations.
i=1

Example 1: The following frequency table gives the height (in inches) of 100 students in a
college.

Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100

Calculate the mean


Solution:
The formula to be used for the mean is as follows:
k

∑ CM i f i
i=1
x= k

∑ fi
i=1

7
Basic Statistics December 30, 1899
Let us calculate these values and make a table for these values for the sake of convenience.

Class Interval (CI) 60-62 62-64 64-66 66-68 68-70 70-72 Total
Frequency (f) 5 18 42 20 8 7 100
Mid-Point (CM i) 61 63 65 67 69 71
f i CM i 305 1134 2730 1340 552 497 6558

6
Substituting these values with ∑ f i = 100, we get
i=1
k

∑ CM i f i 6558
i=1
x= =x= = 65.58
k
100
∑ fi
i=1

The mean height of students is 65.58

Example 2: calculate the mean for the following age distribution.


Class Frequency
6- 10 35
11- 15 23
16- 20 15
21- 25 12
26- 30 9
31- 35 6

Solutions:
1. First find the class marks(mid point)
2. Find the product of frequency and class marks(mid-point)
3. Find mean using the formula.
Class fi CMi CMi * fi
6 - 10 35 8 280
11 – 15 23 13 299
16 – 20 15 18 270
21 – 25 12 23 276
26 – 30 9 28 252
31 – 35 6 33 198
8
Total 100 1575
Basic Statistics December 30, 1899

6
∑ f i∗CM i
1575
X̄ = i=1 6
= =15 . 75
100
∑ fi
i =1

Exercises:
1. Marks of 75 students are summarized in the following frequency
distribution:
Marks 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74

No. of students 7 10 22 f4 f5 6 3
If 20% of the students have marks between 55 and 59
i. Find the missing frequencies f4 and f5.
ii. Find the mean.

Special properties of Arithmetic mean


1. The sum of the deviations of a set of items from their mean is always
n
∑ ( X i − X̄ )=0 .
zero. i.e. i=1

2. If
X̄ 1 is the mean of n1 observations

If
X̄ 2 is the mean of n2 observations

.
.
.
If
X̄ k is the mean of n k observations
Then the mean of all the observation in all groups often called the
combined mean is given by:
k

X̄ 1 n1 + X̄ 2 n 2 +. .. .+ X̄ k nk ∑ X̄ i ni
X̄ c= = i =1k
n1 +n 2 +.. . nk
∑ ni
i=1

Example: In a class, there are 30 females and 70 males. If females


averaged 60 in an examination and boys averaged 72, find the mean for
the entire class.
Solutions:

9
Basic Statistics December 30, 1899
Females Males
X̄ 1 =60 X̄ 2=72
n1 =30 n2 =70

X̄ 1 n1 + X̄ 2 n2
∑ X̄ i ni
X̄ c = = i =1
n 1 +n2 2
∑ ni
i =1
30 ( 60 )+70 ( 72 ) 6840
⇒ X̄ c = = =68 . 40
30 +70 100

3. If a wrong figure has been used when calculating the mean the
correct mean can be obtained without repeating the whole process
using:
(CorrectValue−WrongValue )
CorrectMean=WrongMean+
n
Where n is total number of observations.
Example: An average weight of 10 students was calculated to be 65.
Latter it was discovered that one weight was misread as 40 instead of
80 k.g. Calculate the correct average weight.
Solutions:
( CorrectValue −WrongValue )
CorrectMean=WrongMean +
n
( 80− 40 )
CorrectMean=65+ =65+ 4=69 k . g .
10
4. The sum of squares of deviations from the mean is the least comparing to other measure of

central tendencies. That is, is minimum when .


If the mean of x 1 , x 2 , … , x n is x , then

a) The mean of x 1 ± k, x 2 ± k ,..., x n ± k will be x ± k


b) The mean of kx 1 , kx 2 , … , kx n will be k x .
Merits and Demerits of Arithmetic Mean
Merits of Arithmetic Mean

10
Basic Statistics December 30, 1899

o Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite or unique. It can be calculated for any set of numerical data.

o It is calculated based on all observations.


o Arithmetic mean is simple to calculate and easy to understand.
o It is suitable for further mathematical treatment.
o It is stable average, i.e. it is not affected by fluctuations of sampling to some extent.
o It doesn’t need arrangement of data in increasing or decreasing order to calculate the results.
o Arithmetic mean of many samples from the same population does not fluctuate
considerably.

o It affords a good standard of comparison.

Demerits of Arithmetic Mean

o It can’t be calculated for data which are not quantifiable


o It is highly affected by extreme (abnormal) values in the series.
o It can be a number, which does not exist in the series.
o It can’t be calculated for grouped continuous in the case of open-ended classes.
o It can not be determined by the method of inspection.

11
Basic Statistics December 30, 1899

o Sometimes it leads to wrong conclusion if the details of the data from


which it is obtained are not available.
Weighted Mean
While calculating simple arithmetic mean, all items were assumed to be of equally
importance (each value in the data set has equal weight). When the observations have
different weight, we use weighted average. Weights are assigned to each item in proportion
to its relative importance.
If x 1 , x 2 , … , x n represent values of the items and w 1 , w 2 , … , w nare the corresponding
weights, then the weighted mean, ( x w ) is given by :

Example 1:
A student obtained the following percentage in an examination: English
60, Biology 75, Mathematics 63, Physics 59, and chemistry 55. Find the
students weighted arithmetic mean if weights 1, 2, 1, 3, 3 respectively are
allotted to the subjects.
Solutions:
5
∑ Xi W i
i=1 60∗1+75∗2+ 63∗1+59∗3+55∗3 615
X̄ w = 5
= = =61. 5
1+ 2+ 1+ 3+3 10
∑ Wi
i −1

Example 2: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively A, B, D and C. If the respective credits (weight) received for these courses are 4,
4, 3 and 2, determine the average grade the student has got for the course.
Solution
We use a weighted arithmetic mean, weight associated with each course being taken as the
number of credits received for the corresponding course.
xi 4 3 1 2 Total
wi 4 4 3 2 13

12
Basic Statistics December 30, 1899
x i wi 16 12 3 4 35

= 16+12+3+ 4 = 35 = 2.69
13 13

Average grade of the student is approximately 2.69.

Combined mean: When a set of observations is divided into k groups and x 1is the mean of
n1 observations of group 1, x 2 is the mean of n2 observations of group2, …, x k is the mean of
nk observations of group k, then the combined mean, denoted by x c, of all observations taken
together is given by

x x 1 n1+ x 2 n2 +…+ xk nk
c=¿ ¿
n1+n2+ …+nk

This is a special case of the weighted mean. In this case the sample sizes are the weights.

Example: In the Previous year there were two sections taking Statistics course. At the end of
the semester, the two sections got average marks of 70 & 78. There were 45 and 50 students
in each section respectively. Find the mean mark for the entire students.
Solution:

x x1 n1 + x 2 n 2 70 x 45+78 x 50 7050
c=¿
x 1 n1+ x 2 n2 +…+ xk nk
¿ = = = = 74.21
n1+n2+ …+nk n1 +n2 45+50 95

The combined mean of the entire students will be 74.21.

3.4.2. Geometric Mean

The geometric mean like arithmetic mean is calculated an average. It is used when observed
values are measured as ratios, percentages, proportions, indices or growth rates.

Geometric mean for individual series: The geometric mean, G.M. of an individual series of
positive numbers (> 0) x 1 , x 2 , … , x n is defined as the nth root of their product. If the
value of one observation is zero its values becomes zero.

13
Basic Statistics December 30, 1899
The geometric mean of X1, X2 ,X3 …Xn is denoted by G.M and given by:

G . M = √ X 1∗X 2∗.. .∗X n 1


n
=>
G . M =antilog (
n
∑ logxi )
 To show this, takes logarithms on both sides
1
log ( G. M )=log ( √ X 1∗X 2∗. ..∗X n )=log ( X 1∗X 2∗. . .∗X n )
n n

1 1
⇒ log ( G. M )= log ( X 1∗X 2∗.. . .∗X n )= ( log X 1 +log X 2 +.. .+ log X n )
n n
n
1
⇒ log ( G. M )= ∑ log X i
n i =1

⇒ The logarithm of the G.M of a set of observation is the arithmetic


mean of their logarithm.
n
1
⇒G . M = Anti log( ∑ log X i )
n i=1
Example 1:
Find the G.M of the numbers 2, 4, 8.
Solutions:
G . M = n√ X 1∗X 2∗.. .∗X n =3√ 2∗4∗8= 3√ 64=4

Example 2: Find the G.M of 3, 12


n
: G . M = √ X 1∗X 2∗.. .∗X n = 2
√ 3∗12=6
Solution

Properties of geometric mean


It is less affected by extreme values. E.g. x = 2, 5, 8, 72; Find and compare for Arithmetic
and geometric mean?
It takes each and every observation into consideration.

Geometric mean for discrete data arranged in FD: When the numbers x 1 , x 2 , … , x k
occur with frequencies f 1 , f 2 , … , f m, respectively, then the geometric mean is obtained by

1
= antilog ( n ∑ f i logx i) where n is sum of f i for all i.

Example 3.8: Compute the geometric mean of the following values: 3, 3, 4, 4, 4, 5, 6 and 6.

14
Basic Statistics December 30, 1899
Solution
Values 3 4 5 6
Frequency 2 3 1 2

G.M. = √8 32 X 4 3 X 51 X 62 = 4.236
The geometric mean for the given data is 4.236.

Geometric mean for continuous grouped FD: The above formula can also be used
whenever the frequency distribution is grouped continuous, class marks of the class intervals
are considered as xi.
3.4.3. Harmonic Mean
 It is a suitable measure of central tendency when the data pertains to speed,
rate and time.
 The harmonic Mean of n values is defined as n divided by the sum of
their reciprocal
Harmonic mean for individual series: If x 1 , x 2 , … , x n are n observations, then harmonic
mean can be represented by the following formula:

 Example 3.9 A car travels 25 miles at 25 mph, 25 miles at 50 mph, and 25 miles at 75 mph. Find
average mean ( the harmonic mean) of the three velocities.
 Solution

3
1 1 1
= + + = 40.9
25 50 75

Harmonic mean for discrete data arranged in FD: If the data is arranged in the form of
frequency distribution

, where

15
Basic Statistics December 30, 1899
Harmonic mean for continuous grouped FD: Whenever the frequency distribution are
grouped continuous, class marks of the class intervals are considered as x i and the above
formula can be used as
n
n
H.M. = f x i is the class mark of ith class
∑ x i where
i=1 i

Remark: The Harmonic Mean is useful and appropriate in finding average


speeds and average rates.
Properties of harmonic mean
 It is unique for a given set of data.
 It takes each observation into consideration.
 Difficult to calculate and understand.
 Appropriate measure of central tendency in situations where data is in time, speed or rate.
Relations among different means
 If all the observations are positive we have the relationship among the three means
given as: x ≥ GM ≥HM
 For two observations √ x∗HM ¿ GM
 x = GM = HM if all observation are positive and have equal value.
3.4.4. The Mode or modal value
The mode or the modal value is the value with the highest frequency and denoted by ^x . A
data set may not have a mode or may have more than one mode. A distribution is called a
bimodal distribution if it has two data values that appear with the greatest frequency. If a
distribution has more than two modes, then the distribution is multimodal. If a distribution
has no modes, then the distribution is no modal.

Mode of individual series: The mode or the modal value of individual series (raw data) is
simply obtained by locating the observation with the maximum frequency.

Example 3.12: Consider the following data:


A. 30 45 69 70 32 18 32. The mode ( ^x ) = 32.
B. 10 20 30 10 40 30. The mode ( ^x ) = 10 and 30.
C. 10 40 30 20 50 60. No mode.

Note that in some samples, there may be more than one mode or there may not be a mode.
The mode is not a suitable measure of central tendency in these cases. We use the mode as a
measure of central tendency if we require a measure that takes on one of the sample values.

16
Basic Statistics December 30, 1899
The mode can be used for variables that are measured on a category (nominal) scale, e.g. the
most popular computer type.

Mode for discrete data arranged in a frequency distribution:-In the case of discrete grouped
data, the mode is determined just by looking to that value (s) having the highest frequency.

Mode for Grouped Continuous Frequency Distribution

For grouped data, the mode is found by the following formula:

In such cases, one can only determine the modal class easily: the class with the highest
frequency.

After locating this class, the mode is interpolated using:

, where L = the lower class boundary of the modal class;

, , w = the common class width, = frequency of the class

immediately preceding the modal class; = frequency of the class immediately succeeding
the modal class; and fmode = frequency of the modal class.
Example: Calculate the mode for the frequency distribution of data of example 3.11.

Solution: By inspection, the mode lies in the third class, where L =10.5, fmod = 12, f1=8, f2=6,

w=5

Using the formula, the mode is:

= 10.5 + (12-8)*5/(12-8)+(12-5) = 12.5

Note: The modal class is a class with the highest frequency.

Example: Following is the distribution of the size of certain farms selected


at random from a district. Calculate the mode of the distribution.

17
Basic Statistics December 30, 1899

Size of farms No. of farms


5-15 8
15-25 12
25-35 17
35-45 29
45-55 31
55-65 5
65-75 3

Solution:
45−55 is the mod al class , sin ce it is a class with the highest frequency .
Lmo =45
w=10
Δ 1=f mo −f 1=2
Δ 2 =f mo −f 2=26
f mo=31
f 1=29
f 2=5

^ = 45+10 2
⇒X
2+26 ( )
= 45 . 71
Merits and Demerits of Mode
Merits:
 It is not affected by extreme observations.
 Easy to calculate and simple to understand.
 It can be calculated for distribution with open ended class
 We can change the size of the observations without changing the mode.
 It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.

Demerits:
 It is not rigidly defined.

18
Basic Statistics December 30, 1899

 It is not based on all observations


 It is not suitable for further mathematical treatment.
 It is not stable average, i.e. it is affected by fluctuations of sampling
to some extent.
 Mode may not exist in the series and if it exists it may not be unique.
Note: being the point of maximum density, mode is especially useful in
finding the most popular size in studies relating to marketing, trade,
business, and industry. It is the appropriate average to be used to find the
ideal size.

3.4.5. Median
◆ The median is as its name indicates the middle value in the
arrangement which divides the data into two equal parts.

◆ It is obtained by arranging the data in an increasing or decreasing


order of magnitude and denoted by ~ x.

◆ In a distribution, median is the value of the variable which divides it


in to two equal halves.
Median for individual series
We arrange the sample in ascending order of the variable of interest. Then the median is the
middle value (if the sample size n is odd) or the average of the two middle values (if the
sample size n is even).
For individual series the median is obtained by
A. ~ x = ¿ value if n is odd, and
th th
n n
( ) value+( +1) value
B. ~
x= 2 2 if n is even
2
Example 3.10: Find the median for the following data.
a. -5 15 10 5 0 2 1 4 6 and 8
b. 5 2 2 3 1 8 4

Solution;
i. The data in ascending order is given by:
-5 0 1 2 4 5 6 8 10 15
n=10 , n is even. The two middle values are 5th and 6th observations. So the median is,

19
Basic Statistics December 30, 1899
th th
10 10 th th
~ ( ) +( + 1) 5 +6 4 +5
x= 2 2 value = = =4.5
2 2
2

ii. The data in ascending order is given by:


1 2 2 3 4 5 8
The middle value is the 4th observation. So the median is 3.

Note: The median is easy to calculate for small samples and is not affected by an "outlier".

Median for Discrete data arranged in a frequency distribution: In this case, the median
is obtained by the above formula. After arranging the values in an increasing order find the
smallest CF greater than or equal to the rank/position of the median value (i.e., that value
obtained by a & b above formula) and the corresponding value is the median.
Median for grouped continuous data
If the data are given in the shape of continuous frequency distribution, the
median is obtained by the following formula:

Where: L= the lower class boundary of the median class;

w = the class width of the median class;

= the frequency of the median class; and the cum. freq. corresponding to the class
preceding the median class. That is, the sums of the frequencies of all classes lower than the
median class. Where the median class is the class which contains the (n/2) th observation
whether n is odd or even, since the items have already lost their originality once they are
grouped in to continuous classes.

Example 1: Calculate the median for the following frequency distribution.

C.I 1 - 5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

20
Basic Statistics December 30, 1899

Solution: Construct the less than cumulative frequency distribution, then:

C.I 1-5 6 - 10 11 – 15 16 – 20 21 - 25 26 - 30 31 - 35 Total

Freq. 4 8 12 6 3 4 3 40

Cuml. Freq. 4 12 24 30 33 37 40

Since n = 40, 40/2 = 20, and the smallest CF greater than or equal to 20 is 24; thus, the median

class is the third class. And for this class, L = 10.5, w = 5, =12, CF = 12. Then applying
the formula,

we get:

=10.5+ (20-12)*5/12=13.8

Example 2: Find the median of the following distribution.

Class Frequenc
y
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3

Solutions:
 First find the less than cumulative frequency.
 Identify the median class.
 Find median using formula.

21
Basic Statistics December 30, 1899

Class Frequen [Link](less than


cy type)
40-44 7 7
45-49 10 17
50-54 22 39
55-59 15 54
60-64 12 66
65-69 6 72
70-74 3 75

n 75
= =37 . 5
2 2
39 is the first cumulative frequency to be greater than or equal to 37 . 5
⇒ 50−54 is the median class .

Lmed =49 . 5 , w =5
n = 75 , c = 17 , f med = 22

~ w n
⇒ X = Lmed + (
f med 2
−c )

5
=49 . 5 + ( 37 . 5 −17 )
22
=54 . 16
Merits and Demerits of Median
Merits:
 Median is a positional average and hence not influenced by
extreme observations.
 Can be calculated in the case of open-end intervals.
 Median can be located even if the data are incomplete.
 It can be computed for ratio, interval, and ordinal level of data.
Demerits:

22
Basic Statistics December 30, 1899
 It is not a good representative of data if the number of items is
small.
 It is not amenable to further algebraic treatment.
 It is susceptible to sampling fluctuations.
 Its value is not determined by each & every observation.
 The arrangement of items in order of magnitude is sometimes
very tedious process if the number of items is very large.

The Relationship of the Mean, Median and Mode


Comparing the Mean, Median, and the Mode
 If the data is skewed –avoid the mean.

 If there is high gap around the middle- avoid the median.

 A measure is a resistant measure if its value is not affected by an


outlier or an extreme data value.

 The mean is not a resistant measure of central tendency because it


is not resistant to the influence of the extreme data values or
outliers.

 The median is resistant to the influence of extreme data values or


outliers and its value does not respond strongly to the changes of a
few extreme data values regardless of how large the change may
be.

 The mode has an advantage over both the mean and the median
when the data is categorical since it is not possible to calculate the
mean or median for this type of data. Also, the mode usually
indicates the location within a large distribution where the data
values are concentrated. However, the mode cannot always be
calculated because if a distribution has all different data values,
then the distribution is non modal.

 In the case of symmetrical distribution; mean, median and mode


coincide. That is mean=median = mode. However, for a

23
Basic Statistics December 30, 1899
moderately asymmetrical (non symmetrical) distribution, mean and
mode lie on the two ends and median lies between them and they
have the following important empirical relationship, which is

 Mean – Mode = 3(Mean - Median)


Example: In a moderately asymmetrical distribution, the mean and the
mode are 30 and 42 respectively. What is the median of the distribution?
Solution:
Median = (2mean + Mode)/2 = (2*30 + 42)/3 = 34
Hence, the median of the distribution is 34.

3.5. Measures of Positions


Quantiles
Median is the value of the middle item which divides the data in to two equal parts and found
by arranging the data in an increasing or decreasing order of magnitude, whereas quintiles
are measures which divides a given set of data in to approximately equal subdivision and are
obtained by the same procedure to that of median. They are averages of position. Some of
these are quartiles, deciles and percentiles.
[Link]:
 Quartiles are measures that divide the frequency distribution in to
four equal parts.
 The value of the variables corresponding to these divisions are
denoted Q1, Q2, and Q3 often called the first, the second and the third
quartile respectively.
 Q1 is a value which has 25% items which are less than or equal to it.
 Q2 has 50% items with value less than or equal to it , it is the same
as with median value
 Q3 has 75% items whose values are less than or equal to it.

24
Basic Statistics December 30, 1899
iN
 To find Qi (i=1, 2, 3) we count 4 of the classes beginning from the
lowest class.
Quartiles for Ungrouped data:

o Let be n ordered observations. Then ith quartile is

( )
th
i(n+1)
Qi = value /item, if n is odd , i = 1, 2, 3.
4

( ( ) ( ))
th
¿ + ¿ +1
Q i=
4 4 Value/item, if n is even, i = 1, 2, 3
2

Quartiles for Grouped data can be determined in the following


formula.
w in
Q =LQ + f ( 4 −c ) , i=1 ,2 ,3
i i Q i

Where :
LQ =lower class boundary of the quartile class .
i

w = the size of the quartile class


n = total number of observations .
c = the cumulative frequency ( less than type) preceeding the quartile class .
f Qi = thefrequency of the quartile class .

Remark:
The quartile class (class containing Q i) is the class with the smallest
in
cumulative frequency (less than type) greater than or equal to 4 .
3.5.2. Deciles:
o Deciles are measures that divide the frequency distribution in to ten equal
parts.

o The values of the variables corresponding to these divisions are denoted


D1, D2, …, D9 often called the first, the second,…, the ninth decile
respectively.
Deciles for Ungrouped data.

25
Basic Statistics December 30, 1899

o Let be n ordered observations. The ith deciles (D¿¿ i)¿ is

( )
th
i(n+1)
Di = value /item, if n is odd, i = 1, 2, 3, …, 9
10

( )
th

D i=
10 ( )(
¿ + ¿ +1
10 ) Value/item, if n is even , i = 1, 2, 3, …, 9
2

o To find Di ( i =1, 2, . . . ,9) we count 4¿ of the classes beginning from the


lowest class.
Deciles for grouped data we have the following formula.
w iN
D i =LD + ( −c ) , i=1 ,2 , .. . , 9
i f D 10
i

Where :
L D =lower class boundary of the decile class .
i

w = the size of the decileclass


N = total number of observations .
c = the cumulative frequency ( less than type) preceeding the decile class .
f D = thefrequency of the decile class .
i

Remark:
The decile class (class containing D i) is the class with the smallest
in
cumulative frequency (less than type) greater than or equal to 10 .
3.5.3. Percentiles:
o Percentiles are measures that divide the frequency distribution in to
hundred equal parts.
o The values of the variables corresponding to these divisions are denoted
P1, P2 ,… , P99 often called the first, the second,…, the ninety-ninth
percentile respectively.

Percentiles for Ungrouped data

Let be n ordered observations. The ith percentile(P¿¿ i)¿ is

( )
th
i(n+1)
Pi = value / item , if n is odd , i = 1, 2, 3, … ,99
100

26
Basic Statistics December 30, 1899

(( )
th

pi =
¿ + ¿ +1
100 100 )( ) value/item, if n is even , i = 1, 2, 3, …, 99
2
in
o To find Pi (i = 1, 2, . . . ,99) we count 100 of the classes beginning from
the lowest class.
o P50 =~x = Q2=D5 , all are equal in magnitude.
 Percentile for grouped data, we have the following formula.
w in
Pi =LPi + ( −c) , i=1 ,2 , . .. , 99
f Pi 100
Where:
L Pi =lower class boundary of the percentile class .
w = the size of the percentile class
N = total number of observations .
c = the cumulative frequency (less than type) preceeding the percentile class .
f P = thefrequency of the percentile class .
i

Remark:
The percentile class (class containing P i ) is the class with the smallest
in
cumulative frequency (less than type) greater than or equal to 100 .
Example, Find Median ,Q 1 , Q2 ,Q3 , D 2 , D5 , P 50, for the following data.
25, 38, 42 ,46 , 31, 29 ,21 , 9 , 5
Solution : n = 9 first arrange the data in order
5 , 9 , 21 , 25 , 29 , 31 , 38 , 42 ,46

( )
th
i(n+1)
Qi = value /item
4

=(
4 )
th
1(9+1)
Q1 value /item =( 2.5 )th item = 2nd item + 0.5 (3rd item -2nd item)

Q1=9+ 0.5 ( 21−9 )=9+6=15

( )
th
2(9+1)
o Q2 = value /item =( 5 )th item = 29
4

=(
4 )
th
3(9+ 1)
o Q3 value /item =( 7.5 )th item = 7th item + 0.5 (8th item -7th item)

27
Basic Statistics December 30, 1899
Q3=38+ 0.5 ( 42−38 )=38+2=40

( )
th
i(n+1)
Di = value /item
10

( )
th
2(9+1)
¿ D2 = value /item =2nd item = 9
10

( )
th
5(9+ 1)
D5 = value /item =5th item = 29
10

( )
th
i(n+1)
Pi = value /item
100

( )
th
50(9+ 1)
P50 = value /item = 5th item = 29
100

Example 2 find ¿ Median , D 2 ,¿ D7 , ¿ D5 ¿ Q2 ,¿ Q5 , ¿ p2 ,¿ p50 for the


following data.
25, 38, 42, 46, 50, 31, 29, 21, 9, 5
Solution :- n = 10, First arrange the in increasing order
5, 9 , 21 , 25, 29, 31, 38, 42, 46, 50

(( )
th

D i=
¿ + ¿ +1
10 10 )( ) Value/item
2

( ( )( ) )
th
2∗10 2∗10
( )
th
+ +1 2+3
10 10 value/item = value/item = 2.5thvalue/item =
D 2= 2
2
15
D7=40 Check ???

( )
th

Q i=
4 ( )(
¿ + ¿ +1
4 ) value/item
2

( ( )( ) )
th
2∗10 2∗10
+ +1
4 4 value/item = 5.5thvalue/item = 5nd item + 0.5 (6 th
Q 2=
2
item -5th item)
Q2 = 29+0.5(31-29) = 29+1= 30

28
Basic Statistics December 30, 1899
Example 1 Calculate Q1 , Q2 ,Q3 , D4 , D9 , P40 , P 50∧P 90for the following data
given on the table below.
X 10 11 12 13 14 15 16 17 18
F 2 8 25 48 65 40 20 9 2

Solution: The data is arranged in an increasing order. So we need to


construct the cumulative frequency table before calculating the required
values.
X 10 11 12 13 14 15 16 17 18
F 2 8 25 48 65 40 20 9 2
[Link](l 2 10 35 83 148 188 208 217 219
ess than
type)

The total number of observations is 219 which is odd. Clearly then the
median is 14. i.e.
~
x = ¿ = ¿value = 110th value = 14

( ) (
) value = 55 value = 13
th th
1 ( n+1 ) 1 (219+ 1 )
Q 1= value = th
4 4

¿ Q =(
4 ) ( ) value = 110 value = 14 = Median = ~x
th th
2(n+1) 2 ( 219+1 )
2 value = th
4

Q =(
4 ) ( ) value = 165 value = 15
th th
3(n+1) 3 ( 219+1 )
3 value = th
4

D =( ) value = ( ) value = 88 value = 14


th th
4( n+ 1) 4 (219+1) th
4
10 10

D =( ) value = ( ) value = 198 value = 16


th th
9(n+ 1) 9(219+ 1) th
9
10 10

P =( ) value = ( ) value = 88 value = 14


th th
40(n+1) 40 (219+1) th
40
100 100

29
Basic Statistics December 30, 1899

( ) ( ) value = 110
th th
50 (n+1) 50(219+1)
P50= value = th
value = 14
100 100

P =( ) value = ( ) value = 198


th th
90(n+1) 90(219+ 1)
90
th
value = 16
100 100
Example 2 : Considering the following distribution
Calculate:
a) All quartiles.
b) The 7th decile.
c) The 90th percentile.
d) 2nd Quartile, Median and 50th Percentile , What is you suggestion
based on the values?

Values Frequenc
y
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:
 First find the less than cumulative frequency.
 Use the formula to calculate the required quantile.

30
Basic Statistics December 30, 1899
Values Frequenc [Link](less than
y type)
140- 150 17 17
150- 160 29 46
160- 170 42 88
170- 180 72 160
180- 190 84 244
190- 200 107 351
200- 210 49 400
210- 220 34 434
220- 230 31 465
230- 240 16 481
240- 250 12 493

a) Quartiles:
i. Q1
- determine the class containing the first quartile.
n
=123. 25
4
⇒ 170−180 is the class containing the first quartile.
w n
⇒ Q 1= LQ + ( −c )
1 fQ 4
1

LQ =170 , w = 10 10
=170 + ( 123 . 25−88 )
1
72
n = 493 , c = 88 , f Q = 72 =174 . 90
1

ii. Q2
- determine the class containing the second quartile.
2∗n
=246.5
4
⇒ 190−200 is the class containing the sec ond quartile.

31
Basic Statistics December 30, 1899

LQ =190 , w = 10
2

n = 493 , c = 244 , f Q = 107


2

w 2∗n
⇒ Q 2= LQ + ( −c )
2 f Q2 4
10
=170+ ( 246 . 5−244 )
72
=190 . 23

iii. Q3
- determine the class containing the third quartile.
3∗n
=369. 75
4
⇒ 200−210 is the class containing the third quartile.

LQ =200 , w = 10
3

n = 493 , c = 351 , f Q3 = 49

w 3∗n
⇒ Q 3= LQ + ( −c )
3 f Q 4
3

10
=200 + ( 369 . 75 −351 )
49
=203 . 83

b) D7
- determine the class containing the 7th decile.
7∗n
=345 .1
10
⇒ 190−200 is the class containing the seventh decile.
L D =190 , w = 10
7

n = 493 , c = 244 , f D = 107


7

32
Basic Statistics December 30, 1899
w 7∗n
⇒ D 7 = LD + ( −c )
7 f D7 10
10
=190+ ( 345 . 1−244 )
107
=199 . 45

c) P90
- determine the class containing the 90 th percentile.

90∗n
=443 . 7
100
⇒ 220−230 is the class containing the 90th percentile .
L P =220 , w = 10
90

n = 493 , c = 434 , f P 90= 31

w 90∗n
⇒ P 90 =L P90 + ( −c )
f P90 100
10
=220+ ( 443 .7−434 )
31
=223 . 13

Exercise :- Complete the following frequency distribution?


Class Frequenc
limit y
11-20 12
21-30 30
31-40 f3
41-50 65
51-60 f5
61-70 25
71-80 18
Total 229

If the median value is 46, find


I) The missing frequencies?
II) Calculate the mean and second quartiles and interpret them?
III) Comment on the value of median and second quartiles?

33
Basic Statistics December 30, 1899
2. The first quartile for the following data is 21.5.
Class 10- 15- 20- 25- 30- 35- 40- 45- Total
15 20 25 30 35 40 45 50
frequen 24 f1 90 122 f5 56 20 33 460
cy

I) Find the missing frequencies


II) Find the modal value

34

You might also like