0% found this document useful (0 votes)
11 views19 pages

Measures of Central Tendency Explained

The document discusses descriptive statistics, focusing on measures of central tendency, which include mean, median, and mode. It outlines the characteristics of an ideal measure, provides formulas and examples for calculating these measures, and highlights their merits and demerits. Additionally, it explains the relationship between mean, median, and mode in statistical distributions.

Uploaded by

Hifa Islam
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Measures of Central Tendency Explained

The document discusses descriptive statistics, focusing on measures of central tendency, which include mean, median, and mode. It outlines the characteristics of an ideal measure, provides formulas and examples for calculating these measures, and highlights their merits and demerits. Additionally, it explains the relationship between mean, median, and mode in statistical distributions.

Uploaded by

Hifa Islam
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Descriptive Statistics

Measures of Central Tendency


The term ‘central tendency’ was coined because observations (numerical values) in most data
sets show a distinct tendency to group or cluster around a value of an observation located
somewhere in the middle of all observations. It is necessary to identify or calculate this typical
central value (also called average) to describe or project the characteristic of the entire data
set. This descriptive value is the measure of the central tendency or location and methods of
computing this central value are called measures of central tendency.

Requisites for an Ideal Measure of Central Tendency: According to Professor Yule, the
following are the characteristics to be satisfied by an ideal measure of central tendency:
1. It should be rigidly defined.
2. It should be readily comprehensible and easy to calculate.
3. It should be based on all the observations.
4. It should be suitable for further mathematical treatment.
5. It should be affected as little as possible by fluctuations of sampling.
6. It should not be affected much by extreme values.

The various measures of central tendency or averages commonly used can be broadly classified
in the following categories:

1. Mean or Mathematical Averages


(a) Arithmetic Mean commonly called the Average
(b) Geometric Mean
(c) Harmonic Mean
2. Median
3. Mode
1. Arithmetic Mean: Arithmetic mean of a set of observations is their sum divided by the
number of observations, e.g., the arithmetic mean 𝑥̅ of n observations x1, x2, …, xn is given
by:
𝑛
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 1
𝑥̅ = = ∑ 𝑥𝑖
𝑛 𝑛
𝑖=1
In case of frequency distribution (xi, fi), i=1,2,….,n, where fi is the frequency of the
variable xi:
𝑛 𝑛
𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ + 𝑓𝑛 𝑥𝑛 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 1
𝑥̅ = = 𝑛 = ∑ 𝑓𝑖 𝑥𝑖 , ∑ 𝑓𝑖 = 𝑁
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑛 ∑𝑖=1 𝑓𝑖 𝑁
𝑖=1 𝑖=1

In case of grouped or continuous frequency distribution, x is taken as the mid-value of the


corresponding class.

Example 1: In a survey of 5 textile companies, the profit (in Rs. lakh) earned during a year
was 15, 20, 10, 35, and 32. Find the arithmetic mean of the profit earned.
Solution:
5
1 15 + 20 + 10 + 35 + 32
𝑥̅ = ∑ 𝑥𝑖 = = 22.4
5 5
𝑖=1

Thus, the arithmetic mean of the profit earned by these companies during a year was Rs. 22.4
lakh.
Example 2: The number of new orders received by a company over the last 25 working days
were recorded as follows: 3, 0, 1, 4, 4, 4, 2, 5, 3, 6, 4, 5, 1, 4, 2, 3, 0, 2, 0, 5, 4, 2, 3, 3, 1.
Calculate the arithmetic mean for the number of orders received over all similar working days
Solution: The arithmetic mean is
1 3+0+1+⋯+3+1
𝑥̅ = 25 ∑25
𝑖=1 𝑥𝑖 = = 2.84 = 3 orders (approx.)
5

Alternatively, prepare the frequency table


No. of orders (xi) Frequency (fi) fixi
0 3 0
1 3 3
2 4 8
3 5 15
4 6 24
5 3 15
6 1 6
25 71
Arithmetic Mean,
1 71
𝑥̅ = 25 ∑7𝑖=1 𝑓𝑖 𝑥𝑖 = = 2.84 = 3 orders (approx.)
25

Example 3: Find the arithmetic mean for the following frequency distribution:
x 1 2 3 4 5 6 7
f 5 9 12 17 14 10 6
Solution:
x f fx
1 5 5
2 9 18
3 12 36
4 17 68
5 14 70
6 10 60
7 6 42
73 299
7
1 299
𝑥̅ = ∑ 𝑓𝑖 𝑥𝑖 = = 4.09
73 73
𝑖=1

Example 4: Calculate the arithmetic mean of the marks from the following table:
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of students 12 18 27 20 17 6

Solution:
Marks No. of students (f) Mid-points (x) fx
0 – 10 12 5 60
10 – 20 18 15 270
20 – 30 27 25 675
30 – 40 20 35 700
40 – 50 17 45 765
50 – 60 6 55 330
Total 100 2800
6
1 2800
𝑥̅ = ∑ 𝑓𝑖 𝑥𝑖 = = 28
100 100
𝑖=1

Merits:
(i) The calculation of arithmetic mean is simple and it is unique.
(ii) The calculation of arithmetic mean is based on all values given in the data set.
(iii) The arithmetic mean is reliable single value that reflects all values in the data set.
(iv) The arithmetic mean is least affected by fluctuations in the sample size. In other
words, its value, determined from various samples drawn from a population, varies
by the least possible amount.
(v) It can be readily put to algebraic treatment.
Demerits:
(i) Arithmetic mean cannot be located graphically.
(ii) The value of A.M. cannot be calculated accurately for unequal and open-ended class
intervals either at the beginning or end of the given frequency distribution.
(iii) The mean cannot be calculated for qualitative characteristics such as intelligence,
honesty, beauty, or loyalty.
(iv) Arithmetic mean cannot be obtained if a single observation is missing or lost.

Properties of A.M.
1. The sum of the deviations of the values from their A.M. is always zero, i.e., if x1,x2,…,xn
be the values then ∑(𝑥𝑖 − 𝑥̅ ) = 0 or ∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ ) = 0.
2. The sum of the squares of the deviations of a set of values is minimum when taken
about mean, i.e.
𝑛

𝑍 = ∑ 𝑓𝑖 (𝑥𝑖 − 𝐴)2
𝑖=1

Hence Z is minimum at the point 𝐴 = 𝑥̅ .


3. If 𝑥̅1 , 𝑥̅2 , … , 𝑥̅𝑘 be the A.M. of k-component series of sizes ni (i=1,2,…,k) respectively,
then the A.M. of the combined series is given by
𝑛1 𝑥̅1 + 𝑛2 𝑥̅2 + ⋯ + 𝑛𝑘 𝑥̅𝑘 ∑ 𝑛𝑖 𝑥̅𝑖
𝑥̅ = =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 ∑ 𝑛𝑖

Uses: AM is the most important, widely used and best measures of central tendency. In the
determination of average income, average price, average cost of production, average sales, etc.
i.e., those phenomena which are capable of direct quantitative measurement, AM is the most
appropriate measure.

2. Median: Median may be defined as the middle value in the data set when its elements are
arranged in a sequential order, that is, in either ascending or descending order of magnitude.
It is called a middle value in an ordered sequence of data in the sense that half of the
observations are smaller and half are larger than this value. The median is thus a measure
of the location or centrality of the observations.
In discrete case, the data is arranged in either ascending or descending order of magnitude.
(i) If the number of observations (n) is an odd number, then the median (Med) is represented
by the numerical value corresponding to the positioning point of (n + 1)/2 ordered observation.
That is,
𝑛+1
Med = Size or value of ( )th observation in the data array
2
(ii) If the number of observations (n) is an even number, then the median is defined as the
𝑛 𝑛
arithmetic mean of the numerical values of (2)th and (2 + 1)th observations in the data array.

That is,
𝑛 𝑛
𝑡ℎ 𝑜𝑏𝑠.+( +1)𝑡ℎ 𝑜𝑏𝑠.
Med = 2 2
2

Example 1: Number of insects per plant is given as follows: 17, 27, 30, 26, 24, 18, 19, 28, 23,
25, and 20. Find out the median value of number of insects per plant.
Solution: Let us arrange the data in ascending order of their values as follows: 17, 18, 19, 20,
23, 24, 25, 26, 27, 28, and 30. Here, we have 11 observations, so the middlemost observation
11+1
is the ( ) = 6th observation and the value of the sixth observation is 24. Hence, the
2

median value of number of insects per plant is 24.


Example 2: Calculate the median of the following data that relates to the number of patients
examined per hour in the outpatient ward (OPD) in a hospital: 10, 12, 15, 20, 13, 24, 17, 18.
Solution: Arranging the data in ascending order, we have
10 12 13 15 17 18 20 24
𝑛 𝑛
𝑡ℎ 𝑜𝑏𝑠.+( +1)𝑡ℎ 𝑜𝑏𝑠. 15+17
Median = A.M. of 2 2
= = 16.
2 2

Thus, median number of patients examined per hour in OPD in a hospital are 16.
Merits:
(i) Median is unique, i.e., like mean, there is only one median for a set of data.
(ii) The value of median is easy to understand and may be calculated from any type of
data.
(iii) The extreme values in the data set does not affect the calculation of the median.
(iv) The median is considered the best statistical technique for studying the qualitative
attribute of an observation in the data set.
(v) The median value may be calculated for an open-ended distribution of data set.
Demerits:
(i) The median is not capable of algebraic treatment. For example, the median of two
or more sets of data cannot be determined.
(ii) The value of median is affected more by sampling variations as compared with
mean.
(iii) Median is not based on all the observations.
(iv) Since median is an average of position, therefore arranging the data in ascending or
descending order of magnitude is time consuming in case of a large number of
observations.
Uses: The median is helpful in understanding the characteristic of a data set when
(i) observations are qualitative in nature
(ii) extreme values are present in the data set

3. Mode: The mode is that value of an observation which occurs most frequently in the data
set, that is, the point with the highest frequency. The concept of mode is of great use to
large scale manufacturers of consumable items such as ready-made garments, shoe-makers,
and so on. In all such cases it is important to know the size that fits most persons rather
than ‘mean’ size.
Mode of discrete series is the value which occurs most frequently.

Example 1: Consider the data set 1, 2, 3, 4, 4, 5. The mode for this data is 4 because 4 occurs
most frequently (twice) in the data.
Example 2: Look at the following discrete series:
Variable: 10 20 30 40 50
Frequency 2 8 20 10 5
Here the value of mode is 30.
Merits:
(i) Mode value is easy to understand and to calculate. Mode class can also be located
by inspection.
(ii) The mode is not affected by the extreme values in the distribution.
(iii) The mode value can also be calculated for open-ended frequency distributions.
(iv) The mode can be used to describe quantitative as well as qualitative data. For
example, its value is used for comparing consumer preferences for various types of
products, say cigarettes, soaps, toothpastes, or other products.
Demerits:
(i) Mode is not a rigidly defined measure as there are several methods for calculating
its value.
(ii) It is difficult to locate modal class in the case of multi-modal frequency
distributions.
(iii) Mode is not suitable for algebraic manipulations.
(iv) When data sets contain more than one modes, such values are difficult to interpret
and compare.
(v) Mode is a poor measure when most frequently occurring values of an observation
do not appear close to the centre of the data or not even a unique value.
Uses: Mode is mostly used in business and commerce. Meteorological forecasts are based on
mode.

Relationship between Mean, Median and Mode:


In a unimodal and symmetrical distribution, the values of mean, median, and mode are equal
as indicated in the following Figure. In other words, when all these three values are not equal
to each other, the distribution is not symmetrical or asymmetrical. For asymmetrical
distribution, Karl Pearson has suggested a relationship between these three measures of central
tendency as:
Mean – Mode = 3 (Mean – Median)
Or, Mode = 3 Median – 2 Mean
Partition Values: Quartiles, Deciles and Percentiles
The measures of central tendency which are used for dividing the data into several equal parts
are called partition values. A data set can be divided into four, ten, and hundred parts of equal
size and the corresponding partition values are called quartiles, deciles, and percentiles. All
these values can be determined in the same way as median. The only difference is in their
location.
Quartiles: The values of observations in a data set, when arranged in an ordered sequence, can
be divided into four equal parts, or quarters, using three quartiles namely Q1, Q2, and Q3. The
first quartile Q1 divides a distribution in such a way that 25% (=n/4) of observations have a
value less than Q1 and 75% (= 3n/4) have a value more than Q1. The second quartile Q2 has the
same number of observations above and below it. It is therefore same as median value. The
quartile Q3 divides the data set in such a way that 75% of the observations have a value less
than Q3 and 25% have a value more than Q3.
Deciles: The values of observations in a data set when arranged in an ordered sequence can be
divided into ten equal parts, using nine deciles, Di (i = 1, 2, . . ., 9).
Percentiles: The values of observations in a data when arranged in an ordered sequence can be
divided into hundred equal parts using ninety-nine percentiles, Pi (i = 1, 2, . . ., 99). In general,
the ith percentile is a number that has i% of the data values at or below it and (100 – i)% of the
data values at or above it. The lower quartile (Q1), median and upper quartile (Q3) are also the
25th percentile, 50th percentile and 75th percentile, respectively.

Measures of Dispersion:
Measures of central tendency helps to locate the centre of the distribution but they do not reveal
how the items are spread out on either side of the central value. The dispersion of values is
indicated by the extent to which these values tend to spread over an interval rather than cluster
closely around an average. Scatteredness of data about an average is termed as dispersion or
variation.

Example: Let us observe the following three series:


Series A: 40 40 40 40 40
Series B: 35 39 41 42 43
Series C: 10 18 35 57 80

All the series have the same mean 40. In the first series the mean fully represents the series and
the individual items. The data are not all scattered. In the second series, although the mean is
40, all the observations are not very much scattered as the minimum value of the series is 35
and the maximum value is 43. Hence in case of the second series also the mean is a good
representation of the series. In case of third series the observations are widely scattered.
Clearly, the mean does not satisfactorily represent the entire series. Thus, we have observed
that although all the above three series have the same average yet they widely differ from on
another in terms of their formation. Hence it is important for any investigation not only to know
the average of any type (mean, median or mode) but also the scatteredness of the various
observations of a distribution.
Requisites for a Measure of Dispersion:
The essential requisites for a good measure of variation are:
1. It should be rigidly defined.
2. It should be based on all the values in the data set.
3. It should be calculated easily, quickly, and accurately.
4. It should not be unduly affected by the fluctuations of sampling and also by extreme
observations.
5. It should be amenable to further mathematical or algebraic manipulations.

Classification of Measures of Dispersion:


The various measures of dispersion (variation) can be classified into two categories:
1. Absolute measures, and
2. Relative measures

Absolute measures are described by a number or value to represent the amount of variation or
differences among values in a data set. Such a number or value is expressed in the same unit
of measurement as the set of values in the data such as rupees, inches, feet, kilograms, or tonnes.

The relative measures are described as the ratio of a measure of absolute variation to an
average and is termed as coefficient of variation. The word ‘coefficient’ means a number that
is independent of any unit of measurement.
1. Range and its Coefficient:
The range is the simplest measure of dispersion and is based on the location of the largest and
the smallest values in the data. Thus, the range is defined to be the difference between the
largest and lowest observed values in a data set.
Range (R) = Highest value of an observation – Lowest value of an observation
R=H–L
The relative measure of range, called the coefficient of range is obtained by applying the
following formula:
𝐻−𝐿
Coefficient of range = 𝐻+𝐿.

Example 1: Weight of 7 students (in kg’s): 47, 50, 49, 70, 63, 55, 81. Calculate the range and
coefficient of range.
Solution: Highest observation, H = 81 and Lowest observation, L = 47
𝐻−𝐿 34
Range = H – L = 81 – 47 = 34 kg and Coefficient of range = 𝐻+𝐿 = 128.

Example 2: The following are the sales figures of a firm for the last 12 months. Calculate the
range and coefficient of range.
Months 1 2 3 4 5 6 7 8 9 10 11 12
Sales (Rs. ‘000) 80 82 82 84 84 86 86 88 88 90 90 92

Solution: Here, H = 92 and L = 80


𝐻−𝐿
Range = H – L = 12 (in ‘000 rupees) and Coefficient of range = 𝐻+𝐿 = 0.069.

Example 3: The following table gives the marks distribution of students in an examination.

Marks 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
No. of Students 7 10 12 13 8

Solution: Here H = 60 and L = 10


Range = 50 marks and Coefficient of range = 0.71.
Merits:
(i) Range is rigidly defined, very simple to calculate and easy to understand.
(ii) It is independent of the measure of central tendency.
(iii)It is quite useful in cases where the purpose is only to find out the extent of extreme
variation.
Demerits:
(i) Range is not capable of further mathematical treatment.
(ii) It is largely influenced by two extreme values and completely independent of the other
values.
(iii)Range is much affected by fluctuation of sampling.
(iv) It cannot be used in case of open-end distributions.

2. Quartile Deviation and its Coefficient:


The difference between the 1st quartile (Q1) and the third quartile (Q3) is known as interquartile
range and half of the interquartile range is known as quartile deviation (Q.D.) or semi-
interquartile range.
𝑄3 −𝑄1
Q.D. = 2

𝑄 −𝑄
Coefficient of Q.D. = 𝑄3+𝑄1
3 1

Merits:
(i) It is rigidly defined and not difficult to calculate.
(ii) It measures the spread over the middle half of the values, minimises the influence of
outliers.
(iii)It is the most appropriate measure in open ended class interval.
(iv) It is useful in case of erratic or highly skewed distribution.

Demerits:
(i) Q.D. is based on only 50% of the observations of a distribution.
(ii) It is much affected by fluctuations of sampling.
(iii)It is not amenable for further mathematical treatment.
(iv) Q.D. does not show the variation of the data about its central value.

3. Mean Deviation and its Coefficient:


The mean deviation (M.D.) or mean absolute deviation (M.A.D.) of a set of observations from
its central value (Mean, Median or Mode) is defined as:

The arithmetic mean of the absolute deviations of the observations of a distribution from its
mean or median is known as mean deviation.

If a variable takes n values x1, x2, …, xn, then


|𝑥1 −𝐴|+|𝑥2 −𝐴|+⋯+|𝑥𝑛 −𝐴| 1
M.D. = = ∑𝑛𝑖=1 |𝑥𝑖 − 𝐴|
𝑛 𝑛

Again, if the frequencies of x1, x2, …, xn are f1, f2, …, fn respectively then

𝑓1 |𝑥1 −𝐴|+𝑓2 |𝑥2 −𝐴|+⋯+𝑓𝑛 |𝑥𝑛 −𝐴| 1


M.D. = = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 |𝑥𝑖 − 𝐴|
𝑁

Here A = mean or median. By mean ‘A’ may be A.M., G.M. or H.M. But usually, A is taken
as A.M.

Generally, median is preferred because the sum of the absolute values of the deviations from
median is smaller than that from any other value.
𝑀.𝐷.
Coefficient of M.D. = 𝑥̅ 𝑜𝑟 𝑀𝑒

Merits:
(i) It is rigidly defined. Based on all observations and easy to compute.
(ii) M.D. is less affected by extreme values.
(iii)Average deviation from mean is always zero in any data set. The M.D. avoids this
problem by using absolute values to eliminate the negative signs.

Demerits:
(i) It is not capable of further mathematical treatment.
(ii) It cannot be computed in case of open-end distributions.
(iii)The algebraic signs are ignored while calculating M.D.
(iv) Mean deviation from mode is not considered to be a good measure of dispersion.

4. Standard Deviation, Variance and Coefficient of Variation:


The arithmetic mean of the squares of the deviations of the values of a variable from its
arithmetic mean is called the variance of that variable denoted as σ2 and the positive square
root of variance is called the standard deviation (S.D.) and is denoted as σ.

If a variable x takes values x1, x2, …, xn and if 𝑥̅ be the arithmetic mean of these values then
𝑛
2
(𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2 + ⋯ + (𝑥𝑛 − 𝑥̅ )2 1
𝜎 = = ∑(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛
𝑖=1

and

𝑛
(𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2 + ⋯ + (𝑥𝑛 − 𝑥̅ )2 1
𝜎=√ = √ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛 𝑛
𝑖=1
Again, in case of frequency distribution, i.e. if the frequencies of x1, x2, …, xn are f1, f2, …, fn
respectively then
𝑛
2
𝑓1 (𝑥1 − 𝑥̅ )2 + 𝑓2 (𝑥2 − 𝑥̅ )2 + ⋯ + 𝑓𝑛 (𝑥𝑛 − 𝑥̅ )2 1
𝜎 = = ∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑁 𝑁
𝑖=1

and

𝑛
𝑓1 (𝑥1 − 𝑥̅ )2 + 𝑓2 (𝑥2 − 𝑥̅ )2 + ⋯ + 𝑓𝑛 (𝑥𝑛 − 𝑥̅ )2 1
𝜎=√ = √ ∑ 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑁 𝑁
𝑖=1

Working Formula:
1 1
𝜎 2 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖 2 − 𝑥̅ 2 and 𝜎 = √𝑛 ∑𝑛𝑖=1 𝑥𝑖 2 − 𝑥̅ 2

and for frequency data


1 1
𝜎 2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − 𝑥̅ 2 and 𝜎 = √𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − 𝑥̅ 2

Merits:
(i) It is rigidly defined and the value of standard deviation is based on all observations in
a set of data.
(ii) It is the only measure of variation capable of algebraic treatment.
(iii)It is less affected by fluctuations of sampling as compared to other measures of
variation.
Demerits:
(i) It is difficult to calculate as compared to other measures of dispersion.
(ii) In comparison to mean deviation it is more affected by extreme values.
(iii)It cannot be computed in case of open-end distribution.

However, by studying the relative merits and demerits of all the measures of dispersion,
standard deviation is found to be the best of all the measures of dispersion and hence it is the
most widely used technique of dispersion. Thus, standard deviation is an ideal measure of
dispersion.

Mathematical Properties of S.D.

1. S.D. is independent of the change of origin but dependent on the change of scale.
Let σx be the S.D. of the variable x taking values x1, x2, …, xn. If change of origin and
scale are affected simultaneously, then the new variable ‘v’ be given as:
𝑥−𝐴
𝑣=

and we have 𝜎𝑥 = ℎ𝜎𝑣
2. S.D. of combined distribution: let a distribution having n1 observations has mean 𝑥̅1
and S.D. σ1, and let another distribution having n2 observations has mean 𝑥̅2 and S.D.
σ2. Then the S.D. σ of the distribution obtained by combining the two distributions
having altogether (n1+n2) observations is given by:

𝑛1 𝜎1 2 + 𝑛2 𝜎2 2 + 𝑛1 𝑑1 2 + 𝑛2 𝑑2 2
𝜎=√
𝑛1 + 𝑛2

where 𝑑1 = 𝑥̅1 − 𝑥̅ , 𝑑2 = 𝑥̅ 2 − 𝑥̅
𝑛1 𝑥̅ 1 +𝑛2 𝑥̅ 2
and 𝑥̅ = , i.e., 𝑥̅ is the mean of the combined distribution.
𝑛1 +𝑛2

5. Coefficient of Variation (C.V.):


Standard deviation is an absolute measure of variation and expresses variation in the same unit
of measurement as the arithmetic mean or the original data. A relative measure called the
coefficient of variation (C.V.), developed by Karl Pearson is very useful measure for
(i) comparing two or more data sets expressed in different units of measurement, or
(ii) comparing data sets that are in same unit of measurement but the mean values of
data sets in a comparable field are widely dissimilar.

Thus, C.V. measures the standard deviation relative to the mean in percentages. In other words,
CV indicates how large the standard deviation is in relation to the mean and is computed as
follows:
𝜎
𝐶. 𝑉. = × 100
𝑥̅
Multiplying by 100 converts the decimal to a percent.

While comparing the variability between distributions, the distribution having minimum C.V.
is said to be less variable, more stable, more uniform, more consistent or more homogeneous.

Relationship between Different Measures of Variation:

It can be verified empirically that for a distribution the following relationship among Q.D.,
M.D. and S.D. approximately holds. However, in case of symmetrical distribution, this
relationship is exact.

6 Q.D. = 5 M.D. = 4 S.D.


Example 1: The following data represents the number of cars entering a gas station between 10 a.m.
and 11 a.m. in a city for repairs during last 8 days of a month.
7 8 6 8 9 7 5 6
Calculate the standard deviation for these data.

Solution: Here the A.M. is


7+8+⋯+6
𝑋̅ = =7
8

And the value of S.D. is


(7−7)2 +(8−7)2 +⋯+(6−7)2 12
𝜎=√ 8
= √ 8 = 1.22

Or,
1 72 +82 +⋯+62 404
𝜎 = √𝑛 ∑𝑛𝑖=1 𝑥𝑖 2 − 𝑥̅ 2 = √ 8
− 72 = √ 8
− 49 = √1.5 = 1.22

Example 2: The following data gives the heights (in cms.) of 100 plants. Calculate the value of variance.

Height in cms. 60 61 62 63 64 65 66 67 68
No. of Plants 2 0 15 29 25 12 10 4 3

Solution: To calculate the variance the following table is formed:

X f fx fx2
60 2 120 7200
61 0 0 0
62 15 930 57660
63 29 1827 115101
64 25 1600 102400
65 12 780 50700
66 10 660 43560
67 4 268 17956
68 3 204 13872
Total 100 6389 408449

6389
Here 𝑋̅ = 100 = 63.89 cms.

408449
and 𝜎2 = − 63.892 = 2.56 cms.
100
Example 3: A study of 100 engineering companies gives the following information:

Profit (Rs. In crore) 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60


No. of Companies 8 12 20 30 20 10

Calculate the standard deviation of the profit earned.

Solution: To calculate the value of S.D. the following table is formed:

Profit Mid-value f fx fx2


(Rs. in crore) X
0 – 10 5 8 40 200
10 – 20 15 12 180 2700
20 – 30 25 20 500 12500
30 – 40 35 30 1050 36750
40 – 50 45 20 900 40500
50 – 60 55 10 550 30250
Total 100 3220 122900

3220
Here 𝑋̅ = 100 = 32.2 Rs. in crore

122900
𝜎2 = − 32.22 = 192.16 Rs. in crore
100

and 𝜎 = √192.16 = 13.86 Rs. in crore.

Example 4: calculate A.M., S.D. and C.V. from the following data.

Age (year) 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70 70 – 80 80 – 90
No. of Persons 3 61 132 153 140 51 2

Solution:

Age Mid-value (X) f fx fx2


20 – 30 25 3 75 1875
30 – 40 35 61 2135 74725
40 – 50 45 132 5940 267300
50 – 60 55 153 8415 462825
60 – 70 65 140 9100 591500
70 – 80 75 51 3825 286875
80 – 90 85 2 170 14450
Total 542 29660 1699550
29660
Here 𝑋̅ = = 54.72 years
542

1699550
𝜎2 = 100
− 54.722 = 141.07 years

𝜎 = √141.07 = 11.88 years

11.88
and 𝐶. 𝑉. = 54.72 × 100% = 21.74%

Example 5: The number of employees, wages per employee and the variance of the wages of employees
for two factories are given below:
No. of employees Factory A Factory B
Avg. wages per employee per month 1200 850
Variance of the wages (Rs.) 81 256
In which factory is there greater variation in the distribution of wages per employee?

√81
Solution: C.V. for factory, A: 𝐶. 𝑉.𝐴 = 1200 × 100% = 0.75%

√256
C.V. for factory, B: 𝐶. 𝑉.𝑩 = 850
× 100% = 1.88%

Since co-efficient of variation is more for firm B (C.V.B > C.V.A), it shows greater variation in the
distribution of wages per employee in the firm B.

Example 6: Two teams of football players A and B gave goals as shown below. Determine with the
help of C.V. the team that is more consistent.
No. of goals No. of games
A B
0 27 17
1 9 9
2 8 6
3 5 5
4 4 3
Solution:

No. of Team A Team B


goals (X) f1 f 1X f1x 2
f2 f2x f2x2
0 27 0 0 17 0 0
1 9 9 9 9 9 9
2 8 16 32 6 12 24
3 5 15 45 5 15 45
4 4 16 64 3 12 48
Total 53 56 150 40 48 126
Here,
56
𝑋̅𝐴 = 53 = 1.057

150
𝜎𝐴 = √ 53 − 1.0572 = 1.309

1.309
𝐶. 𝑉.𝐴 = 1.057 × 100% = 123.84%

48
and 𝑋̅𝐵 = = 1.2
40

126
𝜎𝐵 = √ 40 − 1.22 = 1.308

1.308
𝐶. 𝑉.𝐵 = 1.2
× 100% = 109%

Since C.V.B < C.V.A, team B is more consistent than team A.

**************************************************************************

GLOSSARY:
• Arithmetic Mean: Arithmetic mean of a set of observations is their sum divided by the
number of observations
• Median: Median may be defined as the middle value in the data set when its elements
are arranged in a sequential order
• Mode: The mode is that value of an observation which has the highest frequency.
• Quartiles: The values of observations in a data set, when arranged in an ordered
sequence, can be divided into four equal parts.
• Deciles: The values of observations in a data set, when arranged in an ordered sequence,
can be divided into ten equal parts.
• Percentiles: The values of observations in a data set, when arranged in an ordered
sequence, can be divided into hundred equal parts
• Dispersion: Dispersion is a measure of the degree to which numerical data tends to
spread about an average value.
• Absolute measures: Absolute measures of dispersion are those which measures the
variability of data in original units of measurement.
• Relative measures: Relative measures of dispersion are those which measures the
variability in relation to an appropriate average. They are free from units of
measurement.
• Range: The range is defined to be the difference between the largest and lowest
observed values in a data set.
• Standard deviation: The square root of the arithmetic mean of the squares of the
deviations of the values of a variable from its arithmetic mean is called the standard
deviation of that variable.

REFERENCES
• S. C. Gupta and V. K. Kapoor. Fundamentals of Mathematical Statistics. Sultan Chand
& Sons, New Delhi.
• A. M. Gun, M. K. Gupta and B. Dasgupta. Fundamentals of Statistics (Volume One).
The World Press Private Ltd, Kolkata.
• J. K. Sharma. Fundamentals of Business Statistics. Dorling Kindersley (India) Pvt. Ltd.

You might also like