100% found this document useful (1 vote)
67 views8 pages

Central Tendency and Dispersion Measures

The document discusses various measures of central tendency and dispersion. It defines the mean, median and mode as the three main measures of central tendency. For measures of dispersion it discusses the range, interquartile range, mean absolute deviation, variance and standard deviation. It provides formulas and properties for calculating each of these measures for both ungrouped and grouped data. The mean, median and mode each have their own merits and demerits for describing the central or typical value in a data set. Measures of dispersion describe how spread out the values are from the central measure.

Uploaded by

mussaiyib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
67 views8 pages

Central Tendency and Dispersion Measures

The document discusses various measures of central tendency and dispersion. It defines the mean, median and mode as the three main measures of central tendency. For measures of dispersion it discusses the range, interquartile range, mean absolute deviation, variance and standard deviation. It provides formulas and properties for calculating each of these measures for both ungrouped and grouped data. The mean, median and mode each have their own merits and demerits for describing the central or typical value in a data set. Measures of dispersion describe how spread out the values are from the central measure.

Uploaded by

mussaiyib
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT – 2

MEASURES OF CENTRAL TENDENCY:


The central tendency of a variable means a typical value around which other values tend to concentrate; hence
this value representing the central tendency of the series is called measures of central tendency or average.
According to Clark, “Average is an attempt to find one single figure to describe whole of figures.”
The term central tendency refers to the "middle" value or perhaps a typical value of the data, and is measured
using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best
to use depends upon the situation. The commonly used averages are as follows:
• Arithmetic mean
• Median
• Mode
Properties of Best Average
It should be
• Rigidly defined.
• Based on all observations of the series.
• Easy to calculate and simple to understand.
• Capable of further algebraic treatment.
• Free from the extreme values (i.e., it should not be affected by extreme values).

Arithmetic Mean or Mean (X̅) : The most popular and widely used measure of representing the entire data
by one value is known as arithmetic mean. Its value is obtained by adding together all the items and by dividing
this total by the number of items.

Calculation of Arithmetic Mean :

Ungrouped Data Grouped Data

𝑀𝑒𝑎𝑛 (𝑋̅) = ∑𝑋̅/𝑁 𝑀𝑒𝑎𝑛 (𝑋̅) = ∑𝑓𝑋̅/𝑁

Shortcut Method Shortcut Method


𝑀𝑒𝑎𝑛 (𝑋̅) = 𝐴 + ∑𝑑/𝑁 𝑀𝑒𝑎𝑛 (𝑋̅) = 𝐴 + ∑𝑓𝑑/𝑁

Where A → Assumed Mean Where A → Assumed Mean


𝑑 = 𝑋̅ − 𝐴 𝑑 = 𝑋̅ − 𝐴
Step Deviation Method Step Deviation Method
𝑀𝑒𝑎𝑛 (𝑋̅) = 𝐴 + (∑𝑓𝑑/𝑁) ∗ ℎ
Not Applicable
Where
𝑑 = 𝑋̅ − 𝐴/ℎ
h→ Class Size (width of class interval)

Merits of Mean
1. It is rigidly defined.
2. It is easy to understand and easy to calculate.
3. If the number of items is sufficiently large, it is more accurate and more reliable.
4. It is a calculated value and is not based on its position in the series.
5. It is possible to calculate even if some of the details of the data are lacking.
6. Of all averages, it is affected least by fluctuations of sampling.
7. It provides a good basis for comparison.
Demerits of Mean

1. It cannot be obtained by inspection nor located through a frequency graph.


2. It cannot be in the study of qualitative phenomena not capable of numerical measurement i.e. Intelligence,
beauty, honesty etc.,
3. It can ignore any single item only at the risk of losing its accuracy.
4. It is affected very much by extreme values.
5. It cannot be calculated for open-end classes.
6. It may lead to fallacious conclusions, if the details of the data from which it is computed are not given.

Median (M) : Median is that value of the variable which divides the group into two equal parts, one part
comprising all values greater than, and the other all values less than the median. Median is also called 2 nd
quartile (Q2).

Calculation of Median :

Ungrouped Data Grouped Data

𝑀𝑒𝑑𝑖𝑎𝑛 = (𝑁 + 1)/2𝑡ℎ 𝑇𝑒𝑟𝑚 (If ‘N’ is odd) 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + [(𝑛/2 − 𝑐𝑓)/𝑓] ∗ ℎ


l →lower limit of median class
𝑀𝑒𝑑𝑖𝑎𝑛 = [𝑁/2𝑡ℎ 𝑡𝑒𝑟𝑚 + (𝑁/2 + 1)𝑡ℎ 𝑡𝑒𝑟𝑚]/2 cf→ Cumulative frequency of class before median
class
(If ‘N’ is even) f→frequency of median class
h→ class size
n→ no. of observations

Merits of Median

1. Median is not influenced by extreme values because it is a positional average.


2. Median can be calculated in case of distribution with open-end intervals.
3. Median can be located even if the data are incomplete.

Demerits of Median

1. A slight change in the series may bring drastic change in median value.
2. In case of even number of items or continuous series, median is an estimated value other than any value in
the series.
3. It is not suitable for further mathematical treatment except its use in calculating mean deviation.
4. It does not take into account all the observations.

Mode (Z) : Mode is the value that appears most frequently in a series i.e. it is the value of the item around
which frequencies are most densely concentrated.

Calculation of Mode :

Ungrouped Data Grouped Data

𝑀𝑜𝑑𝑒 = 𝑉𝑎𝑙𝑢𝑒 𝑡ℎ𝑎𝑡 ℎ𝑎𝑠 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦


Properties of Mode
1. It is used when you want to find the value which occurs most often.
2. It is a quick approximation of the average.
3. It is an inspection average.
4. It is the most unreliable among the three measures of central tendency because its value is undefined
in some observations.

Quartiles

The quartiles divide the distribution in four parts. There are three quartiles. The second quartile divides the
distribution into two halves and therefore is the same as the median. The first (lower).quartile (Q1) marks off
the first one-fourth, the third (upper) quartile (Q3) marks off the three-fourth. It may be noted that the second
quartile is the value of the median and 50th percentile.

Calculation of Quartile :

Ungrouped Data Grouped Data

First arrange the given data in the increasing order

Where l = lower limit of the quartile class


f = frequency of the quartile class
c = width of the quartile class
m = c.f. preceding the quartile class

Measures of Dispersion

Dispersion is a measure of the extent to which the individual item vary from a central value Dispersion is used
in two senses, (i) difference between the extreme items of the series and (ii) average of deviation of items
from the mean.
Absolute Measure : The figure showing the limit or magnitude of dispersion is known as absolute measure
and it is shown in the same unit as ;those of the original data, example measures of dispersion in the age of
students, their height, weight etc.
Relative Measure : For comparative study the concerning absolute measure is divided by the corresponding
mean or some other characteristic value to obtain a ratio or percentage, which is known as the relative measure.

Characteristics of an Ideal Measure of Dispersion

1. It should be rigidly defined.


2. It should be easy to understand and easy to calculate.
3. It should be based on all the observations of the data.
4. It should be easily subjected to further mathematical treatment.
5. It should be least affected by the sampling fluctuation .
6. It should not be unduly affected by the extreme values.

Common Measures of Dispersion

– The range
– The Interquartile range (IQR)
– Mean Absolute Deviation (MAD)
– Variance
– standard deviation

Range
The range is defined as the difference between the largest score in the set of data and the smallest score in
the set of data, XL - XS
• sensitive to extreme scores;
• compensate by calculating interquartile range (distance between the 25th and 75th percentile points)
which represents the range of scores for the middle half of a distribution
Usually used in combination with other measures of dispersion.

Advantages
• It is very easy to calculate.
• It is easy to understand.

Disadvantages
• It is affected by the extreme values.

Inter Quartile Range

The interquartile range or the quartile deviation is a better measure of variation in a distribution than the range.
Here, the middle 50 per cent of the distribution is used by avoiding the 25 per cent of the distribution at both
the ends. In other words, the interquartile range denotes the difference between the third quartile and the first
quartile.

Interquartile range = upper quartile - lower quartile


= Q3 - Q1

Quartile Deviation (QD) or Semi Inter Quartile Range


It is the measure based on the quartiles. It is the average of the difference between the third quartile and the
first quartile. Mathematically it can be expressed as:
QD = (Q3 – Q1)/2
Advantages
• It is also called the semi-inter-quartile range.
• It is easy to compute.
• It is not affected by the extreme values.
• It is used when the extreme values are thought to be unrepresentative.

Disadvantages
• It is not suited for further algebraic treatment.

NOTE: If the QDs of two samples are known, we cannot calculate the QD for the combined sample. That is,
composite QD cannot be evaluated.

Coefficient of Quartile Deviation


The coefficient of quartile deviation (CQD) is a relative measure of dispersion based on the quartile deviation.
It is defined as,

CQD = (Q3 – Q1)/(Q3 + Q1)


Moreover, it is a positional measure and is preferred when the distribution is badly skewed.

Mean Deviation
Mean deviation (MD) is the mean of the difference of each item in the distribution of its average (i.e., mean,
median, mode). Consider all the deviations as a positive one (i.e., consider the magnitude of the deviations).
Calculation of Mean Deviation

Ungrouped Data Grouped Data

Merits of Mean Deviation


1. A major advantage of mean deviation is that it is simple to understand and easy to calculate.
2. It takes into consideration each and every item in the distribution. As a result, a change in the value of any
item will have its effect on the magnitude of mean deviation.
3. The values of extreme items have less effect on the value of the mean deviation.
4. As deviations are taken from a central value, it is possible to have meaningful comparisons of the formation
of different distributions.

Limitations of Mean Deviation


1. It is not capable of further algebraic treatment.
2. At times it may fail to give accurate results. The mean deviation gives best results when deviations are taken
from the median instead of from the mean. But in a series, which has wide variations in the items, median is
not a satisfactory measure.
3. Strictly on mathematical considerations, the method is wrong as it ignores the algebraic signs when the
deviations are taken from the mean.

Variance

The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.

Calculation of Variance

Ungrouped Data Grouped Data


Population Variance Population Variance
𝝈𝟐 = ∑(𝑋̅ − µ)2/𝑁

Population Variance (Computation Formula)


𝝈𝟐 = [∑𝒙𝟐 − (∑𝒙)𝟐/𝑵]/𝑵 Population Variance (Computation Formula)

Sample Variance Sample Variance


𝒔𝟐 = ∑(𝑿 − 𝑿)𝟐/𝒏 − 𝟏

Sample Variance (Computational Formula)


𝑠2 = [∑𝑥2 − (∑𝑥)2/𝑛]/𝑛 − 1
where:
f = frequency
M = class midpoint
n = ∑f, or total of the frequencies of the sample
x̅ = grouped mean for the sample

Standard Deviation (SD)


It is the important absolute measure of dispersion. SD is going to reveal the scatteredness of the elements in
the given distribution. It is otherwise called ‘root mean square deviation’ from the mean. It is defined as,

Calculation of Standard Deviation


Ungrouped Data Grouped Data

Computation Formula
Computational Formula

𝜎 = √[∑𝑓𝑥2 − (∑𝑓𝑥)2/𝑁]/𝑁

Advantages
1. It is the most widely used measures of dispersion.
2. It possesses all the qualities needed for a good measure of dispersion.
3. It is not affected owing to sampling fluctuation.
4. It is the highly reliable measures of dispersion.
5. It includes all the data values and does not ignore positive and negative signs like the mean deviation.
6. It is a mathematically logical one and can be further treated mathematically.
7. It is a minimum when it is calculated from the arithmetic mean.
8. It has great practical utility in sampling, statistical inference, and fitting a normal curve.

Disadvantages
1. It is difficult to understand.
2. Its evaluation process is complicated.
3. It is not a helpful measure to compare the variability of 2 or more distributions given in different units.

SKEWNESS
A distribution of data in which the right half is a mirror image of the left half is said to be symmetrical.
Lack of symmetry is called Skewness. If a distribution is not symmetrical then it is called skewed distribution.
So, mean, median and mode are different in values and one tail becomes longer than other. The skewness may
be positive or negative.
Symmetrical Distribution
For a symmetrical distribution, the values, of equal distances on either side of the mode, have equal
frequencies. Thus, the mode, median and mean - all coincide.

Positively skewed distribution:


If the frequency curve has longer tail to right the distribution is known as positively skewed distribution and
Mean > Median > Mode.

Negatively skewed distribution:


If the frequency curve has longer tail to left the distribution is known as negatively skewed distribution and
Mean < Median < Mode.

Coefficient of Skewness
Statistician Karl Pearson is credited with developing at least two coefficients of skewness that can be used to
determine the degree of skewness in a [Link] present one of these coefficients here, referred to as a
Pearsonian coefficient of skewness. This coefficient compares the mean and median in light of the magnitude
of the standard deviation. Note that
if the distribution is symmetrical, the mean and median are the same value and hence the coefficient of
skewness is equal to zero.

Coefficient of Skewness (Sk) = [Mean − Mode]/SD

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 (𝑆𝑘) = 3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)/𝑆𝐷

Kurtosis
Kurtosis describes the amount of peakedness of a distribution. Distributions that are high and thin are referred
to as leptokurtic distributions. Distributions that are flat and spread out are referred to as platykurtic
distributions. Between these two types are distributions that are more “normal” in shape, referred to as
mesokurtic distributions.

1. If the curve is more peaked than a normal curve it is called Lepto Kurtic. In this case items are more
clustered about the mode.
2. If the curve is more flat-toped than the more normal curve, it is Platy-Kurtic.

3. The normal curve itself is known as Meso Kurtic.

Common questions

Powered by AI

Standard deviation is more reliable than mean deviation as it involves squaring differences, thereby emphasizing larger deviations and treating deviations as positive and negative values where mean deviation does not. It is mathematically logical, allowing for further calculations like variance and providing a comprehensive view of data variability by including all data points without ignoring algebraic signs .

Variance measures dispersion by calculating the average of the squared deviations from the mean, whereas standard deviation takes the square root of variance, providing a measure that is in the same units as the data. Variance offers a squared sense of data spread, while standard deviation gives a more practical metric that quantifies dispersion directly .

Kurtosis describes the 'peakedness' of a distribution curve. Leptokurtic distributions are high and thin, indicating clusters around the mode; platykurtic distributions are flat and spread, showing less clustering. Mesokurtic describes normal-curved distributions. Understanding kurtosis helps identify how data clusters around the central peak and tail behavior, aiding in interpreting the normality of data distribution .

The arithmetic mean, while a popular measure of central tendency, has several limitations: it cannot be obtained by inspection nor located through a frequency graph, is affected significantly by extreme values, cannot be used in the study of qualitative phenomena, and may lead to fallacious conclusions if the data details are incomplete. Moreover, it is unsuitable for open-end classes and can lose accuracy if any single item is ignored .

The coefficient of skewness quantifies the degree of asymmetry in a distribution around its mean. A skewness of zero indicates perfect symmetry. Positive skewness indicates a longer right tail (mean > median > mode), and negative skewness indicates a longer left tail (mean < median < mode). This measurement offers insights into the directional bias and outlier impact on a dataset .

In a positively skewed distribution, the long tail is on the right side, meaning that the mean is typically higher than the median, which is higher than the mode (Mean > Median > Mode). This happens because the mean is pulled towards the tail by the higher values, whereas the mode remains at the peak of the frequency distribution .

For ungrouped data, the median is calculated as the (N+1)/2th term if N is odd, or as the mean of the N/2 and (N/2 + 1)th terms if N is even. For grouped data, it involves a formula using the lower limit of the median class, the cumulative frequency of the class before the median class, the frequency of the median class, and the class size .

The interquartile range (IQR) is superior to the range as it is not affected by extreme values, providing a more robust measure of dispersion. It considers the middle 50% of the data, eliminating the influence of outliers that may distort the understanding of data spread, thus better representing the central tendency of a dataset .

Mode is considered the most unreliable measure of central tendency because its value can be undefined in certain observations; it is simply the most frequent value and may not exist or may not be unique. Additionally, it is least useful for further mathematical treatment and is only an approximation .

Mean deviation may fail to provide accurate results for data with wide variations because it only gauges average deviations from the central value, often median, which isn’t robust in such datasets. This method also ignores algebraic signs, potentially misleading interpretations when variations are volatile or include large outliers, making it less suitable than standard deviation .

You might also like