Measure of Central Tendency and
Dispersion
Week 2
Reference Books:
Statistics for Management, 7thEdition, by Levin & Rubin, Pearson Education Ltd.
Statistics
• Every organization is collecting and using data to develop knowledge and intelligence that will help
people make informed decisions and to track the implementation of their decisions.
• A good working knowledge of statistics is useful for summarizing and organizing data to provide
information that is useful and supportive of decision making.
• Statistics is used to make valid comparisons and to predict the outcomes of decisions.
• Two major categories of statistics are: Descriptive Statistics and Inferential Statistics
[Link]
Level Of Measurement
• We have studied in the previous module that variables can be categorized as Numerical
(Quantitative) or Categorial (Qualitative) .
• We now need to know, how are these variables measured.
Level Of Measurement
• Levels of measurement, also called scales of measurement, tell you how precisely variables are
recorded. In scientific research, a variable is anything that can take on different values across your
data set (e.g., height or test scores)..
• There are 4 levels of measurement:
1. Nominal the data can only be categorized
2. Ordinal the data can be categorized and ranked
3. Interval the data can be categorized, ranked, and evenly spaced
4. Ratio the data can be categorized, ranked, evenly spaced, and has a natural zero.
Summarizing Categorical Data
• Categorical data is basically data in which individuals are placed into groups or categories – for
examples gender, region, or type of movie.
• When summarizing categorical data, all of the information is condensed into a little number that
serves as a fundamental narrative.
• Because categorical data involves pieces of data that belong in categories, you have to look at how
many individuals fall into each group and summarize the numbers appropriately.
Counting on the Frequency
• One way to summarize categorical data is to simply count, or tally up, the number of individuals that
fall into each category.
• The number of individuals in any given category is called the frequency (or count) for that category.
If you list all the possible categories along with the frequency for each, you create a frequency
table.
• The total of all the frequencies should equal the size of the sample (because you place each
individual in one category).
Summarizing Categorical Data
Example of summarizing data by a frequency table
Suppose that you take a sample of 10 people and ask them all whether they own a cellphone. Each
person falls into one of two categories: yes or no. The data is shown in the following table.
Below is the summarized data in terms of frequency and percentage
[Link]
Frequency Distribution for Discrete Variables
A frequency distribution, organizes data points into specified ranges, allowing for an easy
understanding of how often each value occurs. This technique is vital for identifying patterns, trends,
and potential outliers, providing deeper insights into the data.
Example: Done in Excel
In 55 basketball matches, the number of points Kruti scored is recorded as shown below
Create a frequency distribution and identify which class interval she has scored maximum. Find the
relative frequency
Cumulative Frequency Distribution
• A variation of the frequency distribution that provides another tabular summary of quantitative
data is the cumulative frequency distribution, which uses the number of classes, class widths, and
class limits developed for the frequency distribution.
• However, rather than showing the frequency of each class, the cumulative frequency
distribution shows the number of data items with values less than or equal to the upper
class limit of each class.
Examples in excel sheet
Summary Statistics
• Summary statistics help in understanding the characteristics of a dataset
• The following are the two types of summary statistics that are widely used by decision makers
1. Central tendency: Central tendency measures represent the central of middle point of a data
distribution
• Other name for the measures of central tendency is the measures of location
2. Dispersion: Distribution characterizes the spread of the data in a distribution
• Distribution shows the extent to which the observations are scattered, usually around the central
point
Measure of Central Tendency
• A measure of central tendency identifies the center point of a data distribution.
• The central tendency measures are easy to compute and apply.
• They are the most widely used statistical measures.
• The measures of central tendency are
• Mean or average of a data set
• Median or the midpoint of a data array
• Mode or the most frequent observation(s) in a data set
• The selection of the measure of to compute central tendency depends on the organization
of data.
Measure of Central Tendency
The Mean (Arithmetic) :
• The most common way to compute mean or average is the arithmetic mean.
• The arithmetic mean is the computed by dividing the sum the values in a distribution by the
number of observations.
The formula for mean when frequency is given :
Measure of Central Tendency
The Mean (Arithmetic) :
Measure of Central Tendency
The Median:
• The median represents the central item in a data distribution.
• It is a single item, the middlemost or most central item in the set of numbers.
• One half each of the items lie above and below median.
• Unlike mean, median is not affected by the extreme values.
The Mode:
• The mode is the most repeated element (value) in the data set.
Measure of Central Tendency
Comparing Mean , Median and Mode:
• While working on statistical problems, we have to decide which measure of central
tendency is appropriate for the study.
• The decision depends on the nature of the data distribution.
• If data has symmetrical distributions, then its mean, median, and mode are same.
• We need not choose the measure of central tendency because the choice has been made
• for us
Measure of Central Tendency
Comparing Mean , Median and Mode:
• In case of a positively skewed frequency distribution, the mean is always greater than
median and the median is always greater than the mode.
• In case of a negatively skewed frequency distribution, the mean is always lesser than
median and the median is always lesser than the mode.
Mean > Median > Mode Mean < Median < Mode
Measure of Dispersion
Why Dispersion is important? Importance of dispersion:
• Dispersion gives additional information that
enables one to judge the reliability of the
measure of central tendency.
• In case widely dispersed data, the central
tendency measure has lesser representative of
the data as a whole.
• The problems peculiar to widely dispersed data
can be tackled only after the dispersion is
recognized.
• Understanding of the dispersion may be used to
compare various samples.
Measure of Dispersion
The dispersion is measured in the following ways
[Link] as the difference between two values selected from the data set
• Range, interquartile range
2. Dispersion measured as the average deviation from some measure of central tendency
•Variance, standard deviation
Measure of Dispersion
Range:
• The range is the difference between highest and lowest values in the dataset
• Usually written as [lowest value, highest value]
Range = Highest value – Lowest value
• However, the usefulness of the range is limited
• The range considers only the highest and lowest values of a distribution and ignores the
remaining values in the distribution
• It does not account for the nature of the variation among all other variables
Measure of Dispersion
Quartile :
• A quartile is a statistical term that describes a division of observations into four defined
intervals based on the values of the data and how they compare to the entire set of
observations.
• Quartiles are organized into lower quartiles, median quartiles, and upper quartiles.
• When the data points are arranged in increasing order, data are divided into four sections
of 25% of the data each.
Measure of Dispersion
Interquartile :
• Quartiles are also used to calculate the interquartile range, which is a measure of
variability around the median.
• The interquartile range is simply the range between the first and third quartiles.
Interquartile range = Q3–Q1
Interpretation:
• If the data point for Q1 is farther away from the median than Q3 is from the median, then
you can say there is a greater dispersion among the smaller values of the dataset than
among the larger values. The same logic applies if Q3 is farther away from Q2 than Q1 is
from the median. This is called quartile skewness.
• The best way is to use a spreadsheet and the QUARTILE function. For example, the
function "=QUARTILE(A1:A53,1)" returns the first (lower) quartile of your dataset.
Measure of Dispersion
BOXPLOT: A box and whisker plot or diagram (otherwise known as a boxplot), is a graph
summarizing a set of data. The shape of the boxplot shows how the data is distributed and it also
shows any outliers.
• The line splitting the box in two represents the
median value. This shows that 50% of the data
lies on the left hand side of the median value
and 50% lies on the right hand side.
• The left edge of the box represents the lower
quartile; it shows the value at which the
first 25% of the data falls up to. The right edge
of the box shows the upper quartile; it shows
that 25% of the data lies to the right of the
upper quartile value.
• The values at which the horizontal lines stop at
are the values of the upper and lower values of
the data.
• The single points on the diagram show the
outliers.
[Link] .
Measure of Dispersion
Variance:
• Variance is the mean squared dispersion (differences) of all data points in a data set from its
mean.
• A population has a variance and is symbolized by (sigma squared)
• Variance is calculated as
• It is always non-negative since each term in the variance sum is squared and therefore the result is
either positive or zero.
Measure of Dispersion
Standard Deviation:
• The standard deviation is defined as the deviation of the values or data from an average mean.
Lower standard deviation concludes that the values are very close to their average. Whereas
higher values mean the values are far from the mean value. It should be noted that the standard
deviation value can never be negative.
Measure of Dispersion
• Bell Shaped Curve (Normal Distribution):
1. About 68 percent of the values in the population will fall within ±1 standard deviation from the
mean.
[Link] 95 percent of the values will lie within ±2 standard deviations from the mean.
3. About 99 percent of the values will be in an interval ranging from 3 standard deviations below the
mean to 3 standard deviations above the mean.
Measure of Dispersion
Coefficient Of Variation
Can You compare the standard deviation of two datasets?
The standard deviation is an absolute measure of dispersion that computes variation in the same
units as the original data
• If the units of two datasets differs, then their standard deviations are not compatible
• Considering only standard deviation translates into ignoring the central tendency of the data
distribution
• The coefficient of variation is a relative measure of dispersion that relates the standard deviation
and the mean
• It expresses the standard deviation as a percentage of the mean, which free standard deviation of
the scale of measurement
Measure of Skewness and Kurtosis
Skewness refers to the degree of symmetry, or more precisely, the degree of lack of symmetry.
Distributions, or data sets, are said to be symmetric if they appear the same on both sides of a central
point.
If this value is between:
1.-0.5 and 0.5, the distribution of the value is almost symmetrical
2.-1 and -0.5, the data is negatively skewed, and if it is between 0.5 to 1, the data is positively skewed. The skewness is
moderate.
[Link] the skewness is lower than -1 (negatively skewed) or greater than 1 (positively skewed), the data is highly skewed.
Measure of Skewness and Kurtosis
• Kurtosis is used to find the presence of outliers in our data. It gives us the total degree of outliers present.
• The data can be heavy-tailed, and the peak can be flatter, almost like punching the distribution or squishing it.
This is called Negative Kurtosis (Platykurtic). If the distribution is light-tailed and the top curve steeper, like
pulling up the distribution, it is called Positive Kurtosis (Leptokurtic).
• Kurtosis tells us how heavy or light the tails are compared to a normal distribution.
The expected value of kurtosis is 3. This
is observed in a symmetric distribution.
• Leptokurtic (High Kurtosis, >3): Taller peak
and fatter tails (e.g., Laplace distribution)
• Mesokurtic (Normal Kurtosis, ≈3): Similar to
the normal distribution.
• Platykurtic (Low Kurtosis, <3): Flatter peak
and thinner tails (e.g., uniform distribution).
[Link]
rtosis