Descriptive Statistics
• Statistics is the branch of mathematics that deals with
collecting, analyzing, summarizing, interpreting and
presenting data. It helps in making informed decisions
based on numerical data and patterns.
There are two main types of statistics:
• Descriptive Statistics – Summarizes and describes data
using measures like mean, median, mode, and standard
deviation.
• Inferential Statistics – Makes predictions or inferences
about a population based on a sample using techniques like
hypothesis testing and regression analysis.
Descriptive Statistics
• Data: information especially facts collected for
examination
• Grouped vs. Ungrouped Data in Statistics
Ungrouped Data
• Raw data that has not been organized into groups or
categories.
• It is presented as an individual list of values.
• Example: Test scores of 10 students:
45, 67, 89, 56, 78, 90, 34, 76, 88, 92
Descriptive Statistics
Grouped Data
• Data that is organized into classes or intervals to
make analysis easier.
• It is often presented in frequency tables.
• Grouped data is useful for summarizing large datasets
and finding trends.
• Example: Test scores grouped into intervals:
Descriptive Statistics
Descriptive Statistics
• A frequency distribution table is a way to organize data into
classes or intervals and show how often each class occurs. It
helps in summarizing large datasets for easier interpretation.
• A class interval refers to the range of values in a grouped
frequency distribution table.
• Class interval terms
Lower Class Limit – The smallest value in the class.
Upper Class Limit – The largest value in the class.
Class Width (Class Size) – The difference between the
lower limit of one class and the lower limit of the next
class.
Descriptive Statistics
Class Boundaries (real limits) – Adjusted values used to
avoid gaps between intervals.
Midpoint (Class Mark) – The average of the lower and upper
class limits.
Class interval types
Exclusive (continuous) class interval: is one where the upper
limit of one class is the lower limit of the next, meaning the
upper limit is excluded from the class it appears in.
Inclusive (Discrete) Class Interval: The upper limit of one class
is included in that same class.
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
• Graphical Representation of Frequency Distribution
• Graphical methods make it easier to visualize patterns and
trends in a dataset.
• Here are the most common ways to represent a frequency
distribution graphically:
Histogram, Frequency Polygon, Ogive (Cumulative Frequency
Graph)
Measures of central tendency
• Mean for grouped and ungrouped data
(a) Arithmetic mean: It represents the sum of all values in a
dataset divided by the number of values.
-By direct method
- By assumed mean
- By coding number/method
(b) Harmonic mean: the reciprocal of the average of the
reciprocals of the data values
(c) Geometric mean: the nth root of the product of the n values
for the set of numbers
Measures of central tendency
• Properties of Arithmetic mean:
The algebraic sum of deviations of values from the
mean is zero
• Combined mean
Example: The average salary of male employees in a
firm was Rs.520 and that of females was Rs.420. The
mean salary of all the employees was Rs.500. Find
the percentage of male and female employees.
Median
• Median is defined as the value of the middle item (or the mean
of the values of the when the data are arranged in an ascending
or descending order of magnitude.
• Thus, in an ungrouped frequency distribution if the n values
are arranged in ascending or descending order of magnitude,
the median is the middle value if n is odd. When n is even, the
median is the mean of the two middle values.
• Suppose we have the following series:15, 19,21,7, 10,33,25,18
and 5
• We have to first arrange it in either ascending or descending
[Link] figures are arranged in an ascending order as
follows: 5,7,10,15,18,19,21,25,33
Median
• Now as the series consists of odd number of items, to
find out the value of the middle item, we use the
formula; (n+1)/2 ,Where n is the number of items.
In this case, n is 9, as such (n+1)/2 = 5 , that is, the
size of the 5th item is the median. This happens to be
18.
• In the case of a grouped series, the median is
calculated with the help of the following formula:
Median
• Median=L+(C/nw)(N/2 -nb)
Where; L-lower boundary or lower real limit of the median
class
C- class size of the median class
N- total number of frequency
nb - total number of frequency below the median
class
nw - number of frequency in the median class
• The class corresponding to the cumulative frequency just
greater than N/2 is called the median class
Median
Median
Median
Median
Median
Mode
• Mode is the value which occurs most frequently in a
set of observations
• In other words, mode is the value of the variable
which is predominant in the series.
• In the case of discrete frequency distribution mode is
the value of x corresponding to the maximum
frequency
• For example, in the following frequency distribution
Mode
Mode
• The mode is another measure of central tendency.
• It is the value at the point around which the items are
most heavily concentrated. As an example, consider
the following series: 8,9, 11, 15, 16, 12, 15,3, 7, 15,37
• There are ten observations in the series wherein the
figure 15 occurs maximum number of times three.
The mode is therefore 15
• In the case of grouped data, mode is determined by
the following formula:
Mode
• Mode=L+t1C/(t1+t2)
where; L-lower real limit or lower boundary of
the modal class
C- class size of the modal class
t1- difference between highest
frequency and the frequency before it
t2- difference between the highest
frequency and the frequency after it
Mode
Mode
RELATIONSHIPS BETWEEN MEAN, MEDIAN
AND MODE
• Sometimes mode is estimated from the mean and the
median. For a symmetrical distribution, mean,
median and mode coincide.
• If the distribution is asymmetrical, the mean, median
and mode obey the following empirical
relationship(due to Karl Pearson) :
Mode=3median-2mean
MEASURE OF DISPERSION AND SKEWNESS
MEASURE OF DISPERSION
• Some important definitions of dispersion are given below:
1. "Dispersion is the measure of the variation of the items."
2. "The degree to which numerical data tend to spread about an
average value is called the variation of dispersion of the data."
3. Dispersion or spread is the degree of the scatter or variation of
the variable about a central value."
4. "The measurement of the scatterness of the mass of figures in
a series about an average is called measure of variation or
dispersion.“
MEASURE OF DISPERSION
• When dispersion is small, the average is a typical
value.
• when dispersion is large, the average is not so typical,
and unless the sample is very large.
• The following are measures of dispersion: Range,
Inter-quartile range, Mean deviation, Standard
Deviation
MEASURE OF DISPERSION
• RANGE:The simplest measure of dispersion is the
range, which is the difference between the
maximum value and the minimum value of data.
MEASURE OF DISPERSION
MEASURES OF POSITION
Quartile, Decile, and Percentile
• They are measures of position in statistics.
• They help you understand how data is spread
out or distributed by dividing the dataset into
equal parts.
QUARTILE
• Divides the data into 4 equal parts
• Each part represents 25% of the data
Q1 = 25% of the data is below this value
Q2 = 50% (the median)
Q3 = 75% of the data is below this value
QUARTILE
INTERQUARTILE RANGE
• The interquartile range or the quartile deviation is a
better measure of variation in a distribution than the
range.
• In other words, the interquartile range denotes the
difference between the third quartile and the first
quartile
INTERQUARTILE RANGE
• Symbolically, interquartile range = Q3- Ql
• Many times the interquartile range is reduced in the
form of semi-interquartile range or quartile deviation
as shown below:
• Semi interquartile range or Quartile deviation
= (Q3 – Ql)/2
• When quartile deviation is small, it means that there
is a small deviation of data from the mean
DECILE
• Divides the data into 10 equal parts
• Each part represents 10% of the data
D1 = 10% of the data is below this value
D5 = 50% (same as median)
D9 = 90% of the data is below this value
PERCENTILE
• Divides the data into 100 equal parts
• Each part represents 1% of the data
P1 = 1% of the data is below this value
P50 = 50% (median)
P99 = 99% of the data is below this value
Mean deviation
Mean deviation
Mean deviation
Standard deviation
Standard deviation
Standard deviation
Standard deviation
• Calculate the mean and standard deviation for the
following table below giving the age distribution of
542 members.
Standard deviation
• Standard deviation = 11.5 years
H/W
SKEWNESS
• Some important definitions of skewness are as
follows:
1. "When a series is not symmetrical it is said to be
asymmetrical or skewed."
2. "Skewness refers to the asymmetry or lack of
symmetry in the shape of a frequency distribution."
3. "Measures of skewness tell us the direction and
the extent of skewness. In symmetrical distribution
the mean, median and mode are identical. The more
the mean moves away from the mode, the larger the
asymmetry or skewness."
SKEWNESS
4. "A distribution is said to be 'skewed' when the mean
and the median fall at different points in the
distribution, and the balance (or centre of gravity) is
shifted to one side or the other-to left or right.
• The above definitions show that the term 'skewness'
refers to lack of symmetry" i.e., when a distribution is
not symmetrical (or is asymmetrical) it is called a
skewed distribution.
SKEWNESS
• In the positively skewed distribution the frequencies
are spread out over a greater range of values on the
high-value end of the curve (the right-hand side) .
• In the negatively skewed distribution the position is
reversed, i.e. the excess tail is on the left-hand side.
SKEWNESS
TESTS OF SKEWNESS
• In order to ascertain whether a distribution is skewed or
not, the following tests may be applied. Skewness is
present if:
[Link] values of mean, median and mode do not coincide.
2. When the data are plotted on a graph they do not give
the normal bell shaped form i.e. when cut along a vertical
line through the centre the two halves are not equal.
3. The sum of the positive deviations from the median is
not equal to the sum of the negative deviations.
MEASURES OF SKEWNESS
• There are four measures of skewness, each divided
into absolute and relative measures.
• The relative measure is known as the coefficient of
skewness and is more frequently used than the
absolute measure of skewness.
• Further, when a comparison between two or more
distributions is involved, it is the relative measure of
skewness, which is used.
• The measures of skewness are: (i) Karl Pearson's
measure, (ii) Bowley’s measure, (iii) Kelly’s measure,
and (iv) Moment’s measure
KARL PEARON’S MEASURE
KARL PEARON’S MEASURE
Or Mode = 3 Median - 2 Mean
• The direction of skewness is determined by
ascertaining whether the mean is greater than the
mode or less than the mode. If it is greater than the
mode, then skewness is positive.
• But when the mean is less than the mode, it is
negative.
• The difference between the mean and mode indicates
the extent of departure from symmetry.
KARL PEARON’S MEASURE
• The value of coefficient of skewness is zero, when
the distribution is symmetrical.
• Normally, this coefficient of skewness lies between
+ve or –ve one.
• If the mean is greater than the mode, then the
coefficient of skewness will be positive, otherwise
negative.
KARL PEARON’S MEASURE
KARL PEARON’S MEASURE
KARL PEARON’S MEASURE
KARL PEARON’S MEASURE
KARL PEARON’S MEASURE
BOWLEY’S MEASURE
• The Bowley’s measure coefficient of skewness
is given by;
k= (Q1 + Q3 - 2Q2)/(Q3 - Q1)
KELLY’S MEASURE
• The Kelly’s measure coefficient of skewness is
given by;
k= (P90 + P10 – 2P50)/(Q90 - Q10) or
k= (D9 + D1 – 2D5)/(Q9 - Q1)
MOMENT’S MEASURE
• The Moment’s measure coefficient of skewness is
given by;
k = μ3/σ3
μ3: is the third central moment.
σ3: is the cube of the standard deviation
• The nth central moment is given by
μn = Σf(x-x̄ )n ÷ N