0% found this document useful (0 votes)
14 views67 pages

Descriptive Statistics

This document provides an overview of descriptive statistics, including its definition, key measurements such as mean, median, mode, and measures of variability. It explains how to analyze and interpret datasets using various statistical methods and presents concepts like normal distribution and skewness. The content is structured into lessons covering central tendency, variability, and normality, with examples and applications throughout.

Uploaded by

24-09163
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views67 pages

Descriptive Statistics

This document provides an overview of descriptive statistics, including its definition, key measurements such as mean, median, mode, and measures of variability. It explains how to analyze and interpret datasets using various statistical methods and presents concepts like normal distribution and skewness. The content is structured into lessons covering central tendency, variability, and normality, with examples and applications throughout.

Uploaded by

24-09163
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COLLEGE OF INFORMATICS AND COMPUTING SCIENCES

DESCRIPTIVE
STATISTICS
JOEBERT C. CEPILLO
Couse Facilitator
Learning Objectives
1. Explain the concept of descriptive statistics and
distinguish it from deductive (inferential) statistics
2. enumerate and describe the key measurements of
descriptive statistics (mean, median, mode, range,
variance, standard deviation, etc.)
3. use the different measurements of descriptive
statistics to analyze and interpret a given dataset
DESCRIPTIVE STATISTICS
❑ Descriptive Statistics is the branch of statistics that
focuses on summarizing, organizing, and presenting
data.
❑ It deals only with the data at hand—it does not generalize
or make predictions about a larger population.
❑ Its purpose is to describe the characteristics and
properties of a dataset (such as scores, measurements,
or survey results).
❑ Because it is based on actual values, descriptive
statistics are factual and verifiable.
🛠 How Descriptive Statistics
Work?
❑It uses methods of collection and
description to turn raw data into
meaningful information.
❑Results are usually presented in three
forms:
✔Tabular (tables, frequency distributions)
✔Graphical (charts, histograms, pie graphs)
✔Numerical (mean, median, mode, etc)
SOME MEASUREMENTS OF
DESCRIPTIVE STATISTICS
Category Measurements
Measures of Central Tendency Mean, Median, Mode
Measures of Location (Position) Quartile, Decile, Percentile
Range, Mean Absolute Deviation,
Standard Deviation, Variance, Coefficient
Measures of Variation (Dispersion)
of Variation, Quartile/Decile/Percentile
Deviation, Semi-Interquartile Range
Measures of Normality (Shape of
Normal Distribution, Skewness, Kurtosis
Distribution)
MEASURES OF
CENTRAL
TENDENCY
Lesson 1
CENTRAL TENDENCY
❑Definition:
• Value around which data tends to cluster.
• Represents the "typical" or most common observation.
❑Purpose:
• Shows the center or representative value of a dataset.
❑Main Measures:
• Mean → Average of all values
• Median → Middle value when ordered
• Mode → Most frequent value
MEAN
Definition Properties
•Most popular measure 1. A dataset has only one mean.
of central location. 2. Applicable for interval and
ratio data.
•Called the average in 3. All values are included in
everyday use; computation.
statisticians call it the 4. Useful for comparing datasets.
arithmetic mean. 5. Affected by extreme values
(outliers).
6. Cannot be computed for
open-ended frequency
distributions.
To determine the mean of a certain set
of data, the formula is shown below:

Where;
Example

MEDIAN
Definition Properties
Definition: 1. Unique → only one median per
dataset.
❑ The middle value of an ordered
dataset. 2. Found by ordering data (lowest
to highest or vice versa).
❑ If odd number of values → middle
item. 3. Not affected by outliers (extreme
values).
❑ If even number of values → average 4. Can be computed for
of two middle items. open-ended frequency
Purpose: distributions.
❑ Avoids being misled by extreme 5. Applicable for ordinal, interval,
values (outliers). and ratio data.
❑ Represents the central point of the
data.
MEDIAN FOR UNGROUPED

DATA
🔑 Steps to Find the Median
1. Arrange data → lowest to highest
(or vice versa).
2. Check number of observations
(n):
•Odd n → Median = middle value.
•Even n → Median = average of two
middle values.
Example

MODE
Definition Properties
❑ Value that occurs most frequently 1. Found by locating the most
in a dataset. frequent value.
Ungrouped data → found by 2. Easiest average to compute.
counting occurrences. 3. A dataset may have one, more
Can be one mode (unimodal) than one, or no mode.
or multiple modes: 4. Not affected by outliers
✔ Bimodal → two modes (extreme values).
✔ Trimodal → three modes 5. Applicable for nominal,
ordinal, interval, and ratio
✔ Multimodal → more than data.
three modes
Example

MEASURES OF
LOCATION
Lesson 2
FRACTILES IN STATISTICS
❑Values that divide data into equal parts.
❑Median is one example (divides data into 2 parts).
❑Types of Fractiles
❑Quartiles → Divide data into 4 equal parts (25%
each).
❑Deciles → Divide data into 10 equal parts (10%
each).
❑Percentiles → Divide data into 100 equal parts
(1% each).
FRACTILES IN STATISTICS
Fractile Division Notation Meaning
Q1 = 25% below,
Divide data into 4 Q2 = 50%
Quartiles Q1, Q2, Q3
equal parts (Median), Q3 =
75% below
D1 = 10% below,
Divide data into 10
Deciles D1–D9 D2 = 20%, … D9 =
equal parts
90% below
P1 = 1% below, P50
Divide data into
Percentiles P1–P99 = Median, P99 =
100 equal parts
99% below
Position Formulas

Example

Solution:
75, 78, 80, 84, 85, 88, 89, 90, 92, 95

Quartile Decile
• •
Solution:
75, 78, 80, 84, 85, 88, 89, 90, 92, 95

Percentile

MEASURES OF
VARIABILITY
Lesson 3
Measures of Variability
Definition:
•Show how data values are spread out around
the center.
•Also called measures of spread or dispersion.
Purpose:
•Describe the degree of scatter in a dataset.
•Complement measures of central tendency
(mean, median, mode).
•Variability shows the degree of difference among
values
Consider the following sets of prelim
grades in Statistics 103 of two groups of
five students each:
Name of Male Student Prelim Grade in Stat 103 Name of Female Student Prelim Grade in Stat 103
1. Ernesto Angulo 100 1. Lenie Sanchez 84
2. Allan Eleponga 95 2. Patricia Reyes 86
3. Allan Ochoa 75 3. Norma Mendoza 85
4. Mark Anthony Aseron 85 4. Letecia Santillan 82
5. Casiano Custodio 65 5. Bella Florencia 83
MEAN ? MEAN ?

1. What can you say about the mean of two groups?


2. How far apart are the grades of each group from one another
about the mean?
3. Who performs better? Group of male student or group of
female students? Explain briefly.
INTERPRETING MEASURES OF
VARIABILITY
Small Variability Large Variability
• Data clustered closely around • Data spread far from mean
mean • More heterogeneous
• More homogeneous (similar) (different)
• Less variable • More variable
• More consistent • Less consistent
• More uniformly distributed • Less uniformly distributed

values are stable and values are scattered


predictable and less reliable.
Measures of Variability
Measure Definition / Key Idea Formula

Range Difference between the highest and lowest values.

Mean Absolute Deviation


Average of absolute differences from the mean.
(MAD)

Variance (Var) Average of squared deviations from the mean.

Square root of variance; shows typical spread and how


Standard Deviation (SD) concentrated the data are around the mean; the more
concentrated, the smaller the standard deviation.

Ratio of standard deviation to mean; compares variability


Coefficient of Variation (CV)
across datasets.

Spread based on fractiles (Quartile, Decile, Percentile


Other Deviations
Deviation).
MEASURES OF VARIABILITY

75 10.6 112.36
78 7.6 57.76
80 5.6 31.36
84 1.6 2.56
85 0.6 0.36
88 2.4 5.76
89 3.4 11.56
90 4.4 19.36
92 6.4 40.96
95 9.4 88.36
MEASURES OF VARIABILITY


Spread based on fractiles (Quartile,
Decile, Percentile Deviation)
Measure Definition / Key Idea Formula
Interquartile Range Spread of the middle 50% of data
(IQR) (between Q3 and Q1).
Semi-Interquartile Half of the interquartile range; shows
Range (SIR) average spread around the median.
Decile Deviation Spread between the 1st and 9th
(DD) deciles.
Percentile Deviation Spread between the 10th and 90th
(PD) percentiles.

MEASURES OF

VARIABILITY
MEASURES OF
NORMALITY
Lesson 4
NORMAL DISTRIBUTION
The normal distribution is the most important and most
widely used distribution in statistics. It is sometimes
called the "bell curve," since it shape appears like a bell.

This distribution is also called the "Gaussian curve"


after the mathematician Karl Friedrich Gauss.
PROPERTIES OF THE NORMAL
DISTRIBUTION
1. All measures of central tendency are equal.
2. The curve is perfectly symmetrical about the mean.
3. The ends of the curve approach (but never touch) the
horizontal axis.
4. The total area under the normal curve is 1.0 or
100%.
5. The normal curve area maybe subdivided into
standard deviation, at least 3 to the left and 3 to the
right of the mean.
About 68% of the data lie About 95% of the data
within 1 standard lie within 2 standard
deviation of the mean deviation of the mean

About 99.7% of the data lie within the 3 standard


deviation of the mean
STANDARD Z-SCORE
Definition:
A z-score tells us how many standard
deviations a raw score is above or below the
mean.
•Positive z-score → above the mean
•Negative z-score → below the mean
FORMULA

Interpretating the Standard Score or
Z-Score)
1.z < 0 → Raw score is below the mean.
2.z > 0 → Raw score is above the mean.
3. z = 0 → Raw score is equal to the mean.
4. z = +1, +2, … → standard deviations above the
mean.
5. z = -1, -2, … → standard deviations below the
mean.
6. Empirical Rule (68–95–99.7 Rule):
❑ About 68% of data lie between z = -1 and z = +1.
❑ About 95% lie between z = -2 and z = +2.
❑ About 99.7% lie between z = -3 and z = +3.
AREAS UNDER THE NORMAL
1. Total Area = 1.0 (100%)
CURVE
❑ The entire distribution is represented by the area under the curve.
2. Symmetry
❑ Since the curve is symmetrical about the mean, each half has an area of 0.5
(50%).
3. Use of Tables
❑ Standard normal distribution tables provide the area to the right of the mean.
❑ These tables list values from z = 0 up to about z = 3.99, giving areas
between 0 and 0.4997 (≈0.5).
4. Area vs. z-score
❑ Area is always positive, but z-scores can be positive or negative
❑ Negative z-scores represent values below the mean; positive z-scores
represent values above the mean.
REMINDERS
1. Always sketch the normal curve and shade the region
representing the probability you want.
2. Only addition and subtraction are needed when combining or
subtracting areas.
3. Half of the curve (left or right of the mean) = 0.5 (50%).
4. Finding the area under the curve is the same as finding the
probability of an event.
5. The standard normal table gives the area between z = 0 and a
positive or negative z-score.
6. This makes it easy to determine the area between the mean
and a given z-score.
Find the area under the normal curve
based on the given conditions.
1. From z = 0 to z = 1.35 (same as P(0 ≤ z ≤
1.35)
2. From z = 0 to z = -1.08
3. From z = -2.0 to z = 2.15
4. From z = 0.25 to z = 1.75
5. To the right of z = -1.58
APPLICATION OF A
NORMAL CURVE
Problem 1
Most graduate schools of business require applicants for
admission to take the Graduate Management Admission
Council’s GMAT examination. Scores on the GMAT are
roughly normally distributed with a mean of 527 and a
standard deviation of 112.
a. What is the probability of an individual scoring above
500 on the GMAT?
b. How high must an individual score on the GMAT in
order to score in the highest 5%?
Problem 2
The length of human pregnancies from conception to
birth approximates a normal distribution with a mean
of 266 days and a standard deviation of 16 days.
a. What proportion of all pregnancies will last
between 240 and 270 days (roughly between 8 and
9 months)?
b. What length of time marks the shortest 70% of all
pregnancies?
Problem 3
A radar unit is used to measure speeds of
cars on a motorway. The speeds are
normally distributed with a mean of 90
km/hr and a standard deviation of 10
km/hr. What is the probability that a car
picked at random is traveling at more than
100 km/hr?
Problem 4
For a certain type of computers, the length of
time between charges of the battery is normally
distributed with a mean of 50 hours and a
standard deviation of 15 hours. John Lenard
owns one of these computers and wants to
know the percentage that the length of time
will be between 60 hours and 70 hours.
Problem 5
Entry to a certain University is determined by a
Entrance Examination. The scores on this test are
normally distributed with a mean of 500 and a
standard deviation of 100. A freshman applicant
wants to be admitted to this university and he knows
that he must score better than at least 70% of the
students who took the test. The applicant takes the
test and scores 585. Will he be admitted to this
university?
Exercises

SKEWNESS
SKEWNESS
Skewness measures the degree of
asymmetry in a distribution compared
to the normal curve.
✔Normal distribution → skewness ≈ 0
(perfect symmetry).
✔Skewed distribution → skewness ≠ 0.
SKEWNESS
Negative
Skewness Positive Skewness
Left-Skewed Right-Skewed
• Coefficient of skewness < 0. • Coefficient of skewness > 0.
• Distribution has a longer tail on • Distribution has a longer tail on
the left (toward smaller values). the right (toward larger values).
• The bulk of values are • The bulk of values are
concentrated on the right (larger concentrated on the left (smaller
values). values).
• Relationship among measures of • Relationship among measures
central tendency: of central tendency:
Mean<Median<Mode Mode<Median<Mean
SKEWNESS
Negative
Skewness Positive Skewness

Mean<Median<Mode Mode<Median<Mean
INTERPRETING SKEWNESS (SK)
1. SK = 0 → Distribution is perfectly normal
(symmetrical about the mean).
2. SK ≈ 0 → Distribution is almost normal (nearly
symmetrical).
3. SK < 0 → Distribution is negatively skewed (tail
extends to the left; mean < median < mode).
4. SK > 0 → Distribution is positively skewed (tail
extends to the right; mean > median > mode).
FORMULA:

Moment-Based Formula
75 -10.6 -1191.02
78 -7.6 -438.976
80 -5.6 -175.616
84 -1.6 -4.096
85 -0.6 -0.216
88 2.4 13.824
89 3.4 39.304
90 4.4 85.184
92 6.4 262.144
95 9.4 830.584
Pearson’s Median Skewness


INTERPRETION OF SKEWNESS
Skewness Value (SK) Interpretation Description
Distribution shows strong
Highly Skewed asymmetry; extreme values
heavily influence the mean.
Distribution is noticeably
Moderately Skewed skewed; mean is pulled away
from the median.
Distribution is close to
Approximately Symmetric normal; mean, median, and
mode are nearly equal.
KURTOSIS
KURTOSIS
•In probability theory and statistics,
kurtosis is a measure of the “tailedness”
of the probability distribution of a
real-valued random variable.
•It describes whether the data are peaked
or flat relative to a normal distribution.
•Like skewness, kurtosis is a descriptor of
distribution shape.
Kurtosis Value
TYPES OF KURTOSIS
Type Description
Distribution is normal in relation
Ku = 0 Mesokurtic to height; tails and peak are
moderate.
Distribution is nearly normal;
Ku ≈ 0 Almost Normal slight deviations but still close to
symmetry.

Distribution is flatter than


Ku < 0 Platykurtic normal; lighter tails, fewer
extreme values.

Distribution is taller than normal;


Ku > 0 Leptokurtic heavier tails, more extreme
values.
Distribution is normal in relation
Ku = 0 Mesokurtic to height; tails and peak are
moderate.
FORMULA

Kurtosis
75 -10.6 12624.77
78 -7.6 3336.218
80 -5.6 983.4496
84 -1.6 6.5536
85 -0.6 0.1296
88 2.4 33.1776
89 3.4 133.6336
90 4.4 374.8096
92 6.4 1677.722
95 9.4 7807.49

You might also like