Descriptive Statistics: Numerical Measures
Measures of Location (central tendency)
Measures of Variability
Measures of Location
Mean
If the measures are computed
Median
for data from a sample,
Mode they are called sample statistics.
Percentiles
Quartiles If the measures are computed
for data from a population,
they are called population parameters.
A sample statistic is referred to
as the point estimator of the
corresponding population parameter.
Mean
Perhaps the most important measure of location is
the mean.
The mean provides a measure of central location.
The mean of a data set is the average of all the data
values.
The sample mean x is the point estimator of the
population mean m.
Applicable for interval and ratio data
Affected by each value in the data set, including extreme
values
Sample Mean x
Sum
Sum ofof the
the values
values
of
of the
the nn observations
observations
x i
x
n
Number
Number ofof
observations
observations
in
in the
the sample
sample
Population Mean m
Sum
Sum ofof the
the values
values
of
of the
the N
N observations
observations
x i
N
Number
Number ofof
observations
observations in
in
the
the population
population
Sample Mean
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a small college town. The monthly rent
prices for these apartments are listed below.
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Sample Mean
Example: Apartment Rents
x
x
i34,356
490.80
n 70
445 615 430 590 435 600 460 600 440 615
440 440 440 525 425 445 575 445 450 450
465 450 525 450 450 460 435 460 465 480
450 470 490 472 475 475 500 480 570 465
600 485 580 470 490 500 549 500 500 480
570 515 450 445 525 535 475 550 480 510
510 575 490 435 600 435 445 435 430 440
Median
The median of a data set is the value in the middle
when the data items are arranged in ascending order.
Whenever a data set has extreme values, the median
is the preferred measure of central location.
The median is the measure of location most often
reported for annual income and property value data.
A few extremely large incomes or property values
can inflate the mean.
Applicable for ordinal, interval, and ratio data
Unaffected by extremely large and extremely small values
Median
For an odd number of observations:
26 18 27 12 14 27 19 7 observations
12 14 18 19 26 27 27 in ascending order
the median is the middle value.
Median = 19
Median
For an even number of observations:
26 18 27 12 14 27 30 19 8 observations
12 14 18 19 26 27 27 30 in ascending order
the median is the average of the middle two values.
Median = (19 + 26)/2 = 22.5
Median
Example: Apartment Rents
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Mode
The mode of a data set is the value that occurs with
greatest frequency.
The greatest frequency can occur at two or more
different values.
If the data have exactly two modes, the data are
bimodal.
If the data have more than two modes, the data are
multimodal.
Applicable to all levels of data measurement (nominal,
ordinal, interval, and ratio)
Mode
Example: Apartment Rents
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Percentiles
A percentile provides information about how the data
are spread over the interval from the smallest value to
the largest value.
Admission test scores for colleges and universities
are frequently reported in terms of percentiles.
The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less and
at least (100 - p) percent of the items take on this value
or more.
Not applicable for nominal data
Example: 90th percentile indicates that at least 90% of
the data lie below it, and at most 10% of the data lie
above it
Percentiles
Arrange the data in ascending order.
Compute index i, the position of the pth percentile.
i = (p/100)n
If i is not an integer, round up. The p th percentile
is the value in the i th position.
If i is an integer, the p th percentile is the average
of the values in positions i and i +1.
80 Percentile
th
Example: Apartment Rents
i = (p/100)n = (80/100)70 = 56
Averaging the 56th and 57th data values:
80th Percentile = (535 + 549)/2 = 542
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
80th Percentile
Example: Apartment Rents
“At least 80% of the “At least 20% of the
items take on a items take on a
value of 542 or less.” value of 542 or more.”
56/70 = .8 or 80% 14/70 = .2 or 20%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Quartiles
Quartiles are specific percentiles.
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median
Third Quartile = 75th Percentile
Third Quartile
Example: Apartment Rents
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.
Variability
No Variability in Cash Flow (same amounts)
Mean
Variability in Cash Flow (different amounts)
Mean
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Coefficient of Variation
Range
The range of a data set is the difference between the
largest and smallest data values.
It is the simplest measure of variability.
It is very sensitive to the smallest and largest data
values.
Range
Example: Apartment Rents
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Interquartile Range
The interquartile range of a data set is the difference
between the third quartile and the first quartile.
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data values.
Interquartile Range
Example: Apartment Rents
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Note: Data is in ascending order.
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean: = 13
Deviations (x - ) from the mean: -8, -4, 3,
4, 5
+5
-4 +4
-8 +3
0 5 10 15 20
Mean Absolute Deviation
Average of the absolute deviations from the
mean
X XX å X -m
M . A. D . =
5
9
-8
-4
+8
+4
N
16 +3 +3 24
17 +4 +4 =
18 +5 +5 5
0 24 = 4. 8
Variance
The variance is a measure of variability that utilizes
all the data.
It is based on the difference between the value of
each observation (xi) and the mean ( x for a sample,
m for a population).
The variance is useful in comparing the variability
of two or more variables.
Variance
The variance is the average of the squared
differences between each data value and the mean.
The variance is computed as follows:
2
2 ( xi x ) 2 ( xi )2
s
n 1 N
for a for a
sample population
Standard Deviation
The standard deviation of a data set is the positive
square root of the variance.
It is measured in the same units as the data, making
it more easily interpreted than the variance.
Standard Deviation
The standard deviation is computed as follows:
s s2 2
for a for a
sample population
Coefficient of Variation
The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:
s
100 % 100 %
x
for a for a
sample population
Sample Variance, Standard Deviation,
And Coefficient of Variation
Example: Apartment Rents
• Variance i
( x x ) 2
s2 2, 996.16
n1
• Standard Deviation the standard
s s 2 2996.16 54.74 deviation is
about 11%
of the mean
• Coefficient of Variation
s 54.74
100 % 100 % 11.15%
x 490.80
Uses of Standard Deviation
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
Standard Deviation as an
Indicator of Financial Risk
Annualized Rate of Return
Financial
Security
A 15% 3%
B 15% 7%
Coefficient of Variation
m1 = 29 m2 = 84
s 1 = 4.6 s 2 = 10
=
s 1
( ) =
s 2
( )
C V1 m 100 C V2 m 100
1 2
4.6 10
= (100) = (100)
29 84
= 1586
. = 1190
.
Measures of Central Tendency
and Variability: Grouped Data
Measures of Central Tendency
Mean
Median
Mode
Measures of Variability
Variance
Standard Deviation
Mean of Grouped Data
Weighted average of class midpoints
Class frequencies are the weights
fM
f
fM
N
f 1M 1 f 2M 2 f 3 M 3 f iM i
f1 f 2 f 3 fi
Weighted Mean
When the mean is computed by giving each data
value a weight that reflects its importance, it is
referred to as a weighted mean.
In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.
Grouped Data
The weighted mean computation can be used to
obtain approximations of the mean, variance, and
standard deviation for the grouped data.
To compute the weighted mean, we treat the
midpoint of each class as though it were the mean
of all items in the class.
We compute a weighted mean of the class midpoints
using the class frequencies as weights.
Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.
Calculation of Grouped Mean
Class Interval Frequency Class Midpoint fM
20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
fM2150
43
.0
f 50
Sample Mean for Grouped Data
Another Example: Apartment Rents
The previously presented
sample of apartment rents
is shown here as grouped
data in the form of a Rent ($) Frequency
frequency distribution. 420-439 8
440-459 17
460-479 12
Find the sample mean, 480-499 8
sample variance for the 500-519 7
grouped apartment rents 520-539 4
data. 540-559 2
560-579 4
580-599 2
600-619 6
Median of Grouped Data
N
cfp
Median L 2 W
fmed
Where:
L the lower limit of the median class
cfp = cumulative frequency of class preceding the median class
fmed = frequency of the median class
W = width of the median class
N = total of frequencies
Median of Grouped Data --
Example
Cumulative N
Class Interval Frequency Frequency cfp
20-under 30 6 6 Md L 2 W
30-under 40 18 24 fmed
40-under 50 11 35
50
50-under 60 11 46 24
40 2 10
60-under 70 3 49
70-under 80 1 50
N = 50
11
40.909
Mode of Grouped Data
f1 f0
Mode l (w)
2 f1 f0 f 2
•l = lower limit
•w = size of the class interval
•f1 = frequency of the modal class
•f0 = frequency of the class preceding
the modal class
•f2 = frequency of the class succeeding
the modal class
Variance and Standard Deviation
of Grouped Data
Population Sample
f M S f MX
2 2
2
n1
N
2
2 S S
Population Variance and Standard
Deviation of Grouped Data
M M M M
2
f fM
2
Class Interval f
20-under 30 6 25 150 -18 324 1944
30-under 40 18 35 630 -8 64 1152
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
60-under 70 3 65 195 22 484 1452
70-under 80 1 75 75 32 1024 1024
50 2150 7200
f M 7
2 2
144
1
2 20
0
N
50
1
44
Measures of Shape
Skewness
Absence of symmetry
Extreme values in one side of a distribution
Kurtosis
Peakedness of a distribution
Leptokurtic: high and thin
Mesokurtic: normal shape
Platykurtic: flat and spread out
Symmetrical and Skewness
0.30
0.4
0.25
0.3 0.20
0.15
0.2
0.10
0.1
0.05
0.00 0
0.0 0
0 2 4 6 8 10 12
-4 -3 -2 -1 0 1 2 3
Right or Positively
Symmetrical 12
Skewed
10
2
Left or Negatively
0
0.70 0.75 0.80 0.85 0.90 0.95 1.00
0
Skewed
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Relationship of Mean, Median and Mode
Coefficient of Skewness
Summary measure for skewness
3 M d
Sk
If Sk < 0, the distribution is negatively skewed
(skewed to the left).
If Sk = 0, the distribution is symmetric (not
skewed).
If Sk > 0, the distribution is positively skewed
(skewed to the right).
Distribution Shape: Skewness
Another formula for the skewness of sample data is
3
n
x
x
(
n1)(
n2
Skewness
i
) s
Skewness can be easily computed using statistical
software.
Distribution Shape: Skewness
Highly Skewed Right
• Skewness is positive (often above 1.0).
• Mean will usually be more than the median.
.35
Skewness = 1.25
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Distribution Shape: Skewness
Example: Apartment Rents
Seventy efficiency apartments were randomly
sampled in a college town. The monthly rent prices
for the apartments are listed below in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Distribution Shape: Skewness
Example: Apartment Rents
.35 Skewness = .92
.30
Relative Frequency
.25
.20
.15
.10
.05
0
Coefficient of Skewness
1
23 2
26 3
29
Md1 26 Md2 26 Md3 26
1
1 2 .3 2
1 2 .3 3
1 2 .3
3 M d1 3 M d2 3 M d3
S S S
1 2 3
1
1
2
2
3
3
3 2 3 2 6 3 2 6 2 6 3 2 9 2 6
1 2 .3 1 2 .3 1 2 .3
0 .7 3 0 0 .7 3
Types of Kurtosis
Platykurtic Distribution
Leptokurtic Distribution
Mesokurtic Distribution
Measures of, Relative Location, and
Detecting Outliers
z-Scores
Empirical Rule
Detecting Outliers
z-Scores
The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.
ItIt denotes
denotes thethe number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.
xi x
zi
s
z-Scores
An observation’s z-score is a measure of the relative
location of the observation in a data set.
A data value less than the sample mean will have a
z-score less than zero.
A data value greater than the sample mean will have
a z-score greater than zero.
A data value equal to the sample mean will have a
z-score of zero.
z-Scores
Example: Apartment Rents
• z-Score of Smallest Value (425)
xi x 425 490.80
z 1.20
s 54.74
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27
Empirical Rule
When the data are believed to approximate a
bell-shaped distribution …
The
The empirical
empirical rule
rule can
can be
be used
used to
to determine
determine the
the
percentage
percentage of
of data
data values
values that
that must
must be
be within
within aa
specified
specified number
number ofof standard
standard deviations
deviations of
of the
the
mean.
mean.
The
The empirical
empirical rule
rule is
is based
based on
on the
the normal
normal
distribution
distribution
Empirical Rule
For data having a bell-shaped
distribution:
68.26%
68.26% of
of the
the values
values of
of aa normal
normal random
random variable
variable
are within +/- 1 standard deviation of
are within of its
its mean.
mean.
95.44% of
of the
the values
values of
of aa normal
normal random
random variable
variable
are within +/- 2 standard deviations of
are within of its
its mean.
mean.
99.72% ofof the
the values
values of
of aa normal
normal random
random variable
variable
are within +/- 3 standard deviations of
are within of its
its mean.
mean.
Empirical Rule
99.72%
95.44%
68.26%
m x
m – 3s m – 1s m + 1s m + 3s
m – 2s m + 2s
Detecting Outliers
Example: Apartment Rents
• The most extreme z-scores are -1.20 and 2.27
• Using |z| > 3 as the criterion for an outlier, there
are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27