Statistics for Finance and Business
Analytics
FR1203 and BA1201
Week2: Descriptive Statistics
Copyright © 2023 Pearson Education Ltd. Slide - 1
Chapter Topics
• Measures of central tendency, variation, and shape
– Mean, median, mode
– Quartiles, box-and-whisker plots
– Range, interquartile range, variance and standard
deviation
– Symmetric and skewed distributions
• Population summary measures
– Mean, variance, and standard deviation
– Covariance and coefficient of correlation
Copyright © 2023 Pearson Education Ltd. Slide - 2
Describing Data Numerically
Copyright © 2023 Pearson Education Ltd. Slide - 3
Section 2.1 Measures of Central
Tendency
Copyright © 2023 Pearson Education Ltd. Slide - 4
Arithmetic Mean (1 of 2)
• The arithmetic mean (mean) is the most common
measure of central tendency
– For a population of N values:
– For a sample of size n:
Copyright © 2023 Pearson Education Ltd. Slide - 5
Arithmetic Mean (2 of 2)
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20
= =3 = =4
5 5 5 5
Copyright © 2023 Pearson Education Ltd. Slide - 6
Median
• In an ordered list, the median is the “middle”
number (50% above, 50% below)
• Not affected by extreme values
Copyright © 2023 Pearson Education Ltd. Slide - 7
Finding the Median
• The location of the median:
n +1
th
Median position = position in the ordered data
2
– If the number of values is odd, the median is the middle number
– If the number of values is even, the median is the average of the
two middle numbers
n +1
• Note that is not the value of the median, only the
2
position of the median in the ranked data
Copyright © 2023 Pearson Education Ltd. Slide - 8
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
Copyright © 2023 Pearson Education Ltd. Slide - 9
Review Example
• Five houses on a hill by the beach
House Prices:
$2, 000, 000
500, 000
300, 000
100, 000
100, 000
Copyright © 2023 Pearson Education Ltd. Slide - 10
Review Example: Summary Statistics
• Mean:
House Prices : $3,000,000
$2,000,000 5
500,000 = $600,000
300,000
• Median: middle value of ranked data
100,000
100,000 = $300,000
Sum 3,000,000 • Mode: most frequent value
= $100,000
Copyright © 2023 Pearson Education Ltd. Slide - 11
Which Measure of Location Is the
“Best”?
• Mean is generally used, unless extreme values
(outliers) exist …
• Then median is often used, since the median is
not sensitive to extreme values.
– Example: Median home prices may be reported for a
region – less sensitive to outliers
Copyright © 2023 Pearson Education Ltd. Slide - 12
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Copyright © 2023 Pearson Education Ltd. Slide - 13
Percentiles and Quartiles
• Percentiles and Quartiles indicate the position of a value
relative to the entire set of data
• Generally used to describe large data sets
• Example: An IQ score at the 90th percentile means that 10% of the
population has a higher IQ score and 90% have a lower IQ score.
P
( + )
th
Pth percentile = value located in the n 1
100
ordered position
Copyright © 2023 Pearson Education Ltd. Slide - 14
Quartiles (1 of 2)
• Quartiles split the ranked data into 4 segments with an
equal number of values per segment (note that the widths
of the segments may be different)
• The first quartile, Q1 , is the value for which 25% of the
observations are smaller and 75% are larger
• Q2 is the same as the median (50% are smaller, 50% are
larger)
• Only 25% of the observations are greater than the third
quartile
Copyright © 2023 Pearson Education Ltd. Slide - 15
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position: Q1 = 0.25 ( n + 1)
Second quartile position: Q2 = 0.50 ( n + 1)
(the median position)
Third quartile position: Q3 = 0.75 ( n + 1)
where n is the number of observed values
Copyright © 2023 Pearson Education Ltd. Slide - 16
Quartiles (2 of 2)
• Example: Find the first quartile
Sample Ranked Data: 11 12 13 16 16 17 18 21 22
( n = 9)
Q1 = is in the 0.25 ( 9 + 1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Copyright © 2023 Pearson Education Ltd. Slide - 17
Five-Number Summary
The five-number summary refers to five descriptive
measures:
minimum
first quartile
median
third quartile
maximum
minimum Q1 median Q3 maximum
Copyright © 2023 Pearson Education Ltd. Slide - 18
Section 2.2 Measures of Variability
• Measures of variation give
information on the spread
or variability of the data
values.
Copyright © 2023 Pearson Education Ltd. Slide - 19
Range
• Simplest measure of variation
• Difference between the largest and the smallest
observations:
Range = X largest − X smallest
Example:
Copyright © 2023 Pearson Education Ltd. Slide - 20
Disadvantages of the Range
• Ignores the way in which data are distributed
• Sensitive to outliers
Copyright © 2023 Pearson Education Ltd. Slide - 21
Interquartile Range (1 of 2)
• Can eliminate some outlier problems by using the
interquartile range
• Eliminate high-and low-valued observations and
calculate the range of the middle 50% of the data
• Interquartile range = 3rd quartile − 1st quartile
IQR = Q3 – Q1
Copyright © 2023 Pearson Education Ltd. Slide - 22
Interquartile Range (2 of 2)
• The interquartile range (IQR) measures the
spread in the middle 50% of the data
• Defined as the difference between the observation
at the third quartile and the observation at the first
quartile
IQR = Q3 − Q1
Copyright © 2023 Pearson Education Ltd. Slide - 23
Box-and-Whisker Plot (1 of 2)
• A box-and-whisker plot is a graph that describes the shape
of a distribution
• Created from the five-number summary: the minimum value,
Q1 , the median, Q3 , and the maximum
• The inner box shows the range from Q1 to Q3 , with a
line drawn at the median
• Two “whiskers” extend from the box. One whisker is
the line from Q1 to the minimum, the other is the line from
Q3 to the maximum value
Copyright © 2023 Pearson Education Ltd. Slide - 24
Box-and-Whisker Plot (2 of 2)
The plot can be oriented horizontally or vertically
Example:
Copyright © 2023 Pearson Education Ltd. Slide - 25
Population Variance
• Average of squared deviations of values from the
mean
N
( x − )
2
– Population variance: i
2 = i =1
Where = population mean
N = population size
xi = i th value of the variable x
Copyright © 2023 Pearson Education Ltd. Slide - 26
Sample Variance
• Average (approximately) of squared deviations of
values from the mean
n
– Sample variance: (x − x )
i
2
s2 = i =1
n −1
Where x = arithmetic mean
n = sample size
xi = i th value of the variable x
Copyright © 2023 Pearson Education Ltd. Slide - 27
Population Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
– Population standard deviation:
N
( x − )
2
i
= i =1
Copyright © 2023 Pearson Education Ltd. Slide - 28
Sample Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
– Sample standard deviation: n
i
(x − x ) 2
S= i =1
n −1
Copyright © 2023 Pearson Education Ltd. Slide - 29
Calculation Example: Sample
Standard Deviation
Sample Data ( xi ) :
(10 − x ) 2 + (12 − x ) 2 + (14 − x ) 2 + L + (24 − x ) 2
s=
n −1
(10 − 16 ) + (12 − 16 ) + (14 − 16 ) + L + ( 24 − 16 )
2 2 2 2
=
8 −1
Copyright © 2023 Pearson Education Ltd. Slide - 30
Measuring Variation
Copyright © 2023 Pearson Education Ltd. Slide - 31
Comparing Standard Deviations
Mean = 15.5 for each data set
Copyright © 2023 Pearson Education Ltd. Slide - 32
Advantages of Variance and Standard
Deviation
• Each value in the data set is used in the
calculation
• Values far from the mean are given extra weight
(because deviations from the mean are squared)
Copyright © 2023 Pearson Education Ltd. Slide - 33
z-Score (1 of 3)
A z-score shows the position of a value relative
to the mean of the distribution.
• indicates the number of standard deviations a
value is from the mean.
– A z-score greater than zero indicates that the value is
greater than the mean
– a z-score less than zero indicates that the value is less
than the mean
– a z-score of zero indicates that the value is equal to the
mean.
Copyright © 2023 Pearson Education Ltd. Slide - 34
z-Score (2 of 3)
• If the data set is the entire population of data
and the population mean, , and the population
standard deviation, , are known, then for each
value, xi , the z-score associated with xi is
xi −
z=
Copyright © 2023 Pearson Education Ltd. Slide - 35
z-Score (3 of 3)
• If intelligence is measured for a population using
an IQ score, where the mean IQ score is 100 and
the standard deviation is 15, what is the z-score
for an IQ of 121?
xi − 121 − 100
z= = = 1.4
15
A score of 121 is 1.4 standard
deviations above the mean.
Copyright © 2023 Pearson Education Ltd. Slide - 36
Section 2.4 Measures of
Relationships Between Variables
Two measures of the relationship between variable
are
• Covariance
– a measure of the direction of a linear relationship
between two variables
• Correlation Coefficient
– a measure of both the direction and the strength of a
linear relationship between two variables
Copyright © 2023 Pearson Education Ltd. Slide - 37
Covariance
• The covariance measures the strength of the linear relationship
between two variables
• The population covariance:
( x − )( y − )
N
i x i y
Cov ( x, y ) = xy = i =1
N
• The sample covariance:
n
(x − x )(y − y )
i i
Cov (x, y ) = sxy = i =1
n −1
– Only concerned with the strength of the relationship
– No causal effect is implied
Copyright © 2023 Pearson Education Ltd. Slide - 38
Interpreting Covariance
• Covariance between two variables:
Cov ( x, y ) 0 → x and y tend to move in the same direction
Cov ( x, y ) 0 → x and y tend to move in opposite directions
Cov ( x, y ) = 0 → x and y are independent
Copyright © 2023 Pearson Education Ltd. Slide - 39
Coefficient of Correlation
• Measures the relative strength of the linear relationship
between two variables
• Population correlation coefficient:
Cov ( x, y )
=
x y
• Sample correlation coefficient:
Cov ( x, y )
r=
sx s y
Copyright © 2023 Pearson Education Ltd. Slide - 40
Features of Correlation Coefficient, r
• Unit free
• Ranges between −1 and 1
• The closer to −1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker any positive linear relationship
Copyright © 2023 Pearson Education Ltd. Slide - 41
Scatter Plots of Data with Various
Correlation Coefficients
Copyright © 2023 Pearson Education Ltd. Slide - 42
Chapter Summary
• Described measures of central tendency
– Mean, median, mode
• Illustrated the shape of the distribution
– Symmetric, skewed
• Described measures of variation
– Range, interquartile range, variance and standard deviation
• Calculated measures of relationships between
variables
– covariance and correlation coefficient
Copyright © 2023 Pearson Education Ltd. Slide - 43