INTRODUCTION TO
STATISTICS AND
DESCRIPTIVE STATISTICS
1
LECTURE GOALS
After completing this lecture, you should be able to:
Know key definitions: Population vs. Sample
Identify the different types of statistical charts
Compute and interpret the mean, median, and mode for a
set of data
Compute the range, variance, and standard deviation and
know what these values mean
2
POPULATIONS and SAMPLES
A Population - the entire items or individuals of interest
eg: All voters in the election, all students in a university, all
workers in a factory etc.
Parameter - a summary measure computed to describe a
characteristic of the population eg average age of population
A Sample - a subset of the population
eg: 1000 voters selected at random for interview, 500 students
from a university, etc.
Statistic - summary measure computed to describe a
characteristic of the sample
3
DESCRIBING DATA USING NUMERICAL
MEASURES
Describing Data Numerically
Central Location Variation / dispersion
Mean Range
Median Interquartile Range
Mode
Variance
Standard Deviation
Coefficient of
Variation 4
MEAN (ARITHMETIC AVERAGE)
The Mean is the arithmetic average of
data values
N = Population Size
Population mean
N
xx1 x2 x N
i
i 1
N N
n = Sample Size
n
Sample mean x i
x1 x2 xn
x i 1
n n
5
MEAN (ARITHMETIC AVERAGE)
The most common measure of central tendency
Mean = sum of values divided by the number of
values
Affected by extreme values (outliers)
For example, 2 sets of data are given below:
2, 7, 9 12, 15 : mean = (2+7+9+12+15)/5 = 9
2, 7, 9 12, 15, 70 : mean = (2+7+9+12+15+70)/6
= 19.17
6
MEDIAN
In an ordered array, the median is the
“middle” number, i.e., the number that splits
the distribution in half
The median is not affected by extreme
values (outliers)
To find the median, sort the data values from low
to high
Find the value in the i = (n+1)/2 position
(middle position)
7
For example, a set of data is given below:
10 5 19 8 3
First, we rank the given data in an increasing
order as follows:
3 5 8 10 19
There are five observations in the data set.
Consequently, n = 5 and
Position of the middle term = (5+1)/2 = 3
Median is 8
8
MODE
The value that occurs most often
Not affected by extreme values
There may be no mode
There may be several modes
9
For example, 2 sets of data are given as
follows:
77 69 74 81 71 68 74 73
Mode = 74
495, 486, 503, 495, 470, 505, 470, 499
Mode = 495 and 470
10
SHAPE OF A DISTRIBUTION
Describes how data is distributed
Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
outliers outliers
Mean < Median Mean = Median Mean > Median
(Longer tail extends to left) (Longer tail extends to right)
Example sets of data:
1, 30, 34, 40, 46, 50 1, 2, 4, 6, 7 2, 7, 9, 12, 15, 70
11
Which measure of location
is the “best”?
Mean is generally used for
symmetric distributions and no
extreme values (outliers) exist
Median is often used for
skewed distributions and if
extreme values (outliers) exist.
12
Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of
Variation
Population Population
Interquartile
Variance Standard
Range
Deviation
Sample Sample
Variance Standard
Deviation
13
Variation
Measures of variation give information on the
spread or variability of the data values.
Example sets of data:
20, 25, 30, 35, 40
Mean: 30
Standard deviation (SD): 7.9
10, 20, 30, 40, 50
Mean: 30
Standard deviation (SD): 15.8
Same center / mean,
different variation 14
RANGE
Simplest measure of variation
Difference between the largest and the smallest
observations:
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15
Disadvantages of the Range
Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
16
Interquartile Range
Difference between the 3rd quarter and
1st quarter of the data
Interquartile range = 3rd quartile – 1st quartile
To find the 1st quartile and the 3rd quartile, sort
the data values from low to high.
1st quartile: the value in the (n+1) position
3rd quartile: the value in the (n+1) position
17
Find the Inter Quartile Range
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
*make sure the data is ranked in an increasing order
1st quartile position = (9+1) = 2.5,
1st quartile = (12+13)/2 = 12.5,
3rd quartile position = (9+1) = 7.5,
3rd quartile = (18+21)/2 = 19.5
Interquartile range = 19.5 – 12.5 = 7
18
VARIANCE
Population variance:
N
i
(x μ) 2
σ2 i 1
N
n
Sample variance: (x i x) 2
s2 i1
n -1
19
STANDARD DEVIATION
Population standard deviation:
x x
N 2 2
2
σ σ
i
(x μ) 2
OR
N
i 1 N
N
Sample standard deviation:
s
2
n
(x i x ) 2
OR
1 x 2
s i 1
s
n 1 x
2
n
n -1
20
Calculation Example:
Sample Standard Deviation
Sample Data (Xi) : 48.50 38.40 65.50
22.60 79.80 54.60
x x²
48.50 2352.25
38.40 1474.56
65.50 4290.25
22.60 510.76
79.80 6368.04
54.60 2981.16
∑x = 309.40 ∑x² = 17,977.02
2
1 x 1 (309.40) 2
s
n 1
x
2
n
17,977.02
6 1 6
20.11
Coefficient of Variation
The coefficient of variation is used to
measure variation relative to the mean
Population Sample
σ s
CV
μ
100% CV
x 100%
22
Comparing Coefficients of Variation
Stock A:
Average price last year = $50 CV = ($5/$50) . 100%
Standard deviation = $5
= 10%
Stock B:
Average price last year = $100
CV = ($5/$100) . 100%
Standard deviation = $5
= 5%
23
Both stocks have the same standard
deviation, but stock B is less variable
relative to its price, i.e. the price for stock
B is less volatile than stock A.
24