Online Course #1
Descriptive Statistics
Roger L. Brown, Ph.D.
Medical Research Consulting
Middleton, WI
This online course is a FREE
service to all MRC clients
Purpose of this series
To assist researchers in the
interpretation and application of
statistical analyses
Statistics ?
The Science of collecting,
organizing, analyzing,
interpreting and presenting data
Topics we will review
• Descriptive Statistics
• Frequency Distributions and Histograms
Relative / Cumulative Frequency
• Measures of Central Tendency
Mean, Median, Mode, Midrange
Topics (continued)
• Measures of Dispersion (Variation)
Range, Standard Deviation,
Variance and Coefficient of variation
• Shape
Symmetric, Skewed, using Box-and-
Whisker Plots
• Quartile
• Statistical Relationships
Correlation , Covariance
Descriptive Statistics
A collection of quantitative measures and
ways of describing data. This includes:
Frequency distributions & histograms,
measures of central tendency
and
measures of dispersion
Descriptive Statistics
•Collect Data e.g. Survey
•Present Data e.g. Tables and Graphs
•Characterize Data e.g. Mean xi
n
A Characteristic of a:
Population is a Parameter
Sample is a Statistic.
Collection of Data
Survey/questionnaires/interviews
Direct observation
Secondary data source (e.g., Medical charts)
Presenting Data
Graphics
The visual representation of data may be used not
only to present results/findings in the data, but
may also be used to learn about the data.
Summary Measures in Descriptive
Statistics
Summary Measures
Central Tendency Quartile Variation
Mean Mode
Range Coefficient of
Median Variation
Midrange Variance
Standard Deviation
Measures of Central Tendency
Central Tendency
Mean Median Mode
Midrange
The Mean (Arithmetic Average)
•It is the Arithmetic Average of data values:
x
n
xi xi x2 xn
i 1
Sample Mean
n n
•The Most Common Measure of Central Tendency
•Affected by Extreme Values (Outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 5 Mean = 6
The Median
•Important Measure of Central Tendency
•In an ordered array, the median is the
“middle” number.
•If n is odd, the median is the middle number.
•If n is even, the median is the average of the 2
middle numbers.
•Not Affected by Extreme Values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5 Median = 5
The Mode
•A Measure of Central Tendency
•Value that Occurs Most Often
•Not Affected by Extreme Values
•There May Not be a Mode
•There May be Several Modes
•Used for Either Numerical or Categorical Data
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9 No Mode
Midrange
•A Measure of Central Tendency
•Average of Smallest and Largest
Observation:
x l arg est x smallest
Midrange
2
•Affected by Extreme Value
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10
10 Midrange = 5 Midrange = 5
Summary Measures in Descriptive
Statistics
Summary Measures
Central Tendency Quartile Variation
Mean Mode
Range Coefficient of
Median Variation
Midrange Variance
Standard Deviation
Quartiles
Not a Measure of Central Tendency
Split Ordered Data into 4 Quarters
25% 25% 25% 25%
Position of i-th
Q1 Quartile:
Q2 Q3 of point
position
i(n+1)
Qi 4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q1 = 1•(9 + 1) = 2.50 Q1 =12.5
4
Quartiles
Not a Measure of Central Tendency
Split Ordered Data into 4 Quarters
25% 25% 25% 25%
Position of i-th
Q1 Quartile:
Q2 Q3 of point
position
i(n+1)
Qi 4
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
Position of Q3 = 3•(9 + 1) = 7.50 Q3 =19.5
4
Summary Measures
Summary Measures
Central Tendency Quartile Variation
Mean Mode
Range Coefficient of
Median Variation
Variance
Midrange
Standard Deviation
Measures of Dispersion (Variation)
Variation
Variance Standard Deviation Coefficient of
Variation
Range Population Population
Variance Standard
Sample Deviation
Variance Sample
Standard
Deviation
Understanding Variation
• The more Spread out or dispersed data
the larger the measures of variation
• The more concentrated or homogenous the data
the smaller the measures of variation
• If all observations are equal
measures of variation = Zero
• All measures of variation are Nonnegative
The Range
• Measure of Variation
• Difference Between Largest & Smallest
Observations:
Range = x La rgest x Smallest
• Ignores How Data Are Distributed:
Range = 12 - 7 = 5 Range = 12 - 7 = 5
7 8 9 10 11 7 8 9 10 11
12 12
Variance
•Important Measure of Variation
•Shows Variation About the Mean:
2
•For the Population: Xi
2
N
X i X
2
2
•For the Sample: s
n 1
For the Population: use N in the For the Sample : use n - 1
denominator. in the denominator.
Standard Deviation
•Most Important Measure of Variation
•Shows Variation About the Mean:
X i
2
•For the Population:
N
X i X
2
•For the Sample: s
n 1
For the Population: use N in the For the Sample : use n - 1
denominator. in the denominator.
Sample Standard Deviation
s Xi X
2
For the Sample : use n - 1
in the denominator.
n 1
Data: Xi : 10 12 14 15 17 18 18
24
n=8 Mean =16
s= (10 16 ) 2 (12 16 ) 2 (14 16 ) 2 (15 16 ) 2 (17 16 ) 2 (18 16 ) 2 ( 24 16 ) 2
8 1
= 4.2426
Comparing Standard Deviations
Data : X i : 10 12 14 15 17 18 18 24
N= 8 Mean =16
X i X
2
s = = 4.2426
n 1
X i
2
= 3.9686
N
Value for the Standard Deviation is larger for data considered as a Sample.
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = .9258
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 4.57
Coefficient of Variation
Measure of Relative Variation
Always a%
Shows Variation Relative to Mean
Used to Compare 2 or More Groups
Formula ( for Sample):
S
CV 100%
X
Comparing Coefficient of Variation
Group A: Average Health Measure = 50
Standard Deviation = 5
Group B: Average Health Measure = 100
Standard Deviation = 5
Coefficient of Variation:
S
CV 100% Group A: CV = 10%
X
Group B: CV = 5%
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
-0.5 <0 < 0.5
Symmetric
Mean = Median = Mode
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
< -1 -0.5 <0 < 0.5
Left-Skewed Symmetric
Mean Median Mode Mean = Median = Mode
Shape
Describes How Data Are Distributed
Measures of Shape:
Symmetric or skewed
< -1 -0.5 <0 < 0.5 >1
Left-Skewed Symmetric Right-Skewed
Mean Median Mode Mean = Median = Mode Mode Median Mean
Negatively Skewed Positively Skewed
Box-and-Whisker Plot
Graphical Display of Data Using
5-Number Summary
X smallest Q1 Median Q3 Xlargest
4 6 8 10 12
Distribution Shape &
Box-and-Whisker Plots
Left-Skewed Symmetric Right-Skewed
Q1 Median Q3 Q1 Median Q3 Q1 Median Q3
Summary
Discussed Measures of Central Tendency
Mean, Median, Mode, Midrange
Quartiles
Addressed Measures of Variation
The Range, Interquartile Range, Variance,
Standard Deviation, Coefficient of Variation
Determined Shape of Distributions
Symmetric, Skewed, Box-and-Whisker Plot
Mean Median Mode Mean = Median = Mode Mode Median Mean