Chapter 3
Descriptive Statistics:
Tabular and Graphical Presentations
Summarizing Qualitative Data
Main Ideas
Summarizing Quantitative Data
STAT 4053 1
Summarizing Qualitative
Data
Frequency Distribution
Relative Frequency Distribution
Main Ideas Percent Frequency Distribution
Bar Graph
Pie Chart
STAT 4053 2
Frequency Distribution
A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
in each of several non overlapping classes.
The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
the original data.
STAT 4053 3
Example: Chinese
Buffet
Customers eating at Chinese Buffet were asked to rate the quality of
their service as being excellent, above average, average, below
average, or poor.
The ratings provided by a sample of 20 guests are:
Below Average Average Above Average
Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average
STAT 4053 4
Frequency
Distribution
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
STAT 4053 5
Relative Frequency
Distribution
The relative frequency of a class is the fraction or
proportion of the total number of data items
belonging to the class.
A relative frequency distribution is a tabular
summary of a set of data showing the relative
frequency for each class.
STAT 4053 6
Percent Frequency Distribution
The percent frequency of a class is the relative
frequency multiplied by 100.
A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.
STAT 4053 7
Relative Frequency and
Percent Frequency
Distributions
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) =
10
Above Average .45 45
Excellent .05 5
Total 1.00 100
1/20
STAT 4053 8
= .05
Bar Graph
A bar graph is a graphical device for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent frequen
scale can be used for the other axis (usually the
vertical axis).
Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
The bars are separated to emphasize the fact that each
class is a separate category.
STAT 4053 9
Bar Graph
Bar Chart of Chinese Buffet Quality Ratings
10
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
STAT 4053 10
Example: Disabling injuries by industry group
STAT 4053 11
STAT 4053 12
Pie Chart
The pie chart is a commonly used graphical device
for presenting relative frequency distributions for
qualitative data.
First draw a circle; then use the relative
frequencies to subdivide the circle
into sectors that correspond to the
relative frequency for each class.
Since there are 360 degrees in a circle,
a class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.
STAT 4053 13
Pie Chart
Pie Chart on Chinese Buffet Quality
Ratings
Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%
STAT 4053 14
Example: Chinese
Buffet
Insights Gained from the Preceding Pie Chart
• One-half of the customers surveyed gave Chinese B
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.
STAT 4053 15
Pie chat for Disabling injuries by industry
group
STAT 4053 16
Summarizing Quantitative
Data
Frequency Distribution
Relative Frequency and
Percent Frequency
Distributions
Main Ideas Dot Plot
Histogram
Cumulative Distributions
Ogive
Stem-and-leaf
Box-plot
STAT 4053 17
Example: Tyre 4 Less Auto Repair
The manager of Tyre 4 Less Auto
would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. He examines 50
customer invoices for tune-ups. The costs of
parts,
rounded to the nearest dollar, are listed on the
next STAT 4053 18
Example: Tyre 4 Less Auto Repair
Sample of Parts Cost for 50 Tune-ups
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
STAT 4053 19
Frequency
Distribution
Guidelines for Selecting Number of Classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes
STAT 4053 20
Frequency Distribution
Guidelines for Selecting Width of Classes
•Use classes of equal width.
•Approximate Class Width =
Largest Data Value Smallest Data Value
Number of Classes
STAT 4053 21
Frequency Distribution
For the Auto Repair, if we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 1
Parts Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
STAT 4053 Total 50 22
Relative Frequency and
Percent Frequency Distributions
Parts Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(10
70-79 .32 32 0)
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100
STAT 4053 23
Relative Frequency and
Percent Frequency Distributions
Insights Gained from the Percent Frequency
Distribution
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.
STAT 4053 24
Dot Plot
One of the simplest graphical
summaries of data is a dot plot.
A horizontal axis shows the range of
data values.
Then each data value is represented by
a dot placed above the axis.
STAT 4053 25
Dot Plot
Tune-up Parts Cost
.
. .. . . .
. .. .. .. .. . .
. . ..... .......... .. . .. . . ... . ..
50 60 70 80 90 100 110
Cost ($)
STAT 4053 26
Histogram
Another common graphical presentation of
quantitative data is a histogram.
The variable of interest is placed on the horizontal
axis.
A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency
relative frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
STAT 4053 27
Histogram
Tune-up Parts Cost
18
16
14
Frequency
12
10
8
6
4
2
STAT 4053
Parts 28
Cost ($)
5059 6069 7079 8089 9099 100-110
Histogram
Symmetric
Left tail is the mirror image of the right tail
Examples: heights or weights of people
.35
Relative Frequency
.30
.25
.20
.15
.10
.05
0 STAT 4053 29
Histogram
Moderately Skewed Left
A longer tail to the left
Example: exam scores
.35
Relative Frequency
.30
.25
.20
.15
.10
.05
0 STAT 4053 30
Histogra
m
Moderately Right Skewed
A Longer tail to the right
Example: housing values
.35
Relative Frequency
.30
.25
.20
.15
.10
.05
0
STAT 4053 31
Histogra
m Highly Skewed Right
A very long tail to the right
Example: executive salaries
.35
Relative Frequency
.30
.25
.20
.15
.10
.05
0
STAT 4053 32
Cumulative
Distributions
Cumulative frequency distribution shows the
number of items with values less than or equal to
the upper limit of each class..
Cumulative relative frequency distribution – shows
the proportion of items with values less than or
equal to the upper limit of each class.
Cumulative percent frequency distribution – shows
the percentage of items with values less than or
equal to the upper limit of each class.
STAT 4053 33
Cumulative
Distributions
Tire 4 Less Auto Repair
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + .62 15/50 62 .30(100
< 89 38 13 .76 76 )
< 99 45 .90 90
< 109 50 1.00 100
STAT 4053 34
The Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
STAT 4053 35
Five-Number Summary
Lowest Value = 425 First Quartile = 445
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
STAT 4053 36
Box
Plot
A box is drawn with its ends located at the first and
third quartiles.
A vertical line is drawn in the box at the location of
the median (second quartile).
37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
STAT 4053 37
Box Plot
Limits are located (not drawn) using the
interquartile range (IQR).
Data outside these limits are considered
outliers.
The locations of each outlier is shown with the
The lower
symbol *. limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5
The upper limit is located 1.5(IQR) above Q3.
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75)
= 637.5
There are no outliers (values less than 332.5 or
greater than 637.5) in the apartment rent data.
STAT 4053 38
Box Plot
Whiskers (dashed lines) are drawn from the
ends of the box to the smallest and largest
data values inside the limits.
37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615
STAT 4053 39
Summary of Tabular and Graphical
Procedures
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabular Graphical
Methods Methods Methods Methods
•Bar Graph •Frequency
•Frequency
•Pie Chart Distribution •Dot Plot
Distribution
•Rel. Freq. Dist. •Rel. Freq. Dist. •Histogram
•Percent Freq. •Cum. Freq. Dist. •Box Plot
Distribution •Cum. Rel. Freq.
Distribution
•Stem-and-Leaf
STAT 4053
Display 40