0% found this document useful (0 votes)
5 views40 pages

Descriptive Statistics: Data Summarization

Chapter 3 discusses descriptive statistics, focusing on summarizing both qualitative and quantitative data through various methods. It covers frequency distributions, relative and percent frequency distributions, and graphical representations such as bar graphs and pie charts for qualitative data, as well as dot plots and histograms for quantitative data. The chapter also introduces cumulative distributions and the five-number summary for deeper data analysis.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views40 pages

Descriptive Statistics: Data Summarization

Chapter 3 discusses descriptive statistics, focusing on summarizing both qualitative and quantitative data through various methods. It covers frequency distributions, relative and percent frequency distributions, and graphical representations such as bar graphs and pie charts for qualitative data, as well as dot plots and histograms for quantitative data. The chapter also introduces cumulative distributions and the five-number summary for deeper data analysis.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter 3

Descriptive Statistics:
Tabular and Graphical Presentations

 Summarizing Qualitative Data


 Main Ideas
 Summarizing Quantitative Data

STAT 4053 1
Summarizing Qualitative
Data
Frequency Distribution
 Relative Frequency Distribution

 Main Ideas  Percent Frequency Distribution


 Bar Graph

 Pie Chart

STAT 4053 2
Frequency Distribution

A frequency distribution is a tabular summary of


data showing the frequency (or number) of items
in each of several non overlapping classes.

The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.

STAT 4053 3
Example: Chinese
Buffet
 Customers eating at Chinese Buffet were asked to rate the quality of
their service as being excellent, above average, average, below
average, or poor.
 The ratings provided by a sample of 20 guests are:

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

STAT 4053 4
Frequency
Distribution

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

STAT 4053 5
Relative Frequency
Distribution

The relative frequency of a class is the fraction or


proportion of the total number of data items
belonging to the class.

A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.

STAT 4053 6
Percent Frequency Distribution

The percent frequency of a class is the relative


frequency multiplied by 100.

A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.

STAT 4053 7
Relative Frequency and
Percent Frequency
Distributions

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25 .10(100) =
10
Above Average .45 45
Excellent .05 5
Total 1.00 100

1/20
STAT 4053 8
= .05
Bar Graph
 A bar graph is a graphical device for depicting
qualitative data.
 On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
 A frequency, relative frequency, or percent frequen
scale can be used for the other axis (usually the
vertical axis).
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that each
class is a separate category.

STAT 4053 9
Bar Graph

Bar Chart of Chinese Buffet Quality Ratings


10
9
8
7
Frequency

6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average
STAT 4053 10
Example: Disabling injuries by industry group

STAT 4053 11
STAT 4053 12
Pie Chart
 The pie chart is a commonly used graphical device
for presenting relative frequency distributions for
qualitative data.
 First draw a circle; then use the relative
frequencies to subdivide the circle
into sectors that correspond to the
relative frequency for each class.
 Since there are 360 degrees in a circle,
a class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.

STAT 4053 13
Pie Chart
Pie Chart on Chinese Buffet Quality
Ratings
Excellent
5%
Poor
10%
Below
Average
Above 15%
Average
45%
Average
25%

STAT 4053 14
Example: Chinese
Buffet
 Insights Gained from the Preceding Pie Chart

• One-half of the customers surveyed gave Chinese B


a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.

• For each customer who gave an “excellent” rating,


there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.

STAT 4053 15
Pie chat for Disabling injuries by industry
group

STAT 4053 16
Summarizing Quantitative
Data
 Frequency Distribution
 Relative Frequency and
Percent Frequency
Distributions
 Main Ideas  Dot Plot
 Histogram
 Cumulative Distributions
 Ogive
 Stem-and-leaf
 Box-plot
STAT 4053 17
Example: Tyre 4 Less Auto Repair

The manager of Tyre 4 Less Auto


would like to have a better
understanding of the cost
of parts used in the engine
tune-ups performed in the
shop. He examines 50
customer invoices for tune-ups. The costs of
parts,
rounded to the nearest dollar, are listed on the
next STAT 4053 18
Example: Tyre 4 Less Auto Repair

 Sample of Parts Cost for 50 Tune-ups

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

STAT 4053 19
Frequency
Distribution
Guidelines for Selecting Number of Classes

• Use between 5 and 20 classes.


• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes

STAT 4053 20
Frequency Distribution
Guidelines for Selecting Width of Classes

•Use classes of equal width.

•Approximate Class Width =


Largest Data Value  Smallest Data Value
Number of Classes

STAT 4053 21
Frequency Distribution
For the Auto Repair, if we choose six classes:

Approximate Class Width = (109 - 52)/6 = 9.5 1

Parts Cost ($) Frequency


50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
STAT 4053 Total 50 22
Relative Frequency and
Percent Frequency Distributions

Parts Relative Percent


Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 2/50 26 .04(10
70-79 .32 32 0)
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100

STAT 4053 23
Relative Frequency and
Percent Frequency Distributions
 Insights Gained from the Percent Frequency
Distribution
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.

STAT 4053 24
Dot Plot
 One of the simplest graphical
summaries of data is a dot plot.
 A horizontal axis shows the range of
data values.
 Then each data value is represented by
a dot placed above the axis.

STAT 4053 25
Dot Plot
Tune-up Parts Cost

.
. .. . . .
. .. .. .. .. . .
. . ..... .......... .. . .. . . ... . ..
50 60 70 80 90 100 110
Cost ($)

STAT 4053 26
Histogram
 Another common graphical presentation of
quantitative data is a histogram.
 The variable of interest is placed on the horizontal
axis.
 A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency
relative frequency, or percent frequency.
 Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.

STAT 4053 27
Histogram
Tune-up Parts Cost
18
16
14
Frequency

12
10
8
6
4
2
STAT 4053
Parts 28
Cost ($)
5059 6069 7079 8089 9099 100-110
Histogram
Symmetric

Left tail is the mirror image of the right tail

Examples: heights or weights of people

.35
Relative Frequency

.30
.25
.20
.15
.10
.05
0 STAT 4053 29
Histogram
Moderately Skewed Left
 A longer tail to the left
 Example: exam scores

.35
Relative Frequency

.30
.25
.20
.15
.10
.05
0 STAT 4053 30
Histogra
m
Moderately Right Skewed
 A Longer tail to the right
 Example: housing values

.35
Relative Frequency

.30
.25
.20
.15
.10
.05
0
STAT 4053 31
Histogra
m Highly Skewed Right
 A very long tail to the right
 Example: executive salaries
.35
Relative Frequency

.30
.25
.20
.15
.10
.05
0

STAT 4053 32
Cumulative
Distributions

Cumulative frequency distribution  shows the


number of items with values less than or equal to
the upper limit of each class..

Cumulative relative frequency distribution – shows


the proportion of items with values less than or
equal to the upper limit of each class.

Cumulative percent frequency distribution – shows


the percentage of items with values less than or
equal to the upper limit of each class.
STAT 4053 33
Cumulative
Distributions
Tire 4 Less Auto Repair

Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 2 + .62 15/50 62 .30(100
< 89 38 13 .76 76 )
< 99 45 .90 90
< 109 50 1.00 100

STAT 4053 34
The Five-Number Summary

1 Smallest Value

2 First Quartile

3 Median

4 Third Quartile

5 Largest Value

STAT 4053 35
Five-Number Summary

Lowest Value = 425 First Quartile = 445


Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
STAT 4053 36
Box
Plot
 A box is drawn with its ends located at the first and
third quartiles.
 A vertical line is drawn in the box at the location of
the median (second quartile).

37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
STAT 4053 37
Box Plot
 Limits are located (not drawn) using the
interquartile range (IQR).
 Data outside these limits are considered
outliers.
 The locations of each outlier is shown with the
 The lower
symbol *. limit is located 1.5(IQR) below Q1.
Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5

 The upper limit is located 1.5(IQR) above Q3.

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75)


= 637.5
 There are no outliers (values less than 332.5 or
greater than 637.5) in the apartment rent data.
STAT 4053 38
Box Plot
 Whiskers (dashed lines) are drawn from the
ends of the box to the smallest and largest
data values inside the limits.

37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615

STAT 4053 39
Summary of Tabular and Graphical
Procedures
Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

•Bar Graph •Frequency


•Frequency
•Pie Chart Distribution •Dot Plot
Distribution
•Rel. Freq. Dist. •Rel. Freq. Dist. •Histogram
•Percent Freq. •Cum. Freq. Dist. •Box Plot
Distribution •Cum. Rel. Freq.
Distribution
•Stem-and-Leaf
STAT 4053
Display 40

You might also like