Statistics
What is Statistics?
• Science of gathering, analyzing, interpreting,
and presenting data
• Branch of mathematics
• Facts and figures
• Measurement taken on a sample
Statistics is the scientific method that
enables us to make decisions as responsibly
as possible.
Statistics in Business
• Accounting — auditing and cost estimation
• Economics — regional, national, and international economic performance
• Finance — investments and portfolio management
• Management — human resources, compensation, and quality
management
• Management Information Systems — (ERP): performance of systems
which gather, summarize, and disseminate information to various
managerial levels
• Marketing — market analysis and consumer research
• International Business — market and demographic analysis
Statistics…
• The science of data to answer research
questions
– Formulate a research question(s) (hypothesis)
– Collect data
– Analyze and summarize data
– Draw conclusions to answer research
questions
• Statistical Inference
– In the presence of variation
Answers Questions from Everyday
Life
• Business: Will a new marketing strategy be
profitable?
• Industry: Will a product’s life exceed the
warranty period?
• Medicine: Will this year’s flu vaccine reduce the
chance of flu?
• Education: Will technology improve learning?
• Government: Will a change in interest rates
affect inflation?
5
Statistics: Science of
variability..?
• Virtually everything varies
• Variation occurs among individuals
• Variation occurs within any one individual
as time passes
Population Versus Sample
• Population — the whole
– a collection of persons, objects, or items under study
– The entire group of individuals in a statistical study we
want information about.
• Census — gathering data from the entire
population
• Sample — a portion of the whole
– a subset of the population
– a part of the population from which we actually collect
information, used to draw conclusions about the
whole (statistical inference
7
Statistics can be split into two
broad categories
1. Descriptive statistics
2. Inferential statistics
Descriptive Statistics
Collect data
ex. Survey
Present data
ex. Tables and graphs
Characterize data
ex. Sample mean = X i
n
Descriptive statistics..
• Encompasses the following:
– Graphical or pictorial display
– Condensation of large masses of data into a
form such as tables
– Preparation of summary measures to give a
concise description of complex information (e.g.
an average figure)
– Exhibition of patterns that may be found in sets
of information
10
Inferential Statistics
Estimation
ex. Estimate the population
mean weight using the
sample mean weight
Hypothesis testing
ex. Test the claim that the
population mean weight is
120 pounds
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Inferential Statistics..
• Especially relates to:
– Determining whether characteristics of a situation
are unusual or if they have happened by chance
– Estimating values of numerical quantities and
determining the reliability of those estimates
– Using past occurrences to attempt to predict the
future
Process of Inferential Statistics
Calculate x
to estimate
Population Sample
x
(parameter) (statistic)
Select a
random sample
13
Population vs. Sample
Population Sample
Measures used to describe the Measures computed from
population are called parameters sample data are called statistics
Parameter vs. Statistic
• Parameter — descriptive measure of the
population
– Usually represented by Greek letters
• Statistic — descriptive measure of a
sample
– Usually represented by Roman letters
Symbols for Population and
Sample Parameters
denotes population parameter
2
denotes population variance
denotes population standard deviation
x denotes sample mean
S
2
denotes sample variance
S denotes sample standard deviation
Types of Variables
Categorical (qualitative) variables have values
that can only be placed into categories, such as
“yes” and “no.”
Numerical (quantitative) variables have
values that represent quantities.
Types of Variables
Data
Categorical Numerical
Examples:
Marital Status
Political Party Discrete Continuous
Eye Color
Examples: Examples:
(Defined categories)
Number of Children Weight
Defects per hour Voltage
(Counted items) (Measured
characteristics)
Levels of Data Measurement
• Nominal — Lowest level of measurement
• Ordinal
• Interval
• Ratio — Highest level of measurement
19
Levels of Measurement
A nominal scale classifies data into distinct
categories in which no ranking is implied.
Categorical Variables Categories
Personal Computer Yes / No
Ownership
Type of Stocks Owned Growth Value Other
Internet Provider Microsoft Network / AOL
Levels of Measurement
An ordinal scale classifies data into distinct
categories in which ranking is implied
Categorical Variable Ordered Categories
Student class designation Freshman, Sophomore, Junior,
Senior
Product satisfaction Satisfied, Neutral, Unsatisfied
Faculty rank Professor, Associate Professor,
Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.
A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
Interval and Ratio Scales
Usage Potential of Various
Levels of Data
Ratio
Interval
Ordinal
Nominal
Data Level, Operations,
and Statistical Methods
Statistical
Data Level Meaningful Operations
Methods
Nominal Classifying and Counting Nonparametric
Ordinal All of the above plus Ranking Nonparametric
Interval All of the above plus Addition, Parametric
Subtraction
All of the above plus
Ratio multiplication and division Parametric
Data preparation rules
• Data presented must be
– factual
– relevant
Before presentation always check:
• the source of the data
• that the data has been accurately
transcribed
• the figures are relevant to the problem
Methods of visual presentation
of data
• Table
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 90 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
Methods of visual presentation
of data
• Graphs
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Methods of visual presentation
of data
• Pie chart
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Methods of visual presentation
of data
• Multiple bar chart
4th Qtr
3rd Qtr
North
West
2nd Qtr East
1st Qtr
0 20 40 60 80 100
Methods of visual presentation
of data
• Simple pictogram
100
80
60
40
North
20
East
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr West
Frequency distributions
• Frequency tables
Observation Table
Class Interval Frequency Cumulative Frequency
< 20 13 13
<40 18 31
<60 25 56
<80 15 71
<100 9 80
Frequency diagrams
Frequency
30
25 Frequency
20 Cumulative Frequency
15
10
90
80
5 70
0 60
50
< 20 <40 <60 <80 <100 Cumulative Frequency
40
30
20
10
0
< 20 <40 <60 <80 <100
Frequency
30
25
20
15 Frequency
10
5
0
< 20 <40 <60 <80 <100
Ungrouped Versus Grouped
Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency
distribution
Example of Ungrouped
Data
42 26 32 34 57
30 58 37 50 30
Ages of a Sample of
53 40 30 47 49
Managers from
50 40 32 31 40
XYZ
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52
49 33 43 46 32
61 31 30 40 60
74 37 29 43 54
35
Frequency Distribution of
Ages
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
36
Data Range
42 26 32 34 57 Range = Largest - Smallest
30 58 37 50 30
53 40 30 47 49
= 74 - 23
50 40 32 31 40 = 51
52 28 23 35 25
30 36 32 26 50
55 30 58 64 52 Smallest
49 33 43 46 32
61 31 30 40 60 Largest
74 37 29 43 54
37
Number of Classes and Class
Width
• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an
approximate class width
• Round up to a convenient number
51
Approximate Class Width = = 8.5
6
Class Width = 10
38
Class Midpoint
beginning class endpoint + ending class endpoint
Class Midpoint =
2
30 + 40
=
2
= 35
1
Class Midpoint = class beginning point + class width
2
1
= 30 + 10
2
= 35
39
Relative Frequency
Relative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 6 .36
50
40-under 50 11 .22
50-under 60 11 18 .22
60-under 70 3 50 .06
70-under 80 1 .02
Total 50 1.00
40
Cumulative Frequency
Cumulative
Class Interval Frequency Frequency
20-under 30 6 6
30-under 40 18
18 + 6 24
40-under 50 11 35
11 + 24
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
41
Class Midpoints, Relative Frequencies,
and Cumulative Frequencies
Relative Cumulative
Class IntervalFrequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 1.00
Cumulative Relative Frequencies
RelativeCumulative Cumulative Relative
Class IntervalFrequency Frequency Frequency Frequency
20-under 30 6.12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3.06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
43
Common Statistical Graphs
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for
categories of a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot
44
Histogram
Class Interval
20
Frequency
20-under 30 6
Frequency
30-under 40 18
10
40-under 50 11
50-under 60 11
60-under 70 3
0
70-under 80 1 0 10 20 30 40 50 60 70 80
Years
45
Histogram Construction
Class Interval Frequency
20-under 30 6
20
30-under 40 18
40-under 50 11
Frequency
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
46
Frequency Polygon
Class Interval Frequency
20
20-under 30 6
30-under 40 18
Frequency
40-under 50 11
10
50-under 60 11
60-under 70 3
70-under 80 1
0
0 10 20 30 40 50 60 70 80
Years
47
Ogive
Cumulative
Class Interval Frequency
60
20-under 30 6
40
Frequency
30-under 40 24
40-under 50 35
20
50-under 60 46
60-under 70 49 0
70-under 80 50 0 10 20 30 40 50 60 70 80
Years
48
Relative Frequency Ogive
Cumulative
Relative
Cumulative Relative Frequency
Class Interval Frequency 1.00
0.90
20-under 30 .12 0.80
0.70
30-under 40 .48 0.60
0.50
40-under 50 .70 0.40
50-under 60 .92 0.30
0.20
60-under 70 .98 0.10
0.00
70-under 80 1.00 0 10 20 30 40 50 60 70 80
Years
49
Complaints by Passengers
Schedules,
Personnel Etc.
14% 10%
Equipment
15%
Stations, Etc.
40%
Train
Performance
21%
50
2d Quarter
Truck
Production
Company
A 357,411
B 354,936
Second C 160,997
Quarter Truck D 34,099
Production E
Totals
12,747
920,190
51
Pie Chart Calculations for
Company A
2d Quarter
Truck
Production
Company Proportion Degrees
A 357,411 .388 140
B 357, 411 354,936 .386 139
=
C 920,190 160,997 .175 63
D 34,099 .388 .037
360 = 13
E 12,747 .014 5
Totals 920,190 1.000 360
52
Second Quarter
Truck Production
17%
4%
1%
39%
39%
A B C D E
53
Pareto Chart
100 100%
90 90%
80 80%
70 70%
60 60%
Frequency
50 50%
40 40%
30 30%
20 20%
10 10%
0 0%
Poor Short in Defective Other
Wiring Coil Plug
54
Scatter Plot
Registered Gasoline Sales
Vehicles (1000's of 200
(1000's) Gallons)
Gasoline Sales
5 60
100
15 120
9 90
0
15 140 0 5 10 15 20
Registered Vehicles
7 60
55
Principles of Excellent Graphs
The graph should not distort the data.
The graph should not contain unnecessary
adornments (sometimes referred to as chart junk).
The scale on the vertical axis should begin at zero.
All axes should be properly labeled.
The graph should contain a title.
The simplest possible graph should be used for a
given set of data.
Graphical Errors: Chart Junk
Bad Presentation Good Presentation
Minimum Wage Minimum Wage
1960: $1.00
$
4
1970: $1.60
2
1980: $3.10
0
1990: $3.80 1960 1970 1980 1990
Graphical Errors:
Compressing the Vertical Axis
Bad Presentation Good Presentation
Quarterly Sales Quarterly Sales
$ $
200 50
100 25
0 0
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Graphical Errors: No Zero Point
on the Vertical Axis
Bad Presentation
Good Presentations
Monthly Sales $ Monthly Sales
$ 45
45
42
42 39
39 36
36 0
J F M A M J J F M A M J
Graphing the first six months of sales
• [Link]
resenting_data.html
• [Link]
6
1