Lecture 1.
Part 1
Chapter 1
What is statistics?
1
What is statistics?
• We are constantly being bombarded with
statistics and statistical information. For
example:
– customer surveys
– economic predictions
– marketing information
– political polls (proportion of people voting
for candidate A or policy B)
• How can we make sense out of all this data?
What is statistics?
Statistics
Information
Data
“Statistics is a way to get information from data”
Data: Facts, especially Information:
numerical facts, Knowledge
collected together for communicated
reference or concerning some
information. particular fact.
“Statistics is a tool for creating new
understanding from a set of numbers”
Definitions: Oxford English Dictionary
Key statistical concepts
Population
• A population is the group of all items of interest to a
statistics practitioner.
• Frequently very large; sometimes infinite.
Example 1.3 page 3: An Australian automobile club
and a new life insurance policy. The population is all
current million or so members of an automobile club
Sample
• A sample is a set of data drawn from the population.
• Potentially very large, but less than the population.
Example 1.3 (contd.): A sample of 500 members of
the club selected.
Key statistical concepts
Parameter
• A descriptive measure of a population.
Example 1.3 (Contd.): the proportion of all
members who would purchase the new life
insurance policy
Statistic
• A descriptive measure of a sample.
Example 1.3 (Contd.): the proportion of 500
selected members who would purchase the
new life insurance policy
Key statistical concepts
Population
Sample
Subset
Statistic
Parameter
• Populations have parameters
• Samples have statistics.
• Examples of parameters / populations and
statistics / samples??
Descriptive statistics
• Methods of organizing, summarizing, and
presenting data in a convenient and informative
way. These methods include:
– graphical techniques (Chapter 2), and
– numerical techniques (Chapter 4).
• The actual method used depends on what
information you would like to extract. Are you
interested in:
– measure(s) of central location and/or
– measure(s) of variability (dispersion)?
Descriptive statistics helps to answer these questions.
Inferential statistics
• Descriptive statistics describes the data set
that is being analysed, but does not allow us
to draw any conclusions or make any
inferences about the data. Hence we need
another branch of statistics: inferential
statistics.
• Inferential statistics is also a set of methods,
but it is used to draw conclusions or inferences
about characteristics of populations based on
data from a sample.
Statistical inference
• Statistical inference is the process of making
an estimate, prediction, or decision about a
population based on a sample.
Population Sample
Inference
Parameter Statistic
What can we infer about a population’s parameters
based on a sample’s statistics?
Statistical inference
• We use the sample statistics to make inferences
about the population parameters.
• Therefore, we can make an estimate, prediction,
or decision about a population based on sample
data.
• Thus, we can apply what we know about a sample
to the larger population from which it was drawn.
Example 1.3 (contd.): Suppose 60 out of 500
selected members want to purchase the new life
insurance policy (12%) Þ a statistical inference
may be made: about 12 % (or at least 10%) of all
one million members would purchase the new
policy.
Statistical inference
• Rationale
– Large populations make investigating each
member impractical and expensive.
– It is easier and cheaper to take a sample
and make estimates about the population
from the sample.
• However
– Such conclusions and estimates are not
always going to be correct.
– For this reason, we build into the statistical
inference 'measures of reliability', namely
confidence level and significance level.
Statistical applications in
economics & business
• Statistical analysis plays an important role in virtually all
aspects of business and economics.
• Throughout this course, we will see applications of
statistics in:
– accounting
– economics
– finance
– human resources management
– marketing
– and operations management.
Lecture 1. Part 2
Chapter 2
Graphical descriptive methods
2.1 Types of data
2.2 Graphical and tabular techniques for
nominal data
13
Re-cap
Descriptive statistics involves arranging,
summarizing, and presenting a set of data in such a
way that useful information is produced.
Statistics
Data Information
Its methods make use of graphical techniques and
numerical descriptive measures (such as averages)
to summarize and present the data.
14
Some useful definitions
A variable is some characteristic of a population
or sample. E.g. the marks of IB2015D on the
math exam (example, page 21)
Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of
possible values for a variable.
E.g. student marks 0, 1, 2, …., 100
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
15
2.1 Types of data
Data (at least for purposes of statistics) fall into
three main groups:
• Numerical (interval or quantitative) data
• Nominal (categorical or qualitative) data
• Ordinal (ranked) data
16
Types of data – Example page 21
Numerical data Nominal data Ordinal data
age income person married exam grade
55 75 000 1 yes HD
D
42 68 000 2 no C
. . 3 no P
. . . .
. . F
computer
weight gain With nominal data, all we brand
can calculate is1 the With ordinal
IBM Fooddata, all we
quality
+10
proportion of 2
data that can use is computations
Dell Excellent
+5 Good
falls into each 3
category. involving
Compaq the ordering
. 4 IBM
Satisfactory
process.
. Poor
. .
IBM Dell Compaq other total
25 11 8 6 50
50% 22% 16% 12%
17
Calculations for Types of Data
As mentioned above,
• All calculations are permitted on numerical
data.
• No calculations are allowed for nominal data,
except counting the number of observations in
each category and calculating their proportions.
• Only calculations involving a ranking process
are allowed for ordinal data.
18
2.2 Graphical and tabular
techniques for nominal data
The only allowable calculation on nominal data is
to count the frequency of each value of the
variable.
We can summarize the data in a table that
presents the categories and their counts called a
frequency distribution.
A relative frequency distribution lists the
categories and the proportion with which each
occurs.
19
Example 2.1 (page 28)
• To determine the approximate market share of
various women’s magazines in New Zealand, a
women’s magazine readership survey was
conducted using a sample of 200 readers.
• Data was collected and the count of the
occurrences (frequencies) was recorded for each
magazine.
• The frequencies were presented in a bar chart.
• Then the frequencies were converted to
proportions and the results were presented in a
pie chart.
20
Example 2.1 (contd.)
1 = Australian Women’s Weekly (NZ Edition); 2 = Next;
3 = NZ New Idea; 4 = NZ Woman’s Day; 5 = NZ Women’s
Weekly; and 6 = That’s Life.
Magazine Frequency Percentage
Australian Women’s Week ly, NZ Edn (1) 36 18
Next (2) 20 10
NZ New Idea (3) 28 14
NZ Women’s Day (4) 56 28
NZ Women’s Week ly (5) 42 21
That’s Life (6) 18 9
200 100
21
Example 2.1 cont. (Excel representation)
22
The size of each slice in a pie chart is proportional
to the percentage corresponding to the category it
represents.
(10/100)(3600) = 360
23
Summary: page 76
Home assignment:
- Section 2.1 Exercises pages 26 -27: 2.3,
2.5, 2.8
- Section 2.2 Exercises pages 38 -43: 2.11,
2.12, 2.13
- Sections 2.1 & 2.2 Supplementary
exercises pages 77-79: 2.74
24