0% found this document useful (0 votes)
58 views126 pages

Introduction to Descriptive Statistics

The document provides an introduction to descriptive statistics, covering key concepts such as the definition of statistics, types of statistics (descriptive and inferential), and the importance of statistical methods in decision-making across various fields. It also discusses data collection methods, sampling techniques, and levels of measurement, including nominal, ordinal, interval, and ratio scales. Additionally, it highlights the significance of understanding variables and statistical studies in analyzing data effectively.

Uploaded by

niizzatiisme
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views126 pages

Introduction to Descriptive Statistics

The document provides an introduction to descriptive statistics, covering key concepts such as the definition of statistics, types of statistics (descriptive and inferential), and the importance of statistical methods in decision-making across various fields. It also discusses data collection methods, sampling techniques, and levels of measurement, including nominal, ordinal, interval, and ratio scales. Additionally, it highlights the significance of understanding variables and statistical studies in analyzing data effectively.

Uploaded by

niizzatiisme
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

WEEK 1

CHAPTER 1 : DESCRIPTIVE STATISTICS


I) INTRODUCTION TO
STATISTICS
LEARNING OUTCOMES

What is Uses of Types of


statistics? statistics statistics

Common Sources of Types of


statistical data variables
terms
Level of Data Sampling
measurement Collection techniques
WHY
STUDY STATISTICS Data are everywhere.

Statistical techniques are used to make


many decisions that affect our lives.
No matter what your career, you will make
professional decisions that involve data. An
understanding of statistical methods will help
you make these decisions effectively.
WHAT IS MEANT BY STATISTICS?

In the more common We often present statistical


usage, statistics refers to information in a graphical form
numerical information for capturing reader attention
and to portray a large amount
of information.

✓ the average starting salary of college graduates


✓ the number of deaths due to motor accident last year
✓ the number of student enroll at a university this semester
FORMAL DEFINITION
collect presenting interpreting

organizing analyzing decisions

STATISTICSis the science of collecting, organizing, presenting, analyzing


and interpreting numerical data to assist in making more effective
decisions.
WHO USES STATISTICS?

01 02 03 04

Medicine Education Business Others


• Effectiveness of • Predict most • Predict Sales
drugs favourite subject • Consumer
• Predict diseases • Predict CGPA Preferences
• Law • Financial
• Organize evidence Trends
to make decision
TYPES OF STATISTICS

Describe the situation


Descriptive
Methods of organizing,
summarizing, and presenting
data in an informative way.
STATISTICS

A decision, estimate, prediction,


or generalization about a
Inferential population, based on a sample
(make inferences about
population based on sample)
EXAMPLE
Out of 350 randomly selected students in the faculty FSKM, Shah Alam,
180 students had the first name Mohd.

Descriptive Inferential

“51% of these "51% of FSKM


students have the students have the first
first name Mohd.." name Mohd.."
COMMON STATISTICAL TERMS
A population (universe) is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest.

Population Sample
All items Items selected from population
Population
Population parameter: Numerical value that describes a characteristic of a population
Calculation: Calculated using data from the entire population
•All items of interest
Purpose: Used to describe the entire population opulation
•Censusof– all
Example: The population mean income if the study involve
households in the
a country, census
Population whole population
arameter
•Parameter – summary measure of
the whole population
Sample
Sample statistics: Numerical value •Portion of population
that describes a characteristic of a sample
•Sample
Calculation: Calculated using data from a sampleinvolved
survey – of the population
subgroup (or
Sample
Purpose: Used sample) of
to estimate the population selected population
parameter ample
•Statistic
Example: The sample mean of income in a randomlymeasure
– summary selected group of households
computed from sample data tatistics

• a small survey taken in advance


Pilot Study
Pilot study before a major observations
Definition: A pre-study which is conducted on a small scale before a larger research effort
Purpose: Can help identify design issues and evaluate a study’s feasibility, practicality,
resources, time, and cost before the main research is conducted. Feedback from
participants in the pilot study can be used to improve the experience for participants in the
main study.
SOURCES OF DATA

Secondary
Primary • Taken from other investigator’s collection of
• First hand data figures
• Collected by the investigator • Data collected from other parties
• Eg. Interview respondents, survey, experiment • eg. Bank Negara, Statistics Department
• Advantage – more accurate and consistent • Advantage – easily accessible from the internet,
• Able to explain how the data are collected and journals, books, annual reports etc and
limitation used inexpensive, less time to collect
• Disadvantage – requires more time, manpower • Disadvantage – lack accuracy because
and high cost method of data collection are not explained
and biased – original purpose of data
collection is not known
TYPES OF VARIABLES

Qualitative Quantitative

Categorical Discrete Continuous

e.g. • mostly integers or


• make of a computer • obtained through
numbers used for
• hair color measuring process
counting
• gender • e.g. length, age,
• e.g. number of
height, weight, time
houses, cars,
accidents
LEVELS OF MEASUREMENT

Nominal Ordinal Interval Ratio

categorical (names) nominal, plus can ordinal, plus interval, plus ratios are
be ranked (order) intervals are consistent, true zero
consistent
i. NOMINAL DATA
• Represent observations that can be categorized, do not have a
meaningful numeric value
Nominal • Examples: Gender, Religion, Nationality, Favorite colour, Number
on a football jersey

Properties:
1. Observations of a qualitative variable can only be classified and
counted.
2. There is no particular order to the labels.

Note:
• The values cannot be compared to see if one is larger than
the other
• Cannot calculate the MEAN
ii. ORDINAL DATA
• Represent observations that can be categorized and rank ordered
• The values can be compared to see if one is larger or smaller than the other
• Examples:
Ordinal o Consumer satisfaction ratings,
o Military rank - Private, Lieutenant, Captain, General
o Class ranking - Grade (A, B, C, D, E, F)

Properties:
1. Data classifications are represented by sets of labels or names
(high, medium, low) that have relative values.
2. Because of the relative values, the data classified can be ranked
or ordered.

Note:
• cannot assume the differences between adjacent scale
values are equal
iii. INTERVAL DATA
• Represent observations that can be categorized, rank ordered, and have
a unit of measure
Interval • A unit of measure implies that the difference between any two successive
values is identical
• Examples: Farennheit temperature scale

Properties:
1. Interval data are ordered, can be continuous or discrete
2. The degree of difference between items is meaningful (their
intervals are equal)
Note:
o Can be added or subtracted (cannot be multiplied or divided)
o Interval data can be negative
o No true zero point (A value of zero on an interval scale does not
mean the absence of the variable)
iv. RATIO DATA
• Highest and most informative scale
• Observations that can be categorized, rank ordered, have a unit
measure and have a true zero (an absolute zero point)
Ratio • The true zero implies that a value zero represents the complete
absence of the variable
• Examples:
- amount of money – zero money indicates the absence of money
- time
Properties:
1. Data classifications are ordered according to the amount of the
characteristics they possess.
2. Equal differences in the characteristic are represented by equal
differences in the numbers assigned to the classifications.
3. The zero point is the absence of the characteristic and the ratio
between two numbers is meaningful.
◦ Note:
• Can be multiplied or divided
Strongest forms
Ratio of measurement
Highest
scale

Interval

to

Ordinal

Lowest Weakest form of


Nominal scale measurement
VARIABLES AND LEVELS OF MEASUREMENT

Variable Nominal Ordinal Interval Ratio Level


Hair colour ✓  Nominal
Postcode ✓  Nominal
Letter Grade ✓ ✓  Ordinal
CGPA ✓ ✓ ✓  Interval
Temperature ✓ ✓ ✓  Interval
(F)
Height ✓ ✓ ✓ ✓ Ratio
Age ✓ ✓ ✓ ✓ Ratio
TYPES OF STATISTICAL STUDIES
Experimental
Observational
Study
Study
The researcher merely o The researcher manipulates the
observes and tries to independent (explanatory)
draw conclusions variable and tries to determine
based on the how the manipulation influences
observations. the dependent (outcome)
variable.
o A confounding variable
influences the dependent
variable but cannot be
separated from the
independent variable.
EXAMPLE
Traffic offence is a growing concern at Dewan Bandaraya in Kuala Lumpur.
A study was conducted to determine the profile of these traffic offenders.
A researcher from this office collected data on the age, gender, race, types
of offence, the amount of fine paid and the years of driving experience from a
sample of traffic offenders as they entered the building to pay their fines. The
researcher also checked the office database to obtain the number of traffic
offences by these drivers.

i. State the population for the above study.


ii. Is the above study a census study or sample study?
iii. Was any secondary data used for the above study? If there was, please state the data.

iv. State the variable (s) and measurement scale from this study.
v. What is the most suitable data collection method?
Give ONE (1) advantage and ONE (1) disadvantage of this method
REVIEW EXERCISE

5. The number of ads on a one-hour


television show is what type of
data.
True/False a. Nominal
b. Qualitative
1. The highs of the mountains in the state of c. Discrete
Alaska are an example of a variable. d. Continuous

2. The lowest level of measurement is the 6. Data that can be classified


nominal level. according to colour are measured
on what scale?
3. The variable temperature is an example of a. Nominal
a quantitative variable. b. Ratio
c. Ordinal
4. The height of basketball players is d. Interval
considered as continuous variable.
REVIEW EXERCISE
7. For each statement, decide whether descriptive or inferential statistics is
used.
a. The average life expectancy in New Zealand is 78.49 years.
b. A diet high in fruits and vegetables will lower blood pressure
c. The total amount of estimated losses for Hurricane Katrina was
$125 billion.
d. Researchers stated that the shape of a person’s ears is relative to
the person’s aggression.

8. Classify each as nominal, ordinal, interval or ratio level of measurement.


a. Rating of movies as G, PG, and R.
b. Number of candy bars sold on a fund drive.
c. Classification of automobiles as subcompact, compact, standard
and luxury.
d. Temperatures of hair dyers.
e. Weights of suitcases on a commercial airliner.
REVIEW EXERCISE
9. Classify each as discrete or continuous.

a. Ages of people working in a large factory.

b. Number of cups of coffee served at a restaurant.

c. The amount of drug injections into a guinea pig.

d. The time it takes a student to drive to university.

e. The number of gallons of milk sold each day at a grocery store.


WEEK 1

CHAPTER 1 : DESCRIPTIVE STATISTICS


II) SAMPLING AND
DATA COLLECTION METHOD
IMPORTANT STATISTICAL TERMS
Population: Sample:

▪ a set which includes all ▪ A subset of the population


measurements of interest to the
researcher (The collection of all ▪ eg. Sample survey
responses, measurements, or
counts that are of interest)

▪ eg. Census
27

▪ Process of selecting sample from population


▪ The sample must be selected in such a way so that it will
WHAT IS accurately represent its population
▪ Sampling technique –scientific method of selecting
SAMPLING? sample from population (must be random and represent
population)
▪ Sampling Unit –individuals or items to be sampled
✓ eg. Student, person who uses credit card
▪ Sampling frame –LIST of individuals or items from which the
samples can be obtained (list of sampling units).
✓ eg. Telephone directory, student list, customer list of
credit card users
WHY DO SAMPLING
▪ Sampling (i.e. selecting a sub-set of a whole population) is often done for
reasons of:
1) Cost (it’s less expensive to sample 1,000 television viewers
than 100 million TV viewers) and
2) Practicality (e.g. performing a crash test on every automobile
produced is impractical).
▪ The sampled population and the target population should be similar to
one another.
▪ The sampling technique used in each study depends on the nature of the
population.
▪ This includes factors such as:
- homogeneity (or heterogeneity) of the population
- the availability of the sampling frame (list of individuals or items in
population from which the sample can be obtained)
- the budget
SAMPLING TECHNIQUES
Non-probability Sampling Probability Sampling

Convenient Sampling Simple Random Sampling

Judgmental Sampling Systematic Sampling

Snowball Sampling Stratified Random Sampling

Quota Sampling Cluster Sampling


29
PROBABILITY SAMPLING 30

▪ The items/individuals are selected randomly, based on known probabilities


▪ Random means the item has an equal chance of being selected
(unbiased)
▪ Used when a researcher plans to make inferences about the population
▪ Advantage
Because all members have an equal chance of being selected, your
sample is less likely to be biased.
The ability to generalize conclusions about the population is higher

▪ Disadvantage
It can be time-consuming when you’re dealing with a large population
size.
Resource use (e.g. cost) can be higher to develop these types of
samples.
Greater expertise and knowledge of the subject matter is needed to
determine what type of sampling approach is most appropriate.
PROBABILITY SAMPLING
i. SIMPLE RANDOM SAMPLING (SRS)

▪ A probability sample in which every member of a study population has an equal


chance of selection. Example: lucky draw, random number.

▪ Characteristic of SRS:
o Target population must homogeneous
o Must have complete sampling frame

▪ Advantage:
o Easy to conduct
o Every element has equal chance to be selected

▪ Disadvantage
o Difficult to obtain sampling frame
o Tend to be bias
o Sometimes no assurance of representativeness
PROBABILITY SAMPLING
i. SIMPLE RANDOM SAMPLING (SRS)

STEP 1 Prepare sampling frame


o i.e. Write everyone's name on a slip of paper or assigned number to
each of the people.

STEP 2 Select sample by using:


o Lucky draw method
o Table of random numbers
o Calculator random number generator
PROBABILITY SAMPLING
i. SIMPLE RANDOM SAMPLING (SRS)
A sample of size, n = 4 people is to be selected from a list containing names
of N = 12 people in a certain housing area.

This can be done by generating 3 random numbers from 1 to 12 using the


random number that generator in a computer or by using lucky draw.
PROBABILITY SAMPLING
ii. SYSTEMATIC SAMPLING

▪ The first element is selected randomly from a list or from sequential files, and then
every kth element is selected.

▪ Steps to do Systematic Sampling:


1) Identify the population size, N, sample size, n and have complete sampling
frame.
2) Obtained the range k by dividing the population size by the sample size.
Sampling interval, k = N/n
3) Randomly select one element from the first k elements in the list
(using SRS). Suppose the rth element is selected.
4) Lastly sample every kth element in the population begins with the r element
until a sample of size n obtained.
rth, (r+k)th, (r+2k)th, ... , (r+(n-1)k)th
PROBABILITY SAMPLING
ii. SYSTEMATIC SAMPLING

1) There are 12 elements in the population and a sample of 4 is desired


2) In this case, the sampling interval k is N/n = 12/4 = 3
3) The first sample element is selected randomly from the first k elements (between
1-3). For example, this number is r = 2.
4) Every 3rd person on the list would be included in the sample, starting with person
no.2. The sample no. are : 2, 2+3=5, 5+3=8, 8+3=11.
PROBABILITY SAMPLING
ii. SYSTEMATIC SAMPLING

▪ Advantage:
o Researchers can create, analyze, and conduct samples easily when using this
method because of its structure.
o Systematic sampling makes it easy to check whether every kth number or
name has been selected.

▪ Disadvantage
o Systematic sampling ignores all persons between every kth element chosen.
o In systematic sampling, the sampling error increases if the list is arranged in a
particular order.
PROBABILITY SAMPLING
iii. STRATIFIED SAMPLING

▪ A probability sampling procedure that involves dividing the population in a groups


or strata defined by the presence of certain characteristics and then randomly
selecting sample/individuals from each stratum.

▪ The aim of this sampling method is to enhance representativeness

▪ Characteristic of the population:


o Elements in each stratum are homogeneous.
o Elements between the strata are heterogeneous
PROBABILITY SAMPLING
iii. STRATIFIED SAMPLING
▪ A group of research plan to survey all workers working in an industrial area. They
then divides the population of the workers according to the subgroups
(education level) and randomly selects a sample from each stratum.
PROBABILITY SAMPLING
iii. STRATIFIED SAMPLING

▪ When to use?
o Stratified sampling is beneficial in cases where the population has diverse
subgroups, and researchers want to be sure that the sample includes all of them.
o When the group means are different, and the goal of the study is to understand
these differences.

▪ Advantage:
o Stratified random sampling is more accurate than other sampling techniques
because it divides the population into smaller groups, or strata, based on
important characteristics.
o Studies can become less expensive and more practical when the researchers
divide a large population into smaller groups containing similar members.

▪ Disadvantage
o Researchers must then have sufficient information to assign subjects to the
correct strata.
PROBABILITY SAMPLING
iv. CLUSTER SAMPLING

▪ A probability sampling procedure that involves randomly selecting clusters of


elements from a population and subsequently selecting every element in each
cluster for inclusion in the sample.

▪ This method is useful when it is difficult or costly to develop a complete list of the
population members or when the population elements are widely dispersed
geographically.
PROBABILITY SAMPLING
iv. CLUSTER SAMPLING
A group of researchers plan to survey all family in Alam Maju.
Suppose they divide the people who live in Alam Maju into 6 village.
In order to save cost, they decide to survey only 2 villages.
By using simple random sampling or systematic random sampling, they only select 2
villages from 6 villages and sampled each (all) of the elements in 2 villages.
PROBABILITY SAMPLING
CLUSTER SAMPLING

▪ Advantage:
o Cluster sampling is relatively easy to implement and cost effective. Eg: it is more
economical to observe clusters of units in a population than randomly selected
units scattered over throughout the state.

o Cluster sampling is particularly useful when dealing with large and widely
dispersed populations.

▪ Disadvantage
o The participants within each cluster may not be representative of the entire
population. Therefore, it might not be possible to apply its findings to another
area.
PROBABILITY SAMPLING
SUMMARY
EXERCISE
Name the following sampling technique.
i. The population is divided into groups. Samples are collected randomly from
each group.

ii. A sample is drawn in such a way that each element of the population has
the same chance of being selected.

iii. One member is randomly selected from the first k units. Then every kth
member starting with the first selected number is included in the sample.

iv. The population is divided into groups. All elements from the randomly
selected groups are taken as the sample.
EXERCISE
For each of the following statements, identify the sampling technique used.

i. To check the accuracy of a machine that is used for filling detergent containers,
every 20th bottle is selected and weighted.

ii. In a large school district, a researcher numbers all the full-time teachers and
then randomly select 30 teachers to be interviewed.

iii. To determine how long people exercise, a researcher interviews 5 people


selected from a yoga class, 5 people from a weight lifting class, 5 people from
an aerobics class and 5 people from a swimming class.

iv. Out of 10 hospitals in a city, a researcher selects 2 hospitals and collect records
for a 24-hour period on the types of emergencies that were treated here.
NON-PROBABILITY SAMPLING

Non-probability Sampling
Convenient Sampling

Judgmental Sampling

Snowball Sampling

Quota Sampling
NON-PROBABILITY SAMPLING
▪ The process of selecting a sample from a population without using statistical
probability – eg the chance to be in the sample is unknown.

▪ Used when generalization concerning the population is not required or when


sampling frames are difficult to obtain (not much information is available)

▪ Advantage
Obtaining the sample can be easier and less costly. Little research is required
prior to surveying as the researcher simply seeks out those easily within reach.
▪ Disadvantage
Difficcult to make valid ineference about the entire population because the
sample selected is not representative
Sampling bias. For example, a researcher may only select people they feel
comfortable with.
NON-PROBABILITY SAMPLING
i. CONVENIENT SAMPLING

A nonprobability sampling procedure that involves selecting


elements that are readily accessible to the researcher.

Convenience sampling is the most commonly used sampling


method in many disciplines.

Example: Best example, asking people who live in your area to take
survey for your project.
NON-PROBABILITY SAMPLING
ii. JUDMENTAL SAMPLING (PURPOSIVE SAMPLING)

Nonprobability sampling technique in which the selection criteria are


based on personal judgement that the element is representative of
the population under study.

This method incorporates a great deal of sampling error since the


researcher’s judgement may be wrong.

Example: Researcher wants to study the buying patterns of high-end


luxury car owners. The researcher may use judgmental sampling to
select a sample of individuals who they believe are most likely to
purchase a luxury car.
NON-PROBABILITY SAMPLING
iii. SNOWBALL SAMPLING

A nonprobability sampling procedure that involves using members of


the group of interest to identify other members of the group.

Used when the researcher is unable to identify participants in


advance.

Example: For instance, let’s say you want to collect responses from
patients who suffer from a rare type of cancer. In this case, other
sampling techniques might prove inadequate for gathering relevant
subjects—you cannot just walk into the hospital and request patients’
contact information or medical records. What you can do, however,
is put out a call to speak with one or two patients with the condition,
and then ask them to refer you to other potential subjects who might
be willing to participate in your study.
NON-PROBABILITY SAMPLING
iv. QUOTA SAMPLING

A technique in which population subgroups are classified


on the basis of researcher judgment.

It is the nonprobability equivalent of stratified sampling

First identify the stratums and their proportions as they are


represented in the population. Then convenient or
judgment sampling is used to select the required number
of subject from each stratum.
NON-PROBABILITY SAMPLING
iv. QUOTA SAMPLING

Let us assume that we need to know about the career goals of university students.
More particularly, the differences in the career goals among fresher, juniors and seniors
are to be examined. Suppose the concerned university contains 10,000 students and
can be taken as our population.

Now, we have to divide our population of 10,000 students into categories such as
freshers, juniors and seniors. Suppose we find that there are 4,500 freshers (45%), 3000
junior students (30%) and 2500 senior students (25%).

Our sample must have these proportions 45%-30%-25%. It means that if we sample 1000
students, then we must consider 450 freshers, 300 juniors and 250 seniors.

Lastly, we may start collecting samples from these students based on our proportion by
using convenient or judgmental sampling.
NON-PROBABILITY SAMPLING
Difference between non-probability sampling and probability sampling:
Non-probability sampling Probability sampling

Sample selection based on the subjective The sample is selected at random.


judgment of the researcher.

Not everyone has an equal chance to Everyone in the population has an equal
participate. chance of getting selected.
The researcher does not consider sampling Used when sampling bias has to be
bias. reduced.

Useful when the population has similar traits. Useful when the population is diverse.
The sample does not accurately represent
Used to create an accurate sample.
the population.
Finding respondents is easy. Finding the right respondents is not easy.
QUESTIONNAIRE DESIGN
Over the years, a lot of thought has been put into the science of the design of
survey questions. Key design principles:

1. Keep the questionnaire as short as possible.


2. Ask short, simple, and clearly worded questions.
3. Start with demographic questions to help respondents get started
comfortably.
4. Use dichotomous (yes / no) and multiple choice questions.
5. Use open-ended questions cautiously.
6. Avoid using leading-questions.
7. Pretest a questionnaire on a small number of people.
8. Think about the way you intend to use the collected data when preparing
the questionnaire.
DATA COLLECTION METHOD
▪ Simply refers to how the researcher obtains the empirical data to be used to
answer his or her research questions and used it to get the results or findings.

▪ Of course your data collection strategy depends on other factors as well, such as
the amount of time that you have to collect it, money available in your budget
and the complexity or nature of the questions.

▪ Four main methods of survey data collection :

Interviews:
▪ Face-to-face Direct observation
▪ Telephone

Questionnaire:
▪ Direct (multiple-choice, Others:
yes/no) Internet e-mail and online
▪ Indirect (open-ended) surveys, video record
The following table sets out the main components of each method of
collection and their advantages and disadvantages.
EXERCISE
The Public Service Department (JPA) wants to carry out a survey on
students studying overseas under its sponsorship. The objective of the
study is to collect information on the problems faced by them.

a) State the most suitable data collection method for this study.

b) b) Explain why the method you chose is better than the others.
WEEK 2

CHAPTER 1 : DESCRIPTIVE STATISTICS


III) ORGANIZING DATA

Bluman Chapter 2
LEARNING OBJECTIVES

1 Organize data using a frequency distribution.

2 Represent qualitative data in frequency distributions, pie


chart, bar chart and contingency table.

3 Represent quantitative data using graphically in


frequency distribution, histograms, frequency polygons,
and ogives.

4 Draw and interpret a stem and leaf plot.


DATA PRESENTATION
Data can be summarized in tabular forms and presented in pictorial form using graphs so
that important features can be grasped quickly and effectively.

[Link] Distribution [Link] Distribution


- Ungrouped data
[Link] Method - Grouped data
- Pie chart
- Bar chart (Vertical and [Link] Method
horizontal) - Histogram
- Cluster bar chart - Polygon
- Ogive
[Link] Table/ Cross - Stem and Leaf Plot
Tabulation

QUALITATIVE QUANTITATIVE
DATA DATA

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
PRESENTING QUALITATIVE DATA
▪ After data is collected, it will be processed, organized and presented.

▪ In order to enhance the presentation, some charts, tables and graphs can
be used.

▪ Some considerations in drawing charts/graphs:

a. Indicate the title


b. Draw the axes properly
c. Use proper size and scale
d. Use colours/shading if needed

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 2
1. FREQUENCY DISTRIBUTION
▪ Data collected in original form is called raw data.

▪ A frequency distribution is the organization of raw data in table form, using classes and frequencies.

▪ Nominal- or ordinal-level data that can be placed in categories is organized in categorical


frequency distributions.

Example 1
Twenty-five army inductees were given a blood test to determine their blood type. The data set is:

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. FREQUENCY DISTRIBUTION
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Answer: Frequency distribution for the data


Class Tally Frequency Percent (%)

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 2
2. GRAPHICAL METHOD: PIE CHART
▪ Pie chart can be used to represent categorical data.
▪ It is a circle that is divided into sectors.
▪ The sectors show the percentage of frequencies of each category of the distribution.

Pie Chart using data in Example 1

Blood Type Percent


16%
A 20 36% O
B 28 20% B
O 36 A
28%
AB 16 AB

Note: If possible, construct the pie chart so that %s are either in ascending or descending order
(helps in the interpretation of the data).

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: BAR CHART
▪ A graph of bars whose heights represent the frequencies of respective categories.

▪ Types of Bar Charts:

o Vertical/horizontal bar chart (single/simple)


o Cluster bar chart (multiple)
- One graph represents more than one subject
- Colour/shading needed

How People Go to How People Go to How People Go to


58 Work Work 40Work Men
WALK 2
45 28 30
TRAIN 25
25 BUS 10 15
8 10
10 5
2 MO… 45 2 20

CAR MOTOR BUS TRAIN WALK


CAR 58 CAR MOTOR BUS TRAIN WALK

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: BAR CHART
Example 2

From the following table, construct


- Single(simple) bar chart for year 2000
- Cluster(multiple) bar chart for the year 2000 and 2001

NUMBER OF STUDENTS

Program Year 2000 Year 2001

A 450 600

B 1200 1500

C 800 1100

D 300 400

E 650 800

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 2
3. CROSS TABULATION/ CONTINGENCY TABLE
▪ A cross tabulation(often abbreviated as cross tab) or cross-classification table is often used to examine the
categorical response in terms of two qualitative variables simultaneously.

▪ Some data can be grouped according to two or more criteria of classification or variables.

▪ Cross tabs are frequently used because:


o They are easy to understand. They appeal to people who do not want to use more sophisticated
measures.
o They can be used with any level of data: nominal, ordinal, interval, or ratio
(cross tabs treat all data as if it is nominal)
o A table can provide greater insight than single statistic.
o It solves the problem of empty or sparse cells
o They are simple to conduct.

Example: Cross-tabulation between location and education level

Location No College Four-year degree Advance degree Total

Urban 5 12 8 35
Suburban 8 15 9 32
Rural 6 8 7 21
Total 29 35 24 88
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
3. CROSS TABULATION/ CONTINGENCY TABLE
Example 3:

A group of researchers surveyed 530 staff working with Company Y. Out of 145 professional staff, 40
are women whereas 140 non-professional staff are men. Present this data in the form of 2 x 2 table.

Answer:

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 2
PRESENTING QUANTITATIVE DATA
▪ Quantitative data is information about quantities; that is, information that can be
measured and written down with numbers.

▪ The point of your reporting is to communicate important information in a way that is


as accessible as possible.

▪ Some other aspects to consider about quantitative data:

a. Focuses on numbers

b. Can be displayed through graphs, charts, tables, and maps

c. Data can be displayed over time (such as a line chart)

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. FREQUENCY DISTRIBUTION: UNGROUPED DATA
▪ Frequency is the number of times a values occurs. By accounting frequency, we can make a
frequency distribution table.
▪ The frequency distribution is a table that contains a list of data values and its frequency.
▪ Ungrouped data is defined as the data given as individual points (i.e. values or numbers) such as
15, 63, 34, 20, 25, and so on.
Example:
These are the numbers of newspapers sold at a local shop over the last 10 days:
22, 20, 18, 23, 20, 22, 20, 18, 20

Answer: Frequency Distribution for Ungrouped Data


Papers sold Frequency
18
19
20
21
22
23
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. FREQUENCY DISTRIBUTION: GROUPED DATA
▪ It is also possible to group the data. Grouped data means the data (or information) given in the
form of class intervals such as 0-20, 20-40 and so on.
▪ Grouped frequency distributions are used when the range of the data is large.
▪ Terminologies of frequency distribution:
Class limit Lower limit=smallest value in the class
Upper limit=largest value in the class
Example : 80 – 90, upper limit is 90 and lower limit is 80
Class boundary Value that falls mid/half way between the upper limit of one class and
the lower limit of the next class

Class midpoint, Xm The middle value of a class interval; averaging the upper limit and
lower limit or upper boundary and lower boundary

Class interval/limit Class boundary Class midpoint

30 – 50 30 – 50
Example 1 40
50 – 70 50 – 70
30 – 49 29.5 – 49.5
Example 2 39.5
50 – 69 49.5 – 69.5
30 - < 50 30 - 50
Example
Copyright 3 McGraw-Hill Companies, Inc. Permission required for reproduction or display.
© 2015 The 40
50 - < 70 50 - 70
1. FREQUENCY DISTRIBUTION: GROUPED DATA
Constructing a Grouped Frequency Distributions:

1. Compute the Range [Range = Maximum – Minimum]

2. Find class width.

[Class width = Dividing the range by the number of class and rounding up. Number of class is
usually between 5 and 20]

3. Pick a suitable starting point less than or equal to the minimum value.
The subsequent lower class limits are found by adding the width to the previous lower class
limits

4. To find the upper limit of the first class, subtract one from the lower limit of the second class.
Then continue to add the class width to this upper limit to find the rest of the upper limits.

5. Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5 units from the
upper limits (if necessary). Tally the data.

6. Find the frequencies.

7. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not
be necessary to find the cumulative frequencies.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
8. If necessary, find the relative frequencies and/or relative cumulative frequencies.
1. FREQUENCY DISTRIBUTION: GROUPED DATA
Example 4: Constructing a Grouped Frequency Distribution

The following data represent the record high temperatures for each of the 50 states. Construct a
grouped frequency distribution for the data using 7 classes.

112 100 127 120 134 118 105 110 109 112

110 118 117 116 118 122 114 114 105 109

107 112 114 115 118 117 118 122 106 110

116 108 110 121 113 120 119 111 104 111

120 113 120 117 105 110 118 112 114 114

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. FREQUENCY DISTRIBUTION: GROUPED DATA
Step 1: Compute range = max – min = 134 – 100 = 34
Step 2: Find class width (number of classes = 7) = 34/7 = 5
Step 3: Starting point of 1st class = 100. The subsequent lower-class limits are found
by adding the width to the previous lower-class limits.
Step 4: Upper limit of 1st class = 105 – 1. The subsequent upper-class limits are found
by adding the width to the previous upper-class limits.
Step 5: The class boundary is midway between an upper-class limit and a
subsequent lower-class limit.
Class Limits Class Boundaries
1st class

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1. FREQUENCY DISTRIBUTION: GROUPED DATA
112 100 127 120 134 118 105 110 109 112

110 118 117 116 118 122 114 114 105 109

107 112 114 115 118 117 118 122 106 110

116 108 110 121 113 120 119 111 104 111

120 113 120 117 105 110 118 112 114 114

Class Limits Class Boundaries Frequency Cumulative Frequency

100 - 104 99.5 - 104.5


105 - 109 104.5 - 109.5
110 - 114 109.5 - 114.5
115 - 119 114.5 - 119.5
120 - 124 119.5 - 124.5
125 - 129 124.5 - 129.5
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
130 - 134 129.5 - 134.5
2. GRAPHICAL METHOD
▪ Display of data:
o Histogram
o Frequency polygon
o Ogive
o Stem-and leaf plot

▪ Steps:
Step 1 Draw and label the x and y axes.
Step 2 Choose a suitable scale for the frequencies or cumulative frequencies, and
label it on the y axis. (Do not label the y axis with numbers in the cumulative
frequency)
Step 3 Represent the class boundaries for the histogram or ogive, or the midpoint for
the frequency polygon, on the x axis.
Step 4 Plot the points and then draw the bars or lines.

Histogram : y-axis (Frequency, Relative frequency) , x-axis (class boundaries)


Polygon : y-axis (Frequency, Relative frequency) , x-axis (midpoint)
Ogive Copyright © 2015
: y-axis (Cumulative
The McGraw-Hill Frequency,
Companies, Relative
Inc. Permission frequency),
required x-axis
for reproduction (class boundaries)
or display.
2. GRAPHICAL METHOD: HISTOGRAM
▪ The histogram is a graph that displays the data by using vertical bars of
various heights to represent the frequencies of the classes.
▪ The Histograms plots frequency on the y-axis and lower class boundaries
along the x-axis.

Class Class
Frequency
Limits Boundaries
100 - 104
99.5 - 104.5
105 - 109 104.5 - 109.5
110 - 114 109.5 - 114.5
114.5 - 119.5
115 - 119 119.5 - 124.5
120 - 124 124.5 - 129.5
129.5 - 134.5
125 - 129
130 - 134
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: POLYGON
▪ The frequency polygon is a graph that displays the data by using lines that
connect points plotted for the frequencies at the class midpoints. The
frequencies are represented by the heights of the points.
▪ The frequency polygon plots frequency on y-axis and class midpoint along
the x-axis.

Class Class
Frequency
Limits Midpoint (Xm)
100 - 104
105 - 109
110 - 114
115 - 119
120 - 124
125 - 129
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
130 - 134
2. GRAPHICAL METHOD: OGIVE
▪ An ogive, sometimes called a cumulative frequency polygon, is a type of
frequency polygon that shows cumulative frequencies.

▪ An ogive graph plots cumulative frequency on the y-axis and upper class
boundaries along the x-axis.

Class Class Cumulative


Limits Boundaries Frequency
100 - 104 99.5 - 104.5
105 - 109 104.5 - 109.5
109.5 - 114.5
110 - 114 114.5 - 119.5
115 - 119 119.5 - 124.5
124.5 - 129.5
120 - 124 129.5 - 134.5
125 - 129
130 - 134 Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
EXERCISE
Example 5
Construct a histogram, frequency polygon, and ogive using relative frequencies for the
distribution (shown here) of the miles that 20 randomly selected runners ran during a given
week.

Class Boundaries Frequency *Relative frequency: Divide each


frequency by the total frequency

5.5 - 10.5 1
10.5 - 15.5 2
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ANSWER: 1. HISTOGRAM

Class Relative
Frequency
Boundaries Frequency
5.5 - 10.5 1
10.5 - 15.5 2
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ANSWER: 2. POLYGON

Class Class Relative


Boundaries Midpoints, Xm Frequency

5.5 - 10.5
10.5 - 15.5
15.5 - 20.5
20.5 - 25.5
25.5 - 30.5
30.5 - 35.5
35.5 - 40.5

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
ANSWER: 3. OGIVE

Class Relative [Link].


Boundaries Frequency Frequency
5.5 - 10.5
10.5 - 15.5
15.5 - 20.5
20.5 - 25.5
25.5 - 30.5
30.5 - 35.5
35.5 - 40.5

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Bluman Chapter 2
2. GRAPHICAL METHOD: STEM AND LEAF
▪ A stem-and-leaf plot is a data plot that uses part of a data value as the stem
and part of the data value as the leaf to form groups or classes.
▪ It has the advantage over grouped frequency distribution of retaining the
actual data while showing them in graphic form.

6 is recorded as 06 The key shows us how to


read the diagram

This number is 39

Unordered Stem Plot Ordered Stem Plot


Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: STEM AND LEAF
Advantage:
▪ Basic tools to determine the skewness of data
▪ Depicts outliers. Outlier is extreme value which located far from the
▪ mean value.
▪ Suitable for small data and the variables classify as ratio

Step constructing stem and leaf manually:


Step 1 : Arrange the data values from the smallest value to the largest value
Step 2 : The leading digit(s) becomes the stem and the trailing digit the leaf
Step 3 : Display data

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: STEM AND LEAF
Example 6
At an outpatient testing center, the number of cardiograms performed each
day for 20 days is shown. Construct a stem and leaf plot for the data.

25 31 20 32 13
14 43 02 57 23
36 32 33 32 44
32 52 44 51 45

Answer

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
2. GRAPHICAL METHOD: STEM AND LEAF
Example 7
The following data shows the ages for the CEOs of the 30 top-ranked small
companies in a country.

59 38 47 53 60 69
44 50 56 63 40 48
53 61 41 44 49 55
62 43 55 61 61 53
48 48 55 62 43 48

Construct:
(a) a stem-and-leaf diagram with one line per stem.
(b) a stem-and-leaf diagram with two lines per stem.

Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
WEEK 3

CHAPTER 1 : DESCRIPTIVE STATISTICS


IV) DATA DESCRIPTION

Bluman Chapter 2
LEARNING OBJECTIVES

In this chapter, the students should be able to:

1. To describe the properties of central tendency,


variation, and skewness in numerical data.

2. To calculate descriptive summary measures


NUMERICAL DESCRIPTIVE MEASURES

MEASURES OF
CENTRAL
TENDENCY
1 MEASURES OF
VARIATION 2 MEASURES OF
SKEWNESS 3
The extent to which all The amount of The pattern of the
the data values group dispersion, or scattering, distribution of values
around a typical or of values: from the lowest value to
central value: the highest value

▪ Arithmetic Mean ▪ Range ▪ Skewness


▪ Median ▪ Variance ▪ Mean, Median, Mode
▪ Mode ▪ Standard deviation
▪ Midrange ▪ Coefficient of
variation

MEASURES OF
CENTRAL TENDENCY
UNGROUPED DATA
MEASURES OF CENTRAL
TENDENCY : THE MEAN
Pronounced
The ith value
x-bar

σ𝑛𝑖=1 𝑋𝑖 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
▪ The mean is the sum of the values, divided by the 𝑋= =
𝑛 𝑛
total number of values.
Observed values
▪ Useful in comparing two or more population. Sample size

▪ The most common measure of central tendency


▪ Affected by extreme values (outliers)
GROUPED DATA

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 σ𝑁
𝑚=1 𝑓. 𝑋𝑚
𝑋ത =
𝑛

1 + 2 + 3 + 4 + 5 15 1 + 2 + 3 + 4 + 10 20 Where:
= =3 = =4 f = frequency
5 3 5 5
𝑋𝑚 = midpoint of each class
𝑛 = sample size (total frequency)
THE MEAN: UNGROUPED DATA
Example 3-1 (Days off per year)

The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean.

20, 26, 40, 36, 23, 42, 35, 24, 30

Answer

Interpretation:
THE MEAN: GROUPED DATA
Example 3-3 (Miles Run per Week)

Below is a frequency distribution of miles run per week. Find the mean.

Class Answer:
Frequency
Boundaries σ𝑁
𝑚=1 𝑓. 𝑋𝑚
Formula: 𝑋ത =
𝑛
5.5 - 10.5 1
10.5 - 15.5 2
15.5 - 20.5 3
20.5 - 25.5 5
25.5 - 30.5 4
30.5 - 35.5 3
35.5 - 40.5 2
UNGROUPED DATA
MEASURES OF CENTRAL Step 1 Arrange the data values in ascending order.
TENDENCY : THE MEDIAN Step 2 Median position =
𝑛+1
2
Step 3
▪ In an ordered array, the median is the “middle” a. If the number of values is odd, the median is the
middle number
number (50% above, 50% below)
b. If the number of values is even, the median is the
▪ Not affected by extreme values average of the two middle numbers
𝑛+1
*Note that is the position of the median in the ranked data
2

GROUPED DATA
1) Find median class: (n/2). Locate the class whose
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 cumulative frequency is greater than (nearest to) n/2.

σ𝒇
− σ 𝒇𝒎−𝟏
Median = 3 Median = 3 ෥ = 𝑳𝒎 + 𝟐
2) Median = 𝒙 .𝑪
𝒇𝒎
𝐿𝑚 = lower class boundary of the median class
Σ𝑓𝑚−1 = cumulative frequency of all class intervals
before the median class
𝑓𝑚 = frequency of the median class
𝐶 = width of the median class boundaries
THE MEDIAN: UNGROUPED DATA
Example 3-4 (Hotel Rooms)

The number of rooms in the seven hotels in downtown Pittsburgh is 713, 300,
618, 595, 311, 401, and 292. Find the median.

Solution:
THE MEDIAN: UNGROUPED DATA
Example 3-6 (Tornadoes)

The number of tornadoes that have occurred in the United States over an 8-
year period follows. Find the median.

684, 764, 656, 702, 856, 1133, 1132, 1303

Solution:
THE MEDIAN: GROUPED DATA
Example 3-3 (Miles Run)

Below is a frequency distribution of miles run per week. Find the median.
Solution:
Class 1) Find the median class:
Frequency
boundaries
5.5 – 10.5 1
10.5 – 15.5 2
15.5 – 20.5 3
2) Use formula: σ𝑓
20.5 – 25.5 5 − σ 𝑓𝑚−1
Median = 𝑥෤ = 𝐿𝑚 + 2 .𝐶
𝑓𝑚
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
Interpretation:
UNGROUPED DATA
MEASURES OF CENTRAL
TENDENCY : THE MODE Step 1 Arrange the data values in ascending order.
Step 2 See which number appears the most often.
▪ Value that occurs most often
▪ Not affected by extreme values
▪ Used for either numerical or categorical
(nominal) data
▪ There may be no mode GROUPED DATA
▪ There may be several modes 1) Find the modal class
∆𝟏
2) Mode = 𝒙
ෝ = 𝑳𝒎𝒐 + .𝑪
∆𝟏 +∆𝟐

𝐿𝑚𝑜 = lower class boundary of the modal class


0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 ∆1 = frequency of modal class −
frequency before the modal class
∆2 = frequency of modal class−
frequency after the modal class
Mode = 9 No Mode
𝐶 = width of the modal class boundaries
THE MODE: UNGROUPED DATA
Example 3-9 (NFL Signing Bonuses)
Find the mode of the signing bonuses of eight NFL players for a specific year.
The bonuses in millions of dollars are
18.0 14.0 34.5 10 11.3 10 12.4 10

Solution
You may find it easier to sort first
10 10 10 11.3 12.4 14.0 18.0 34.5

Select the value that occurs the most:


Interpretation:
THE MODE: UNGROUPED DATA
Example 3-10 (Coal Employees in PA)

Find the mode for the number of coal employees per county for 10
selected counties in southwestern Pennsylvania.

110 731 1031 84 20 118 1162 1977 103 752

Solution

Conclusion:
THE MODE: UNGROUPED DATA
Example 3-11 (Licensed Nuclear Reactors)

The data show the number of licensed nuclear reactors in the US for a
recent 15-year period. Find the mode

104 104 104 104 104 107 109 109 109 110 109 111 112 111 109

Solution
THE MODE: GROUPED DATA
Example 3-12 (Miles Run per Week)
Find the modal class for the frequency distribution of miles that 20
runners ran in one week.
Solution:

1) Find modal class


Class
Frequency
Boundaries
5.5 - 10.5 1
10.5 - 15.5 2
15.5 - 20.5 3
20.5 - 25.5 5 2) Use formula
∆1
25.5 - 30.5 4 Mode = 𝑥ො = 𝐿𝑚𝑜 + .𝐶
∆1 +∆2

30.5 - 35.5 3
35.5 - 40.5 2
MEASURES OF CENTRAL UNGROUPED DATA

TENDENCY : THE MIDRANGE 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 + 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒


MR =
2
▪ The number that is exactly halfway
between the minimum and maximum
numbers in a set of data.

GROUPED DATA

𝐿𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡 𝑈𝑝𝑝𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑙𝑖𝑚𝑖𝑡


+
𝑜𝑓 𝑡ℎ𝑒 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑐𝑙𝑎𝑠𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑐𝑙𝑎𝑠𝑠
MR =
2
THE MIDRANGE: UNGROUPED DATA
Example 3-15 (Waterline Breaks)

In the last two winter seasons, the city of Brownville, Minnesota,


reported these numbers of water-line breaks per month. Find the
midrange.

2, 3, 6, 8, 4, 1

Solution
MEASURES OF CENTRAL TENDENCY :
WHICH MEASURE TO CHOOSE?
o The mean is generally used, unless extreme values (outliers) exist.

o The median is often used, since the median is not sensitive to extreme
values. For example, median home prices may be reported for a
region; it is less sensitive to outliers.

o In some situations it makes sense to report both the mean and the
median.
TUTORIAL
Based on the grouped data below, find the mean, median and
mode.

Time to travel to work Frequency


1-10 8
11-20 14
21-30 12
31-40 9
41-50 7
Time to travel f
to work
1-10 8
11-20 14
21-30 12
31-40 9
41-50 7

MEASURES OF
VARIATION (DISPERSION)
Measures of variation give information on the
spread or variability or dispersion of the data
values.

Same center,
different variation
UNGROUPED DATA
MEASURES OF VARIATION :
Range = Xlargest – Xsmallest
THE RANGE
▪ Simplest measure of variation
Example:
▪ Difference between the largest and the
smallest value. 16 24 22 25 26 27 28 23
▪ Disadvantage of using range:
➢ Based on two values only. All other Range: 28 – 16 = 12
values in a dataset are ignored.(Ignores
the way in which data are distributed)

7 8 9 10 11 12 7 8 9 10 11 12

Range = 12 - 7 = 5 Range = 12 - 7 = 5

➢ Sensitive to outlier
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 120 - 1 = 119


UNGROUPED DATA
MEASURES OF VARIATION : 1 σ𝑋 2

THE VARIANCE
2
Population variance, 𝜎 = ෍ 𝑋2 −
𝑁 𝑁

▪ Measure of how spread out your data are from 1 σ𝑋 2


its mean value. *Sample variance, 2
𝑆 =
𝑛−1
෍ 𝑋2 −
𝑛
▪ Purpose:
➢ To determine the spread of the data.
➢ To determine the consistency of a variable.
▪ Notation
Population variance, , 𝜎 2
GROUPED DATA
Sample variance, 𝑠 2

σ 𝑓. 𝑋𝑚 2
1
Population variance, 𝜎2 = ෍ 𝑓. 𝑋𝑚2 −
𝑁 𝑁

σ 𝑓. 𝑋𝑚 2
1
*Sample variance, 𝑆2 = ෍ 𝑓. 𝑋𝑚2 −
𝑛−1 𝑛

*𝑋𝑚 is the midpoint


UNGROUPED DATA
MEASURES OF VARIATION : Population standard deviation,

THE STANDARD DEVIATION 𝜎=


𝑁
1
σ 𝑋2 −
σ𝑋 2
𝑁

▪ Most commonly used measure of variation. *Sample standard deviation,


▪ Is the square root of the variance 𝑠=
1
෍ 𝑋2 −
σ𝑋 2

𝑛−1 𝑛
▪ Notation
➢ Population standard deviation, 𝜎
GROUPED DATA
➢ Sample standard deviation, 𝑠
Population standard deviation,
▪ The lower the standard deviation, the closer 1 σ 𝑓. 𝑋𝑚 2
the values are to the mean and the less 𝜎=
𝑁
෍ 𝑓. 𝑋𝑚2 −
𝑁
variability there is.
▪ The higher the standard deviation, the *Sample standard deviation,
farther the values are spread from the mean 2
1 σ 𝑓. 𝑋𝑚
and the more variability there is 𝑆= ෍ 𝑓. 𝑋𝑚2 −
𝑛−1 𝑛

*𝑋𝑚 is the midpoint (Please check & compare the formula in Appendix)
THE VARIANCE & STANDARD DEVIATION:
UNGROUPED DATA
Example
Sample Data (Xi) : 10 12 14 15 17 18 18 24

Solution:
σ𝑋 2
1
Variance, 2
𝑆 =
𝑛−1
෍ 𝑋2 −
𝑛
THE VARIANCE & STANDARD DEVIATION:
UNGROUPED DATA
Example 3-23 (European Auto Sales)

Find the variance and standard deviation for the amount of European auto
sales for a sample of 6 years. The data are in millions of dollars.

Sales (Millions of Dollars) 𝑿𝟐


𝑿
11.2 125.44
11.9 141.61
12.0 144.00
12.8 163.84
13.4 179.56
14.3 204.49
COMPARING STANDARD DEVIATION

Data A
Smaller standard deviation
Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21

Data B
Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18 19 20 21

Data C Larger standard deviation

Mean = 15.5
S = 4.570
11 12 13 14 15 16 17 18 19 20 21
COMPARING STANDARD DEVIATION
Example 3-21 (Outdoor Paint)
The average for both brands is the same, but the range for Brand A is much
greater than the range for Brand B. Which brand would you buy?

Brand A Brand B
Brand A Brand B
10 35 ෍ 𝑋 = 210 ෍ 𝑋 2 = 9100 N = 6 ෍ 𝑋 = 210 ෍ 𝑋 2 = 7600 N = 6

60 45 σ 𝑋 210 σ 𝑋 210
𝜇= = = 35 𝜇= = = 35
𝑁 6 𝑁 6
50 30 𝑅 = 60 − 10 = 50 𝑅 = 45 − 25 = 20

30 35
40 40
20 25
THE VARIANCE & STANDARD DEVIATION:
GROUPED DATA
Example
Find the variance and the standard deviation for the frequency distribution of
miles that 20 runners ran in one week.
Distance (in
one week.. Number of 1 σ 𝑓. 𝑋𝑚 2

kilometers) people,f 𝑆= ෍ 𝑓. 𝑋𝑚2 −


𝑛−1 𝑛
5.5 – 10.5 1
10.5 – 15.5 2
15.5 – 20.5 3
20.5 – 25.5 5
25.5 – 30.5 4
30.5 – 35.5 3
35.5 – 40.5 2
THE VARIANCE & STANDARD DEVIATION:
GROUPED DATA
Exercise
The data below shows the running distance made by several people at a
popular recreational park in Shah Alam on a particular day. Calculate
variance and standard deviation.
Distance (in Number of
kilometers) people

2-5 2
6-9 4
10 - 13 7
14 - 17 15
18 - 21 5
MEASURES OF VARIATION: SUMMARY CHARACTERISTICS
▪ The more the data are spread out, the greater the range, variance, and
standard deviation.
▪ The more the data are concentrated, the smaller the range, variance, and
standard deviation.
▪ If the values are all the same (no variation), all these measures will be zero.
▪ None of these measures are ever negative.
▪ Symbol:

Measure Population Sample


Parameter Statistic
Mean 𝜇 𝑋
Variance 𝜎2 𝑠2
Standard Deviation 𝜎 𝑠
THE FORMULA:
MEASURES OF VARIATION :
THE COEFFICIENT OF 𝑺
𝑪𝑽 = . 𝟏𝟎𝟎%
VARIATION ഥ
𝑿

▪ Measures relative variation


▪ Always in percentage (%)
▪ Purpose of calculating the CV is to
identify
a. which groups are more consistent.
b. which groups are more/less
variable
▪ In order to determine the most
consistent group, choose the smallest
percentage between or among
groups. In other words, the smallest %
indicates the data has less dispersed
among them.
THE COEFFICIENT OF VARIATION
Example

Stock A:
Average price last year = $50
Standard deviation = $5

Stock B:
Average price last year = $100
Standard deviation = $5
THE COEFFICIENT OF VARIATION
Example 3-25 (Sales of Automobiles)

The mean of the number of sales of cars over a 3-month period is 87, and the
standard deviation is 5. The mean of the commissions is $5225, and the
standard deviation is $773. Compare the variations of the two.

Solution:

MEASURES OF
SKEWNESS
MEASURES OF SKEWNESS : i. PEARSON COEFFICIENT OF SKEWNESS

THE SKEWNESS
𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

or
▪ Skewness is measurement of the shape
of distribution. 3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
▪ This measurement is widely used 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
whenever we want to identify whether
the data is normal or skewed. Skewness = 0 Symmetrical/ normal
Skewness = positive value Skewed to the right
▪ In general, the shape of distribution can
be normal, positively skewed/ skewed to Skewness = negative value Skewed to the left

the right, or negatively skewed/skewed


to the left.
ii. THE RELATIONSHIP AMONG THE MEAN,
MEDIAN, MODE

Mean = Median = Mode Symmetrical/ normal


Mean > Median > Mode Skewed to the right
Mean < Median < Mode Skewed to the left
THE SKEWNESS
Example:

A study is conducted to determine the performance of student from


various classes of Sekolah Menengah Taman Meru Jati. The
measurements on students’ CGPA were tabulated as follows:
Class Mean Median Mode Variance
Melati 3.12 3.10 2.12 0.2755
Lily 3.2 3.13 2.36 0.2509
Mawar 3.15 3.02 3.00 0.2237

Calculate the Skewness for the data and comment on the shape of
the distribution.
THE SKEWNESS
Class 𝑚𝑒𝑎𝑛 − 𝑚𝑜𝑑𝑒 3(𝑚𝑒𝑎𝑛 − 𝑚𝑒𝑑𝑖𝑎𝑛)
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑠𝑡𝑑 𝑑𝑒𝑣 𝑠𝑡𝑑 𝑑𝑒𝑣

Melati

Lily

Mawar

You might also like