0% found this document useful (0 votes)
10 views76 pages

Statistical Tools for Data Management

Module 4 focuses on data management and the application of statistical tools for processing numerical data. It covers the importance of statistics in decision-making, types of statistics (descriptive and inferential), methods of data collection, and various statistical representations such as charts and graphs. Additionally, it discusses measures of central tendency, including mean, median, and mode, as well as the concept of weighted mean.

Uploaded by

jheocarbonilla9
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views76 pages

Statistical Tools for Data Management

Module 4 focuses on data management and the application of statistical tools for processing numerical data. It covers the importance of statistics in decision-making, types of statistics (descriptive and inferential), methods of data collection, and various statistical representations such as charts and graphs. Additionally, it discusses measures of central tendency, including mean, median, and mode, as well as the concept of weighted mean.

Uploaded by

jheocarbonilla9
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 4: Mathematics as a Tool

Lesson 1: Data Management


(Basic Statistics)
Course Outcome:

 Use a variety of statistical tools to process and manage


numerical data
 Advocate the use of statistical data in making important
decisions
Pre Activity
 Collect quantitative information (age, height, weight,
number of family members, cost of the phone, allowance of
students per week, pulse rate, grades etc.) from your
classmates. Organize data using a table and write your
findings based on the result.
 Example:
Age Frequency
Why study statistics?

1. Data are everywhere


2. Statistical techniques are used to make many decisions that
affect our lives
3. No matter what your career, you will make professional
decisions that involve data. An understanding of statistical
methods will help you make these decisions effectively
What Is Statistics?

1. Collecting Data Data Why?


e.g., Survey Analysis
2. Presenting Data
e.g., Charts & Tables © 1984-1994 T/Maker Co.

Decision-
3. Characterizing Data
Making
e.g., Average

© 2011 Pearson Education, Inc


© 1984-1994 T/Maker Co.
What Is Statistics?

Statistics is the science of data. It involves collecting,


classifying, summarizing, organizing, analyzing, and
interpreting numerical information.

© 2011 Pearson Education, Inc


Statistical Methods
Statistical
Methods

Descriptive Inferential
Statistics Statistics

© 2011 Pearson Education, Inc


Types of statistics
 Descriptive statistics – The branch of statistics that
involves the collection, organization, summarization, and
presentation of data
 Inferential statistics – The branch that interprets and
draws conclusion about a population on the basis of a sample
 Population –The entire set of individuals or objects of interest
or the measurements obtained from all individuals or objects of
interest
 Sample – A portion, or part, of the population of interest
Descriptive Statistics
1. Involves
• Collecting Data
$
• Presenting Data 50
• Characterizing Data
25
2. Purpose
• Describe Data 0
Q1 Q2 Q3 Q4

X = 30.5 S2 = 113

© 2011 Pearson Education, Inc


Descriptive Statistics

 Collect data
 e.g., Survey

 Present data
 e.g., Tables and graphs

 Summarize data
 e.g., Sample mean = X i

n
Inferential Statistics
1. Involves
• Estimation Population?
• Hypothesis
Testing

2. Purpose
• Make decisions about
population characteristics

© 2011 Pearson Education, Inc


Inferential Statistics
 Estimation
 e.g., Estimate the population mean
weight using the sample mean weight
 Hypothesis testing
 e.g., Test the claim that the population
mean weight is 70 kg

Inference is the process of drawing conclusions or making decisions about a


population based on sample results
DATA COLLECTION
 Is the process of gathering and measuring information about
variables on study in an established systematic procedure,
which then enable to answer relevant questions at hand and
evaluate outcomes.
Obtaining Data
1. Data from a published source
2. Data from a designed experiment
3. Data from a survey
4. Data collected observationally

© 2011 Pearson Education, Inc


Obtaining Data
Published source:
book, journal, newspaper, Web site
Designed experiment:
researcher exerts strict control over units
Survey:
a group of people are surveyed and their responses are
recorded
Observation study:
units are observed in natural setting and variables of
interest are recorded

© 2011 Pearson Education, Inc


Sampling methods
Sampling methods can be:
 random (each member of the population has an equal chance of
being selected)
 nonrandom

The actual process of sampling causes sampling errors. For example,


the sample may not be large enough or representative of the
population. Factors not related to the sampling process cause
nonsampling errors. A defective counting device can cause a
nonsampling error.
Random sampling methods
 simple random sample (each sample of the same size has an
equal chance of being selected)
 stratified sample (divide the population into groups called
strata and then take a sample from each stratum)
 cluster sample (divide the population into strata and then
randomly select some of the strata. All the members from these
strata are in the cluster sample.)
 systematic sample (randomly select a starting point and take
every n-th piece of data from a listing of the population)
Statistical data
 The collection of data that are relevant to the problem being
studied is commonly the most difficult, expensive, and time-
consuming part of the entire research project.
 Statistical data are usually obtained by counting or measuring
items.
 Primary data are collected specifically for the analysis desired
 Secondary data have already been compiled and are available
for statistical analysis
 A variable is an item of interest that can take on many
different numerical values.
 A constant has a fixed numerical value.
Data
Statistical data are usually obtained by counting or measuring
items. Most data can be put into the following categories:
 Qualitative - data are measurements that each fall into one
of several categories. (hair color, ethnic groups and other
attributes of the population)
 quantitative - data are observations that are measured on a
numerical scale (distance traveled to college, number of
children in a family, etc.)
Qualitative data
Qualitative data are generally described by words or
letters. They are not as widely used as quantitative data
because many numerical techniques do not apply to the
qualitative data. For example, it does not make sense to
find an average hair color or blood type.
Qualitative data can be separated into two subgroups:
 dichotomic (if it takes the form of a word with two options
(gender - male or female)
 polynomic (if it takes the form of a word with more than two
options (education - primary school, secondary school and
university).
Qualitative Data
Classified into categories.
 College major of each
student in a class.
 Gender of each employee
at a company.
 Method of payment
(cash, check, credit card).

$ Credit

© 2011 Pearson Education, Inc


Quantitative data
Quantitative data are always numbers and are the
result of counting or measuring attributes of a population.
Quantitative data can be separated into two
subgroups:
 discrete (if it is the result of counting (the number of students of a
given ethnic group in a class, the number of books on a shelf, ...)
 continuous (if it is the result of measuring (distance traveled,
weight of luggage, …)
Quantitative Data
Measured on a numeric
scale.
4
 Number of defective
943
items in a lot. 21 52
 Salaries of CEOs of
oil companies. 120 12
 Ages of employees at
a company. 8
71 3

© 2011 Pearson Education, Inc


Types of variables
Variables

Qualitative Quantitative

Dichotomic Polynomic Discrete Continuous

Amount of income tax


Children in family,
Gender, marital status Brand of Pc, hair color paid, weight of a
Strokes on a golf hole
student
Charts and graphs
 Frequency distributions are good ways to present the
essential aspects of data collections in concise and
understable terms
 Pictures are always more effective in displaying large data
collections
Histogram
 Frequently used to graphically present interval and ratio data
 Is often used for interval and ratio data
 The adjacent bars indicate that a numerical range is being
summarized by indicating the frequencies in arbitrarily
chosen classes
Frequency polygon
 Another common method for graphically presenting interval
and ratio data
 To construct a frequency polygon mark the frequencies on
the vertical axis and the values of the variable being measured
on the horizontal axis, as with the histogram.
 If the purpose of presenting is comparation with other
distributions, the frequency polygon provides a good
summary of the data
Ogive
 A graph of a cumulative frequency distribution
 Ogive is used when one wants to determine how many
observations lie above or below a certain value in a
distribution.
 First cumulative frequency distribution is constructed
 Cumulative frequencies are plotted at the upper class limit of
each category
 Ogive can also be constructed for a relative frequency
distribution.
Pie Chart
 The pie chart is an effective way of displaying the percentage
breakdown of data by category.
 Useful if the relative sizes of the data components are to be
emphasized
 Pie charts also provide an effective way of presenting ratio-
or interval-scaled data after they have been organized into
categories
Pie Chart
Bar chart
 Another common method for graphically presenting nominal
and ordinal scaled data
 One bar is used to represent the frequency for each category
 The bars are usually positioned vertically with their bases
located on the horizontal axis of the graph
 The bars are separated, and this is why such a graph is
frequently used for nominal and ordinal data – the separation
emphasize the plotting of frequencies for distinct categories
Time Series Graph
 The time series graph is a graph of data that
have been measured over time.
 The horizontal axis of this graph represents
time periods and the vertical axis shows the
numerical values corresponding to these
time periods
Measure of Central Tendency
The Arithmetic Mean
One of the most basic statistical concepts involves finding
measures of central tendency of a set of numerical data.

We will consider three types of averages, known as the


arithmetic mean, the median, and the mode. Each of these
averages is a measure of central tendency for the
numerical data.

40
The Arithmetic Mean

41
The Arithmetic Mean
In statistics it is often necessary to find the sum of a set of
numbers. The traditional symbol used to indicate a
summation is the Greek letter sigma, . Thus the notation
x, called summation notation, denotes the sum of all the
numbers in a given set.

We can define the mean using summation notation.

42
The Arithmetic Mean
Statisticians often collect data from small portions of a large
group in order to determine information about the group.

In such situations the entire group under consideration is


known as the population, and any subset of the
population is called a sample.

It is traditional to denote the mean of a sample by (which


is read as “x bar”) and to denote the mean of a population
by the Greek letter  (lowercase mu).

43
Example 1 – Find a Mean
Six friends in a biology class of 20 students received test
grades of 92, 84, 65, 76, 88, and 90

Find the mean of these test scores.

Solution:
The 6 friends are a sample of the population of 20
students. Use to represent the mean.

44
Example 1 – Solution cont’d

The mean of these test scores is 82.5.

45
The Median

46
The Median
Another type of average is the median. Essentially, the
median is the middle number or the mean of the two middle
numbers in a list of numbers that have been arranged in
numerical order from smallest to largest or largest to
smallest.

Any list of numbers that is arranged in numerical order from


smallest to largest or largest to smallest is a ranked list.

47
Example 2 – Find a Median
Find the median of the data in the following lists.
a. 4, 8, 1, 14, 9, 21, 12 b. 46, 23, 92, 89, 77, 108

48
Example 2 – Find a Median

Solution:
a. The list 4, 8, 1, 14, 9, 21, 12 contains 7 numbers. The
median of a list with an odd number of entries is
found by ranking the numbers and finding the middle
number. Ranking the numbers from smallest to largest
gives
1, 4, 8, 9, 12, 14, 21

The middle number is 9. Thus 9 is the median.

49
Example 2 – Solution cont’d

b. The list 46, 23, 92, 89, 77, 108 contains 6 numbers. The
median of a list of data with an even number of entries
is found by ranking the numbers and computing the
mean of the two middle numbers. Ranking the numbers
from smallest to largest gives
23, 46, 77, 89, 92, 108

The two middle numbers are 77 and 89. The mean of 77


and 89 is 83. Thus 83 is the median of the data.

50
The Mode

51
The Mode
A third type of average is the mode.

52
Example 3 – Find a Mode
Find the mode of the data in the following lists.
a. 18, 15, 21, 16, 15, 14, 15, 21 b. 2, 5, 8, 9, 11, 4, 7, 23

53
Example 3 – Find a Mode
Find the mode of the data in the following lists.
a. 18, 15, 21, 16, 15, 14, 15, 21 b. 2, 5, 8, 9, 11, 4, 7, 23

Solution:
a. In the list 18, 15, 21, 16, 15, 14, 15, 21, the number 15
occurs more often than the other numbers. Thus 15 is
the mode.

b. Each number in the list 2, 5, 8, 9, 11, 4, 7, 23 occurs


only once. Because no number occurs more often than
the others, there is no mode.

54
The Weighted Mean

55
The Weighted Mean
A value called the weighted mean is often used when some
data values are more important than others. For instance,
many professors determine a student’s course grade from
the student’s tests and the final examination.

Consider the situation in which a professor counts the final


examination score as 2 test scores. To find the weighted
mean of the student’s scores, the professor first assigns a
weight to each score.

56
The Weighted Mean
In this case the professor could assign each of the test
scores a weight of 1 and the final exam score a weight of 2.

A student with test scores of 65, 70, and 75 and a final


examination score of 90 has a weighted mean of

57
The Weighted Mean

58
Example 4 – Find a Weighted Mean
Table 13.1 shows Dillon’s fall semester course grades. Use
the weighted mean formula to find Dillon’s GPA for the fall
semester.

Dillon’s Grades, Fall Semester


Table 13.1

59
Example 4 – Solution
The B is worth 3 points, with a weight of 4; the A is worth 4
points with a weight of 3; the D is worth 1 point, with a
weight of 3; and the C is worth 2 points, with a weight of 4.
The sum of all the weights is 4 + 3 + 3 + 4, or 14.

Dillon’s GPA for the fall semester is 2.5.

60
The Weighted Mean
Data that have not been organized or manipulated in any
manner are called raw data.

A large collection of raw data may not provide much readily


observable information.

A frequency distribution, which is a table that lists


observed events and the frequency of occurrence of each
observed event, is often used to organize raw data.

61
Frequency Distribution

62
Frequency Distribution
Frequency Distribution is a table or graph that organize
and present the count of the number of times a certain
value or class of values occurs in a sample or
population.

63
Frequency Distribution
Frequency tables list categories of scores along with their
corresponding frequencies. A frequency table most often
includes all of the following:
1. Absolute frequency (or just frequency): This tells you
how many times a particular category in your variable
occurs. This is a tally, count, or frequency of occurrence
of each individual category/value in the table.

64
Frequency Distribution

65
Frequency Distribution

66
Frequency Distribution

67
Frequency Distribution

68
Frequency Distribution

69
Frequency Distribution

70
Frequency Distribution

71
Frequency Distribution

72
Seatwork
1. Find mean, median, mode.
2. Find class interval, frequency, less than cumulative
frequency, greater than cumulative frequency, relative
frequency and cumulative relative frequency

73
Class frequency Less than Greater than Relative Cumulative
interval Cumulative cumulative frequency relative
frequency frequency frequency
71-75
76-80
81-85
86-90
91-95
96-100

74
Application
Make a frequency distribution table on the colors of one
pack m&m or nips candies then compare your result to
your classmate.

75
Assignment
1. Find mean, median, mode.
2. Find class interval, frequency, less than cumulative frequency, greater than
cumulative frequency, relative frequency and cumulative relative frequency

76

You might also like