0% found this document useful (0 votes)
9 views91 pages

Exam Impact of Notecards on Scores

The document covers descriptive statistics, including experimental design, data organization, and graphical representation. It explains how to summarize qualitative and quantitative data using frequency distribution tables, bar graphs, histograms, and pie charts, while also addressing common graphical misrepresentations. Additionally, it discusses measures of central tendency, such as mean and median, and provides guidelines for constructing effective graphics.

Uploaded by

christtuna0
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views91 pages

Exam Impact of Notecards on Scores

The document covers descriptive statistics, including experimental design, data organization, and graphical representation. It explains how to summarize qualitative and quantitative data using frequency distribution tables, bar graphs, histograms, and pie charts, while also addressing common graphical misrepresentations. Additionally, it discusses measures of central tendency, such as mean and median, and provides guidelines for constructing effective graphics.

Uploaded by

christtuna0
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DESCRIPTIVE

STATISTICS
ENS185 2nd Semester
CONCEPT CHECK
A statistics instructor wants to see if QUESTIONS
allowing students to use a notecard on (a) What type of experimental design is this?
exams will affect their overall exam scores. (b) What is the response variable in this
The instructor thinks that the results could experiment?
be affected by the type of class: traditional, (c) What is the factor that is set to predetermined
Web-enhanced, or online. He randomly levels? What
selects half of his students from each type are the treatments?
of class to use notecards. At the end of the
semester, the average exam grades for those
(d) Identify the experimental units.

using notecards is compared to the average


(e) Draw a diagram similar to Figure 7, 8, or 10 to

for those without notecards. illustrate the


design.
DESCRIPTIVE STATISTICS

❑ Organizing and Summarizing Data


❑ Numerically Summarizing Data
❑ Describing Relationship between Two
Variables
Introduction to Data Analysis 4

PROCESS OF STATISTICS
Identify the A researcher must determine the questions he or she want answered. The
research objective questions must clearly identify the population that is to be studied.

Collect the data Conducting the data on the whole population is impractical and expensive.
However, appropriate data collection techniques must also be followed.
needed

Describe the Describe the data collected using numerical and visual tools. It gives us an
overview of the data and can help us determine which statistical tools to use
data for inference.

Perform Apply the appropriate techniques to extend the results obtained from the
sample to the population and report a level of reliability of the results.
inference
ORGANIZING AND
SUMMARIZING YOUR DATA
Organizing Qualitative Data
Organizing Quantitative Data
Graphical Misrepresentations of Data
ORGANIZING QUALITATIVE
DATA
7

Recall: Qualitative data provide measures


that categorize or classify data based on a
characteristic or attribute.
Descriptive Statistics 8

ORGANIZING QUALITATIVE DATA


1. USE FREQUENCY DISTRIBUTION TABLES
When raw qualitative data is collected, we often first determine the categories and the
number of individuals or units in each category. We then organize this into a table.

A frequency distribution lists each Day of the Relative


category of data and the number of Week Frequency Frequency
Monday 52 52/400=0.1300
occurrences for each category of data.
Tuesday 66 66/400=0.1650
Wednesday 72 72/400=0.1800
Relative frequency is the proportion of Thursday 57 57/400=0.1425
observations within a category. Friday 57 67/400=0.1425
Saturday 43 43/400=0.1075
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 Sunday 53 53/400=0.1325
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 Total 400 1.00
Descriptive Statistics 9

ORGANIZING QUALITATIVE DATA


2. CONSTRUCT BAR GRAPHS
Once the data is organized in a table, we can create graphs.

A bar graph is constructed by labeling


each category of data on either the
horizontal or vertical axis and the
frequency or relative frequency of the
category on the other axis. Rectangles
of equal width are drawn for each
category. The height of each rectangle
represents the category’s frequency or
relative frequency.
Descriptive Statistics 10

ORGANIZING QUALITATIVE DATA


2. CONSTRUCT BAR GRAPHS: A PARETO CHART

A Pareto chart is a bar


graph whose bars are
drawn in decreasing order
of frequency or relative
frequency.
Descriptive Statistics 11

ORGANIZING QUALITATIVE DATA


2. CONSTRUCT BAR GRAPHS: SIDE-BY-SIDE BAR GRAPHS

Side-by-side bar graphs can


also be used to compare
two different data sets.
Descriptive Statistics 12

ORGANIZING QUALITATIVE DATA


2. CONSTRUCT BAR GRAPHS: Horizontal bar graphs

Horizontal bar graphs can be


drawn when the category names
are lengthy.
Descriptive Statistics 13

ORGANIZING QUALITATIVE DATA


3. CONSTRUCT PIE CHARTS
Pie charts are used to present the relative frequency of qualitative data.

A pie chart is a circle divided


into sectors. Each sector
represents a category of
data. The area of each sector
is proportional to the
frequency of the category.
ORGANIZING QUANTITATIVE
DATA
15

IN SUMMARIZING QUANTITATIVE
DATA, WE FIRST DETERMINE
WHETHER THE DATA IS
DISCRETE OR CONTINUOUS.
Descriptive Statistics 16

1. ORGANIZE DISCRETE DATA IN TABLES


If there are relative few different values, then the categories (called classes) will be the
observations (just like qualitative data).

Use the values of the discrete


variable to create the classes
when the number of distinct
data values is small.
Descriptive Statistics 17

2. CONSTRUCT HISTOGRAMS OF DISCRETE DATA


The histogram is similar with a bar graph.

A histogram is constructed by
drawing rectangles for each class
of data. The height of each
rectangle is the frequency or
relative frequency of the class. The
width of each rectangle is the
same and the rectangles touch
each other.

Relative frequency can also be used to


create a histogram.
Descriptive Statistics 18

3. ORGANIZE CONTINUOUS DATA IN TABLES


Classes are categories into which numerical data are grouped.

Lower class limit is Upper class limit is


the smallest value the largest value
within the class. within the class.

Note: Open-ended tables


Class width is the are tables where the first
difference between class has no lower class
consecutive lower limit or the last class has
class limits no upper class limit.
45-40=5
Descriptive Statistics 19

3. ORGANIZE CONTINUOUS DATA IN TABLES


GUIDELINES IN CONSTRUCTING CONTINUOUS DATA TABLES

1. Decide on the number of classes. Generally, there should be between 5 and 20


classes. You can also use the 2k rule.
2𝑘 ≥ 𝑛
Where k= number of classes
n= sample size
2. Determine the class width by computing
𝑀𝑎𝑥𝑖𝑚𝑢𝑚 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Round up this value to a more convenient number.

3. Choose a value below the smallest value as the lower class limit for the first class
limit.
Descriptive Statistics 20

3. ORGANIZE CONTINUOUS DATA IN TABLES


EXAMPLE

1. The sample size is n=50.


2. Using the 2k rule.
𝟐𝟔 ≥ 𝟓𝟎
3. The class width is:
322.61 − 65
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ = = 42.94 ≈ 43
6
Descriptive Statistics 21

3. ORGANIZE CONTINUOUS DATA IN TABLES


EXAMPLE
All the classes should include all the members of
the data set.
4. Starting with 65,
65 – 107.99 Extend the significant figures based on the given
108 – 150.99 data.
151 – 193.99
194 – 236.99 Once you have established your table, count the
237 – 279.99 number of data per category.
280 – 322.99
Descriptive Statistics 22

4. CONSTRUCT HISTOGRAMS OF CONTINUOUS DATA


Once you have created your table, you can create a histogram. Unlike the bar graph, a
histogram has no spaces between the bars.

Create a histogram using a bar chart Create a histogram using the EXCEL
in Excel (start at 12:14) histogram chart
Descriptive Statistics 23

5. DRAW A DOT PLOT


A dot plot is drawn by placing each observation horizontally in increasing order and
placing a dot above the observation each time it is observed.

A dot plot is generally used for


discrete data. It can also be
used for continuous data if
there are relatively few values.
Descriptive Statistics 24

5. CONSTRUCT A FREQUENCY POLYGON


Class Midpoint – the sum of consecutive
lower class limits divided by 2.
Frequency polygon – a graph that uses
points, connected by line segments, to
represent the frequencies for the classes. It
is constructed by plotting a point above
each class midpoint on a horizontal axis at
a height equal to the frequency of the
class. Next, line segments are drawn
connecting consecutive points. Two
additional line segments are drawn
connecting each end of the graph with the
horizontal axis.
Descriptive Statistics 25

5. CONSTRUCT CUMULATIVE FREQUENCY TABLES


Since quantitative data can be ordered, they can be summarized using cumulative
frequency.

A cumulative frequency
distribution displays the
aggregate frequency of the
category.
A cumulative relative
frequency distribution
displays the aggregate
frequency of the category.
Descriptive Statistics 26

5. CONSTRUCT AN OGIVE
An ogive (read as “oh jive”) is a
graph that represents the
cumulative frequency or
cumulative relative frequency
for the class. It is constructed by
plotting points whose x-
coordinates are the upper class
limits and whose y-coordinates
are the cumulative frequencies
or cumulative relative
frequencies of the class.
Descriptive Statistics 27

IDENTIFY THE SHAPE OF THE DISTRIBUTION


One way to
describe a variable
is through the
shape of its
distribution.
GRAPHICAL
MISREPRESENTATIONS OF
DATA
Descriptive Statistics 29

WHAT CAN MAKE A GRAPH MISLEADING?


One method of distorting the truth through graphics.

A misleading graphic unintentionally creates a wrong impression.


A deceiving graphic purposely creates an incorrect impression.

Common Graphical Misrepresentations


a) Inconsistent scale
b) Misplaced origin

*Increments between tick marks should be constant.


Descriptive Statistics 30

MISREPRESENTATION OF DATA
A home security company located in Minneapolis,
Minnesota, develops a summer ad campaign with the slogan
“When you leave for vacation, burglars leave for
work.” According to the city of Minneapolis, roughly 20% of
home burglaries occur during the peak vacation months of
July and August. The advertisement contains the
graphic shown in Figure 20. Explain what is wrong with the
graphic.
Descriptive Statistics 31

MISREPRESENTING DATA
Descriptive Statistics 32

MANIPULATING THE VERTICAL


SCALE
A national news organization developed the graphic shown
below to illustrate the change in the highest marginal tax rate
effective January 1, 2018. Why might this graph be considered
misleading?
Descriptive Statistics 33

MANIPULATING THE VERTICAL


SCALE
Descriptive Statistics 34

MISLEADING GRAPHS

Soccer seems to be losing popularity as a sport in the United


States. In 2010, there were approximately 14 million
participants in the United States. By 2017, this number had
dropped to 12 million. To illustrate this decrease, we could
create a graphic like the one shown below. Describe how the
graph may be misleading.
Descriptive Statistics 35

MISLEADING GRAPHS

In order to avoid misleading graphs, one can add data values or choose a
consistent unit to depict the values.
Descriptive Statistics 36

MISREPRESENTATIONS IN 3D
SCALE
Descriptive Statistics 37

MISREPRESENTATION IN 3D SCALE
Descriptive Statistics 38

GUIDELINES FOR CONSTRUCTING GOOD GRAPHICS


• Title and label the graphic axes clearly, providing explanations if needed.
• Include units of measurement and a data source when appropriate.
• Avoid distortion. Never lie about the data.
• Minimize the amount of white space in the graph. Use the available space to let
the data stand out. If you truncate the scales, clearly indicate this to the reader.
• Avoid clutter, such as excessive gridlines and unnecessary backgrounds or pictures.
• Don’t distract the reader.
• Avoid three dimensions. Three-dimensional charts may look nice, but they distract
the reader and often lead to misinterpretation of the graphic.
• Do not use more than one design in the same graphic. Sometimes graphs use a
different design in one portion of the graph to draw attention to that area.
• Don’t try to force the reader to any specific part of the graph. Let the data speak for
themselves.
• Avoid relative graphs that are devoid of data or scales.
NUMERICALLY
SUMMARIZING YOUR DATA
Measures of Central Tendency
Measures of Dispersion
Grouped Data
Measures of Positions and Outliers
The Five-Number Summary and Boxplots
MEASURES OF CENTRAL
TENDENCY
Descriptive Statistics – Measures of Central Tendency 41

MEASURES OF CENTRAL TENDENCY


A measure of central tendency numerically describes the
average or typical data value.

The three most widely-used measures of central tendency are the


mean, median and mode.
Descriptive Statistics – Measures of Central Tendency 42

ARITHMETIC MEAN
The arithmetic mean is computed by adding all the values of the variable in
the data set and dividing the number of observations.

If the arithmetic mean is computed using all the individuals in the population,
it is a parameter.
σ 𝑥𝑖
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 (𝜇) =
𝑁
σ 𝑥𝑖
𝑆𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 (𝑥)ҧ =
𝑛

N is the population size; n = sample size


Descriptive Statistics – Measures of Central Tendency 43

MEDIAN
Median of a variable is a value that lies in the middle of the data when
arranged in ascending order. We use M to represent the median.

If the number of observations, n, is ODD


𝑛+1
The median is the observation that lies in the position.
2

If the number of observations, n, is EVEN


𝑛 𝑛
The median is the mean of the observations that lies in the and +1
2 2
positions
Descriptive Statistics 44

EXAMPLE

Jonas wants to know how time he typically


spends on his cell phone. He goes to his phone’s
website and randomly selects 12 calls. Find the
mean and median.

1 7 4 1
2 4 3 48
3 5 3 6
Descriptive Statistics 45

EXAMPLE

Solve for the mean:

1 + 7 + 4 + 1 + 2 + 4 + 3 + 48 + 3 + 5 + 3 + 6
𝑥ҧ =
12
𝑥ҧ = 7.25
Descriptive Statistics 46

EXAMPLE

Solve for the median:


1. Arrange the data in order.
1 1 2 3 3 3 4 4 5 6 7 48
2. Since n=12, it is even.
𝑛 𝑛
3. Identify the values at the = 6 and +1=7
2 2
positions
3 and 4
4. Find the mean of the two values:
3+4
𝑀= = 3.50
2
Descriptive Statistics – Measures of Central Tendency 47

WHAT DOES A RESISTANT STATISTIC MEAN?


A numerical summary of data is said to be resistant if extreme observations
(very large or small) relative to the data do not affect its value substantially.

Therefore, the median is resistant while the mean is not resistant.

When data are skewed, there are extreme values that would tend to pull the
mean in the direction of the tail.
Descriptive Statistics 48

EXAMPLE

Jonas wants to know how time he typically


spends on his cell phone. He goes to his phone’s
website and randomly selects 12 calls. Find the
mean and median.

1 7 4 1
2 4 3 48
3 5 3 6

An extremely large value of 48 led to a much


bigger mean compared to the median.
Descriptive Statistics – Measures of Central Tendency 49

MODE
The mode of a variable is the most frequent observation of that variable that
occurs in the data set.

The mode can be used for both quantitative or qualitative data

To determine the mode, just tally the


number of observations for each data
value. The data value that occurs most
often is the mode.
A data set can have one mode, two
modes (bimodal) , or more than one
mode (multimodal), or no mode.
MEASURES OF DISPERSION
Descriptive Statistics – Measures of Dispersion 51

MEASURES OF DISPERSION
Dispersion is the degree to which the data are spread out. Sometimes, the
measures of central tendency is not enough to describe a distribution.

To quantify the spread of data, we have the following measures of dispersion:


a) Range
b) Standard deviation
c) Variance
Descriptive Statistics 52

EXAMPLE

Both data sets have the approximately the same


central tendency, however, University A has a more
spread out distribution compared to University B.
Descriptive Statistics – Measures of Dispersion 53

RANGE
The range is the simplest measure of dispersion.

The range is simply the difference between the largest and smallest data value.
𝑅𝑎𝑛𝑔𝑒 = 𝑅 = 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 − 𝑚𝑖𝑛𝑖𝑚𝑢𝑚

The range is affected by extreme values so it is not a resistant statistic.


Descriptive Statistics – Measures of Dispersion 54

STANDARD DEVIATION
Measures of dispersion are meant to describe how spread out data are. In other
words, they describe how far, on average, each observation is from the typical
data value.

The standard deviation is based on the deviation about the mean. The
further a data is from the mean, the larger will be the absolute value of the
deviation.
Ideally, the sum of all deviations about the mean must equal zero.
Note: The standard deviation is not resistant.
Descriptive Statistics – Measures of Dispersion 55

STANDARD DEVIATION
The population standard deviation of a variable is the square root of the
sum of squared deviations about the population mean divided by the number
of observations in the population, N. That is, it is the square root of the mean
of the squared deviations about the population mean.

σ 𝑥𝑖 2
σ 𝑥𝑖 − 𝜇 2 σ 𝑥𝑖2 −
𝜎= = 𝑁
𝑁 𝑁
Descriptive Statistics – Measures of Dispersion 56

STANDARD DEVIATION
The sample standard deviation of a variable is the square root of the sum of
squared deviations about the sample divided by n-1, where n is the sample size

σ 𝑥𝑖 2
σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑥𝑖2 −
𝑠= = 𝑛
𝑛−1 𝑛−1

Where n-1 = degrees of freedom


Descriptive Statistics – Measures of Dispersion 57

INTERPRETING STANDARD DEVIATIONS


✓ The standard deviation can be used to judge a particular value is “far away”
from the mean
✓ Values are significantly far away if they are more than 2 standard
deviations away from the mean
✓ When comparing two populations, the larger the standard deviation, the
greater the dispersion or spread of the distribution, provided that the
value of interest has the same unit of measure.
Descriptive Statistics – Measures of Dispersion 58

VARIANCE
The variance is simply the square of the standard deviation.

σ 𝑥𝑖 2
2
σ 𝑥𝑖 − 𝑥ҧ 2 σ 𝑥𝑖 −
2
𝑠 = = 𝑛
𝑛−1 𝑛−1

Where n-1 = degrees of freedom


σ 𝑥𝑖 2
σ 𝑥𝑖 − 𝜇 2 σ 𝑥𝑖2 −
𝜎= = 𝑁
𝑁 𝑁
Descriptive Statistics – Measures of Dispersion 59

EMPIRICAL RULE FOR BELL-SHAPED DISTRIBUTION

For data that has a bell-


shaped distribution, the
empirical rule or the
68-95-99.7 rule can be
used to estimate the
percentage of data
within k standard
deviations from the
mean.
Descriptive Statistics 60

EXAMPLE
Given a bell-shaped distribution with a sample mean of
40 and a standard deviation of 10,
a) What is the percentage of observations that will have
a value between 20 and 60?
b) What is the percentage of observations that has a
value less than 10 and greater than 60?
Descriptive Statistics 61

EXAMPLE
The weight of an organ in adult males has a bell-shaped
distribution with a mean of 300 grams and a standard
deviation of 45 grams.
(a) About 99.7% of organs will be between what weights?
(b) What percentage of organs weighs between 255
grams and 345 grams?
(c) What percentage of organs weighs less than 255 grams
or more than 345 grams?
(d) What percentage of organs weighs between 255
grams and 390 grams?
Descriptive Statistics – Measures of Dispersion 62

CHEBYSHEV’S INEQUALITY

Russian mathematician Pafnuty Chebyshev developed an


inequality that determines a minimum percentage of observations
that lie with k standard deviations of the mean, where k>1.
Result is applicable for any shape of distribution.
1
For any data set or distribution, at least 1 − 2 ∙ 100% of the observation lie
𝑘
within k standard deviations of the mean.
Descriptive Statistics 63

EXAMPLE: USE CHEBYSHEV’S


INEQUALITY
Given a bell-shaped distribution with a sample mean of
40 and a standard deviation of 10,
a) What is the percentage of observations that will have
a value between 20 and 60?
Descriptive Statistics 64

EXAMPLE: USE CHEBYSHEV’S


INEQUALITY
The mean of the commute time to work for a resident of a
certain city is 28.8 minutes. Assume that the standard deviation
of the commute time is 7.1 minutes.
(a) What minimum percentage of commuters in the city has a
commute time within 2 standard deviations of the mean?
(b) What minimum percentage of commuters in the city has a
commute time within 2.5 standard deviations of the mean? What
are the commute times within 2.5 standard deviations of the
mean?
(c) What is the minimum percentage of commuters who have
commute times between 7.5 minutes and 50.1 minutes
MEASURES FOR GROUPED
DATA
Descriptive Statistics – Measures of Dispersion 66

MEASURES FOR GROUPED DATA

Sometimes, data provided has already been summarized and


grouped into frequency distributions (grouped data). Although
we cannot find the exact measures of central tendencies or
dispersion, we can still approximate them.
Descriptive Statistics – Measures of Dispersion 67

MEAN FOR GROUPED DATA

Because raw data cannot be retrieved from the distribution table,


it is assumed that the within each class the mean of data is equal
to the class midpoint.

The mean for grouped data is computed as follows:


σ 𝑥𝑖 𝑓𝑖
𝜇 = 𝑥ҧ =
σ 𝑓𝑖
Where 𝑥𝑖 is the midpoint of ith class
𝑓𝑖 is the frequency of the ith class
Descriptive Statistics – Measures of Dispersion 68

STANDARD DEVIATION FOR GROUPED DATA


The standard deviation for grouped data is computed as follows:
σ 𝑥𝑖 𝑓𝑖 2
σ 𝑥𝑖 − 𝜇 2 𝑓𝑖 σ 𝑥𝑖2 𝑓𝑖 −
σ 𝑓𝑖
𝜎= =
σ 𝑓𝑖 σ 𝑓𝑖

σ 𝑥𝑖 − 𝑥ҧ 2 𝑓𝑖
𝑠=
σ 𝑓𝑖 − 1

Where 𝑥𝑖 is the midpoint of ith class


𝑓𝑖 is the frequency of the ith class
Descriptive Statistics – Measures of Dispersion 69

WEIGHTED MEAN

When observations have different importance or weight associated


with them, the weighted mean is computed.
The weighted mean is found by multiplying each value of the variable by its
corresponding weight and dividing with the sum of the weights:
σ 𝑤𝑖 𝑥𝑖
𝑥𝑤 =
σ 𝑤𝑖
Where 𝑤𝑖 is the weight of ith observation
𝑥𝑖 is the value of the ith observation
Descriptive Statistics 70

EXAMPLE
Find the mean and standard deviation for the following
distribution:
Class Frequency
8-8.99 2
9-9.99 2
10-10.99 4
11-11.99 1
12-12.99 6
13-13.99 13
14-14.99 7
Descriptive Statistics 71

EXAMPLE
Class fi 𝒙𝒊 𝒇𝒊 𝒙𝒊
8-8.99 2 8+9 ?
=2
2
9-9.99 2 ?
10-10.99 4 ?
11-11.99 1 ?
12-12.99 6 ?
13-13.99 13 ?
14-14.99 7
SUM 35 ?
σ 𝑥𝑖 𝑓𝑖 ?
𝜇 = 𝑥ҧ = = =
σ 𝑓𝑖 35
Descriptive Statistics 72

EXAMPLE
Class fi 𝒙𝒊 𝒇𝒊 𝒙𝒊 ഥ
𝒙𝒊 − 𝒙 𝟐 ഥ 𝟐 𝒇𝒊
𝒙𝒊 − 𝒙
8-8.99 2 8.5
9-9.99 2 9.5
10-10.99 4 10.5
11-11.99 1 11.5
12-12.99 6 12.5
13-13.99 13 13.5
14-14.99 7 14.5
SUM 25

σ 𝑥𝑖 − 𝑥ҧ 2 𝑓𝑖 ?
𝑠= = =
σ 𝑓𝑖 − 1 35 − 1
MEASURES OF POSITIONS
AND OUTLIERS
Descriptive Statistics – Measures of Dispersion 74

THE Z-SCORE

The z-score represents the distance that a data value is from the
mean in terms of standard deviations. It is unitless.
Z-score
𝒙−𝝁
𝒛=
𝝈
It can be used for sample and population. Just use 𝑥ҧ and s for sample.
A z-score has a mean of 0 and standard deviation of 1.

A positive z-score indicates that the value is larger than the mean.
A negative z-score indicates that the value is smaller than the
mean.
Descriptive Statistics 75

INTERPRETING Z-SCORES

Z-scores can be used to compare apples to


oranges by converting variables with
different centers or spreads to variables
with the same center (0) and spread (1).
Descriptive Statistics 76

EXAMPLE

Mara and Clara were taking Chemistry


classes together and wanted to compare
each others performance. Mara’s teacher
was quite strict. In her class, the average
score for the midterm is 73 with a standard
deviation of 10. Mara scored 85. In Clara’s
class, the average is 84 with a standard
deviation of 7. Clara scored 90.
Who did better?
Descriptive Statistics 77

EXAMPLE

For Mara:
𝒙−𝝁
𝒛=
𝝈
For Clara
𝒙−𝝁
𝒛=
𝝈
Descriptive Statistics – Measures of Dispersion 78

PERCENTILES AND QUARTILES

Recall that the median divides the lower 50% of a set of data from
the upper 50%.
Percentiles divide a set of data written in ascending order into 100 parts.
The kth percentile, denoted 𝑃𝑘 , of a set of data is a value such that k percent of
the observation are less than or equal to the value.

Quartiles divide a set of data written in ascending order into 4 parts.

Deciles divide a set of data written in ascending order into 10 parts.


Presentation title
Frequently Used Percentiles 79

50% - -→ 50%

Lowest Data Median


Value 50% value
Quartiles
25% 25% 25% 25%
Q1 Q2 Q3

Deciles 1/10

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
Descriptive Statistics – Measures of Dispersion 80

PERCENTILE COMPUTATION

To formalize the computational procedure, let Lp refer to the


location of a desired percentile.
𝑃
𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒: 𝐿𝑝 = (𝑛 + 1)
100
Descriptive Statistics 81

EXAMPLE
Listed below are the commissions earned
last month by a sample of 15 brokers at
Salomon Smith Barney’s Oakland,
California, office.
$2,038 $1,758 $1,721 $1,637
$2,097 $2,047 $2,205 $1,787
$2,287 $1,940 $2,311 $2,054
$2,406 $1,471 $1,460
Locate the 25th percentile (1st quartile), and
the 75th percentile (third quartile) for the
commissions earned.
Descriptive Statistics – Measures of Dispersion 82

INTERPRETING THE INTERQUARTILE RANGE

The IQR is the range of the middle 50% of the observations in a


data set. That is, the IQR is the difference between the third and
first quartiles and is found using the formula:

𝐼𝑄𝑅 = 𝑄3 − 𝑄1
The more spread out the data is, the higher will be the
interquartile range.
Descriptive Statistics – Measures of Dispersion 83

CHECKING FOR OUTLIERS

Extreme observations are referred to as outliers. Outliers can result


from erroneous measurement. Sometimes, extreme measurements
are common in some distributions.

Fences – serve as cutoff points for determining outliers.


𝐿𝑜𝑤𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄1 − 1.5 𝐼𝑄𝑅
𝑈𝑝𝑝𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄3 + 1.5(𝐼𝑄𝑅)
If a data value is smaller than the lower fence or greater than the
upper fence, it is an outlier.
Descriptive Statistics – Measures of Dispersion 84

CHECKING FOR OUTLIERS

Extreme observations are referred to as outliers. Outliers can result


from erroneous measurement. Sometimes, extreme measurements
are common in some distributions.

Fences – serve as cutoff points for determining outliers.


𝐿𝑜𝑤𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄1 − 1.5 𝐼𝑄𝑅
𝑈𝑝𝑝𝑒𝑟 𝐹𝑒𝑛𝑐𝑒 = 𝑄3 + 1.5(𝐼𝑄𝑅)
If a data value is smaller than the lower fence or greater than the
upper fence, it is an outlier.
Descriptive Statistics – Measures of Dispersion 85

FIVE-NUMBER SUMMARY

The range, standard deviation and variance are not resistant to


outliers, but the Interquartile range is.

Five Number Summary


Minimum 𝑄1 Median 𝑄3 Maximum
Descriptive Statistics – Measures of Dispersion 86

BOXPLOT

The five-number summary can be used to create another graph,


called the boxplot.
Descriptive Statistics – Measures of Dispersion 87

USING THE BOXPLOT TO DESCRIBE THE SHAPE


Descriptive Statistics – Measures of Dispersion 88

EXAMPLE
The data below represent the age of the mother at the time of her first birth for a random sample of
30 mothers.

Construct a boxplot of the data and describe the shape of the distribution.
Descriptive Statistics 89

PROBLEMS
1. The weight, in grams, of the pair of kidneys in adult males
between the ages of 40 and 49 has a bell-shaped distribution with
a mean of 325 grams and a standard deviation of 30 grams.
(a) About 95% of kidney pairs will be between what weights?
(b) What percentage of kidney pairs weighs between 235
grams and 415 grams?
(c) What percentage of kidney pairs weighs less than 235
grams or more than 415 grams?
(d) What percentage of kidney pairs weighs between 295
grams and 385 grams
Descriptive Statistics 90

[Link] boxplot likely


has a larger standard
deviation? Why?
[Link] the shape of
the distribution of I
and II.
REFERENCE
Chapter 2, Statistics: Informed
Decisions using Data with
Integrated Review by Michael
Sullivan III

You might also like