0% found this document useful (0 votes)
5 views53 pages

Chapter Two: Descriptive Methods (Continued)

This document discusses four common measures of variability in data: range, mean deviation, variance, and standard deviation. It provides examples and definitions for range (exclusive and inclusive), mean deviation (based on absolute values of deviations from the mean), and variance (based on squared deviations from the mean, with the sample variance statistic using n-1 in the denominator to remove bias). Variance is presented as the most useful measure for inference, while all four are useful in different circumstances.

Uploaded by

Fiza Mushtaq
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views53 pages

Chapter Two: Descriptive Methods (Continued)

This document discusses four common measures of variability in data: range, mean deviation, variance, and standard deviation. It provides examples and definitions for range (exclusive and inclusive), mean deviation (based on absolute values of deviations from the mean), and variance (based on squared deviations from the mean, with the sample variance statistic using n-1 in the denominator to remove bias). Variance is presented as the most useful measure for inference, while all four are useful in different circumstances.

Uploaded by

Fiza Mushtaq
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter Two: Descriptive Methods (continued)

1/53
Measures of Variability

It is important to be able to quantify the degree of spread or scatter in a


data set. Measures of this sort are referred to as measures of variability or
dispersion. There are many such measures. We will consider four of these,
(a) range, (b) mean deviation, (c) variance, and (d) standard deviation.
While all of these are useful in given circumstances, the last two are by far
the most important and will form the foundation of many of the methods
you will study.

2.6 Numerical Methods (continued) 2/53


The Range

The range is a function of only the largest and smallest scores in a data
set. Two forms of the range are often identified. Namely, the exclusive
and inclusive range. The exclusive range is the more often used form.

2.6 Numerical Methods (continued) 3/53


Exclusive Range

The exclusive range is defined as the difference between the largest and
smallest scores in the data set or more formally

Range (exclusive) = xL − xS

where xL and xS are the largest and smallest scores in the data set
respectively.

2.6 Numerical Methods (continued) 4/53


Example

The exclusive range of the numbers 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5 is

5−3=2

2.6 Numerical Methods (continued) 5/53


Inclusive Range

The inclusive range takes into account the upper and lower real limits of
the highest and lowest scores and is expressed as

Range (inclusive) = URLL − LRLS

where URLL and LRLS are the upper real limit of the largest and lower
real limit of the smallest scores in the data set respectively.

2.6 Numerical Methods (continued) 6/53


Example

The inclusive range of the numbers 3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5 is

5.5 − 2.5 = 3

2.6 Numerical Methods (continued) 7/53


The Mean Deviation

Like the range, the mean deviation is a highly intuitive measure of


variability. Unlike the range, however, mean deviation takes into account
all the data for which variability is to be assessed thereby making it a more
stable statistic.

2.6 Numerical Methods (continued) 8/53


Deviation Score

As with some other measures of variability as well as some other statistics,


mean deviation is based on what are termed deviation scores or simply
deviations. A deviation score is defined as

x − x̄
where x is the score whose deviation is to be calculated and x̄ is the data
set mean. Obviously, a deviation gives the number of units between a
score and the data set mean. When data are closely “clumped” around the
mean, deviations tend to be small. For data that are more spread out,
deviations will be larger.

2.6 Numerical Methods (continued) 9/53


Mean of Deviation Scores

Plausibly, a reasonable representation of variability could be based on the


average of these deviations. When data are more spread out, the average
of the deviations would be larger than for data with less spread. The
difficulty with using deviations in this manner is that they always sum to
zero. That is,
X
(x − x̄) = 0
which makes the mean always equal to zero as well.

2.6 Numerical Methods (continued) 10/53


The Mean Deviation

The summation to zero problem can be overcome by taking the absolute


values of the deviations. This then is the rationale for mean deviation.
Mean deviation (MD) is then the average of the absolute values of the
deviations of a set of scores. Or,
P
|x − x̄|
MD =
n

2.6 Numerical Methods (continued) 11/53


Example
To find the mean deviation of the scores 9, 3, 3, and 1, we first note that
P
x 16
x̄ = = =4
n 4
We now calculate the deviation scores and their absolute values as shown.

(1) (2) (3)


x x − x̄ |x − x̄|

9 5 5
3 −1 1
3 −1 1
1 −3 3
P
16 0 10

2.6 Numerical Methods (continued) 12/53


Example (continued)

The mean deviation is then


P
|x − x̄| 10
MD = = = 2.5
n 4

2.6 Numerical Methods (continued) 13/53


Variance

Variance is a less intuitive but generally much more useful measure of


variability than is the range or mean deviation. As a descriptive statistic
variance is less appealing than is mean deviation but is generally more
useful because of its role in inference as will be seen. Like mean deviation,
variance uses deviations as its basis but squares them rather than using
absolute values.

2.6 Numerical Methods (continued) 14/53


Variance-Parameter Form

The parameter form of variance is given by

(x − µ)2
P
2
σ =
N

Thus the parameter form of variance is the average of the squared


deviations of the scores that make up the population.

2.6 Numerical Methods (continued) 15/53


Variance-Statistic Form

The statistic form of variance is given by

(x − x̄)2
P
2
s =
n−1

Notice that the devisor for the statistic is n − 1 while that for the
parameter is N. This derives from the fact that s 2 is commonly used as an
estimate of σ 2 in inferential settings. It can be shown that if the devisor of
s 2 were n, the resulting estimate would be biased. That is, on average the
value of s 2 would be smaller than σ 2 . By dividing by n − 1 this bias is
removed making s 2 a better estimate of σ 2 . This formulation of sample
variance is termed a conceptual form because the expression shows the
deviation score nature of s 2 .

2.6 Numerical Methods (continued) 16/53


Alternative Expression For s 2

An algebraically equivalent expression for s 2 that is useful for


computations is given by

x)2
P
x2 − (
P
s2 = n
n−1

2.6 Numerical Methods (continued) 17/53


Example

To calculate the variance of the values 9, 3, 3, and 1 via the conceptual


method we arrange the data as follows.

(1) (2) (3)


x x − x̄ (x − x̄)2

9 5 25
3 −1 1
3 −1 1
1 −3 9
P
16 0 36

2.6 Numerical Methods (continued) 18/53


Example (continued)

We then calculate

(x − x̄)2
P
2
s =
n−1
36
=
3
= 12

2.6 Numerical Methods (continued) 19/53


Example

To calculate the variance of the values 9, 3, 3, and 1 via the


computational
P method we note that
x = 9 + 3 + 3 + 1 = 16
and
P 2
x = 92 + 32 + 32 + 12 = 100
then

x)2
P
x2 − (
P
2 n
s =
n−1
2
100 − (16)
4
=
4−1
= 12

2.6 Numerical Methods (continued) 20/53


Standard Deviation

Standard deviation is defined as the square root of variance. It follows


from the equations for variance that the statistic form of standard
deviation can be represented by
s
(x − x̄)2
P
s=
n−1

and
s
x)2
P
x2 − (
P
n
s=
n−1

2.6 Numerical Methods (continued) 21/53


Example

To calculate the standard deviation of the values 9, 3, 3, and 1 via the


conceptual method we arrange the data as follows.

(1) (2) (3)


x x − x̄ (x − x̄)2

9 5 25
3 −1 1
3 −1 1
P 1 −3 9
16 0 36

2.6 Numerical Methods (continued) 22/53


Example (continued)

We then calculate

s
(x − x̄)2
P
s =
n−1
r
36
=
3
= 3.464

2.6 Numerical Methods (continued) 23/53


Example

To calculate the standard deviation of the values 9, 3, 3, and 1 via the


computational
P method we note that
x = 9 + 3 + 3 + 1 = 16
and
P 2
x = 92 + 32 + 32 + 12 = 100
then

s
x)2
P
x2 − (
P
n
s =
n−1
s
2
100 − (16)
4
=
4−1
= 3.464

2.6 Numerical Methods (continued) 24/53


Measures of Relative Position

Measures of relative position are methods that locate the relative positions
of observations in a distribution. To this end we will in turn examine
percentiles, percentile ranks, and z scores.

2.6 Numerical Methods (continued) 25/53


Percentiles

There is no standard definition for percentile. We will use the following. A


percentile is a point on the scale of measurement below which a specified
percentage of the observations are located. By this definition, the (scale
based) median can be defined as the fiftieth percentile.
An expression for a given percentile is provided by
 
(pr ) (n) − cf
Pp = LRL + (w )
f

where Pp represents the pth percentile, LRL is the lower real limit of the
interval that contains the pth percentile, w is the width of the interval
calculated as the difference between the upper and lower real limits of that
interval, pr is p expressed as a proportion (i.e., p/100), n is the total
number of observations, cf is the cumulative frequency up to the
percentile interval and f is the frequency of that interval.

2.6 Numerical Methods (continued) 26/53


Three Scenarios

When computing a percentile, three different scenarios are possible.


When a percentile interval is identified, the percentile formula is
applied.
When the sought proportion of observations falls below a real limit
with the interval above the limit having nonzero frequency, the real
limit is taken as the percentile.
When the sought proportion of observations falls below a real limit
with the interval above the limit having zero frequency, the midpoint
of the zero frequency interval(s) is taken as the sought percentile.

2.6 Numerical Methods (continued) 27/53


Example

Find P5 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

P5 is the point on the measurement scale below which 5% or


(.05) (60) = 3 of the observations fall. Because the cumulative frequency
at the upper real limit of .05 is 3, .05 is the 5th percentile.

2.6 Numerical Methods (continued) 28/53


Example

Find P20 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

Since (.2) (60) = 12 of the observations fall below any point between .15
and .35, the midpoint of .25 is taken as the 20th percentile.

2.6 Numerical Methods (continued) 29/53


Example

Find P60 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

Because the cumulative frequencies up to .35 and .45 are 12 and 38


respectively, the point below which (.60) (60) = 36 observations fall must
be in the .35 to .45 interval.

2.6 Numerical Methods (continued) 30/53


Example (continued)

Applying Equation 2.5 gives


 
(pr ) (n) − cf
P60 = LRL + (w )
f
 
(.60) (60) − 12
= .35 + (.10)
26
= .442

2.6 Numerical Methods (continued) 31/53


Percentile Rank

While a percentile is a point on the scale of measurement below which a


given percentage of observations fall, a percentile rank is the percentage
of observations that fall below a given point on the scale. Thus,
percentiles are points and percentile ranks are percentages. Because of
their close relationship, they are often confused.

2.6 Numerical Methods (continued) 32/53


Percentile Rank (continued)

When the scale point whose percentile rank is sought falls in a nonzero
frequency interval, the following equation can be used to find percentile
ranks. h i
f (P−LRL)
100 w + cf
PRP =
n
where P is the point on the scale for which the percentile rank is to be
calculated, LRL is the lower real limit of the interval containing P, w is
the width of the interval calculated as the difference between the upper
and lower real limits of that interval, n is the total number of observations,
cf is the cumulative frequency up to the interval containing P and f is the
frequency of that interval.

2.6 Numerical Methods (continued) 33/53


Percentile Rank (continued)

When the point falls at the upper (lower) real limit of an interval or in an
interval with zero frequency, we use

cf
PRP = 100
n

where cf is the cumulative frequency up to the point.

2.6 Numerical Methods (continued) 34/53


Example
Find the percentile rank of the point .05 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

Noting that .05 is located at the upper real limit of the −.05 to .05,
interval we calculate
cf 3
PR.05 = 100 = 100 = 5
n 60

2.6 Numerical Methods (continued) 35/53


Example

Find the percentile rank of the point .25 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

Noting that .25 is located in an interval with zero frequency, we calculate


cf 12
PR.25 = 100 = 100 = 20
n 60

2.6 Numerical Methods (continued) 36/53


Example
Find the percentile rank of the point .442 for the data provided below.

Cumulative
Score Frequency Frequency

.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3

Noting that .442 is located in the interval .35 to .45, we calculate


h i h i
100 f (P−LRL)
w + cf 100 26(.442−.35)
.1 + 12
PR.442 = = = 60
n 60

2.6 Numerical Methods (continued) 37/53


Percentile Ranks of Scores

Unfortunately, data analysts are usually not interested in finding the


percentile rank of a point on the measurement scale, but rather want to
know the percentile rank of a score or observation. This presents a
difficulty since, scores are not points on the scale but rather, represent
intervals. How then do you find the percentile rank of a score?

2.6 Numerical Methods (continued) 38/53


Percentile Ranks of Scores

Basically, you must choose a point on the scale to represent the score.
Three common choices are used.
1 The lower real limit of the score interval.
2 The midpoint of the score interval.
3 The upper real limit of the score interval.

2.6 Numerical Methods (continued) 39/53


z Scores

Percentiles locate points relative to all the observations in a data set. By


contrast, z scores locate points relative to the mean of the data. More
precisely, a z score indicates the distance and direction of a point from the
mean in terms of standard units.

2.6 Numerical Methods (continued) 40/53


z Scores (continued)

A sample z score is calculated by

x − x̄
z=
s

While the population equivalent is given by

x −µ
Z=
σ

2.6 Numerical Methods (continued) 41/53


Example

Convert the set of scores 1, 3, 3, and 9 to z scores. Then find the mean
and standard deviation of the z scores.
The mean and standard deviation of the original data are, respectively, 4
and 3.46. Using these values in Equation 2.2.3 gives
1−4
z1 = 3.46 = −.867
3−4
z3 = 3.46 = −.289
3−4
z3 = 3.46 = −.289
9−4
z9 = 3.46 = 1.445

2.6 Numerical Methods (continued) 42/53


Example (continued)

Because the sum of these scores is zero, the mean is also zero. With mean
zero, Equation 2.15 simplifies to
sP r
z2 3
s= = =1
n−1 3

2.6 Numerical Methods (continued) 43/53


Measures of Distribution Shape

Certain aspects of distribution shapes can be characterized numerically.


The two most common of these, skew and kurtosis, will be discussed here

2.6 Numerical Methods (continued) 44/53


Skew

Various methods have been developed to numerically describe the amount


of skew (or lack thereof) that characterizes a distribution. Skewis
generally defined as the degree of asymmetry in a distribution.

2.6 Numerical Methods (continued) 45/53


Skew (continued)

A common measure of skew is given by

z3
P
Skew =
n

where z is the standardized deviation as described by Equation 2.2.3 and n


is the sample size. Stated differently, this expression of skew is simply the
average of the cubed z scores.

2.6 Numerical Methods (continued) 46/53


Skew (continued)

When this expression results in a negative or positive value, the


distribution is said to be negatively or positively skewed respectively. When
the value is zero the distribution is said to be symmetric. The following
are depictions of each type.

2.6 Numerical Methods (continued) 47/53


Skew (continued)

A 5
4 Symmetric

Frequency
3
2
1
0.0
1 2 3 4 5 6 7
B 5
4 Negatively
Frequency
Skewed
3
2
1
0.0
1 2 3 4 5 6 7
C 5
4 Positively
Frequency

Skewed
3
2
1
0.0
1 2 3 4 5 6 7

2.6 Numerical Methods (continued) 48/53


Example

Calculate skew for the values 1, 3, 3 and 9.


Solution
The z scores of the data and their cubes are as follows.

Score z z3
1 -.867 -0.652
3 -.289 -0.024
3 -.289 -0.024
9 1.445 3.017
P
0.000 2.317

z3
P
2.317
Skew = = = .579
n 4

2.6 Numerical Methods (continued) 49/53


Kurtosis

Kurtosis refers to the peakedness of a distribution relative to the length


and size of its tails. Distributions with sharp peaks are said to be
leptokurtic while those that have flattened middles are said to be
platykurtic. Kurtosis pertains to distributions with no more than one
mode.

2.6 Numerical Methods (continued) 50/53


Kurtosis (continued)

10
A
8

Frequency
6
B
4
2
0.0
1 2 3 4 5 6 7

Distribution A is more peaked and has longer tails than does distribution B
and therefore has greater kurtosis.

2.6 Numerical Methods (continued) 51/53


Kurtosis (continued)

z4
P
Kurtosis =
n

Notice that while skewness was expressed as the average of the cubed z
scores in a distribution, kurtosis is the average of z scores raised to the
fourth power. In general, larger kurtosis values reflect sharper peaks than
do smaller values.

2.6 Numerical Methods (continued) 52/53


Example

Calculate kurtosis for the values 1, 3, 3 and 9.


Solution
The z scores of the data with their values raised to the fourth power are as
follows.

Score z z4
1 -.867 0.565
3 -.289 0.007
3 -.289 0.007
9 1.445 4.360
P
0.000 4.939

z4
P
4.939
Kurtosis = = = 1.235
n 4

2.6 Numerical Methods (continued) 53/53

You might also like