Chapter Two: Descriptive Methods (Continued)
Chapter Two: Descriptive Methods (Continued)
1/53
Measures of Variability
The range is a function of only the largest and smallest scores in a data
set. Two forms of the range are often identified. Namely, the exclusive
and inclusive range. The exclusive range is the more often used form.
The exclusive range is defined as the difference between the largest and
smallest scores in the data set or more formally
Range (exclusive) = xL − xS
where xL and xS are the largest and smallest scores in the data set
respectively.
5−3=2
The inclusive range takes into account the upper and lower real limits of
the highest and lowest scores and is expressed as
where URLL and LRLS are the upper real limit of the largest and lower
real limit of the smallest scores in the data set respectively.
5.5 − 2.5 = 3
x − x̄
where x is the score whose deviation is to be calculated and x̄ is the data
set mean. Obviously, a deviation gives the number of units between a
score and the data set mean. When data are closely “clumped” around the
mean, deviations tend to be small. For data that are more spread out,
deviations will be larger.
9 5 5
3 −1 1
3 −1 1
1 −3 3
P
16 0 10
(x − µ)2
P
2
σ =
N
(x − x̄)2
P
2
s =
n−1
Notice that the devisor for the statistic is n − 1 while that for the
parameter is N. This derives from the fact that s 2 is commonly used as an
estimate of σ 2 in inferential settings. It can be shown that if the devisor of
s 2 were n, the resulting estimate would be biased. That is, on average the
value of s 2 would be smaller than σ 2 . By dividing by n − 1 this bias is
removed making s 2 a better estimate of σ 2 . This formulation of sample
variance is termed a conceptual form because the expression shows the
deviation score nature of s 2 .
x)2
P
x2 − (
P
s2 = n
n−1
9 5 25
3 −1 1
3 −1 1
1 −3 9
P
16 0 36
We then calculate
(x − x̄)2
P
2
s =
n−1
36
=
3
= 12
x)2
P
x2 − (
P
2 n
s =
n−1
2
100 − (16)
4
=
4−1
= 12
and
s
x)2
P
x2 − (
P
n
s=
n−1
9 5 25
3 −1 1
3 −1 1
P 1 −3 9
16 0 36
We then calculate
s
(x − x̄)2
P
s =
n−1
r
36
=
3
= 3.464
s
x)2
P
x2 − (
P
n
s =
n−1
s
2
100 − (16)
4
=
4−1
= 3.464
Measures of relative position are methods that locate the relative positions
of observations in a distribution. To this end we will in turn examine
percentiles, percentile ranks, and z scores.
where Pp represents the pth percentile, LRL is the lower real limit of the
interval that contains the pth percentile, w is the width of the interval
calculated as the difference between the upper and lower real limits of that
interval, pr is p expressed as a proportion (i.e., p/100), n is the total
number of observations, cf is the cumulative frequency up to the
percentile interval and f is the frequency of that interval.
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
Since (.2) (60) = 12 of the observations fall below any point between .15
and .35, the midpoint of .25 is taken as the 20th percentile.
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
When the scale point whose percentile rank is sought falls in a nonzero
frequency interval, the following equation can be used to find percentile
ranks. h i
f (P−LRL)
100 w + cf
PRP =
n
where P is the point on the scale for which the percentile rank is to be
calculated, LRL is the lower real limit of the interval containing P, w is
the width of the interval calculated as the difference between the upper
and lower real limits of that interval, n is the total number of observations,
cf is the cumulative frequency up to the interval containing P and f is the
frequency of that interval.
When the point falls at the upper (lower) real limit of an interval or in an
interval with zero frequency, we use
cf
PRP = 100
n
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
Noting that .05 is located at the upper real limit of the −.05 to .05,
interval we calculate
cf 3
PR.05 = 100 = 100 = 5
n 60
Find the percentile rank of the point .25 for the data provided below.
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
Cumulative
Score Frequency Frequency
.5 22 60
.4 26 38
.3 0 12
.2 0 12
.1 9 12
.0 3 3
Basically, you must choose a point on the scale to represent the score.
Three common choices are used.
1 The lower real limit of the score interval.
2 The midpoint of the score interval.
3 The upper real limit of the score interval.
x − x̄
z=
s
x −µ
Z=
σ
Convert the set of scores 1, 3, 3, and 9 to z scores. Then find the mean
and standard deviation of the z scores.
The mean and standard deviation of the original data are, respectively, 4
and 3.46. Using these values in Equation 2.2.3 gives
1−4
z1 = 3.46 = −.867
3−4
z3 = 3.46 = −.289
3−4
z3 = 3.46 = −.289
9−4
z9 = 3.46 = 1.445
Because the sum of these scores is zero, the mean is also zero. With mean
zero, Equation 2.15 simplifies to
sP r
z2 3
s= = =1
n−1 3
z3
P
Skew =
n
A 5
4 Symmetric
Frequency
3
2
1
0.0
1 2 3 4 5 6 7
B 5
4 Negatively
Frequency
Skewed
3
2
1
0.0
1 2 3 4 5 6 7
C 5
4 Positively
Frequency
Skewed
3
2
1
0.0
1 2 3 4 5 6 7
Score z z3
1 -.867 -0.652
3 -.289 -0.024
3 -.289 -0.024
9 1.445 3.017
P
0.000 2.317
z3
P
2.317
Skew = = = .579
n 4
10
A
8
Frequency
6
B
4
2
0.0
1 2 3 4 5 6 7
Distribution A is more peaked and has longer tails than does distribution B
and therefore has greater kurtosis.
z4
P
Kurtosis =
n
Notice that while skewness was expressed as the average of the cubed z
scores in a distribution, kurtosis is the average of z scores raised to the
fourth power. In general, larger kurtosis values reflect sharper peaks than
do smaller values.
Score z z4
1 -.867 0.565
3 -.289 0.007
3 -.289 0.007
9 1.445 4.360
P
0.000 4.939
z4
P
4.939
Kurtosis = = = 1.235
n 4