3
UNDERSTANDING
VARIABILITY
Vive la Différence
Difficulty Scale ☺ ☺ ☺ ☺
(moderately easy but not a cinch)
WHAT YOU WILL LEARN
IN THIS CHAPTER
• Understanding the value of variability as a descriptive tool
• Computing the range
• Computing the standard deviation
• Computing the variance
• Understanding what the standard deviation and variance have in
common—and how they are different
WHY UNDERSTANDING
VARIABILITY IS IMPORTANT
In Chapter 2, you learned about different types of averages, what they mean, how
they are computed, and when to use them. But when it comes to descriptive statistics
and describing the characteristics of a distribution, averages are only half the story.
The other half is measures of variability.
41
42 Part II ■ Σigma Freud and Descriptive Statistics
In the simplest of terms, variability reflects how much scores differ from one
another. For example, the following set of scores shows some variability:
7, 6, 3, 3, 1
The following set of scores has the same mean (4) but has less variability than the
previous set:
3, 4, 4, 5, 4
The next set has no variability at all—the scores do not differ from one another—
but it also has the same mean as the other two sets we just showed you.
4, 4, 4, 4, 4
Variability (also called spread or dispersion) can be thought of as a measure of how
different scores are from one another. It’s even more accurate (and maybe even eas-
ier) to think of variability as how different scores are from one particular score. And
what “score” do you think that might be? Well, instead of comparing each score to
every other score in a distribution, the one score that could be used as a comparison
is—that’s right—the average. So, variability becomes a measure of how much each
score in a group of scores differs from the average, usually the mean. More about
this in a moment.
Remember what you already know about computing averages—that an average
(whether it is the mean, the median, or the mode) is a representative score of a set
of scores. Now, add your new knowledge about variability—that it reflects how
different scores are from one another. Each is an important descriptive statistic.
Together, these two (average and variability) can be used to describe the character-
istics of a distribution and show how distributions differ from one another.
Three measures of variability are commonly used to reflect the degree of variability,
spread, or dispersion in a group of scores. These are the range, the standard devia-
tion, and the variance. Let’s take a closer look at each one and how each one is used.
Actually, how data points differ from one another is a central part of under-
standing and using basic statistics. But when it comes to differences between
individuals and groups (a mainstay of most social and behavioral sciences), the
whole concept of variability becomes really important. Sometimes it’s called
fluctuation, or error, or one of many other terms, but the fact is, variety is the
spice of life, and what makes people different from one another also makes
understanding them and their behavior all the more challenging (and inter-
esting). Without variability either in a set of data or between individuals and
groups, things are just boring.
Chapter 3 ■ Understanding Variability 43
COMPUTING THE RANGE
The range is the simplest measure of variability and kind of intuitive. It is the distance
of the biggest score from the smallest score. The range is computed simply by subtract-
ing the lowest score in a distribution from the highest score in the distribution.
In general, the formula for the range is
(3.1)
r = h − l,
where
• r is the range,
• h is the highest score in the data set, and
• l is the lowest score in the data set.
Take the following set of scores, for example (shown here in descending order):
98, 86, 77, 56, 48
In this example, 98 − 48 = 50. The range is 50. Even if there was a set of 500 scores,
where the largest is 98 and the smallest was 48, the range would still be 50.
There really are two kinds of ranges. One is the exclusive range, which is the
highest score minus the lowest score (or h − l) and the one we just defined. The
second kind of range is the inclusive range, which is the highest score minus
the lowest score plus 1 (or h − l + 1). The difference is that the inclusive range
counts both the high and low scores as being among the values in the range.
You most commonly see the exclusive range in research articles, but the inclu-
sive range is also used on occasion if the researcher prefers it.
The range tells you how different the highest and lowest values in a data set are from
one another—that is, the range shows how much spread there is from the lowest to
the highest point in a distribution. So, although the range is fine as a general indi-
cator of variability, it should not be used to reach any conclusions regarding how
individual scores differ from one another. And you will usually never see it reported
as the only measure of variability but as one of several—which brings us to. . . .
COMPUTING THE STANDARD DEVIATION
Now we get to the most frequently used measure of variability, the standard deviation.
Just think about what the term implies—it’s a deviation from something (guess
44 Part II ■ Σigma Freud and Descriptive Statistics
what?) that has been standardized. Actually, the standard deviation (sometimes
abbreviated as SD, sometimes s) represents the average amount of variability in a set
of scores. In practical terms, it’s the average distance of each score from the mean.
The larger the standard deviation, the larger the average distance each data point is
from the mean of the distribution and the more variable the set of scores is.
So, what’s the logic behind computing the standard deviation? Your initial thoughts
may be to compute the mean of a set of scores and then subtract each individual
score from the mean. Then, compute the average of that distance. That’s a good
idea—you’ll end up with the average distance of each score from the mean. But it
won’t work. (For fun, play with a small set of scores and see if you can figure out
why this simple formula won’t work. We’ll show you why in a moment.)
Here’s the formula for computing the standard deviation:
∑(X − X )
2
s= , (3.2)
n −1
where
• s is the standard deviation;
• ∑ is sigma, which tells you to find the sum of what follows;
• X is each individual score;
• X is the mean of all the scores; and
• n is the sample size.
This formula finds the difference between each individual score and the mean
(X − X ), squares each difference, and sums them all together. Then, it divides the
sum by the size of the sample (minus 1) and takes the square root of the result. As
you can see, and as we mentioned earlier, the standard deviation is an average devi-
ation from the mean.
1. List each score. It doesn’t matter whether the scores are in any particular
order.
2. Compute the mean of the group.
3. Subtract the mean from each score.
4. Square each individual difference. The result is the column marked ( X − X )2.
5. Sum all the squared deviations about the mean. As you can see, the total
is 28.
6. Divide the sum by n − 1, or 10 − 1 = 9, so then 28/9 = 3.11.
7. Compute the square root of 3.11, which is 1.76 (after rounding). That is the
standard deviation for this set of 10 scores.
Chapter 3 ■ Understanding Variability 45
Here are the data we’ll use in the following step-by-step explanation of how to
compute the standard deviation:
5, 8, 5, 4, 6, 7, 8, 8, 3, 6
Here’s what we’ve done so far, where ( X − X ) represents the difference between
the actual score and the mean of all the scores, which is 6.
X X (X − X )
8 6 8 − 6 = +2
8 6 8 − 6 = +2
8 6 8 − 6 = +2
7 6 7 − 6 = +1
6 6 6−6=0
6 6 6−6=0
5 6 5 − 6 = −1
5 6 5 − 6 = −1
4 6 4 − 6 = −2
3 6 3 − 6 = −3
X (X − X ) (X − X )2
8 +2 4
8 +2 4
8 +2 4
7 +1 1
6 0 0
6 0 0
5 −1 1
5 −1 1
4 −2 4
3 −3 9
Sum 0 28
What we now know from these results is that each score in this distribution differs
from the mean by an average of 1.76 points.
They’re important to review and will increase your understanding of what the
standard deviation is.
46 Part II ■ Σigma Freud and Descriptive Statistics
First, why didn’t we just add up the deviations from the mean? Because the sum of
the deviations from the mean is always equal to zero. Try it by summing the devi-
ations (2 + 2 + 2 + 1 + 0 + 0 − 1 − 1 − 2 − 3). In fact, that’s the best way to check
whether you computed the mean correctly.
There’s another type of deviation that you may read about, and you should know
what it means. The mean deviation (also called the mean absolute deviation)
is the sum of the absolute value of the deviations from the mean divided by the
number of scores. You already know that the sum of the deviations from the
mean must equal zero (otherwise, the mean is computed incorrectly). That’s
why we square the deviations before summing them. Another option, though,
would be to take the absolute value of each deviation (which is the value regard-
less of the sign). Sum the absolute values and divide by the number of data
points, and you have the mean deviation. So, if you have a set of scores such as
3, 4, 5, 5, 8 and the arithmetic mean is 5, the mean deviation is the sum of 2 (the
absolute value of 5 − 3), 1, 0, 0, and 3, for a total of 6. Then divide this by 5 to get
the result of 1.2. (Note: The absolute value of a number is usually represented
as that number with a vertical line on each side of it, such as |5|. For example,
the absolute value of −6, or |−6|, is 6.)
Second, why do we square the deviations? Because we want to get rid of the nega-
tive sign so that when we do eventually sum them, they don’t add up to 0.
And finally, why do we eventually end up taking the square root of the entire value
in Step 7? Because we want to return to the same units with which we originally
started. We squared the deviations from the mean in Step 4 (to get rid of negative
values) and then took the square root of their total in Step 7 to get back where we
started. Pretty tidy.
Why n − 1? What’s Wrong With Just n?
It might make sense to square the deviations about the mean and then later go back
and take the square root of their sum. But how about subtracting the value of 1
from the denominator of the formula? Why do we divide by n − 1 rather than just
plain ol’ n, like we usually do when we calculate the mean? Good question.
The answer is that s (the standard deviation) is an estimate of the population stan-
dard deviation, and it is an unbiased estimate. Unbiased means that your sample
estimate of the mean is just as likely to be a little higher than the population mean
as it is to be a little lower. It is unbiased, though, only when we subtract 1 from n.
By subtracting 1 from the denominator, we artificially force the standard deviation
to be a tiny bit larger than it would be otherwise. Why would we want to do that?
Because, as good scientists, we are conservative. Being conservative means that if
we have to err (and there is always a pretty good chance that we will), we will do
so on the side of overestimating what the standard deviation of the population is.
Dividing by a smaller denominator lets us do so. Thus, instead of dividing by 10,
we divide by 9. Or instead of dividing by 100, we divide by 99.
Chapter 3 ■ Understanding Variability 47
Notice that with larger sample sizes, the adjustment of dividing by n − 1 doesn’t
make as much difference as it does with a smaller set of scores. This pattern, that
larger sample sizes lead to more accurate estimates of population statistics (like
the standard deviation and the mean), is reflected throughout all statistics and
throughout this book.
Biased estimates are appropriate if your intent is only to describe the charac-
teristics of the sample. But if you intend to use the sample as an estimate of a
population parameter, then it’s best to calculate the unbiased statistic. That’s
why spreadsheets and calculators sometimes offer two equations for comput-
ing the standard deviation, one that divides the sum of deviations by n − 1 and
one that divides the sum of deviations by just n.
Take a look in the following table and see what happens as the size of the sample
gets larger (and moves closer to the population in size). The n − 1 adjustment
has far less impact on the difference between the biased and the unbiased esti-
mates of the standard deviation (the bold column in the table) as the sample size
increases.
All other things being equal, then, the larger the size of the sample, the less dif-
ference there is between the biased and the unbiased estimates of the standard
deviation.
Unbiased
Biased Estimate of
Value of Estimate of the Population Difference
Numerator the Population Standard Between
in Standard Standard Deviation Biased and
Deviation Deviation (dividing by Unbiased
Sample Size Formula (dividing by n) n − 1) Estimates
10 500 7.07 7.45 0.38
100 500 2.24 2.25 0.01
1,000 500 0.7071 0.7075 0.0004
The moral of the story? When you compute the standard deviation for a sample,
which is an estimate of the population, the larger the sample is, and the more accu-
rate the estimate will be.
What’s the Big Deal?
The computation of the standard deviation is very straightforward. But what does
it mean? As a measure of variability, all it tells us is how much each score in a set
of scores, on the average, varies from the mean. But it has some very practical
applications, as you will find out in Chapter 4. Just to whet your appetite, consider
48 Part II ■ Σigma Freud and Descriptive Statistics
this: The standard deviation can be used to help us compare scores from different
distributions, even when the means and standard deviations are different. Amazing!
• The standard deviation is computed as the average distance from the
mean. So, you will need to first compute the mean as a measure of central
tendency. Don’t fool around with the median or the mode in trying to
compute the standard deviation.
• The larger the standard deviation, the more spread out the values are, and
the more different they are from one another.
• Just like the mean, the standard deviation is sensitive to extreme scores.
When you are computing the standard deviation of a sample and you have
extreme scores, note that fact somewhere in your written report and in your
interpretation of what the data mean.
• If s = 0, there is absolutely no variability in the set of scores, and the scores
are essentially identical in value. This will rarely happen and actually makes
it hard to compute statistics about the relationship between scores. Without
variability, we can’t see connections between variables. And, when you
think about it, a variable is only a variable when it varies.
So, what’s the big deal about variability and its importance? As a statistical
concept, you learned above that it is a measure of dispersion or how scores
differ from one another. But, “shakin’ it up” really is the spice of life, and as you
may have learned in your biology or psychology classes or from your own per-
sonal reading, variability is the key component of evolution, for without change
(which variability always accompanies), organisms cannot adapt. All very cool.
COMPUTING THE VARIANCE
Here comes another measure of variability and a nice surprise. If you know the
standard deviation of a set of scores and you can square a number, you can easily
compute the variance of that same set of scores. This third measure of variability,
the variance, is simply the standard deviation squared.
In other words, it’s the same formula you saw earlier but without the square root
bracket, like the one shown in Formula 3.3:
∑( X − X )2
s2 = (3.3)
n −1
If you take the standard deviation and never complete the last step (taking the
square root), you have the variance. In other words, s 2 = s × s, or the variance equals
the standard deviation times itself (which is what squaring does). In our earlier
Chapter 3 ■ Understanding Variability 49
example, where the standard deviation was equal to 1.76 (before rounding), the
variance is equal to 1.76 2, or 3.10. As another example, let’s say that the standard
deviation of a set of 150 scores is 2.34. Then, the variance would be 2.34 2, or 5.48.
You are not likely to see the variance mentioned by itself in a journal article or see it
used as a descriptive statistic. This is because the variance is a difficult number to inter-
pret and apply to a set of data. After all, it is based on squared deviation scores, and
its size is only a function of the number of scores that happen to be in a distribution.
But the variance is important because it is used both as a concept and as a practical
measure of variability in many statistical formulas and techniques. You will learn
about these later in Statistics for People Who (Think They) Hate Statistics.
PEOPLE WHO LOVED STATISTICS
You may know about the famous English nurse She used these visual forms of communica-
Florence Nightingale (1820–1910), who man- tion to examine causes of death for soldiers
aged and trained nurses in the Crimean War and to convince administrators that envi-
and developed standards and procedures to ronmental conditions, like unsanitary water,
make nursing a profession. She also, though, affected death rates. Her application of statis-
was a gifted mathematician and was influen- tics to real-world problems of life and death
tial in promoting the collection and analysis of like sanitation and other issues is believed to
statistical data. She was one of the first med- have dramatically increased life expectancy
ical researchers to use graphs and charts to throughout English towns in the latter parts
display the variability of data. of the 19th century.
The Standard Deviation Versus the Variance
How are standard deviation and the variance the same, and how are they different?
Well, they are both measures of variability, dispersion, or spread. The formulas
used to compute them are very similar. You see them (but mostly the standard
deviation) reported all over the place in the “Results” sections of journals.
50 Part II ■ Σigma Freud and Descriptive Statistics
They are also quite different.
First, and most important, the standard deviation (because we take the square root
of the average summed squared deviation) is stated in the original units from which
it was derived. The variance is stated in units that are squared (the square root of
the final value is never taken).
What does this mean? Let’s say that we need to know the variability of a group
of production workers assembling circuit boards. Let’s say that they average 8.6
boards per hour and the standard deviation is 1.59. The value 1.59 means that the
difference in the average number of boards assembled per hour by each worker is
about 1 1/2 circuit boards. This kind of information is quite valuable when trying
to understand the overall performance of groups.
Let’s look at an interpretation of the variance, which is 1.59 2, or 2.53. This would
be interpreted as meaning that the average difference between the workers is about
2.53 circuit boards squared from the mean. Which of these two makes more sense?
USING SPSS TO COMPUTE
MEASURES OF VARIABILITY
Let’s use SPSS to compute some measures of variability. We are using the file
named Chapter 3 Data Set 1.
There is one variable in this data set:
Variable Definition
ReactionTime Reaction time on a tapping task in seconds
Here are the steps to compute the measures of variability that we discussed
in this chapter:
1. Open the file named Chapter 3 Data Set 1.
2. Click Analyze → Descriptive Statistics → Frequencies.
3. Double-click on the ReactionTime variable to move it to the Variable(s)
box. If it’s highlighted, you can also click on that left arrow between the
two boxes to move a variable.
4. Click on the Statistics button, and you will see the Frequencies:
Statistics dialog box. This dialog box is used to select the variables and
procedures you want to perform.
5. Under Dispersion, click Std. deviation.
6. Under Dispersion, click Variance.
Chapter 3 ■ Understanding Variability 51
7. Under Dispersion, click Range.
8. Click Continue.
9. Click OK.
The SPSS Output
Figure 3.1 shows selected output from the SPSS procedure for ReactionTime.
FIGURE 3.1 SPSS output for the variable reaction time
Source: IBM.
Understanding the SPSS Output
There are 30 valid cases with no missing cases, and the standard deviation is
0.70255. The variance equals 0.494 (or s 2), and the range is 2.60. As you already
know, the standard deviation, variance, and range are measures of dispersion or
variability. In the case of this set of 30 observations, the standard deviation (the
most commonly used of all these measures) is equal to 0.703 or about 0.70.
We will go into much greater detail as to how this value is used in Chapter 8, but
for now, the value of 0.70 represents the average amount each score is from the
centermost point in the set of scores, or the mean. For these reaction time data, it
means that the average reaction time varies from the mean by about 0.70 seconds.
Let’s try another one, using the data titled Chapter 3 Data Set 2. There are two
variables in this data set:
Variable Definition
Math_Score Score on a mathematics test
Reading_Score Score on a reading test
Follow the same set of instructions as given previously, only in Step 3, select both
variables (either by double-clicking on each one or by selecting it and clicking the
“move” arrow).
52 Part II ■ Σigma Freud and Descriptive Statistics
More SPSS Output
The SPSS output is shown in Figure 3.2, where you can see selected output from
the SPSS procedure for these two variables. There are 30 valid cases with no miss-
ing cases, and the standard deviation for math scores is 12.36 with a variance of
152.70 and a range of 43. For reading scores, the standard deviation is 18.70, the
variance is a whopping 349.69 (that’s pretty big), and the range is 76 (which is large
as well, reflecting the similarly large variance).
FIGURE 3.2 Output for the variables Math_Score and Reading_
Score
Source: IBM.
Understanding the SPSS Output
As with the example shown in Figure 3.1, the interpretation of the SPSS output is
relatively straightforward.
On average, the amount of deviation from the mean for math sores is 12.4, and
on average, the amount of deviation for reading scores is 18.70. Whether these
measures of dispersion are considered “large” or “small” is a question that can only
be answered relative to a variety of factors, including what’s being measured, the
number of observations, and the range of possible scores. Context is everything.
REAL-WORLD STATS
If you were like a mega stats person, then you But we’re more interested in how these tools
might be interested in the properties of mea- are used, so let’s take a look at one such study
sures of variability for the sake of those prop- that actually focused on variability as an out-
erties. That’s what mainline statisticians spend come. And, as you read earlier, variability
lots of their time doing—looking at the charac- among scores is interesting for sure, but when
teristics and performance and assumptions it comes to understanding the reasons for
(and the violation thereof) of certain statistics. variability among substantive performances
Chapter 3 ■ Understanding Variability 53
and people, then the topic becomes really feedback loops that help the heart function
interesting. efficiently. This is a terrific example of how
looking at variability can be the focal point of a
This is exactly what Nicholas Stapelberg and
study rather than an accompanying descrip-
his colleagues in Australia did when they
tive statistic.
looked at variability in heart rate as it related
to coronary heart disease. Now, they did not Want to know more? Go online or to the
look at this phenomenon directly, but they library and read . . .
entered the search terms heart rate variabil-
Stapelberg, N. J., Hamilton-Craig, I.,
ity, depression, and heart disease into mul-
Neumann, D. L., Shum, D. H., &
tiple electronic databases and found that
McConnell, H. (2012). Mind and heart:
decreased heart rate variability is found in
Heart rate variability in major depressive
conjunction with both major depressive disor-
disorder and coronary heart disease—a
ders and coronary heart disease.
review and recommendations. The
Why might this be the case? The research- Australian and New Zealand Journal of
ers think that both diseases disrupt control Psychiatry, 46, 946–957.
Summary
Measures of variability help us even more fully understand what a distribution of data points looks
like. Along with a measure of central tendency, we can use these values to distinguish distributions
from one another and effectively describe what a collection of test scores, heights, or measures of
personality looks like and what those individual scores represent. Now that we can think and talk
about distributions, let’s explore ways we can look at them.
Time to Practice
1. Why is the range the most convenient measure of dispersion, yet the most imprecise measure
of variability? When would you use the range?
2. Fill in the exclusive and inclusive ranges for the following items.
High Score Low Score Inclusive Range Exclusive Range
12.1 3
92 51
42 42
7.5 6
27 26
3. Why would you expect more variability on a measure of personality in college freshmen than
you would on a measure of age?
(Continued)
54 Part II ■ Σigma Freud and Descriptive Statistics
(Continued)
4. Why does the standard deviation get smaller as the individuals in a group score more similarly
on a test? And why would you expect the amount of variability on a measure to be relatively
less with a larger number of observations than with a smaller one?
5. For the following set of scores, compute the range, the unbiased and the biased standard
deviations, and the variance. Do the exercise by hand.
94, 86, 72, 69, 93, 79, 55, 88, 70, 93
6. In Question 5, why is the unbiased estimate greater than the biased estimate?
7. Use SPSS to compute all the descriptive statistics for the following set of three test scores
over the course of a semester. Which test had the highest average score? Which test had the
smallest amount of variability?
Test 1 Test 2 Test 3
50 50 49
48 49 47
51 51 51
46 46 55
49 48 55
48 53 45
49 49 47
49 52 45
50 48 46
50 55 53
8. For the following set of scores, compute by hand the unbiased estimates of the standard
deviation and variance.
58
56
48
76
69
76
78
45
66
9. The variance for a set of scores is 36. What is the standard deviation, and what is the range?
Chapter 3 ■ Understanding Variability 55
10. Find the inclusive range, the sample standard deviation, and the sample variance of each of
the following sets of scores:
a. 5, 7, 9, 11
b. 0.3, 0.5, 0.6, 0.9
c. 6.1, 7.3, 4.5, 3.8
d. 435, 456, 423, 546, 465
11. This practice problem uses the data contained in the file named Chapter 3 Data Set 3. There
are two variables in this data set:
Variable Definition
Height Height in inches
Weight Weight in pounds
Using SPSS, compute all of the measures of variability you can for height and weight.
12. How can you tell whether SPSS produces a biased or an unbiased estimate of the standard
deviation?
13. Compute the biased and unbiased values of the standard deviation and variance for the set of
accuracy scores shown in Chapter 3 Data Set 4. Use SPSS if you can; otherwise, do it by hand.
Which one is smaller, and why?
14. On a spelling test, the standard deviation is equal to 0.94. What does this mean?
Student Study Site
Get the tools you need to sharpen your study skills! Visit [Link]/salkindfrey7e to access
practice quizzes, eFlashcards, original and curated videos, data sets, and more!