0% found this document useful (0 votes)
7 views9 pages

Understanding Variability in Statistics

The document discusses variability in data, explaining how it is measured through range, variance, and standard deviation. It provides formulas for calculating these measures, along with examples and problems for practice. Additionally, it touches on degrees of freedom and measures of variability for qualitative and ranked data.

Uploaded by

MOnika
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views9 pages

Understanding Variability in Statistics

The document discusses variability in data, explaining how it is measured through range, variance, and standard deviation. It provides formulas for calculating these measures, along with examples and problems for practice. Additionally, it touches on degrees of freedom and measures of variability for qualitative and ranked data.

Uploaded by

MOnika
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Describing Variability

Variability describes how the data points (scores) are scattered around that central point. It is the
extent to which data points in a statistical distribution or data set, vary from the average value, as well as
the data points differ from each other. Variability can be measured with the range, the interquartile range
and the standard deviation/variance.

RANGE
The range in statistics for a given data set is the difference between the highest and lowest values.
For example, if the given data set is {2, 5, 8, 10, 3}, then the range will be 10 – 2 = 8.

Problem 1: Find the range of given observations: 32, 41, 28, 54, 35, 26, 23, 33, 38, 40.
Solution: Let us first arrange the given values in ascending order: 23, 26, 28, 32, 33, 35, 38, 40, 41, 54
Since 23 is the lowest value and 54 is the highest value, therefore, the range of the observations will be;
Range (X) = Max (X) – Min (X)= 54 – 23= 31

VARIANCE
Variance is a measure of how data points differ from the mean. A variance is a measure of how
far a set of data (numbers) are spread out from their mean (average) value. Variance is calculated by
finding the square of the standard deviation of a variable.
Variance=(Standard deviation)2, σ 2 =Σ ( Χ−μ)2 / N
In the formula above, μ represents the mean of the data points, x is the value of an individual data point
and N is the total number of data points.
Problem 2: Find the variance: X = 5, 8, 6, 10, 12, 9, 11, 10, 12, 7
Solution
Mean = sum (x)/ n, n= 10
sum (x) = 5+8+6+10+12+9+11+10+12+ 7= 90
Mean, μ = 90 / 10 = 9
Deviation from mean x- μ = -4, -1, -3, 1, 3, 0, 2, 1, 3,-2
(x-μ)2 = 16,1,9,1,9,0,4,1,9,4
Σ(x-μ)2 = 16+1+9+1+9+0+4+1+9+4 =54

σ2= Σ(x-μ)2 /n =54/10 = 5.4

STANDARD DEVIATION
Standard deviation is simply the square root of the variance. Standard deviation measures the
standard distance between a score and the mean.
Standard deviation=√Variance
The standard deviation is a measure of how the values in data differ from one another or how spread out
data is. There are two types of variance and standard deviation in terms of sample and population.
Sum of Squares (SS):
The sum of squared deviation scores or more simply the sum of squares, symbolized by SS.
Sum of Squares (SS) Formulas for Population:Sum of Square (SS) for population definition formula is
given below:
Sum of Square (SS) = Σ(X-μ)2
Sum of Square (SS) for population computation formula is given below:
2
(ΣΧ )
SS=ΣΧ 2−
N
Sum of Squares (SS) Formulas for Sample:
Sum of Squares for sample definition formula:
2
SS=Σ (X− X)
Sum of Squares for sample computation formula :
2
(ΣΧ )
2
SS=ΣΧ −
n
Standard deviation for Populationσ :

Standard deviation for Sample s:


DEGREES OF FREEDOM (df)
Degrees of freedom (df) refers to the number of values that are free to vary, given one or more
mathematical restrictions, in a sample being used to estimate a population characteristic.
Formula: Degree of freedom, df = n-1
In fact, we can use degrees of freedom to rewrite the formulas for the sample variance and standard
deviation:

where s2 and s represent the sample variance and standard deviation, SS is the sum of
squares as defined in either Formula 4.3 or 4.4, and df is the degrees of freedom and
equals n − 1.
INTERQUARTILE RANGE (IQR)
The interquartile range (IQR), is simply the range for the middle 50 percent of the scores.
More specifically, the IQR equals the distance between the third quartile (or 75th percentile) and the first
quartile (or 25th percentile), that is, after the highest quarter (or top 25 percent) and the lowest quarter (or
bottom 25 percent) have been trimmed from the original set of scores.
To find the interquartile range (IQR), first find the median (middle value) of the lower and upper half of
the data. These values are quartile 1 (Q1) and quartile 3 (Q3). The IQR is the difference between Q3 and
Q1.

Progress Check *4.8 Determine the values of the range and the IQR for the following sets of data.
(a) Retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63
(b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4

M E A S U R E S O F VA R I A B I L I T Y F O R Q U A L I TAT I V E
AND RANKED DATA

Qualitative Data
Measures of variability are virtually nonexistent for qualitative or nominal data. It is probably
adequate to note merely whether scores are evenly divided among the various classes (maximum
variability), unevenly divided among the various classes (intermediate variability), or concentrated mostly
in one class (minimum variability).
Ordered Qualitative and Ranked Data
If qualitative data can be ordered because measurement is ordinal (or if the data are ranked), then
it’s appropriate to describe variability by identifying extreme scores (or ranks).

Problems

1. Progress Check *4.8 Determine the values of the range and the IQR for the following
sets of data.
(a) Retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63
(b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4
Ans:
(a) range = 25; IQR = 65 – 60 = 5
(b) range = 11; IQR = 4 – 1 = 3

2. Using the definition formula for the sum of squares, calculate the
sample standard deviation for the following four scores: 1, 3, 4, 4.
Ans:

3. Using the computation formula for the sum of squares, calculate


the population standard deviation for the scores in (a) and the sample standard deviation for
the scores in (b).
(a) 1, 3, 7, 2, 0, 4, 7, 3 (b) 10, 8, 5, 0, 1, 1, 7, 9, 2

Ans:

4. Days absent from school for a sample of 10 first-grade children are:


8, 5, 7, 1, 4, 0, 5, 7, 2, 9.
a) Before calculating the standard deviation, decide whether the definitional or computational formula
would be more efficient. Why?
b) Use the more efficient formula to calculate the sample standard deviation.

Ans: (a) computation formula since the mean is not a whole number.

5. As a first step toward modifying his study habits, Phil keeps daily
records of his study time.
(a) During the first two weeks, Phil’s mean study time equals 20 hours per week. If he studied
22 hours during the first week, how many hours did he study during the second week?
(b) During the first four weeks, Phil’s mean study time equals 21 hours. If he studied 22, 18,
and 21 hours during the first, second, and third weeks, respectively, how many hours did
he study during the fourth week?
(c) If the information in (a) and (b) is to be used to estimate some unknown population characteristic, the
notion of degrees of freedom can be introduced. How many degrees of
freedom are associated with (a) and (b)?
(d) Describe the mathematical restriction that causes a loss of degrees of freedom in (a) and (b).

Ans: (a) 18 hours


(b) 23 hours
(c) df = 1 in (a) and df = 3 in (b)
(d) When all observations are expressed as deviations from their mean, the sum of all
deviations must equal zero.

Common questions

Powered by AI

Population formulas for standard deviation and variance assume data represents the entire population, dividing the sum of squared deviations by N (the total number of data points). Sample formulas, however, divide by n-1 to incorporate degrees of freedom, acknowledging that data is a sample subset used to estimate a population characteristic, which imposes constraints . The sample formulas correct for bias, ensuring these statistics are unbiased estimators of the population parameters. The choice between these formulas affects statistical inference, as using a sample formula with population data underestimates variability and vice versa, potentially skewing results.

To estimate a population parameter from a sample statistic, begin by calculating the sample mean or variance. When estimating variance, use the sample formula dividing by n-1, not n, to compensate for the constraint posed by the mean calculation, termed degrees of freedom . This adjustment ensures that sample statistics remain unbiased and suitable for inferring about the entire population. In practice, degrees of freedom address dependency in the data, quantified as n minus the number of independent constraints, such as the calculation of a mean. For example, if average study time is calculated over several weeks, each mean functions as a constraint reducing degrees of freedom, a critical consideration in estimating population parameters .

Choosing between definitional and computational formulae affects ease and accuracy of calculations. The definitional formula, directly using deviations from the mean, provides clarity in concept and proves foundational in understanding variability . However, it is less efficient for large or complex data, prone to computational errors due to manual calculations. Conversely, the computational formula, which uses squared scores and squares of sums, is more efficient for large data sets, reducing potential for arithmetic errors . Computational formulae are especially preferred where the mean is not a whole number or when data manipulation requires precision, as evidenced in the decision to use it for sample standard deviation when calculating days absent from school .

To determine the population standard deviation using the computation formula for the sum of squares (SS), sum the squares of each observation, subtract the square of the sum of all observations divided by N, and divide by N . Finally, take the square root of this variance to get the standard deviation. This formula aids in efficiently handling large data sets and minimizes arithmetic errors common with the definitional approach, especially when the mean is convoluted or a non-integer, making it practical for comprehensive data distribution analysis. Through this method we gain insights into the average deviation of data points, reflecting overall data spread and variability .

Finding a data set's standard deviation involves computing the square root of its variance. For population data, calculate each observation's deviation from the mean, square these deviations, sum them, and divide by the total number of observations (N). For sample data, follow similar steps but divide by n-1 instead of N, accounting for degrees of freedom . This adjustment corrects for the sample mean's constraint, ensuring accuracy in larger population estimations. The choice between these processes depends on whether the data encapsulates an entire population or a sample . These differences ensure unbiased variability and accurate standard deviation regardless of context.

Variability provides insight into how data points are scattered around a central value. The range simply shows the difference between the maximum and minimum values, offering a basic understanding of spread. The interquartile range (IQR) focuses on the middle 50% of the data, reducing the effect of outliers by measuring the range between the third quartile (Q3) and the first quartile (Q1). Standard deviation provides a more comprehensive measure as it calculates the average distance of each data point from the mean, considering the overall distribution's variability . Each measure highlights different aspects of distribution spread: range is sensitive to outliers, IQR provides a more robust measure by excluding extreme values, and standard deviation offers a detailed view of data dispersion around the mean.

The sum of squares (SS) is used differently for population and sample standard deviation calculations due to the adjustment for degrees of freedom in sample estimations. For a population, SS is simply the sum of squared differences from the mean, divided by the number of observations (N). For a sample, the calculation divides by n-1, where n is the number of observations, to account for the sample mean's use as an estimate, which incurs a degree of freedom. This distinction ensures that sample statistics are unbiased and accurately reflect population variability .

The interquartile range (IQR) is effective for evaluating data sets with outliers because it measures the spread of the middle 50% of a data set, thus minimizing the influence of extreme values. To calculate IQR, identify the first quartile (Q1) and third quartile (Q3) of a data set, then subtract Q1 from Q3. For example, in the data set with residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4, the IQR is Q3 (4) minus Q1 (1), resulting in an IQR of 3 . This provides a measure of variability that excludes outliers, offering a robust assessment of dispersion.

Degrees of freedom refer to the number of independent values that can vary in an analysis after certain constraints are applied. It's crucial when estimating population parameters from sample statistics as it corrects the bias in variance estimates. For sample variance and standard deviation, degrees of freedom are calculated as n-1 to account for the fact that the sample mean, which is used in these calculations, is itself an estimate that imposes a constraint . This adjustment ensures that variance and standard deviation are unbiased estimators of the population parameters, preventing underestimation of variability.

The range of the data set {1, 3, 7, 2, 0, 4, 7, 3} is calculated by identifying the maximum and minimum values and subtracting the latter from the former. Here, the range is 7 (max) - 0 (min) = 7 . This statistic provides a simplistic view of data spread, indicating the total extent of variability, but it does not account for individual variance within the data set. Unlike the interquartile range or standard deviation, the range does not provide information about the distribution of data points, their deviation from the mean, or sensitivity to outliers.

You might also like