TOPIC: DESCRIPTIVE STATISTICS.
MEASURES OF CENTRAL
TENDENCY AND MEASURES OF DISPERSION: DEFINITIONS,
CHARACTERISTICS, EXAMPLES. T-TEST
Statistical methods can be used to summarize or describe a collection of
data; this is called descriptive statistics.
Descriptive Statistics are used to describe the basic features of the data
gathered from an experimental study in various ways.
In general, statistical data can be briefly described as a list of subjects or units
and the data associated with each of them. Although most research uses many
data types for each unit, this introduction treats only the simplest case.
There may be two objectives for formulating a summary statistic:
1. To choose a statistic that shows how different units seem similar.
Statistical textbooks call one solution to this objective, a measure of
central tendency.
2. To choose another statistic that shows how they differ. This kind of
statistic is often called a measure of statistical variability.
When summarizing a quantity like length or weight or age, it is common to
answer the first question with the arithmetic mean, the median, or, in case of
an unimodal distribution, the mode. Sometimes, we choose specific values
from the cumulative distribution function called quantiles.
The most common measures of variability for quantitative data are the
variance; its square root, the standard deviation; the range; interquartile range;
and the average absolute deviation (average deviation).
1
Measure of central tendency
1. The arithmetic mean (or simply the mean) of a list of numbers is the sum of all
the members of the list divided by the number of items in the list.
Arithmetic Mean Formula
Arithmetic mean is the ratio of the sum of the items to the number of items. It is denoted
by x¯. Below you could see formulas for finding arithmetic mean.
Arithmetic mean formula (Individual series)
If x1,x2,.....xn are the n items then the Arithmetic Mean is defined as
=
Arithmetic mean formula (Discrete series)
If x1,x2,.....xn are the n items and f1,f2,.....fn are the corresponding frequencies, then the
mean is given by,
=
where N = ∑f.
Solved Examples
Question 1: Find the mean of the following marks
x 5 15 25 35 45 55 65
f 5 4 3 5 12 8 1
Solution:
x f fx
5 5 25 ∑fx=1380 and ∑f=38
15 4 60
Mean is given by
25 3 75
35 5 175 x¯ = 1380/38 = 36.32
45 12 540
55 8 440 Mean = 36.32
65 1 65
Total 38 1380
2
2. Median, in statistics, a number that separates the lowest-value half and
the highest-value half.
The median of a finite list of numbers can be found by arranging all the
observations from lowest value to highest value and picking it, or if the
median is not unique, one often takes the mean of the two middle values.
3. In statistics, mode means the most frequent value assumed by a random
variable, or occurring in a sampling of a random variable. The term is applied
both to probability distributions and to collections of experimental data.
Measures of Dispersion
These measures of dispersions are used to convey the spread in probability
distribution as they are used to find the variability when the data is given as
frequency distributions. Range, variance and standard deviation are all
measures of dispersion.
1. Range
2. Mean absolute Deviation
3. Variance
4. Standard deviation
5. Coefficient of variation
1. Range
Range is the difference between the largest and the smallest values
Range = Largest Value - Smallest Value
For example if the lowest and highest marks scored in a test are 15 and
95, the range = 95 -15 = 80.
Even though the range is easiest measure of dispersion to calculate, it is
not considered a good measure of dispersion as it does not utilize the other
information related to the spread. The Outliers, either the extreme low value
or extreme high value can affect the range considerably. In the case of test
marks, if the next lowest mark happens to 25, then the mark 15 is an extreme
low value which shoots the range up by 10 marks.
3
2. Mean absolute Deviation
This measure of dispersion is the average of absolute deviations of data
values from the mean. It is computed using the formula,
∑(xi−x¯)/n
Where x is the arithmetic mean of the data set.
So, what does this measure of dispersion tell about the spread? A data
set with a larger Mean absolute difference is more spread when compared to a
data set with a smaller MSD. The Mean absolute difference is sensitive to the
outliers.
3. Variance and 4. Standard deviation
The variance and the standard deviation are two dispersion measures
which are widely applied because they are calculated utilizing the data
effectively. The deviation about the mean for each value in the data set is
taken into note. The variance is the average of the squared deviations.
Solved Examples
Question 2: Find the standard deviation of the data set 5,10,15, 20, 25, 30, 35, 40, 45, 50.
Solution:
Given data, 5,10,15, 20, 25, 30, 35, 40, 45, 50.
Step 1: The first step would be to compute the mean = x =
(5+10+15+20+25+30+35+40+45+50)/10 = 275/10 = 27.5
Step 2:
Now we build a table calculating the deviations and squared deviations. Finally we
total up the squared deviations column.
x Deviations (x - x) Squared Deviations (x - x)2
5 -22.5 506.25 Step 3:
10 -17.5 306.25
We plug the total found in the last
15 -12.5 156.25
20 -7.5 56.25 column of the table and n= 10 in
25 -2.5 6.25 the formula for variance,
30 2.5 6.25
σ2 = ∑(xi−μ)/n = 2062.5/10 =
35 7.5 56.25
40 12.5 156.25 206.25
45 17.5 306.25 And the standard deviation
50 22.5 506.25 σ = √206.25= 14.36
Total 2062.5
4
5. Coefficient of variation
The coefficient of variation (CV) is a measure of dispersion of a
probability distribution. It is defined as the ratio of the standard deviation to
the mean. The coefficient of variation is a dimensionless number that allows
comparison of the variation of populations that have significantly different
mean values. It is often reported as a percentage (%) by multiplying the above
calculation by 100.
Coefficient of variation is a measure that is used to compare the
dispersion in two series. Coefficient of variation is the percentage ratio of the
standard deviation to the mean Greater the coefficient of variation lesser will
be the consistent. Also greater the coefficient of variation, greater will be the
variability.
Coefficient of variation = (Standard Deviation/Mean) ×100%
T-Test (“Student’s T-Test”)
The t-test is the most commonly used method to evaluate the differences in
means between two groups. For example, the t-test can be used to test for a
difference in test scores between a group of patients who were given a drug
and a control group who received a placebo. Theoretically, the t-test can be
used even if the sample sizes are very small (e.g., as small as 10; some
researchers claim that even smaller n's are possible), as long as the variables
are normally distributed within each group and the variation of scores in the
two groups is not reliably different.
If the difference between the mean values over more than or equal to 2
times its average error, this difference is significant.
If the value of t ≥ 2, then the degree of reliability corresponds to P =
95.5%.
5
This can be represented by a formula:
p1-p2
М1-М2 t=
t= 2
√ m1 +m2 2 √ m12+m22
M - is the mean, P – is the relative value
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
∑V ∑V*p
Arithmetic mean М= M=
N N
Absolute deviation d = V- M
∑d2*p
Standard deviation σ =±√
N
Standard error of the σ
m =±
arithmetic mean √ N
σ
Coefficient of variation γ= * 100%
M
M1 – M2
T-Test t=
√ m12 + m22
p1-p2
T-Test t=
√ m12+m22
q = 100 - p
Standard error of the relative p*q
m =±√
value N