College of Natural and Computational Science
Department of Statistics
Introduction to Statistics
(Dep’t: IT)
November, 2024
©Seid A. MSc.) Tepi, Ethiopia
3. Summarization of Data
3.1 Measure of Central Tendency
• The most important objective of statistical analysis is to determine a single value for
the entire mass of data.
• It tells us where the center of the distribution of data is located.
The most commonly used measures of central tendencies are :
The Mean (Arithmetic mean, Weighted mean, Geometric mean and Harmonic means)
The Median
The Mode
Quantiles (Quartiles, Deciles and Percentiles).
2
3.1.1 Types of measure of central tendency
The Mean
1. Arithmetic mean
• The arithmetic mean of a sample is the sum of all the observation divided by the
number of observations in the sample.
𝒕𝒉𝒆 𝒔𝒖𝒎 𝒐𝒇 𝒂𝒍𝒍 𝒗𝒂𝒍𝒖𝒆𝒔 𝒊𝒏 𝒕𝒉𝒆 𝒔𝒂𝒎𝒑𝒍𝒆
i.e. 𝑺𝒂𝒎𝒑𝒍𝒆 𝒎𝒆𝒂𝒏 𝒐𝒓 𝒂𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝒎𝒆𝒂𝒏 = 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔 𝒊𝒏 𝒕𝒉𝒆 𝒔𝒂𝒎𝒑𝒍𝒆
Suppose that 𝑥1 , 𝑥2 , … , 𝑥𝑛 are n observed values in a sample of size n taken
from a population of size N.
• Then the arithmetic mean of the sample, denoted by 𝑥,ҧ is given by
𝑿𝟏 +𝑿𝟐 +⋯+𝑿𝒏 σ𝒏
𝒊=𝟏 𝑿𝒊
ഥ=
𝑿 = →for sample
𝐧 𝒏 3
• In general, the sample arithmetic mean is calculated by
• 𝑿ഥ =
𝒏
𝑿𝒊
→ 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑑𝑎𝑡𝑎
𝒏
𝒊=𝟏
𝒌
𝒇𝒊 𝑿𝒊
→ 𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 𝑓𝑖 = 𝑛.
σ 𝒇𝒊
𝒊=𝟏
𝒌
𝑴𝒊 𝑿𝒊
→ 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎 𝑤ℎ𝑒𝑟𝑒 𝑓𝑖 = 𝑛.
σ 𝒇𝒊
𝒊=𝟏
4
Example 1 (raw data)
The net weights of five perfume bottles selected at random from the
production line 𝑎𝑟𝑒 85.4, 85.3, 84.9, 85.4 𝑎𝑛𝑑 85. What is the arithmetic
mean weight of the sample observation?
• Solution; 𝐺𝑖𝑣𝑒𝑛 𝑛 = 5
𝑥1 = 85.4, 𝑥2 = 85.3, 𝑥3 = 84.9, 𝑥4 = 85.4 𝑎𝑛𝑑 𝑥5 = 85.
σ𝑛
𝑖=1 𝑋𝑖 85.4+85.3+84.9+85.4+ 85 426
𝑋ത = = = = 85.2.
𝑛 5 5
5
Example 2 (ungrouped fd)
Calculate the mean of the marks of 46 students given below
Marks (𝑋𝑖 ) 9 10 11 12 13 14 15 16 17 18
Frequency (𝑓𝑖 ) 1 2 3 6 10 11 7 3 2 1
Solution: σ 𝑓𝑖 = 𝑛 = 46 is the sum of the frequencies or total
number of observations.
calculate σ𝑘𝑖=1 𝒇𝒊 𝑿𝒊 = 623
ത σ 𝒌 𝒇𝒊 𝑿𝒊 623
𝑋 = 𝒊=𝟏 σ = = 13.54.
𝒇𝒊 46
6
Example 3: (grouped fd)
The net income of a sample of large importers of Urea was
organized into the following table. What is the arithmetic mean
of net income? Net income 2-4 5-7 8-10 11-13 14-16
Number of importers 1 4 10 3 2
𝒎𝒊 3 6 9 12 15
Solution: σ 𝑓𝑖 = 𝑛 = 20 is the sum of the frequencies or total
number of observations.
To calculate σ𝑘𝑖=1 𝒇𝒊 𝒎𝒊 first find the mid points
𝒇𝒊 𝒎𝒊 𝟏𝟖𝟑
𝑋ത = σ𝒌𝒊=𝟏 σ 𝒇𝒊
= = 𝟗. 𝟏𝟓. 7
𝟐𝟎
Combined mean
If we have an arithmetic means 𝑋1 , 𝑋2 , … , 𝑋𝑘 of k groups
having the same unit of measurement of a variable, with sizes
𝑛1 , 𝑛2 , … , 𝑛𝑘 observations respectively,
• we can compute the combined mean of the variant values of the
groups taken together from the individual means by
ഥ𝟏 +𝒏𝟐 𝒙
𝒏𝟏 𝒙 ഥ𝟐 +⋯+𝒏𝒌 𝒙
ഥ𝒌 σ𝒌 ഥ𝒊
𝒊=𝟏 𝒏𝒊 𝒙
𝑿𝒄𝒐𝒎 = = σ𝒌
𝒏𝟏 +𝒏𝟐 +⋯+𝒏𝒌 𝒊=𝟏 𝒏𝒊
8
Example 1:
Compute the combined mean for the following two set
𝑺𝒆𝒕 𝑨: 1, 4, 12, 2, 8 𝑎𝑛𝑑 6 ; 𝑺𝒆𝒕 𝑩: 3, 6, 2, 7 𝑎𝑛𝑑 4.
σ6𝑖=1 𝑋𝑖 33 σ5𝑖=1 𝑋𝑖 22
Solution: 𝑛1 = 6, 𝑥ҧ1 = = = 5.5 ; 𝑛2 = 5, 𝑥ҧ2 = = = 4.4
𝒏𝟏 6 𝒏𝟐 5
𝑛1 𝑥ҧ1 + 𝑛2 𝑥ҧ2 6 𝑥 5.5 + 5 𝑥 4.4 55
𝑋𝑐𝑜𝑚 = = = = 5.
𝑛1 + 𝑛2 6+5 11
Exercise: The mean weight of 150 students in a certain class is 60 kg. The mean
weight of boys in the class is 70 kg and that of girl’s is 55 kg . Find the number of
boys and girls in the class? Answer: 𝑛1 = 50 𝑎𝑛𝑑 𝑛2 = 100 9
Properties of arithmetic mean
1. The algebraic sum of the deviations of each value (xi) from the mean (x̅) is equal
to zero. That is
𝑛 𝑛 𝑛
ഥ
(Xi − X) = 0 =≫ σ𝑖=1 Xi − X ഥ = nതx − nതx = 0.
𝑖=1 𝑖=1
3
As an e.g., the mean of 3, 8 & 4 𝑖𝑠 5. Then 𝑖=1 ഥ = 3 − 5 + (8 −
Xi − X
10
5. The mean is sensitive to extreme values.
6. Uniqueness: the mean of any set of data is unique.
7. It can be used for further treatment.
Comparison of means.
Test on means.
Disadvantages of the arithmetic mean
1. The mean is meaningless in the case of nominal or qualitative data.
2. In case of grouped data, if any class interval is open ended, arithmetic mean
cannot be calculated, since the class mark of this interval cannot be found.
11
෩)
The Median(𝑿
Median is the middle value in the sorted list. We denote it by x .
• Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be n ordered observations. Then the median is
given by:
𝑿 𝒏+𝟏 𝐼𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑.
𝟐
=൞
𝒙 𝑿 𝒏 +𝑿 𝒏
𝟐 𝟐+𝟏
𝐼𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛.
𝟐
12
Example 1:
Find the median for the following data:
23, 16, 31, 77, 21, 14, 32, 6, 155, 9, 36, 24, 5, 27, 19
Solution: First arrange the given data in increasing order. That is
5, 6, 9, 14, 16, 19, 21, 23, 24, 27, 31, 32, 36, 77, 𝑎𝑛𝑑 155.
𝑛 = 15 =≫ 𝑜𝑑𝑑, 𝑥 = 𝑋 𝑛+1 = 𝑋 15+1 = 𝑋 8 = 23
2 2
2. Find the median for the following data.
61, 62, 63, 64, 64, 60, 65, 61, 63, 64, 65, 66, 64, 63
Solution: First arrange the given data in increasing order. that is
60, 61, 61, 62, 63, 63, 63, 64, 64, 64, 65, 65 & 66.
𝑛 = 14 =≫ 𝑒𝑣𝑒𝑛,
𝑋 𝑛 + 𝑋 𝑛+1 𝑋 14 + 𝑋 14+1 𝑋 + 𝑋 63 + 64 127
2 2 2 2 7 8
𝑥 = = = = = = 63.5
2 2 2 2 2
13
The formula for computing the median for grouped data is given by
𝒏
− 𝒍𝒄𝒇𝒑 𝒙 𝒘
𝒎𝒆𝒅𝒊𝒂𝒏 = 𝐱 = 𝐥𝐜𝐛𝒙 + 𝟐
𝒇𝒎
𝑊ℎ𝑒𝑟𝑒: 𝑙𝑐𝑏𝑥 − 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒃 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑛 − 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
𝑓𝑚 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
𝑤 𝑖𝑠 𝑡ℎ𝑒 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠. 𝑟𝑒𝑐𝑎𝑙𝑙 ∶ 𝑤 = 𝑢𝑐𝑏 − 𝑙𝑐𝑏.
𝑙𝑐𝑓𝑝 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒇 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑡𝑜 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝒊𝒎𝒎𝒆𝒅𝒊𝒂𝒕𝒆𝒍𝒚 𝒑𝒓𝒆𝒄𝒆𝒅𝒊𝒏𝒈 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠.
The class corresponding to the smallest LCF which is ≥ 𝑛2 is called the median class.
So that the median lies in this class. 14
Example
Find the median for the following data.
Daily 80 − 89 90 − 99 100 − 109 110 − 119 120 − 129 130 − 139
production
Frequency 5 9 20 8 6 2
Solution: First construct the LCF table.
Daily 80 − 89 90 − 99 100 − 109 110 − 119 120 − 129 130 − 139
production(CI)
Frequency(fi) 5 9 20 8 6 2
Lcf 5 14 34 42 48 50
15
Solution
• To obtain the median class , calculate 𝑛
2
=
50
2
= 25. Thus the smallest lcf
𝑛
which is ≥ is 34. So the class corresponding to this lcf is 100 − 109,
2
𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑇ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒, 𝑙𝑐𝑏𝑥 = 99.5, 𝑤 = 10, 𝑓𝑚 = 20, 𝑙𝑐𝑓𝑝 = 14.
𝑛
− 𝑙𝑐𝑓𝑝 𝑥 𝑤 25 − 14 𝑥 10
𝑚𝑒𝑑𝑖𝑎𝑛 = x = lcb𝑥 + 2 = 99.5 + = 105.
𝑓𝑚 20
16
Properties of the median
[Link] median is unique.
2. It can be computed for an open ended frequency distribution if the
median does not lie in an open ended class.
3. It is not affected by extremely large or small values.
[Link] can be computed for ratio level, interval level and ordinal level
data.
17
)
The mode(𝑿
• In every day speech, something is “in the mode” if it is fashionable or popular.
• In statistics this “popularity” refers to frequency of observations.
• Therefore, mode is the “most frequently” observed value in a set of
observations.
𝑬𝒙𝒂𝒎𝒑𝒍𝒆: 𝑺𝒆𝒕 𝑨: 10, 10, 9, 8, 5, 4, 5, 12, 10 𝑚𝑜𝑑𝑒 = 10
→ 𝑢𝑛𝑖𝑚𝑜𝑑𝑎𝑙.
𝑺𝒆𝒕 𝑩: 10, 10, 9, 9, 8, 12, 15, 5 𝑚𝑜𝑑𝑒 = 9 &10 → 𝑏𝑖𝑚𝑜𝑑𝑎𝑙.
𝑺𝒆𝒕 𝑪: 4, 6, 7, 15, 12, 9 𝑛𝑜 𝑚𝑜𝑑𝑒.
18
Mode for a grouped data
If the data is grouped such that we are given frequency distribution of finite class intervals,
we do not know the value of every item, but we easily determine the class with highest
frequency.
• Therefore, the modal class is the class with the highest frequency.
• So that the mode of the distribution lies in this class.
To compute the mode for a grouped data we use the formula:
∆𝟏
= 𝒍𝒄𝒃𝒙ෝ +
𝒎𝒐𝒅𝒆 = 𝑿 𝒙𝒘
∆𝟏 + ∆𝟐
𝑤ℎ𝑒𝑟𝑒; ∆1 = 𝑓𝑚 − 𝑓𝑝 , ∆2 = 𝑓𝑚 − 𝑓𝑠 19
𝑊ℎ𝑒𝑟𝑒: 𝑙𝑐𝑏𝑥ො – 𝑖𝑠 𝑡ℎ𝑒 𝒍𝒄𝒃 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠.
• 𝑓𝑚 − 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠.
• 𝑓𝑝 − 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝒑𝒓𝒆𝒄𝒆𝒅𝒊𝒏𝒈 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠.
• 𝑓𝑠 − 𝑖𝑠 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝒔𝒖𝒄𝒄𝒆𝒆𝒅𝒊𝒏𝒈 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
• 𝑤 − 𝑖𝑠 𝑡ℎ𝑒 𝑤𝑖𝑑𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠.
Example:
The ages of newly hired, unskilled employees are grouped into the following
distribution. Then compute the modal age?
Ages 18 − 20 21 − 23 24 − 26 27 − 29 30 − 32
Numbers 4 8 11 20 7
Solution: First we determine the modal class. The modal class is 27 − 29, since it has the highest
frequency. 𝑇ℎ𝑢𝑠, 𝑙𝑐𝑏𝑥ො = 26.5, 𝑤 = 3, ∆1 = 20 − 11 = 9, ∆2 = 20 − 7 = 13
∆1 9 27
Mode =𝑋 = 𝑙𝑐𝑏𝑥ො + 𝑥 𝑤 = 26.5 + 𝑥 3 = 26.5 + = 26.5 + 1.2 = 𝟐𝟕. 𝟕
∆1 +∆2 9+13 22
The age of most of these newly hired employees is 27.7 (27 years and 7 months).
21
Example 2: The following table shows the distribution of a group of families
according to their expenditure per week. The median and the mode of the following
distribution are known to be 25.50 Birr and 24.50 Birr respectively. Two frequency
values are however missing from the table. Find the missing frequencies.
Class interval 1 − 10 11 − 20 21 − 30 31 − 40 41 − 50
Frequency 14 𝑓2 27 𝑓4 15
Properties of mode
• It is not affected by extreme values of a set of observations.
• It can be calculated for distribution with open ended classes.
• It can be computed for all levels of data i.e. nominal, ordinal, interval and ratio.
• The main drawback of mode is that often it does not exist.
• Often its values are not unique.
4 Measures of Variation (Dispersion)
4.1. Introduction
Measures of central tendency locate the center of the distribution.
But they do not tell how individual observations are scattered on either side of the
center.
The spread of observations around the center is known as dispersion or variability.
In other words; the degree to which numerical data tend to spread about an average
value is called dispersion or variation of the data.
Small dispersion indicates high uniformity of the observation while larger dispersion
indicates less uniformity
24
Objective of measure of dispersion
• To measure reliability of the average being used.
• To control variability itself.
• To compare two or more groups of numbers in terms of their variability.
Absolute and Relative Measures of Dispersion
• Measure of dispersion can be classified as absolute and relative form
a. Absolute measure of dispersion: is a measure of dispersion w/c are expressed in
terms of the original unit of a series.
• Such measures are not suitable for comparing the variability of two distributions
which are expressed in different units of measurement and different average size. 25
b. Relative measures of dispersions
• Are a ratio or percentage of a measure of absolute dispersion to an
appropriate measure of central tendency
• These are pure numbers independent of the units of measurement.
• Relative measures are used for comparing the variability of two
distributions (even if they are measured in the different unit).
26
Types of Measures of Dispersion
1. The Range (R) and Relative Range (RR)
The Range is the difference b/n the highest and the smallest observation. That is
𝑋𝑚𝑎𝑥 −𝑋𝑚𝑖𝑛 → 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑎𝑛𝑑 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎.
𝑅=ቊ
𝑈𝐶𝐿𝑙𝑎𝑠𝑡 − 𝐿𝐶𝐿𝑓𝑖𝑟𝑠𝑡 → 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎.
• It is a quick and dirty measure of variability.
• Because of the range is greatly affected by extreme values, it may give a distorted picture of
the scores.
• Range is a measure of absolute dispersion and as such cannot be used for comparing
variability of two distributions expressed in different units.
• The solution is to use relative range or any other relative measure of variation
𝑹 27
𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝑹𝒂𝒏𝒈𝒆 (𝑹𝑹) = , also known as coefficient of range.
𝑿𝒎𝒂𝒙 +𝑿𝒎𝒊𝒏
2. Variance and Standard Deviation
Variance: is the average of the squares of the deviations taken from the mean
Suppose that 𝑥1 , 𝑥2 , … , 𝑥𝑁 be the set of observations on N populations.
Then,
σ 𝑁 2 σ 𝑁 2 2
𝑖=1 𝑖𝑥 − 𝜇 𝑖=1 𝑖𝑥 − 𝑁𝜇
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝜎 2 = = .
𝑁 𝑁
→ 𝑓𝑜𝑟 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
σ 𝑛 2 σ 𝑛 2 2
𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑖=1 𝑥𝑖 − 𝑛 𝑥ҧ
𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠 2 = = .
𝑛−1 𝑛−1
→ 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒
28
• In general, the sample variance is computed by:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2 σ𝑛𝑖=1 𝑥𝑖 2 − 𝑛𝑥ҧ 2
= . → 𝑓𝑜𝑟 𝑟𝑎𝑤 𝑑𝑎𝑡𝑎.
𝑛−1 𝑛−1
𝑠2 = σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥ҧ 2 σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖 2 − 𝑛𝑥ҧ 2
𝑘 = . → 𝑓𝑜𝑟 𝑢𝑛𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎.
σ𝑖=1 𝑓𝑖 − 1 𝑛−1
σ𝑘𝑖=1 𝑓𝑖 𝑚𝑖 − 𝑥ҧ 2 σ𝑘𝑖=1 𝑓𝑖 𝑚𝑖 2 − 𝑛𝑥ҧ 2
= . → 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎.
σ𝑘𝑖=1 𝑓𝑖 − 1 𝑛−1
Standard Deviation: it is the square root of variance
It is a measure of how far, on the average, an individual measurement is from the mean
𝑆. 𝑑 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑆2 = 𝑆 29
Example:
Suppose the data given below indicates time in minute required for a laboratory experiment to
compute a certain laboratory test. Calculate the variance and standard deviation for the
following data.
𝒙𝒊 32 36 40 44 48 Total
𝒇𝒊 2 5 8 4 1 20
Solution 𝒙𝒊 32 36 40 44 48 Total
𝒇𝒊 2 5 8 4 1 20
𝒇𝒊 𝒙𝒊 64 180 320 176 48 788
𝒇𝒊 𝒙𝒊 𝟐 2048 6480 12800 7744 2304 31376
σ𝑛𝑖=1 𝒇𝒊 𝒙𝒊 788
𝑥ҧ = = = 39.4 ,
𝑛 20
𝑛 2
2
σ 𝑖=1 𝒇 𝒊 𝑥𝑖 − 𝑛𝑥ҧ 2 31376 − 20 𝑥 39.4 2
𝑠 = = = 17.31. , 𝑆 = 17.31 = 4.16 30
𝑛−1 19
Properties of Variance
1. The variance is always non-negative (𝑠 2 ≥ 0).
2. If every element of the data is multiplied by a constant "c", then the new variance
𝑠 2 𝑛𝑒𝑤 = 𝑐 2 𝑥 𝑠 2 𝑜𝑙𝑑 .
3. When a constant is added to all elements of the data, then the variance does not
change.
4. The variance of a constant (c) measured in n times is zero. i.e. (𝑉𝑎𝑟(𝑐) = 0).
Uses of the Variance and Standard Deviation
• They can be used to determine the spread of the data. If the variance or S.D is large,
then the data are more dispersed.
• They are used to measure the consistency of a variable.
• They are used quit often in inferential statistics.
31
3. Coefficient of Variation (C.V)
Whenever the two groups have the same units of measurement, the variance
and standard deviation for each can be compared directly.
A statistics that allows one to compare two groups when the units of
measurement are different is called coefficient of variation.
It is computed by:
𝑆
𝐶. 𝑉 = 𝑥 100% → 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒.
𝑥ҧ
32
Example
The following data refers to the hemoglobin level for 5 males and 5 female
students. In which case, the hemoglobin level has high variability (less
consistency).
Solution:
13 + 13.8 + 14.6 + 15.6 + 17 74
𝑥ҧ𝑚𝑎𝑙𝑒 = = = 14.8
5 5
12 + 12.5 + 13.8 + 14.6 + 15.6 68
𝑥𝑓𝑒𝑚𝑎𝑙𝑒
ҧ = = = 13.7
5 5
Cont..
σ𝑛 2 2
•𝑠 2
𝑚𝑎𝑙𝑒𝑠 = 𝑖=1 𝑥𝑖 − 𝑛𝑥ҧ
𝑛−1
= 2.44. ,
𝑆𝑚𝑎𝑙𝑒𝑠 = 2.44 = 1.56
σ𝑛 2 2
• 𝑠 2𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = 𝑖=1 𝑥𝑖 − 𝑛𝑥ҧ
𝑛−1
= 2.19.,
𝑆𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = 2.19 = 1.48
• 𝐶. 𝑉𝑚𝑎𝑙𝑒𝑠 = 𝑆𝑥𝑚𝑎𝑙𝑒𝑠
ҧ𝑚𝑎𝑙𝑒
𝑥 100% =
1.56
14.8
𝑥100% = 𝟏𝟎. 𝟓𝟒%,
• 𝐶. 𝑉𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = 𝑆𝑓𝑒𝑚𝑎𝑙𝑒𝑠
𝑥ҧ 𝑓𝑒𝑚𝑎𝑙𝑒
𝑥 100% =
1.48
13.7
𝑥100% = 𝟏𝟎. 𝟖𝟎%
Therefore, the variability in hemoglobin level is higher for females than for males,
because 𝐶. 𝑉𝑓𝑒𝑚𝑎𝑙𝑒𝑠 > 𝐶. 𝑉𝑚𝑎𝑙𝑒𝑠 . 𝐼n other words, there is less consistency among
34
females than males in hemoglobin level.
4. Standard Scores (Z-Scores)
Z gives the number of standard deviation a particular observation lie above
or below the mean.
It is used for describing the relative position of a single score in the entire
set of data in terms of the mean and standard deviation.
If X is a measurement (an observation) from a distribution with mean 𝑥ҧ
and standard deviation S, then its value in standard units is
𝑥 − 𝑥ҧ
𝑍= → 𝑓𝑜𝑟 𝑠𝑎𝑚𝑝𝑙𝑒
𝑆
• A positive Z-score indicates that the observation is above the mean.
• A negative Z-score indicates that the observation is below the mean.
35
Example:
• Two sections were given an examination on a certain course. For section 1, the
average mark (score) was 72 with standard deviation of 6 and for section 2, the
average mark (score) was 85 with standard deviation of 7. If student A from
section 1 scored 84 and student B from section 2 scored 90, then who perform a
better relative to the group?
𝑥−𝑥ҧ 84−72
Solution: 𝑍 𝑠𝑐𝑜𝑟𝑒 𝑓𝑜𝑟 𝐴 𝑖𝑠 𝑍 = = =2
𝑆 6
𝑥 − 𝑥ҧ 90 − 85
𝑍 𝑠𝑐𝑜𝑟𝑒 𝑓𝑜𝑟 𝐵 𝑖𝑠 𝑍 = = = 0.71
𝑆 7
Since, both Z-scores are positive indicating that the observations are above the
mean.
• By comparing both scores, since ZA > ZB i.e. 2 > 0.71, student A performed
better relative to his group than student B. 36
37